Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CompilerPerf] Using reflection to enhance #5112 and #5278 #5307

Closed
wants to merge 46 commits into from

Conversation

manofstick
Copy link
Contributor

Third in a series of PRs, each building on the other, sequentially merge-able without the subsequent ones - although ideally all of them would be merged.

This is realizing the discussion begun here.

It's moving the relevant parts of reflection.fs down into prim-types.fs, and then utilizing those functions to enhance the ability of the functions canUseDefaultComparer and canUseDefaultEqualityComparer to determine if PER = ER on the requested type - hence providing a more efficient comparer, especially in the value type case where it means that boxing can be avoided.

@manofstick manofstick changed the title Using reflection to enhance #5112 and #5278 [WIP] Using reflection to enhance #5112 and #5278 Jul 8, 2018
@manofstick
Copy link
Contributor Author

Some information about this PR:

  • Minimal subset of reflect.fs moved to prim-types.fs for querying fsharp types (records, unions)
  • Function stubs left in reflect.fs call implementations in prim-types.fs
  • Removed any stubs in reflect.fs which were no longer required
  • Copied required reshapedreflection.fs functions to prim-types.fs (I'm hoping this will all go away as .netstandard 2.0 becomes standard?)
  • Used Reflection functions to query objects to determine if ER comparer could be used.
  • All project internal

Some information about moving the functions

  • No access to option<>, list<>, Seq, etc.
  • reflect.fs has what appeared to be optimizations for option<> and list<>. This code was left in the stub.
  • tryXXX functions whose signature was ...->option<X> where changed to ...->byref<X>->bool
  • (Seq|Array).filter was converted to Array.FindAll
  • (Seq|Array).select was converted to Array.ConvertAll
  • A reshaped Array.ConvertAll was required as that only was introduced in .netstandard 2.0
  • sortFreshArray didn't originally create a fresh array, and still doesn't, but I didn't rename it

@manofstick
Copy link
Contributor Author

I've run the tests from #5112 & #5278 again after this PR. The following results are compared against those already optimized results reported in those PRs, rather than back to master.

I haven't rerun the old results, just used what was in the original PR.

Note that +/- 5% probably is just a fluctuation of the runs, I'm not doing this fully scientifically. The results should be significant enough that such "minor" performance changes are negligible.

Test 1

#5112 (comment)
https://gist.github.com/manofstick/2b5f11c4574f206ad27b

64-bit

test #5112 #5307 Percent
custom dynamic 120.33 102 85%
custom structural 136.83 110.83 81%
custom default 123.67 106.83 86%
value dynamic 933.83 382.83 41%
value structural 501 438.5 88%
value default 465.67 410.33 88%
gen value dynamic 1273.67 415.33 33%
gen value structural 838 578 69%
gen value default 572 399.33 70%
ref dynamic 1002.67 664.33 66%
ref structural 740 658.33 89%
ref default 783.5 712.5 91%
gen ref dynamic 1469.83 733.67 50%
gen ref structural 1135.83 878.33 77%
gen ref default 1011.67 764.67 76%
tuple dynamic 593.17 528.5 89%
tuple structural 251.33 241.33 96%
tuple default 706.83 664.5 94%
value tuple dynamic 149.5 126.5 85%
value tuple structural 150.67 127 84%
value tuple default 149.33 128.17 86%

32-bit

test #5112 #5307 Percent
custom dynamic 113.17 94.67 84%
custom structural 277.17 185.17 67%
custom default 112.67 100.17 89%
value dynamic 2365.5 400.17 17%
value structural 577.67 502.83 87%
value default 481.67 413.67 86%
gen value dynamic 3186.83 511.67 16%
gen value structural 1702 1275 75%
gen value default 603.33 521.67 86%
ref dynamic 2289 654.67 29%
ref structural 722 656.17 91%
ref default 774.5 688.33 89%
gen ref dynamic 3055.83 849.33 28%
gen ref structural 2231.5 1798.33 81%
gen ref default 1079.33 887 82%
tuple dynamic 1208.67 949.83 79%
tuple structural 261 240.67 92%
tuple default 792.33 743.83 94%
value tuple dynamic 152.67 134.17 88%
value tuple structural 153.17 132.83 87%
value tuple default 154.33 134.5 87%

@manofstick
Copy link
Contributor Author

manofstick commented Jul 14, 2018

Test 2

#5112 (comment)
https://gist.github.com/manofstick/847d965aff5c1de360cf63d66e8a53ed

32-bit

test #5112 #5307 Percent
seqGroupBy 674 510 76%
seqCountBy 676 456 67%
listGroupBy 716 252 35%
listCountBy 457 211 46%
arrayCountBy 304 151 50%
arrayGroupBy 217 126 58%
arrayCountBy 384 304 79%

64-bit

test #5112 #5307 Percent
seqGroupBy 986 569 58%
seqCountBy 1296 559 43%
listGroupBy 758 214 28%
listCountBy 949 212 22%
arrayCountBy 818 162 20%
arrayGroupBy 432 121 28%
arrayCountBy 525 301 57%

@forki
Copy link
Contributor

forki commented Jul 14, 2018

Wow. So it's ready?

@manofstick
Copy link
Contributor Author

Test 3

#5112 (comment)
https://gist.github.com/manofstick/9b06efefe1396808611cf55e812c0ea9

test #5112 #5307 Percent
32-bit 5512 5628 102%
64-bit 4822 3970 82%

@manofstick
Copy link
Contributor Author

Test 4

#5112 (comment)
https://gist.github.com/manofstick/42bfe5f1a888480af3d83190fe364e85

test #5112 #5307 Percent
32-bit non-generic 2664 2488 93%
32-bit generic 3083 4098 133%
64-bit non-generic 2639 2449 93%
64-bit generic 2891 2658 92%

@manofstick
Copy link
Contributor Author

Test 5

#5112 (comment)
https://gist.github.com/manofstick/d54cf0e11265e10fa0480f18c3dea8a5

test #5112 #5307 Percent
32-bit 288 341 118%
64-bit 301 279 93%

@manofstick
Copy link
Contributor Author

Test 6

#5112 (comment)
https://gist.github.com/manofstick/b7544e94ebae75ff0c29290056512558

test #5112 #5307 Percent
32-bit 915 365 40%
64-bit 710 291 41%

@manofstick
Copy link
Contributor Author

@forki

Yeah, I'm happy with it, so any code reviews now would be appreciated. I'll just finish posting my test times and then remove the WIP in the name...

Oh, and just as I got to test 6 I realized that I was comparing my checks to times from the first cut I did which was removing tailcalls. But it's not too far off - but does explain a couple of anomalies... I should have been referring back to #5112 (comment) - but doesn't totally invalidate I dont' think... just painful posting as I haven't automated this... Grrr.. But just slog through.

@manofstick
Copy link
Contributor Author

Test 7

#5112 (comment)
https://gist.github.com/manofstick/46ae43f5969cc1bb4c125509703299d5

32-bit

test #5112 #5307 Percent
int 2439 2296 94%
float 3795 3482 92%
struct int*int 3079 2788 91%
struct int*int64 3200 2883 90%
struct int*float 8216 7517 91%
struct int64int64int64int64int64int64int64*int64 5498 5272 96%
struct floatfloatfloatfloatfloatfloatfloat*float 22137 20456 92%
int*int 8348 7742 93%
int*float 8042 7454 93%
int64int64int64int64int64int64int64*int64 21023 19362 92%
floatfloatfloatfloatfloatfloatfloat*float 18985 17860 94%
TestRecordData 4542 4188 92%
TestGenericRecordData 6482 6000 93%
TestUnionData 5094 4788 94%
TestGenericUnionData 5739 5203 91%

64-bit

test #5112 #5307 Percent
int 671 615 92%
float 926 917 99%
struct int*int 1230 1040 85%
struct int*int64 18486 16990 92%
struct int*float 21521 19824 92%
struct int64int64int64int64int64int64int64*int64 19626 18585 95%
struct floatfloatfloatfloatfloatfloatfloat*float 28992 25770 89%
int*int 3021 2914 96%
int*float 3008 2988 99%
int64int64int64int64int64int64int64*int64 8928 8322 93%
floatfloatfloatfloatfloatfloatfloat*float 8600 10502 122%
TestRecordData 1317 1203 91%
TestGenericRecordData 1495 1379 92%
TestUnionData 1907 1557 82%
TestGenericUnionData 1991 1693 85%

@manofstick
Copy link
Contributor Author

manofstick commented Jul 14, 2018

Test 8

#5278 (comment)
https://gist.github.com/manofstick/33fefb8ac6b60d401375be738b521ac3

32-bit

test #5278 #5307 Percent
custom dynamic 1192.33 1239.67 104%
custom structural 3798.5 3866.00 102%
custom default 1208.83 1244.50 103%
value dynamic 3648.83 1301.00 36%
value structural 1251.33 1288.83 103%
value default 1259.83 1303.83 103%
gen value dynamic 6444.83 3659.67 57%
gen value structural 3522.83 3697.17 105%
gen value default 3560.67 3714.67 104%
ref dynamic 3766.83 1403.00 37%
ref structural 1281.67 1302.67 102%
ref default 1438.83 1469.17 102%
gen ref dynamic 6484.83 3991.83 62%
gen ref structural 4804.17 4934.33 103%
gen ref default 3961.5 4091.17 103%
tuple dynamic 11176.33 11145.33 100%
tuple structural 1256.67 1297.33 103%
tuple default 8640.83 9091.50 105%
value tuple dynamic 1348 1346.83 100%
value tuple structural 1336.83 1342.17 100%
value tuple default 1349.5 1350.33 100%

64-bit

test #5112 #5307 Percent
custom dynamic 1193.17 1204.50 101%
custom structural 1380.33 1410.67 102%
custom default 1197.33 1224.50 102%
value dynamic 2187.17 1269.67 58%
value structural 1240 1279.50 103%
value default 1249 1270.33 102%
gen value dynamic 2466.5 1521.50 62%
gen value structural 1503.33 1551.00 103%
gen value default 1524.83 1570.33 103%
ref dynamic 1749.67 1369.17 78%
ref structural 1262.83 1262.00 100%
ref default 1361.67 1410.33 104%
gen ref dynamic 2026.17 1665.33 82%
gen ref structural 1572.33 1511.00 96%
gen ref default 1673.5 1710.00 102%
tuple dynamic 4430 4650.33 105%
tuple structural 1070.33 1118.83 105%
tuple default 7844.33 8283.17 106%
value tuple dynamic 1570.17 1637.33 104%
value tuple structural 1558.83 1636.83 105%
value tuple default 1535.17 1633.17 106%

@manofstick
Copy link
Contributor Author

manofstick commented Jul 14, 2018

Test 9

#5278 (comment)
https://gist.github.com/manofstick/04ba0c70c398bf5edeaba9b17d0b17c5

test #5278 #5307 Percent
mapTest 32-bit 3356 3248 97%
mapTest 64-bit 4441 2815 63%

@manofstick
Copy link
Contributor Author

manofstick commented Jul 14, 2018

Test 10

#5278 (comment)
https://gist.github.com/manofstick/bd374436a6d40218b2d836db34c494ca

test #5278 #5307 Percent
32-bit 74615 70751 95%
64-bit 68278 65026 95%

@manofstick
Copy link
Contributor Author

manofstick commented Jul 14, 2018

Test 11

#5278 (comment)
https://gist.github.com/manofstick/258edf7e7d39a76e9d1cdf0316ae2b47

test #5278 #5307 Percent
non-generic 32-bit 24545 11420 47%
generic 32-bit 46421 26119 56%
non-generic 64-bit 18227 12071 66%
generic 64-bit 22051 14137 64%

@manofstick
Copy link
Contributor Author

manofstick commented Jul 14, 2018

Test 12

#5278 (comment)
https://gist.github.com/manofstick/847922bcce2e2f47d3eca033ed9dc068

test #5278 #5307 Percent
32-bit 363 350 96%
64-bit 306.8 287 94%

@manofstick
Copy link
Contributor Author

manofstick commented Jul 14, 2018

Test 13

#5278 (comment)
https://gist.github.com/manofstick/b4fa5acc1abe3ca73a77a7eea12d205a

test #5278 #5307 Percent
32-bit 575.8 558 97%
64-bit 498.8 509 102%

@manofstick
Copy link
Contributor Author

Done.

Remember that the times presented here are relative to already optimized #5278 and #5112 !

This PR is good to go! (well code review, etc.)

@manofstick manofstick changed the title [WIP] Using reflection to enhance #5112 and #5278 [CompilerPerf] Using reflection to enhance #5112 and #5278 Jul 14, 2018
@manofstick
Copy link
Contributor Author

manofstick commented Jul 14, 2018

... and one final test ...

Five years ago I posted a response on stackoverflow where I compared various key types for Dictionary<_,_>, dict and Map. I have slightly modified the code in this gist (added ValueTuple and inlined some functions) but quite interesting to see how far FSharp.Core has come! (I have done some of this by removing layers in value types in dict and now this PR, but others have improved Map - maybe it's just general compiler improvements? I'm not sure...)

So below are the results! From the original posting:

Using .net's System.Collections.Generic.Dictionary
checksum -55.339450 elapsed 874/562 (KeyRecord)
checksum -55.339450 elapsed 1251/898 (KeyGenericRecord`1)
checksum -55.339450 elapsed 569/1024 (KeyStruct)
checksum -55.339450 elapsed 740/1427 (KeyGenericStruct`1)
checksum -55.339450 elapsed 2497/2218 (Tuple`3)
Using f# 'dict'
checksum -55.339450 elapsed 979/628 (KeyRecord)
checksum -55.339450 elapsed 1614/1206 (KeyGenericRecord`1)
checksum -55.339450 elapsed 3237/5625 (KeyStruct)
checksum -55.339450 elapsed 3290/5626 (KeyGenericStruct`1)
checksum -55.339450 elapsed 2448/1914 (Tuple`3)
Using f# 'Map'
checksum -55.339450 elapsed 8453/2638 (KeyRecord)
checksum -55.339450 elapsed 31301/25441 (KeyGenericRecord`1)
checksum -55.339450 elapsed 30956/26931 (KeyStruct)
checksum -55.339450 elapsed 53699/49274 (KeyGenericStruct`1)
checksum -55.339450 elapsed 32203/25274 (Tuple`3)
Using custom array
checksum -55.339450 elapsed 484/160 (Tuple`3)

(read the results as n/m where n is the time to create the container, m is the time to access the container)

Unfortunately I didn't record if this was the 32-bit or 64-bit build

The less important thing here is not so much the absolute times (as I don't remember what the hardware was) but rather using the Dictionary with KeyRecord has a baseline from which the other numbers can be interpreted.

The results from a current run (prior to this PR) are:

32-bit

Using .net's System.Collections.Generic.Dictionary
checksum -55.339450 elapsed 634/444 (KeyRecord)
checksum -55.339450 elapsed 1001/829 (KeyGenericRecord`1)
checksum -55.339450 elapsed 380/283 (KeyStruct)
checksum -55.339450 elapsed 543/669 (KeyGenericStruct`1)
checksum -55.339450 elapsed 1700/1529 (Tuple`3)
checksum -55.339450 elapsed 597/539 (ValueTuple`3)
Using f# 'dict'
checksum -55.339450 elapsed 750/608 (KeyRecord)
checksum -55.339450 elapsed 1095/1185 (KeyGenericRecord`1)
checksum -55.339450 elapsed 526/565 (KeyStruct)
checksum -55.339450 elapsed 760/1121 (KeyGenericStruct`1)
checksum -55.339450 elapsed 1987/1922 (Tuple`3)
checksum -55.339450 elapsed 1500/1743 (ValueTuple`3)
Using f# 'Map'
checksum -55.339450 elapsed 6720/3540 (KeyRecord)
checksum -55.339450 elapsed 11355/7586 (KeyGenericRecord`1)
checksum -55.339450 elapsed 9682/3243 (KeyStruct)
checksum -55.339450 elapsed 10804/7364 (KeyGenericStruct`1)
checksum -55.339450 elapsed 10695/6700 (Tuple`3)
checksum -55.339450 elapsed 13516/6492 (ValueTuple`3)
Using custom array
checksum -55.339450 elapsed 272/3857 (Tuple`3)

64-bit

Using .net's System.Collections.Generic.Dictionary
checksum -55.339450 elapsed 591/434 (KeyRecord)
checksum -55.339450 elapsed 864/629 (KeyGenericRecord`1)
checksum -55.339450 elapsed 706/568 (KeyStruct)
checksum -55.339450 elapsed 799/811 (KeyGenericStruct`1)
checksum -55.339450 elapsed 1619/1585 (Tuple`3)
checksum -55.339450 elapsed 941/857 (ValueTuple`3)
Using f# 'dict'
checksum -55.339450 elapsed 575/526 (KeyRecord)
checksum -55.339450 elapsed 784/695 (KeyGenericRecord`1)
checksum -55.339450 elapsed 1381/1679 (KeyStruct)
checksum -55.339450 elapsed 1515/1912 (KeyGenericStruct`1)
checksum -55.339450 elapsed 1450/1338 (Tuple`3)
checksum -55.339450 elapsed 3129/3510 (ValueTuple`3)
Using f# 'Map'
checksum -55.339450 elapsed 5180/1748 (KeyRecord)
checksum -55.339450 elapsed 7364/3298 (KeyGenericRecord`1)
checksum -55.339450 elapsed 17353/13944 (KeyStruct)
checksum -55.339450 elapsed 19098/15394 (KeyGenericStruct`1)
checksum -55.339450 elapsed 7422/3325 (Tuple`3)
checksum -55.339450 elapsed 19207/15323 (ValueTuple`3)
Using custom array
checksum -55.339450 elapsed 274/99 (Tuple`3)

And post this PR are:

32-bit

Using .net's System.Collections.Generic.Dictionary
checksum -55.339450 elapsed 601/430 (KeyRecord)
checksum -55.339450 elapsed 741/553 (KeyGenericRecord`1)
checksum -55.339450 elapsed 359/284 (KeyStruct)
checksum -55.339450 elapsed 437/389 (KeyGenericStruct`1)
checksum -55.339450 elapsed 1772/1526 (Tuple`3)
checksum -55.339450 elapsed 590/527 (ValueTuple`3)
Using f# 'dict'
checksum -55.339450 elapsed 610/451 (KeyRecord)
checksum -55.339450 elapsed 711/570 (KeyGenericRecord`1)
checksum -55.339450 elapsed 358/327 (KeyStruct)
checksum -55.339450 elapsed 436/431 (KeyGenericStruct`1)
checksum -55.339450 elapsed 1866/1808 (Tuple`3)
checksum -55.339450 elapsed 556/540 (ValueTuple`3)
Using f# 'Map'
checksum -55.339450 elapsed 4299/1155 (KeyRecord)
checksum -55.339450 elapsed 5758/2476 (KeyGenericRecord`1)
checksum -55.339450 elapsed 6775/794 (KeyStruct)
checksum -55.339450 elapsed 4690/2081 (KeyGenericStruct`1)
checksum -55.339450 elapsed 12744/5136 (Tuple`3)
checksum -55.339450 elapsed 7207/943 (ValueTuple`3)
Using custom array
checksum -55.339450 elapsed 265/121 (Tuple`3)

64-bit

Using .net's System.Collections.Generic.Dictionary
checksum -55.339450 elapsed 569/425 (KeyRecord)
checksum -55.339450 elapsed 601/467 (KeyGenericRecord`1)
checksum -55.339450 elapsed 672/544 (KeyStruct)
checksum -55.339450 elapsed 698/592 (KeyGenericStruct`1)
checksum -55.339450 elapsed 1578/1538 (Tuple`3)
checksum -55.339450 elapsed 911/840 (ValueTuple`3)
Using f# 'dict'
checksum -55.339450 elapsed 533/447 (KeyRecord)
checksum -55.339450 elapsed 556/482 (KeyGenericRecord`1)
checksum -55.339450 elapsed 669/575 (KeyStruct)
checksum -55.339450 elapsed 699/624 (KeyGenericStruct`1)
checksum -55.339450 elapsed 1399/1274 (Tuple`3)
checksum -55.339450 elapsed 911/862 (ValueTuple`3)
Using f# 'Map'
checksum -55.339450 elapsed 4603/1203 (KeyRecord)
checksum -55.339450 elapsed 5409/1887 (KeyGenericRecord`1)
checksum -55.339450 elapsed 4323/1111 (KeyStruct)
checksum -55.339450 elapsed 4950/1585 (KeyGenericStruct`1)
checksum -55.339450 elapsed 7014/3007 (Tuple`3)
checksum -55.339450 elapsed 5078/1281 (ValueTuple`3)
Using custom array
checksum -55.339450 elapsed 263/88 (Tuple`3)

So finally we see that dict almost add no overhead, and Map is now quite competitive!

Time for me to enjoy my Sunday.

@dsyme
Copy link
Contributor

dsyme commented Jul 16, 2018

Amazing work!

@manofstick manofstick force-pushed the nobox_reflection branch 2 times, most recently from 080801c to 9e75253 Compare July 19, 2018 05:52
@KevinRansom
Copy link
Member

KevinRansom commented May 28, 2020

@dsyme , @manofstick --- guys, is this orphaned, should I close it? I am on a mission to reduce the PR's in the repo, and an amazing number haven't been worked on in over a year. I think the techniques in this look useful?

But if it's not going to be productized I would like to close it, what do you think?

Thanks

Kevin

@manofstick
Copy link
Contributor Author

@KevinRansom,

I will pick these up again if anyone was going to do anything with them. They were complete, but obviously code base has moved on.

So I'll just go and close 'em all, and if anyone decides that they are worth anything then they can poke me with a stick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants