Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results tracking - Paste your results here! #1

Open
FCLC opened this issue Jan 4, 2024 · 12 comments
Open

Results tracking - Paste your results here! #1

FCLC opened this issue Jan 4, 2024 · 12 comments

Comments

@FCLC
Copy link
Owner

FCLC commented Jan 4, 2024

I don't have a great place to store different peoples results, lets add them here!

You can place them directly below, ideally within a code tag to make for easier parsing!

Otherwise, feel free to send a Pastebin, git gist etc.

Thanks,

-FCLC

@FCLC
Copy link
Owner Author

FCLC commented Jan 4, 2024

Results as of 2024/01/05

M3 family

M3


1, 2, 3, 4, 8,9,10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000]
[4.816194453851074e-06, 3.346848817880602e-05, 0.022099447513812154, 0.0003986051932339883, 0.1272473381043812, 0.16494845360824742, 0.0032480006280313713, 0.10775374795049995, 5.295838383838384, 5.182452602653065, 1.071408266988696, 14.665444546287809, 15.978300952380952, 22.017369728950904, 25.911578667741605, 46.733875634671534, 77.07048363547351, 112.61182001815865, 101.13797444079104, 182.69022641318824, 231.65362377523925, 277.0745076462512, 376.8126402058972, 297.1503736405362, 395.072234390035531

M3 Pro

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683, 32768]
[4.940304143649224e-06, 2.615275533806795e-05, 0.02381395348837209, 0.0003683177661527483, 0.09406451612903226, 0.13408420488066505, 0.003394298390815837, 0.6099378689515191, 6.24152380952381, 1.9919151393574663, 7.580751454981171, 13.460490096444412, 15.375465556174667, 23.925651419224348, 15.189585349667412, 17.63253153352875, 24.174622207265895, 42.4192018661734, 48.97949779016964, 150.95175086648985, 168.6605704378703, 254.58014680913837, 344.37474763881875, 319.1913365999555, 370.7826314248209, 305.2033364720307, 345.67616639437114, 319.32592492057694]
2023, Macbook Pro 14', M3 Pro 12 core, 36GB

M3 Max (3 channel)

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683, 32768]
[6.406406406406406e-06, 0.0009810690018531303, 0.02457757296466974, 0.00036613377040552175, 0.05841814247936533, 0.15335071308081583, 0.31059715639810426, 2.2765440666204024, 3.7718561151079135, 1.1752042588960494, 5.866863170444948, 2.9560261549194187, 1.8095471067249096, 5.087776863956844, 11.97214693789007, 12.792844578413735, 47.16240600277262, 45.73406059941392, 65.07148199906084, 80.04057586790874, 205.91364619739895, 174.1874089125134, 354.94201641466566, 599.6347164155466, 748.8818960202927, 619.0967398250291, 774.0532000340861, 632.1859241014777]
2023,Macbook Pro 14'inch, Apple M3 Max, 36 GBs]

M3 Max (4 channel)

2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683, 32768]
[9.557031585989391e-05, 0.0010790935614084169, 0.018396090830698476, 0.025520249221183802, 0.048736462093862815, 0.14117314886708549, 0.3337951267215386, 1.4670741251444117, 3.3753605274000824, 11.84831638418079, 11.914648910411623, 18.44014788998608, 22.07040548931289, 35.934029112537175, 30.4348589569161, 42.56910515987076, 57.54855796421973, 104.29522139111678, 165.9180004206121, 382.86248949961544, 427.8850032433897, 580.0605971771008, 759.6805828297933, 669.6056906296085, 796.0521915427684, 687.7378855200425, 794.9496904346319, 654.4304466198903]

M3 Ultra (at time of writing, no such product exists )


M2 Family

M2

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683]
[5.815098758549104e-06, 0.0032319846779985634, 0.013017390420014237, 0.00032096540375347725, 0.07012312427856868, 0.08247422680412371, 0.01683864337101747, 1.4512811059907833, 2.9020059336669175, 2.020377649325626, 0.5769737488512171, 6.997092707979834, 8.201338633639477, 11.478932753929733, 17.945149112246924, 20.137630174008454, 39.082309254950914, 57.8646508782581, 51.50849683198146, 130.40949069506397, 166.68286223882805, 181.73445526123535, 238.1759029340736, 266.85908336597487, 280.1431849301336, 315.04831975561916, 282.6115287071625,2022, Macbook Air M2 13.6inch, Apple M2, 24GB]

M2 Pro

 [2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683, 32768]
[6.907717215326498e-05, 0.0014931150804623127, 0.014628571428571428, 0.008189248412533389, 0.08792136525357294, 0.07339449541284404, 0.3260497512437811, 1.2750121457489878, 2.5998095842589652, 3.128527356593449, 4.425595523115166, 10.173819709740924, 9.399885255306941, 11.956590520576276, 19.94813100891513, 31.86823962273178, 61.34317490351311, 101.89265608681254, 156.55017845033754, 236.81182667668324, 280.24386080377013, 349.58749283024144, 537.5963733205274, 502.20111382470714, 641.1073376213504, 467.8289325965807, 608.4119807807264, 438.9683475917186]
14inch MBPro M2 Pro 10 core (6+4) 32GB Sonoma, 14.3

M2 Max

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683]
[8.965644769947158e-05, 0.00116860351879504, 0.013183644041610876, 0.025924050632911394, 0.03319898900198101, 0.11678832116788321, 0.407055900621118, 1.2924683170267253, 2.818752688172043, 8.086745947279935, 9.364599118942731, 12.130401819560273, 16.60012585735375, 19.565032782448217, 22.240507772213046, 35.27848726271813, 52.515125121783846, 103.44025210872005, 96.07281529313705, 277.8700112248694, 358.5049594624609, 471.35730940175335, 623.3315456859455, 539.3242159590661, 675.8449319058319, 546.1567306369412, 671.1415636609279]
MacBook Pro 16", M2 Max 32GB

M2 Ultra


M1 Family

M1

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683]
[7.726333275385834e-05, 0.0014102530620772504, 0.009782940996637114, 0.011680697191613625, 0.07097653587771395, 0.12121212121212122, 0.48907462686567166, 1.8380725591819582, 2.716518134715026, 11.407981200226294, 7.518175066312997, 15.453920273225311, 48.67645386284772, 20.057881530665735, 22.37584181735856, 31.257345321030154, 85.64893355141632, 84.1687360953248, 98.14619273508809, 141.45160507708752, 159.94352445704678, 234.38705476988093, 290.7456121291836, 323.8187931753328, 305.4548644849349, 243.76339240445694, 208.08405051805715]

M1 Pro

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683]
[6.306462547495546e-05, 0.0025562130177514794, 0.004986948221451669, 0.00742701722574796, 0.060643873221861745, 0.03855421686746988, 0.1798779148917483, 0.7892932330827067, 2.0164923076923076, 0.24402513381428903, 7.83691797235023, 8.522715166597775, 6.9212937293729375, 9.01572909031904, 11.707585583184057, 30.26415130076947, 54.06653139119058, 92.50693802035153, 123.5597355024777, 109.69252220190681, 181.47595085481439, 227.3514657752904, 351.08518866610046, 453.59111469097905, 472.2007220969939, 207.15818735211207, 49.877772699966926]

M1 Max

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683]
[7.4839095943721e-05, 0.001098297638660077, 0.013016066707341876, 0.0083506625891947, 0.049846153846153846, 0.06828035915468915, 0.24545318352059925, 0.8921473087818697, 3.640888888888889, 6.779615429376851, 8.759968351822243, 8.625336927223719, 14.517331967312412, 45.766972547293484, 19.896889023823302, 65.2353852790182, 61.557606149079426, 152.11921461153736, 125.82389781674101, 275.6040287519756, 270.00672820624646, 438.00009073656537, 559.5139888275553, 550.4910635147069, 603.831309538254, 612.27554215099, 617.4469173501905]

M1 Ultra

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683, 32768]
[8.531422294739312e-05, 0.0015747113029277965, 0.01735593220338983, 0.03619397709599887, 0.12632126147981285, 0.15384615384615385, 0.07014239110890394, 2.617072197846031, 5.260555466366993, 16.534880787183045, 13.135622126649858, 13.168724279835391, 22.41948226188376, 37.22357638159277, 24.951391739391667, 40.74045806624337, 44.917992072521216, 81.4839564812115, 78.28079584490027, 273.70735474309384, 310.17074242761197, 559.5613794895514, 899.6634147989348, 931.2245655848407, 1058.0766591765296, 1101.572612714917, 1069.3424944129217, 667.2733136218509]

@neon-sunset
Copy link

Apple M1 Pro (6+2 CPU, 14 GPU), macOS 14.1.2 23B92

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683]
[6.306462547495546e-05, 0.0025562130177514794, 0.004986948221451669, 0.00742701722574796, 0.060643873221861745, 0.03855421686746988, 0.1798779148917483, 0.7892932330827067, 2.0164923076923076, 0.24402513381428903, 7.83691797235023, 8.522715166597775, 6.9212937293729375, 9.01572909031904, 11.707585583184057, 30.26415130076947, 54.06653139119058, 92.50693802035153, 123.5597355024777, 109.69252220190681, 181.47595085481439, 227.3514657752904, 351.08518866610046, 453.59111469097905, 472.2007220969939, 207.15818735211207, 49.877772699966926]

The last two are questionable and total power consumption was constantly around 18-20W vs up to 40W when doing the 10_000 one. Nothing interesting turned up in the profiler - kernel thread was extremely busy and the swift one was stuck in the BLAS guts.

@shacron
Copy link

shacron commented Jan 4, 2024

Ran out of memory? M1 Ultra 128GB

Welcome! How many GBs of memory are in your Orchard?
128
You entered 128 GB
Running the following tests: [2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683, 32768, 59049]
Filling the matrices with random data, this may take a while...
taking 10 seconds to avoid SOC hot spotting, normalizing clocks etc.

For {n=m=k}= 2  performance is  3.163618388531883e-05 GFLOPS
For {n=m=k}= 3  performance is  0.0005829959514170041 GFLOPS
For {n=m=k}= 4  performance is  0.0072279631825625384 GFLOPS
For {n=m=k}= 8  performance is  0.0003306380733616141 GFLOPS
For {n=m=k}= 9  performance is  0.04491820450414369 GFLOPS
For {n=m=k}= 10  performance is  0.0653061224489796 GFLOPS
For {n=m=k}= 16  performance is  0.2368793916085938 GFLOPS
For {n=m=k}= 27  performance is  0.9670097521432607 GFLOPS
For {n=m=k}= 32  performance is  1.2808255320812243 GFLOPS
For {n=m=k}= 64  performance is  1.9818705536360957 GFLOPS
For {n=m=k}= 81  performance is  2.83372480224377 GFLOPS
For {n=m=k}= 100  performance is  6.081337894336754 GFLOPS
For {n=m=k}= 128  performance is  7.811843938344173 GFLOPS
For {n=m=k}= 243  performance is  12.916034430379748 GFLOPS
For {n=m=k}= 256  performance is  13.480190553783878 GFLOPS
For {n=m=k}= 512  performance is  33.088096638008075 GFLOPS
For {n=m=k}= 729  performance is  68.4690153442719 GFLOPS
For {n=m=k}= 1000  performance is  101.31092281688132 GFLOPS
For {n=m=k}= 1024  performance is  163.78679228503802 GFLOPS
For {n=m=k}= 2048  performance is  130.50240308939672 GFLOPS
For {n=m=k}= 2187  performance is  214.0167349093067 GFLOPS
For {n=m=k}= 4096  performance is  306.8541273404492 GFLOPS
For {n=m=k}= 6561  performance is  525.382061852664 GFLOPS
For {n=m=k}= 8192  performance is  540.5429240744054 GFLOPS
For {n=m=k}= 10000  performance is  459.00796829869176 GFLOPS
For {n=m=k}= 16384  performance is  742.7630234469782 GFLOPS
For {n=m=k}= 19683  performance is  738.3370942465223 GFLOPS
For {n=m=k}= 32768  performance is  519.6819204513286 GFLOPS
zsh: killed     ./a.out

@FCLC
Copy link
Owner Author

FCLC commented Jan 4, 2024

Ran out of memory? M1 Ultra 128GB

That's strange.

Max value tested is edge length = 59049

(59049^2)83 would be ~80 GB.

Even allowing for a massive transpose matrix, that should add no more than +1/3{memory footprint}, still well below the 128 GBs available.

All of the memory allocations (outside of any supplementary matrix transposes done by accelerate) are done on application start. You'd only "run out" if accelerate ballooned?

I pushed a version to git a few hours ago ( fd7382e ) that takes this into account and divides by 4 instead of 3 when figuring out max edge length.

Even then, you'd still end up with max integer of 3^N being 10 and testing for 59049 🤔

If you do a run with "only" 64, what do those results look like?

@FCLC
Copy link
Owner Author

FCLC commented Jan 4, 2024

Beyond the memory bug, thanks for running it! Wowed to see ~750 GFLOPS on that machine

@shacron
Copy link

shacron commented Jan 5, 2024

M1 Ultra using 64 out of 128 GB. More GFLOPS for ya.

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683, 32768]
[8.531422294739312e-05, 0.0015747113029277965, 0.01735593220338983, 0.03619397709599887, 0.12632126147981285, 0.15384615384615385, 0.07014239110890394, 2.617072197846031, 5.260555466366993, 16.534880787183045, 13.135622126649858, 13.168724279835391, 22.41948226188376, 37.22357638159277, 24.951391739391667, 40.74045806624337, 44.917992072521216, 81.4839564812115, 78.28079584490027, 273.70735474309384, 310.17074242761197, 559.5613794895514, 899.6634147989348, 931.2245655848407, 1058.0766591765296, 1101.572612714917, 1069.3424944129217, 667.2733136218509]

@FCLC
Copy link
Owner Author

FCLC commented Jan 5, 2024

1.1 TFLOPS lets go!

That's (just!) enough to surpass Sandia labs project ASCI RED, the first ever TFLOP class super computer! To reach that performance mark, ASCI RED needed 850KW of power for the systems alone, excluding the needs of the rest of the infrastructure. It also needed over 7k sockets!

Thank you so much for providing the data!

@willkill07
Copy link

willkill07 commented Jan 5, 2024

M2 Max (Macbook Pro 14")

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683, 32768]
[6.843719390395696e-05, 0.0011944788533003008, 0.012907129172128669, 0.0006551851441887189, 0.08431644691186677, 0.08664385045271412, 0.43115789473684213, 1.514076923076923, 2.859711131474451, 9.671782762691853, 2.290491572942908, 13.204980918802573, 17.80710789204427, 18.5206931268151, 29.52653957705629, 34.36578674646738, 33.90512955555299, 59.70052663625562, 62.96282863255188, 227.337912114305, 251.2209400473425, 434.0868590036045, 603.0099552563552, 522.8910446051781, 651.86191372195, 536.2811031755538, 644.2913163428625, 534.3311599311345]

Peak of 651 GFLOPS

EDIT: I built with the recommended flags of ACCELERATE_NEW_LAPACK and ACCELERATE_LAPACK_ILP64 and got these numbers (in Release build within XCode)

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683, 32768]
[0.0001240704409928737, 0.0014102162331557505, 0.016697104096008348, 0.036036036036036036, 0.08452173913043479, 0.12631047113805735, 0.4020613496932515, 1.4059285714285714, 2.803679144385027, 9.619963302752293, 9.702963246973763, 12.841091492776886, 25.420024242424244, 99.40082713329731, 76.71776393255216, 69.59771301632011, 56.53796384052847, 130.37703408544584, 135.33281115581906, 263.91809370171416, 311.56847457055636, 466.12116917482507, 615.717295729097, 532.7772784987365, 672.9165756301664, 540.2428616611207, 665.6119555438962, 539.9115821076888]

Unsure why its 20GFLOPS faster but i'll take it :)

@FCLC
Copy link
Owner Author

FCLC commented Jan 5, 2024

The new interfaces seem to provide a much bigger bump for very small N.

Without checking the instruction stream, I'd assume part of the difference is not firing up the AMX tile for small problem sets where you're better off having lower throughput, but because your latency is so much lower, it comes out in the wash.

I'd assume it's also being smarter about when it starts using a single AMX tile, when it switches from the ecore AMX tile to the pcore attached AMX tile and when it starts issuing calls to both

@daemontus
Copy link

I got these results with M2 Max and 32GB RAM config:

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683]
[8.965644769947158e-05, 0.00116860351879504, 0.013183644041610876, 0.025924050632911394, 0.03319898900198101, 0.11678832116788321, 0.407055900621118, 1.2924683170267253, 2.818752688172043, 8.086745947279935, 9.364599118942731, 12.130401819560273, 16.60012585735375, 19.565032782448217, 22.240507772213046, 35.27848726271813, 52.515125121783846, 103.44025210872005, 96.07281529313705, 277.8700112248694, 358.5049594624609, 471.35730940175335, 623.3315456859455, 539.3242159590661, 675.8449319058319, 546.1567306369412, 671.1415636609279]
MacBook Pro 16", M2 Max 32GB

Pretty consistent with @willkill07, but a smidge faster. Hope it helps :)

@srcc-chekh
Copy link

m3 max, 128GB, got 784 max but maybe because I was doing other stuff? I can try again later

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683, 32768, 59049]
[1.952112246454171e-05, 0.0010845115681233934, 0.01150561797752809, 0.0004147149280364301, 0.16743224621038125, 0.09599692809830085, 0.0032652085097902447, 0.333375676430985, 3.183832102603964, 4.191520830168768, 2.360648528595225, 11.431314944158027, 52.59315360501567, 32.546429259994326, 44.33531835953242, 35.791394133333334, 97.78665010220486, 119.43596366160806, 112.1408702667772, 284.20455459298404, 354.892254498892, 512.38087355766, 693.1422611985672, 627.6962714485065, 768.8290439314724, 648.7229257764205, 784.1103685799497, 650.6355194016465, 773.5660835631116]

@srcc-chekh
Copy link

I entered "72" into the prompt and got:

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683, 32768]
[9.557031585989391e-05, 0.0010790935614084169, 0.018396090830698476, 0.025520249221183802, 0.048736462093862815, 0.14117314886708549, 0.3337951267215386, 1.4670741251444117, 3.3753605274000824, 11.84831638418079, 11.914648910411623, 18.44014788998608, 22.07040548931289, 35.934029112537175, 30.4348589569161, 42.56910515987076, 57.54855796421973, 104.29522139111678, 165.9180004206121, 382.86248949961544, 427.8850032433897, 580.0605971771008, 759.6805828297933, 669.6056906296085, 796.0521915427684, 687.7378855200425, 794.9496904346319, 654.4304466198903]

796 max this time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants