Results tracking - Paste your results here! #1

FCLC · 2024-01-04T01:57:35Z

I don't have a great place to store different peoples results, lets add them here!

You can place them directly below, ideally within a code tag to make for easier parsing!

Otherwise, feel free to send a Pastebin, git gist etc.

Thanks,

-FCLC

FCLC · 2024-01-04T02:52:59Z

Results as of 2024/01/05

M3 family

M3


1, 2, 3, 4, 8,9,10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000]
[4.816194453851074e-06, 3.346848817880602e-05, 0.022099447513812154, 0.0003986051932339883, 0.1272473381043812, 0.16494845360824742, 0.0032480006280313713, 0.10775374795049995, 5.295838383838384, 5.182452602653065, 1.071408266988696, 14.665444546287809, 15.978300952380952, 22.017369728950904, 25.911578667741605, 46.733875634671534, 77.07048363547351, 112.61182001815865, 101.13797444079104, 182.69022641318824, 231.65362377523925, 277.0745076462512, 376.8126402058972, 297.1503736405362, 395.072234390035531

M3 Pro

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683, 32768]
[4.940304143649224e-06, 2.615275533806795e-05, 0.02381395348837209, 0.0003683177661527483, 0.09406451612903226, 0.13408420488066505, 0.003394298390815837, 0.6099378689515191, 6.24152380952381, 1.9919151393574663, 7.580751454981171, 13.460490096444412, 15.375465556174667, 23.925651419224348, 15.189585349667412, 17.63253153352875, 24.174622207265895, 42.4192018661734, 48.97949779016964, 150.95175086648985, 168.6605704378703, 254.58014680913837, 344.37474763881875, 319.1913365999555, 370.7826314248209, 305.2033364720307, 345.67616639437114, 319.32592492057694]
2023, Macbook Pro 14', M3 Pro 12 core, 36GB

M3 Max (3 channel)

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683, 32768]
[6.406406406406406e-06, 0.0009810690018531303, 0.02457757296466974, 0.00036613377040552175, 0.05841814247936533, 0.15335071308081583, 0.31059715639810426, 2.2765440666204024, 3.7718561151079135, 1.1752042588960494, 5.866863170444948, 2.9560261549194187, 1.8095471067249096, 5.087776863956844, 11.97214693789007, 12.792844578413735, 47.16240600277262, 45.73406059941392, 65.07148199906084, 80.04057586790874, 205.91364619739895, 174.1874089125134, 354.94201641466566, 599.6347164155466, 748.8818960202927, 619.0967398250291, 774.0532000340861, 632.1859241014777]
2023,Macbook Pro 14'inch, Apple M3 Max, 36 GBs]

M3 Max (4 channel)

2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683, 32768]
[9.557031585989391e-05, 0.0010790935614084169, 0.018396090830698476, 0.025520249221183802, 0.048736462093862815, 0.14117314886708549, 0.3337951267215386, 1.4670741251444117, 3.3753605274000824, 11.84831638418079, 11.914648910411623, 18.44014788998608, 22.07040548931289, 35.934029112537175, 30.4348589569161, 42.56910515987076, 57.54855796421973, 104.29522139111678, 165.9180004206121, 382.86248949961544, 427.8850032433897, 580.0605971771008, 759.6805828297933, 669.6056906296085, 796.0521915427684, 687.7378855200425, 794.9496904346319, 654.4304466198903]

M3 Ultra (at time of writing, no such product exists )

M2 Family

M2

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683]
[5.815098758549104e-06, 0.0032319846779985634, 0.013017390420014237, 0.00032096540375347725, 0.07012312427856868, 0.08247422680412371, 0.01683864337101747, 1.4512811059907833, 2.9020059336669175, 2.020377649325626, 0.5769737488512171, 6.997092707979834, 8.201338633639477, 11.478932753929733, 17.945149112246924, 20.137630174008454, 39.082309254950914, 57.8646508782581, 51.50849683198146, 130.40949069506397, 166.68286223882805, 181.73445526123535, 238.1759029340736, 266.85908336597487, 280.1431849301336, 315.04831975561916, 282.6115287071625,2022, Macbook Air M2 13.6inch, Apple M2, 24GB]

M2 Pro

 [2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683, 32768]
[6.907717215326498e-05, 0.0014931150804623127, 0.014628571428571428, 0.008189248412533389, 0.08792136525357294, 0.07339449541284404, 0.3260497512437811, 1.2750121457489878, 2.5998095842589652, 3.128527356593449, 4.425595523115166, 10.173819709740924, 9.399885255306941, 11.956590520576276, 19.94813100891513, 31.86823962273178, 61.34317490351311, 101.89265608681254, 156.55017845033754, 236.81182667668324, 280.24386080377013, 349.58749283024144, 537.5963733205274, 502.20111382470714, 641.1073376213504, 467.8289325965807, 608.4119807807264, 438.9683475917186]
14inch MBPro M2 Pro 10 core (6+4) 32GB Sonoma, 14.3

M2 Max

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683]
[8.965644769947158e-05, 0.00116860351879504, 0.013183644041610876, 0.025924050632911394, 0.03319898900198101, 0.11678832116788321, 0.407055900621118, 1.2924683170267253, 2.818752688172043, 8.086745947279935, 9.364599118942731, 12.130401819560273, 16.60012585735375, 19.565032782448217, 22.240507772213046, 35.27848726271813, 52.515125121783846, 103.44025210872005, 96.07281529313705, 277.8700112248694, 358.5049594624609, 471.35730940175335, 623.3315456859455, 539.3242159590661, 675.8449319058319, 546.1567306369412, 671.1415636609279]
MacBook Pro 16", M2 Max 32GB

M2 Ultra

M1 Family

M1

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683]
[7.726333275385834e-05, 0.0014102530620772504, 0.009782940996637114, 0.011680697191613625, 0.07097653587771395, 0.12121212121212122, 0.48907462686567166, 1.8380725591819582, 2.716518134715026, 11.407981200226294, 7.518175066312997, 15.453920273225311, 48.67645386284772, 20.057881530665735, 22.37584181735856, 31.257345321030154, 85.64893355141632, 84.1687360953248, 98.14619273508809, 141.45160507708752, 159.94352445704678, 234.38705476988093, 290.7456121291836, 323.8187931753328, 305.4548644849349, 243.76339240445694, 208.08405051805715]

M1 Pro

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683]
[6.306462547495546e-05, 0.0025562130177514794, 0.004986948221451669, 0.00742701722574796, 0.060643873221861745, 0.03855421686746988, 0.1798779148917483, 0.7892932330827067, 2.0164923076923076, 0.24402513381428903, 7.83691797235023, 8.522715166597775, 6.9212937293729375, 9.01572909031904, 11.707585583184057, 30.26415130076947, 54.06653139119058, 92.50693802035153, 123.5597355024777, 109.69252220190681, 181.47595085481439, 227.3514657752904, 351.08518866610046, 453.59111469097905, 472.2007220969939, 207.15818735211207, 49.877772699966926]

M1 Max

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683]
[7.4839095943721e-05, 0.001098297638660077, 0.013016066707341876, 0.0083506625891947, 0.049846153846153846, 0.06828035915468915, 0.24545318352059925, 0.8921473087818697, 3.640888888888889, 6.779615429376851, 8.759968351822243, 8.625336927223719, 14.517331967312412, 45.766972547293484, 19.896889023823302, 65.2353852790182, 61.557606149079426, 152.11921461153736, 125.82389781674101, 275.6040287519756, 270.00672820624646, 438.00009073656537, 559.5139888275553, 550.4910635147069, 603.831309538254, 612.27554215099, 617.4469173501905]

M1 Ultra

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683, 32768]
[8.531422294739312e-05, 0.0015747113029277965, 0.01735593220338983, 0.03619397709599887, 0.12632126147981285, 0.15384615384615385, 0.07014239110890394, 2.617072197846031, 5.260555466366993, 16.534880787183045, 13.135622126649858, 13.168724279835391, 22.41948226188376, 37.22357638159277, 24.951391739391667, 40.74045806624337, 44.917992072521216, 81.4839564812115, 78.28079584490027, 273.70735474309384, 310.17074242761197, 559.5613794895514, 899.6634147989348, 931.2245655848407, 1058.0766591765296, 1101.572612714917, 1069.3424944129217, 667.2733136218509]

neon-sunset · 2024-01-04T08:30:32Z

Apple M1 Pro (6+2 CPU, 14 GPU), macOS 14.1.2 23B92

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683]
[6.306462547495546e-05, 0.0025562130177514794, 0.004986948221451669, 0.00742701722574796, 0.060643873221861745, 0.03855421686746988, 0.1798779148917483, 0.7892932330827067, 2.0164923076923076, 0.24402513381428903, 7.83691797235023, 8.522715166597775, 6.9212937293729375, 9.01572909031904, 11.707585583184057, 30.26415130076947, 54.06653139119058, 92.50693802035153, 123.5597355024777, 109.69252220190681, 181.47595085481439, 227.3514657752904, 351.08518866610046, 453.59111469097905, 472.2007220969939, 207.15818735211207, 49.877772699966926]

The last two are questionable and total power consumption was constantly around 18-20W vs up to 40W when doing the 10_000 one. Nothing interesting turned up in the profiler - kernel thread was extremely busy and the swift one was stuck in the BLAS guts.

shacron · 2024-01-04T22:57:49Z

Ran out of memory? M1 Ultra 128GB

Welcome! How many GBs of memory are in your Orchard?
128
You entered 128 GB
Running the following tests: [2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683, 32768, 59049]
Filling the matrices with random data, this may take a while...
taking 10 seconds to avoid SOC hot spotting, normalizing clocks etc.

For {n=m=k}= 2  performance is  3.163618388531883e-05 GFLOPS
For {n=m=k}= 3  performance is  0.0005829959514170041 GFLOPS
For {n=m=k}= 4  performance is  0.0072279631825625384 GFLOPS
For {n=m=k}= 8  performance is  0.0003306380733616141 GFLOPS
For {n=m=k}= 9  performance is  0.04491820450414369 GFLOPS
For {n=m=k}= 10  performance is  0.0653061224489796 GFLOPS
For {n=m=k}= 16  performance is  0.2368793916085938 GFLOPS
For {n=m=k}= 27  performance is  0.9670097521432607 GFLOPS
For {n=m=k}= 32  performance is  1.2808255320812243 GFLOPS
For {n=m=k}= 64  performance is  1.9818705536360957 GFLOPS
For {n=m=k}= 81  performance is  2.83372480224377 GFLOPS
For {n=m=k}= 100  performance is  6.081337894336754 GFLOPS
For {n=m=k}= 128  performance is  7.811843938344173 GFLOPS
For {n=m=k}= 243  performance is  12.916034430379748 GFLOPS
For {n=m=k}= 256  performance is  13.480190553783878 GFLOPS
For {n=m=k}= 512  performance is  33.088096638008075 GFLOPS
For {n=m=k}= 729  performance is  68.4690153442719 GFLOPS
For {n=m=k}= 1000  performance is  101.31092281688132 GFLOPS
For {n=m=k}= 1024  performance is  163.78679228503802 GFLOPS
For {n=m=k}= 2048  performance is  130.50240308939672 GFLOPS
For {n=m=k}= 2187  performance is  214.0167349093067 GFLOPS
For {n=m=k}= 4096  performance is  306.8541273404492 GFLOPS
For {n=m=k}= 6561  performance is  525.382061852664 GFLOPS
For {n=m=k}= 8192  performance is  540.5429240744054 GFLOPS
For {n=m=k}= 10000  performance is  459.00796829869176 GFLOPS
For {n=m=k}= 16384  performance is  742.7630234469782 GFLOPS
For {n=m=k}= 19683  performance is  738.3370942465223 GFLOPS
For {n=m=k}= 32768  performance is  519.6819204513286 GFLOPS
zsh: killed     ./a.out

FCLC · 2024-01-04T23:39:25Z

Ran out of memory? M1 Ultra 128GB

That's strange.

Max value tested is edge length = 59049

(59049^2)83 would be ~80 GB.

Even allowing for a massive transpose matrix, that should add no more than +1/3{memory footprint}, still well below the 128 GBs available.

All of the memory allocations (outside of any supplementary matrix transposes done by accelerate) are done on application start. You'd only "run out" if accelerate ballooned?

I pushed a version to git a few hours ago ( fd7382e ) that takes this into account and divides by 4 instead of 3 when figuring out max edge length.

Even then, you'd still end up with max integer of 3^N being 10 and testing for 59049 🤔

If you do a run with "only" 64, what do those results look like?

FCLC · 2024-01-04T23:53:33Z

Beyond the memory bug, thanks for running it! Wowed to see ~750 GFLOPS on that machine

shacron · 2024-01-05T00:02:37Z

M1 Ultra using 64 out of 128 GB. More GFLOPS for ya.

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683, 32768]
[8.531422294739312e-05, 0.0015747113029277965, 0.01735593220338983, 0.03619397709599887, 0.12632126147981285, 0.15384615384615385, 0.07014239110890394, 2.617072197846031, 5.260555466366993, 16.534880787183045, 13.135622126649858, 13.168724279835391, 22.41948226188376, 37.22357638159277, 24.951391739391667, 40.74045806624337, 44.917992072521216, 81.4839564812115, 78.28079584490027, 273.70735474309384, 310.17074242761197, 559.5613794895514, 899.6634147989348, 931.2245655848407, 1058.0766591765296, 1101.572612714917, 1069.3424944129217, 667.2733136218509]

FCLC · 2024-01-05T00:30:12Z

1.1 TFLOPS lets go!

That's (just!) enough to surpass Sandia labs project ASCI RED, the first ever TFLOP class super computer! To reach that performance mark, ASCI RED needed 850KW of power for the systems alone, excluding the needs of the rest of the infrastructure. It also needed over 7k sockets!

Thank you so much for providing the data!

willkill07 · 2024-01-05T03:02:08Z

M2 Max (Macbook Pro 14")

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683, 32768]
[6.843719390395696e-05, 0.0011944788533003008, 0.012907129172128669, 0.0006551851441887189, 0.08431644691186677, 0.08664385045271412, 0.43115789473684213, 1.514076923076923, 2.859711131474451, 9.671782762691853, 2.290491572942908, 13.204980918802573, 17.80710789204427, 18.5206931268151, 29.52653957705629, 34.36578674646738, 33.90512955555299, 59.70052663625562, 62.96282863255188, 227.337912114305, 251.2209400473425, 434.0868590036045, 603.0099552563552, 522.8910446051781, 651.86191372195, 536.2811031755538, 644.2913163428625, 534.3311599311345]

Peak of 651 GFLOPS

EDIT: I built with the recommended flags of ACCELERATE_NEW_LAPACK and ACCELERATE_LAPACK_ILP64 and got these numbers (in Release build within XCode)

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683, 32768]
[0.0001240704409928737, 0.0014102162331557505, 0.016697104096008348, 0.036036036036036036, 0.08452173913043479, 0.12631047113805735, 0.4020613496932515, 1.4059285714285714, 2.803679144385027, 9.619963302752293, 9.702963246973763, 12.841091492776886, 25.420024242424244, 99.40082713329731, 76.71776393255216, 69.59771301632011, 56.53796384052847, 130.37703408544584, 135.33281115581906, 263.91809370171416, 311.56847457055636, 466.12116917482507, 615.717295729097, 532.7772784987365, 672.9165756301664, 540.2428616611207, 665.6119555438962, 539.9115821076888]

Unsure why its 20GFLOPS faster but i'll take it :)

FCLC · 2024-01-05T15:42:48Z

The new interfaces seem to provide a much bigger bump for very small N.

Without checking the instruction stream, I'd assume part of the difference is not firing up the AMX tile for small problem sets where you're better off having lower throughput, but because your latency is so much lower, it comes out in the wash.

I'd assume it's also being smarter about when it starts using a single AMX tile, when it switches from the ecore AMX tile to the pcore attached AMX tile and when it starts issuing calls to both

daemontus · 2024-01-05T20:38:37Z

I got these results with M2 Max and 32GB RAM config:

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683]
[8.965644769947158e-05, 0.00116860351879504, 0.013183644041610876, 0.025924050632911394, 0.03319898900198101, 0.11678832116788321, 0.407055900621118, 1.2924683170267253, 2.818752688172043, 8.086745947279935, 9.364599118942731, 12.130401819560273, 16.60012585735375, 19.565032782448217, 22.240507772213046, 35.27848726271813, 52.515125121783846, 103.44025210872005, 96.07281529313705, 277.8700112248694, 358.5049594624609, 471.35730940175335, 623.3315456859455, 539.3242159590661, 675.8449319058319, 546.1567306369412, 671.1415636609279]
MacBook Pro 16", M2 Max 32GB

Pretty consistent with @willkill07, but a smidge faster. Hope it helps :)

srcc-chekh · 2024-01-05T22:00:07Z

m3 max, 128GB, got 784 max but maybe because I was doing other stuff? I can try again later

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683, 32768, 59049]
[1.952112246454171e-05, 0.0010845115681233934, 0.01150561797752809, 0.0004147149280364301, 0.16743224621038125, 0.09599692809830085, 0.0032652085097902447, 0.333375676430985, 3.183832102603964, 4.191520830168768, 2.360648528595225, 11.431314944158027, 52.59315360501567, 32.546429259994326, 44.33531835953242, 35.791394133333334, 97.78665010220486, 119.43596366160806, 112.1408702667772, 284.20455459298404, 354.892254498892, 512.38087355766, 693.1422611985672, 627.6962714485065, 768.8290439314724, 648.7229257764205, 784.1103685799497, 650.6355194016465, 773.5660835631116]

srcc-chekh · 2024-01-05T22:10:04Z

I entered "72" into the prompt and got:

[2, 3, 4, 8, 9, 10, 16, 27, 32, 64, 81, 100, 128, 243, 256, 512, 729, 1000, 1024, 2048, 2187, 4096, 6561, 8192, 10000, 16384, 19683, 32768]
[9.557031585989391e-05, 0.0010790935614084169, 0.018396090830698476, 0.025520249221183802, 0.048736462093862815, 0.14117314886708549, 0.3337951267215386, 1.4670741251444117, 3.3753605274000824, 11.84831638418079, 11.914648910411623, 18.44014788998608, 22.07040548931289, 35.934029112537175, 30.4348589569161, 42.56910515987076, 57.54855796421973, 104.29522139111678, 165.9180004206121, 382.86248949961544, 427.8850032433897, 580.0605971771008, 759.6805828297933, 669.6056906296085, 796.0521915427684, 687.7378855200425, 794.9496904346319, 654.4304466198903]

796 max this time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results tracking - Paste your results here! #1

Results tracking - Paste your results here! #1

FCLC commented Jan 4, 2024

FCLC commented Jan 4, 2024 •

edited

Loading

neon-sunset commented Jan 4, 2024

shacron commented Jan 4, 2024

FCLC commented Jan 4, 2024

FCLC commented Jan 4, 2024

shacron commented Jan 5, 2024

FCLC commented Jan 5, 2024

willkill07 commented Jan 5, 2024 •

edited

Loading

FCLC commented Jan 5, 2024

daemontus commented Jan 5, 2024

srcc-chekh commented Jan 5, 2024

srcc-chekh commented Jan 5, 2024

Results tracking - Paste your results here! #1

Results tracking - Paste your results here! #1

Comments

FCLC commented Jan 4, 2024

I don't have a great place to store different peoples results, lets add them here!

FCLC commented Jan 4, 2024 • edited Loading

Results as of 2024/01/05

M3 family

M3

M3 Pro

M3 Max (3 channel)

M3 Max (4 channel)

M3 Ultra (at time of writing, no such product exists )

M2 Family

M2

M2 Pro

M2 Max

M2 Ultra

M1 Family

M1

M1 Pro

M1 Max

M1 Ultra

neon-sunset commented Jan 4, 2024

shacron commented Jan 4, 2024

FCLC commented Jan 4, 2024

FCLC commented Jan 4, 2024

shacron commented Jan 5, 2024

FCLC commented Jan 5, 2024

willkill07 commented Jan 5, 2024 • edited Loading

FCLC commented Jan 5, 2024

daemontus commented Jan 5, 2024

srcc-chekh commented Jan 5, 2024

srcc-chekh commented Jan 5, 2024

FCLC commented Jan 4, 2024 •

edited

Loading

willkill07 commented Jan 5, 2024 •

edited

Loading