-
Notifications
You must be signed in to change notification settings - Fork 18
Tilemajor matmul implementation #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Row │ Size Octavian
│ Int64 Float64
─────┼──────────────────
269 │ 4886 1800.67
270 │ 5000 1807.82
271 │ 5117 1788.49
272 │ 5237 1798.98
273 │ 5359 1728.39
274 │ 5484 1707.73
275 │ 5613 1750.0
276 │ 5744 1727.5
277 │ 5878 1726.07
278 │ 6015 1709.47
279 │ 6156 1758.61
280 │ 6300 1789.8
281 │ 6447 1816.43
282 │ 6598 1810.03
283 │ 6752 1828.7
284 │ 6910 1791.21
285 │ 7071 1741.78
286 │ 7237 1726.9
287 │ 7406 1715.8
288 │ 7579 1728.31
289 │ 7756 1762.05
290 │ 7937 1751.56
291 │ 8123 1754.17
292 │ 8313 1769.72
293 │ 8507 1753.06
294 │ 8706 1785.01
295 │ 8909 1751.57
296 │ 9117 1768.85
297 │ 9330 1781.09
298 │ 9548 1833.02
299 │ 9772 1829.03
300 │ 10000 1842.31 I'll add benchmarks from the master branch in an hour or so. |
Current master: Row │ Size Octavian
│ Int64 Float64
269 │ 4886 1745.31
270 │ 5000 1696.69
271 │ 5117 1779.43
272 │ 5237 1759.75
273 │ 5359 1657.65
274 │ 5484 1696.74
275 │ 5613 1651.58
276 │ 5744 1635.12
277 │ 5878 1616.55
278 │ 6015 1636.0
279 │ 6156 1687.92
280 │ 6300 1769.92
281 │ 6447 1741.46
282 │ 6598 1751.08
283 │ 6752 1783.15
284 │ 6910 1773.49
285 │ 7071 1733.6
286 │ 7237 1670.6
287 │ 7406 1649.37
288 │ 7579 1660.92
289 │ 7756 1742.86
290 │ 7937 1736.86
291 │ 8123 1733.38
292 │ 8313 1737.56
293 │ 8507 1746.48
294 │ 8706 1741.73
295 │ 8909 1690.63
296 │ 9117 1599.14
297 │ 9330 1735.78
298 │ 9548 1632.21
299 │ 9772 1747.52
300 │ 10000 1806.31 So this does look like an improvement. CI doesn't like the compile time increase, though. |
Full vector of results for max gflops on this PR (top) vs master (bottom): [40.34582132564841, 49.10519317564656, 57.83779726363916, 56.899033139961325, 64.68719706242351, 75.34026894951074, 89.35570454467579, 65.68873726774008, 73.21357336307155, 71.45129465903896, 80.17237059678308, 85.8050032071841, 93.16683822947185, 90.7559175828576, 108.24178126757612, 74.94998312683796, 77.28818401991094, 80.56081351927256, 81.40020992946764, 82.38276149992409, 87.6656092554581, 92.45198115821557, 96.93934419971752, 79.64347518566545, 85.09949592340742, 84.69135802469137, 95.67575392038603, 85.67103594080336, 88.12655584999598, 89.5651517439227, 107.1309005691329, 86.56242150213512, 90.79968135302408, 86.86441603845732, 99.70037453183521, 92.06870421823692, 171.66843033509699, 166.9582696791831, 188.33787465940054, 150.68715978226064, 160.63740923986379, 160.71117034165255, 171.73496183206106, 170.60333467025725, 156.36941410129097, 166.23646960865946, 240.22433486081667, 205.47320537002108, 208.77641645711842, 215.73424369747897, 231.68507990990025, 222.73662977702668, 228.28352490421454, 238.11480266638455, 334.7516281445537, 281.21959961087504, 290.1883025850951, 281.7451990632319, 302.62945139557263, 296.6965378825891, 295.2230669918233, 305.6456020495303, 372.8751248751249, 322.315581127733, 334.4658840792369, 340.46583572453375, 376.4444020962363, 364.30349780555923, 352.390099009901, 374.09695232474814, 453.54330708661416, 374.92898016775115, 372.95193716883995, 371.0944712611041, 394.33090773005114, 374.4521931328836, 402.08992493085736, 399.8500189753321, 486.669152945844, 400.97774617845715, 402.70680845187127, 406.758518318602, 443.74427467322005, 427.87791741472176, 440.5842920133939, 438.3587908225219, 574.5272129550713, 427.5959440465832, 436.87835283975994, 420.9931508972015, 465.1317319512276, 422.91066350016126, 444.4451358142874, 437.04709529047096, 580.4251805985554, 437.22460027697343, 452.2559331687867, 448.3094751608673, 491.4510874865893, 505.1143470064356, 527.4768824306474, 511.8858426125198, 679.7882076449852, 513.5017052700258, 531.1621403603119, 539.3333136321996, 559.1436162273501, 554.6723681989534, 523.5573484646984, 570.0774405740768, 616.1106265226938, 614.5884327575354, 608.7226624405706, 1180.2132779248634, 1000.7274872686625, 1039.5332385353715, 990.243095329006, 1033.44340654191, 1041.9890901156448, 1110.9077076139756, 1378.8460243721806, 1179.8700013374348, 1396.1375046006626, 1078.7941747572816, 1148.9709507985854, 1266.1745549283544, 1543.8033951509647, 1364.4090349075977, 1630.2863065760685, 1271.4739730583735, 1326.2120675784392, 1381.8868163136265, 1377.409237536657, 1814.3300027005132, 1182.3058217865162, 1238.006864006864, 1216.40015789214, 1274.3684663986214, 1340.8219489120152, 1762.7403212758582, 1422.1795617270557, 1500.6189967982925, 1436.8894148184909, 1811.5540351982715, 1434.4339698224064, 1464.670990192977, 1483.6032535828142, 1271.1574801258496, 1205.166188807476, 1263.4390309223131, 1700.7318212487671, 1282.2539513733543, 1343.5076653682593, 1392.3679180180804, 1355.0662279670973, 1351.1550805262311, 1404.8845530765952, 1850.529181389358, 1425.5430098797196, 1516.5253527063549, 1535.7374614310686, 1534.3753743541363, 1508.3534476702762, 1846.7677154081387, 1412.25448122465, 1274.4416950158366, 1259.481971207228, 1301.7372236007382, 1317.7718932467787, 1677.7200681955671, 1436.729211531401, 1705.5074081037317, 1448.3545645618206, 1728.6501020079459, 1425.3967938433875, 1498.6647460589777, 1381.8697597221692, 1410.1036025289275, 1459.9232017306651, 1484.1826499285692, 1625.0189843011449, 1282.7791586496214, 1399.3890877959468, 1421.2363210527128, 1327.8257755601398, 1537.3181157433942, 1446.0510550428646, 1894.4646630240852, 1536.9515011547344, 1983.239557473163, 1881.277700529956, 1911.6987430462405, 1976.590361841889, 2053.2189763710594, 1666.609418369427, 1676.8946626607965, 1721.996191765017, 1775.6099529197468, 1809.8680087888379, 1828.7828577744554, 1793.6098315966253, 1824.0356457219523, 1868.1032043343773, 1897.8684709417257, 1952.1828836741668, 2002.066195275048, 1863.5254631773832, 1866.7339836488952, 1850.5189626183196, 1860.6231879018646, 1807.0010117869076, 1648.893865094042, 1680.3100408152434, 1671.7127358580833, 1509.4748508112873, 1550.9174850285601, 1526.240797396785, 1589.8528524683322, 1614.8395136547858, 1655.4706383714376, 1623.9052077471797, 1620.367948917266, 1659.6982510286646, 1703.0939708897163, 1618.6605744920648, 1603.556403845013, 1654.2378101769687, 1506.0267417976777, 1542.6900379974843, 1553.4188496768918, 1577.3472114276676, 1621.5751060066825, 1652.1588400714031, 1593.8797653807178, 1656.2655618263411, 1691.0596295545129, 1696.1842096071011, 1692.4884194505992, 1705.9797505739668, 1715.043893049753, 1707.6829577090325, 1685.573877471744, 1745.0100110726469, 1610.1383028338794, 1633.749395415954, 1676.0898043365132, 1676.336815881731, 1739.8883922572516, 1708.9476454393873, 1728.2104501636875, 1788.5291291318058, 1747.6995350634618, 1764.619618815973, 1716.8947774098972, 1772.4279526483185, 1652.0787144766114, 1669.9998629171864, 1690.1282120776746, 1705.003374425393, 1713.0585039179089, 1720.5239107606383, 1750.8437904700693, 1729.562469706851, 1691.0325637976673, 1716.0675686603429, 1730.367186300895, 1759.4191262225352, 1781.0623973370289, 1800.671893693608, 1807.8240382674405, 1788.4893566798794, 1798.984050985084, 1728.394416415473, 1707.7266490627967, 1749.9953528934366, 1727.5007540837853, 1726.0688046616704, 1709.4721470318502, 1758.60524619606, 1789.7958445466677, 1816.431788944306, 1810.031155346502, 1828.6982237369236, 1791.2057693135548, 1741.7784915908455, 1726.8981546550692, 1715.797318378928, 1728.3130407139568, 1762.0525930469937, 1751.5552578843751, 1754.1696481870924, 1769.715831204492, 1753.060144421134, 1785.0073453280284, 1751.5693681138878, 1768.8478700057735, 1781.0860222393733, 1833.0227057421198, 1829.0339107782168, 1842.3120988099997]
[41.197564840296884, 47.06509747551882, 57.52872374688523, 54.11964809384165, 63.74684829511803, 73.57137150618017, 88.23251762097094, 63.30346550856997, 69.59713226732019, 68.47986155599743, 80.23700369076975, 86.52598702650756, 91.47997473152243, 90.12804533910264, 110.29120580235721, 74.45910511380644, 76.35200421246627, 82.36180228648284, 83.94147307939969, 81.97983193277312, 90.47184285082312, 94.9393085111398, 96.74003608331967, 79.56531365313654, 84.88097809049427, 84.25152642290686, 97.47813053549531, 84.28119800332777, 86.89152810768013, 89.22833935018052, 109.33629452464336, 89.27012499190467, 93.3686200378072, 96.67092224451333, 101.18066278655422, 94.62125538653237, 172.23038131469522, 165.6794063671906, 188.80409731113957, 150.8513912040005, 161.14477246358126, 160.66250832677287, 173.7295360474455, 164.26900584795322, 149.79927065165688, 164.763974471831, 239.83065892796176, 213.03692626251006, 217.85618579723092, 217.26330265524172, 241.57020634121793, 223.74906900328585, 232.590761223162, 239.48313291476006, 350.3427998663548, 279.56396335256187, 286.9698885376809, 282.99707266074233, 299.28486066310614, 304.97508896797154, 311.1895161290323, 306.86853386681906, 365.78994936571024, 297.5652728199898, 324.72762888433795, 306.6571093970844, 344.67035986913845, 326.62641599427644, 340.5196731114212, 346.60035149384885, 449.75843053047674, 325.6378676470589, 354.19311840044963, 341.35263609566806, 368.611342169705, 341.79769027410595, 385.6053349499848, 375.9517573595005, 483.67556484365764, 353.1529681182238, 376.5131217921818, 416.351945854484, 465.82696477978016, 440.9398704902868, 451.6651599089148, 453.9077493216862, 591.1787847149718, 433.1329491525423, 447.0007463192889, 430.13793103448273, 463.1773969430292, 419.94334459066033, 460.6651576695296, 439.94415207200996, 593.4000659413123, 431.7885117493472, 450.0060453400504, 451.50391596793514, 495.1049390803091, 516.4622133599203, 537.8398660740057, 543.2856197033897, 706.2221105166781, 532.4670643951042, 550.3677057858404, 536.4165417511683, 581.1769524341432, 590.34624827834, 538.7795383352088, 568.8867671192004, 637.2917055565925, 638.2221576673082, 637.6358030965207, 1206.3710605645383, 990.3029222874354, 999.5580242693555, 1052.0785418486068, 1060.3760097448392, 1053.8311817279048, 1114.233388119277, 1385.5625465124144, 1186.814381327145, 1406.073689673067, 1088.066804482646, 1126.6291780533952, 1234.4163403534765, 1536.4419780490814, 1326.9439840239643, 1634.867878041269, 1336.9542712249715, 1324.8264887888774, 1359.324521847302, 1352.131126304426, 1822.3681735985533, 1193.9451357778883, 1241.6022372808432, 1239.082328106152, 1288.8036595991869, 1373.8355951919348, 1785.9600725952812, 1414.5070349589987, 1502.414839509339, 1486.9520837448156, 1834.3419169591175, 1425.6296738661626, 1444.0675587161973, 1432.8846939656414, 1391.173189643843, 1232.8419657599723, 1288.3617074912818, 1726.7673174716094, 1312.3323449932443, 1361.4113706319029, 1398.0601094789358, 1353.560504569926, 1355.6086548885178, 1392.8306400484653, 1899.9026412666644, 1438.0441493315861, 1537.55034628389, 1432.5605629486956, 1514.2096112416236, 1508.5351131630453, 1919.127054594794, 1255.3646916864745, 1290.8671396120915, 1282.7425606296567, 1343.0747228633045, 1352.332535176207, 1726.5135653293526, 1460.205414376333, 1719.2499801307092, 1431.8200819711105, 1767.0826797798131, 1438.9452678734992, 1499.2531785449733, 1471.0423372728355, 1446.1927702654482, 1480.5527006721256, 1512.591733243291, 1624.0525110972749, 1557.8120279286097, 1438.8749751825724, 1460.9623749830296, 1324.6278063993657, 1559.0355993597843, 1554.360889010677, 1972.8378479938021, 1580.0707432890786, 1896.2289718086888, 1752.9133647911913, 1764.693057233862, 1778.1117361394543, 2051.76242442751, 1662.634324092426, 1688.159421593669, 1714.3281030553676, 1770.9373437126496, 1786.9480600848958, 1829.7477898789434, 1721.119801363574, 1757.94742876309, 1820.871418119373, 1802.443499466438, 1828.8944489680907, 1847.9822479228749, 1622.464103546999, 1605.7200598816519, 1586.1483108212356, 1476.1265230791555, 1431.1423292719423, 1423.6637388724039, 1440.8753148784306, 1516.202504305341, 1486.975507808609, 1545.9871410067187, 1502.100632211207, 1560.6060656381044, 1595.750893440515, 1625.2396842937449, 1612.1541710709962, 1620.4643951456683, 1673.7117284056949, 1675.291435844321, 1612.6254199198218, 1603.884414958813, 1649.3346203244282, 1502.3585454989409, 1522.8380341702248, 1524.382633878463, 1576.86892432821, 1616.6691458775783, 1610.2294854474687, 1589.946782996978, 1651.1534967603236, 1678.7365828640532, 1697.0499208859226, 1729.203170817776, 1714.3287836968639, 1711.903652199903, 1709.4336696830592, 1596.6552860700224, 1692.8925858238686, 1601.4781962248871, 1611.1468119512679, 1685.1565627153964, 1672.3507638824317, 1714.5495219556565, 1683.337783580443, 1733.5601753445228, 1769.7243994742817, 1741.6116681735366, 1740.238607295019, 1668.580615658366, 1730.5267070927782, 1644.754536855583, 1632.3595471457961, 1656.8051498942423, 1705.8379493216528, 1721.541056089482, 1721.8175073525333, 1726.9177688042128, 1696.1839698203942, 1666.9182534252132, 1590.6112184953847, 1600.4044074046271, 1725.5059412008961, 1755.179633003724, 1745.3052987118758, 1696.6948858697826, 1779.4282791133296, 1759.7541736422993, 1657.6515500160326, 1696.742053268825, 1651.5758122179027, 1635.1223112830924, 1616.5458194130292, 1636.0013653463366, 1687.9191748889323, 1769.9180002123007, 1741.4637015129506, 1751.0792423190032, 1783.1455649030156, 1773.48881743802, 1733.5962208525084, 1670.6004761652978, 1649.3732137075401, 1660.9188544064284, 1742.8593385914367, 1736.856510067945, 1733.3772188736275, 1737.558047945733, 1746.4789194619843, 1741.7324508484026, 1690.6337063771164, 1599.1374048613611, 1735.7815797082878, 1632.2066478631368, 1747.5176477354014, 1806.30705869422] Focusing on the largest 150 (as this is when packing starts to matter), on this PR: julia> rb.sizes[150] # size range: 313:10_000
313
julia> hp = rb.gflops[150:end,1,BLASBenchmarksCPU.get_measure_index(:minimum)];
julia> using StatsBase, Statistics
julia> summarystats(hp)
Summary Stats:
Length: 151
Missing Count: 0
Mean: 1666.606157
Minimum: 1259.481971
1st Quartile: 1565.383031
Median: 1705.979751
3rd Quartile: 1774.018953
Maximum: 2053.218976 On master: julia> hp = rb.gflops[150:end,1,BLASBenchmarksCPU.get_measure_index(:minimum)];
julia> summarystats(hp)
Summary Stats:
Length: 151
Missing Count: 0
Mean: 1631.788354
Minimum: 1255.364692
1st Quartile: 1541.768744
Median: 1657.651550
3rd Quartile: 1733.578198
Maximum: 2051.762424 |
Codecov Report
@@ Coverage Diff @@
## master #91 +/- ##
==========================================
- Coverage 87.86% 86.60% -1.27%
==========================================
Files 11 11
Lines 651 724 +73
==========================================
+ Hits 572 627 +55
- Misses 79 97 +18
Continue to review full report at Codecov.
|
The non-coverage CI jobs seem to complete quickly. It's just the coverage jobs that are really slow. Maybe decrease the size of the matrices in the coverage jobs? |
And maybe also decrease the number of different matrices that we test in the coverage jobs? |
I think lots of different sizes is fine. For example:
It's the initial compilations that take an eternity. |
I’m away for a few days so I can’t review this right now, hopefully on Wednesday or Thursday. Feel free to merge without me if you’re in any hurry. |
This PR though is about using a tile-major data layout for We're all familiar with column major. A = rand(200,200);
Atm = permutedims(reshape(A, (8, cld(size(A,1), 8), size(A,2))), (1,3,2)); Basically, A[1:8,:]
A[9:16,:]
A[17:24,:] etc. We order What's the advantage of this? With AVX2, the microkernel is 8x6. C[(1:8) .+ m, (1:6) .+ n] = Apack[(1:8) .+ m, :] * Bpack[:, (1:6) .+ n] In other words, we calculate this 8x6 block of If we make all data in the Still, LoopVectorization is emitting software prefetch instructions for What's the advantage of column-major? |
matmul_params(::Val{T}) where {T <: Base.HWReal} = LoopVectorization.matmul_params() | ||
|
||
function block_sizes(::Type{T}, _α, _β, R₁, R₂) where {T} | ||
function block_sizes(::Val{T}, _α, _β, R₁, R₂) where {T} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to wrap the type in a Val
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To throw an error if not specialized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to, but I think I prefer it as a style.
It should force specialization, just like Array{T}
always specializes on T
.
end | ||
end | ||
|
||
if !pack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure it's if !pack
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
pack
means it is packing Apack
, and !pack
that it's using Apack
.
When pack
, it only evaluates a single n_r
tile of Bpack
, while with !pack
, it needs to evaluate all of those remaining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do it in this awkward way because I haven't gotten around to adding "tile major" memory layout support to LV, and think this should wait until after the rewrite.
So the code is written to manually iterate over m_r
and n_r
.
The difference between tile major and a m_r x K x (M / m_r)
array is the reaminders. The last iteration along the (M/m_r)
axis will only be partially filled, i.e. we won't have a full m_r
iterations on the first axis.
I'll merge this. It does hurt compile times by a lot more than it helps benchmarks, but it does help benchmarks. We can get complex numbers to use this later. |
The main change is to
macrokernels.jl
, which adds amacro kernel
for convenience.It also passes types around as
Val{T}
instead ofType{T}
, which probably should've gone in a separate PR.I am not sure if it should be merged.
It makes compile times a fair bit worse, and from limited benchmarks doesn't really seem to help performance.