[SYSTEMDS-3393] Implement SIMD usage for basic dense dense MM #1643

kev-inn · 2022-06-20T20:31:48Z

`DoubleVector` replacement for matrix multiply

JDK 17 adds Vector classes to use SIMD instructions. This PR replaces the basic dense dense matrix multiply with an equivalent DoubleVector implementation. It is necessary to use JDK 17, therefore we should not merge this yet, but keep it in staging for future reference.

As an experiment we check a simple matrix multiply:
Z = X %*% Y, $X\in \mathbb{R}^{n\times k}, Y\in \mathbb{R}^{k\times m}$

The experiment script performs 10 matrix multiplications and saves the time of the last 5 to give the JVM some time to optimize.

Vary rows n, m fixed at 1000

Alpha Node

$k = 1000$	$k = 10000$

Lima Node

$k = 1000$	$k = 10000$

Vary cols m, n fixed at 1000

Alpha Node

$k = 1000$	$k = 10000$

Lima Node

$k = 1000$	$k = 10000$

Conclusion

The implementation seems to boost the performance in most cases. The case where we vary the number of columns n on the alpha node needs some more exploration, but it seems we are never worse than the current implementation.

Experiment Script

X = read($Xfname);
Y = read($Yfname);

lim = 10;
R = matrix(0, rows=lim, cols=1);
for (i in 1:lim) {
  t1 = time();
  Z = X %*% Y;
  t2 = time();
  R[i,1] = (t2-t1)/1000000;
}

print(sum(Z));
res = R[5:lim,];
write(res, $fname, format="csv", sep="\t");

phaniarnab · 2022-06-20T20:44:08Z

This looks pretty good. I think Alpha has wider SIMD registers, which explains why most configurations perform better in Alpha.
Is there any JVM flag that you needed to enable? If so, can you please mention those as well for documentation purposes? @kev-inn

kev-inn · 2022-06-20T21:16:10Z

This looks pretty good. I think Alpha has wider SIMD registers, which explains why most configurations perform better in Alpha. Is there any JVM flag that you needed to enable? If so, can you please mention those as well for documentation purposes? @kev-inn

Note the varying columns case for Alpha though, Lima is faster with DoubleVector than Alpha. I will take a closer look why that might be.

Yes, the flag is --add-modules=jdk.incubator.vector, which has to be added when running systemds (see the systemds script).

Baunsgaard · 2022-06-20T22:32:57Z

What if you use some of the other experimental JVM that should have better support ?

kev-inn · 2022-06-21T08:08:33Z

What if you use some of the other experimental JVM that should have better support ?

Which ones do you have in mind, and what kind of support do you expect (which the current does not support)? To clarify, do you expect better performance or that we can remove the --add-modules=jdk.incubator.vector flag?

Baunsgaard · 2022-06-23T14:24:12Z

What if you use some of the other experimental JVM that should have better support ?

Which ones do you have in mind, and what kind of support do you expect (which the current does not support)? To clarify, do you expect better performance or that we can remove the --add-modules=jdk.incubator.vector flag?

Project Panama: https://openjdk.java.net/projects/panama/
And JDK 19 have the official full support for vectorizing; https://openjdk.org/jeps/426

JDK 17 only officially have the API, this does not guarantee that the instructions are correctly vectorized. Hence i am positively looking at your improvements already/

kev-inn · 2022-07-01T09:25:22Z

I ran the experiments again with jdk-19 (early access).
It seems to still require the same additional flags and still is part of the incubator module. This might be due to the early access version.
Results are similar, with the addition of sometimes one single iteration, of our sample, taking ~3-4x the time. Probably due to GC. This already existed before though and was introduced in my last commit. Or maybe I just got lucky in my first run, before the second commit.

kev-inn · 2022-07-31T20:21:30Z

More experiments

Dump of more experiments and updated plots.

All in all the results look promising, but we can also clearly some of the weak spots.

Alpha

Variable columns

Variable rows

Lima

Variable columns

Variable rows

kev-inn · 2022-07-31T20:36:02Z

Closed by 9bf0a9f (messed up the commit message)

kev-inn changed the title ~~Implement SIMD usage for basic dense dense MM~~ [SYSTEMDS-3393] Implement SIMD usage for basic dense dense MM Jun 21, 2022

kev-inn and others added 4 commits July 18, 2022 13:03

Implement SIMD usage for basic dense dense MM

422380c

Improve performance slightly

cdd25d3

Dot product vectorized

ecfaf4c

Copy to staging and add readme

b208866

kev-inn force-pushed the double_vector_mm branch from a07ccd8 to b208866 Compare July 31, 2022 15:17

kev-inn added 2 commits July 31, 2022 17:21

Revert formatting changes to LibMatrixMult.java

91aed8c

Improve performance explanation

a44ce13

kev-inn marked this pull request as ready for review July 31, 2022 20:21

kev-inn closed this Jul 31, 2022

Baunsgaard reopened this Jul 31, 2022

Baunsgaard closed this Aug 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYSTEMDS-3393] Implement SIMD usage for basic dense dense MM #1643

[SYSTEMDS-3393] Implement SIMD usage for basic dense dense MM #1643

kev-inn commented Jun 20, 2022

phaniarnab commented Jun 20, 2022

kev-inn commented Jun 20, 2022

Baunsgaard commented Jun 20, 2022

kev-inn commented Jun 21, 2022

Baunsgaard commented Jun 23, 2022 •

edited

kev-inn commented Jul 1, 2022

kev-inn commented Jul 31, 2022

kev-inn commented Jul 31, 2022

[SYSTEMDS-3393] Implement SIMD usage for basic dense dense MM #1643

[SYSTEMDS-3393] Implement SIMD usage for basic dense dense MM #1643

Conversation

kev-inn commented Jun 20, 2022

DoubleVector replacement for matrix multiply

Vary rows n, m fixed at 1000

Alpha Node

Lima Node

Vary cols m, n fixed at 1000

Alpha Node

Lima Node

Conclusion

Experiment Script

phaniarnab commented Jun 20, 2022

kev-inn commented Jun 20, 2022

Baunsgaard commented Jun 20, 2022

kev-inn commented Jun 21, 2022

Baunsgaard commented Jun 23, 2022 • edited

kev-inn commented Jul 1, 2022

kev-inn commented Jul 31, 2022

More experiments

Alpha

Variable columns

Variable rows

Lima

Variable columns

Variable rows

kev-inn commented Jul 31, 2022

`DoubleVector` replacement for matrix multiply

Baunsgaard commented Jun 23, 2022 •

edited