Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improved a lot in the last 2 weeks #1624

Closed
fenrus75 opened this issue Jun 17, 2018 · 5 comments
Closed

Performance improved a lot in the last 2 weeks #1624

fenrus75 opened this issue Jun 17, 2018 · 5 comments

Comments

@fenrus75
Copy link
Contributor

[Not a bug just wanted to put this somewhere useful in github]

Below is a performance summary from git 36c4523 to what you get at the end of my other pull request, on a Core i9 cpu (so with AVX 512); basically all things done i the last two weeks. You can see at the small matrixes end the benefit of the work from @sandwhichmaker and @oon3m0oo ; the threading improvements in the mid of the range and the AVX512 across all of the range.

   Matrix          SGEMM cycles    MPC                           DGEMM cycles      MPC
   1 x 1                  250.0    0.2   5.8x                           309.2      0.2   4.7x
   2 x 2                  272.7    0.3   5.4x                           335.6      0.3   4.4x
   4 x 4                  329.0    0.8   4.7x                           460.1      0.4   3.5x
   6 x 6                  648.0    0.5   2.8x                           696.8      0.6   2.5x
   8 x 8                  481.3    2.2   3.6x                           583.5      1.8   3.3x
  10 x 10                 946.8    1.4   2.3x                          1371.4      0.9   1.7x
  16 x 16                1153.6    4.5   2.3x                          1419.6      3.7   2.3x
  20 x 20                2092.9    4.3   1.8x                          3452.7      2.5   1.3x
  32 x 32                4453.7    7.8   1.6x                          6107.3      5.6   2.1x
  40 x 40                7979.4    8.3   1.5x                         12939.8      5.1   1.6x
  64 x 64               20793.9   12.8   1.5x                         40296.7      6.6   1.6x
  80 x 80               39823.5   12.9   3.9x                         63438.7      8.1   2.4x
  96 x 96               44082.8   20.2   4.4x                         77066.3     11.5   2.5x
 100 x 100              53297.6   18.8   3.8x                         86400.7     11.6   2.3x
 112 x 112              56750.1   24.9   4.1x                         97351.1     14.5   2.2x
 128 x 128              62867.8   33.5   5.1x                        114024.4     18.4   2.3x
 150 x 150             106685.3   31.7   2.8x                        181187.9     18.7   1.5x
 200 x 200             176946.9   45.3   2.2x                        311688.3     25.7   1.3x
 256 x 256             290318.2   57.8   2.0x                        531438.5     31.6   1.3x
 300 x 300             316617.5   85.3   2.9x                        654855.4     41.2   1.6x
 400 x 400             851045.7   75.2   1.2x                       1583273.4     40.4   1.1%
 500 x 500            1395180.0   89.6   1.3x                       2553968.7     48.9   1.2x
 512 x 512            1671261.6   80.3   1.3x                       2935469.4     45.7   1.1x
 600 x 600            1942775.7  111.2   1.9x                       3894347.7     55.5   1.4x
 700 x 700            4391303.0   78.1   1.2x                       7232828.9     47.4   1.1x
 800 x 800            4427185.5  115.7  10.2%                       8933843.5     57.3   1.1x
 900 x 900            8193011.1   89.0   1.1x                      14320142.3     50.9   1.2x
1000 x 1000           9294957.2  107.6   1.2x                      16591323.5     60.3   1.2x
1024 x 1024          11053822.8   97.1   1.2x                      18804424.6     57.1   1.2x
1536 x 1536          31137524.8  116.4   1.2x                      58578031.3     61.9   1.2x
2000 x 2000          55566820.6  144.0   1.2x                     110392289.5     72.5   1.2x
@sandwichmaker
Copy link

Thanks @fenrus75 . @oon3m0oo gets the credit for the actual work. I just nudged him :). We are not done yet.

@martin-frbg
Copy link
Collaborator

Good to see the rusty handbrake cable repaired after all these years :-)

@oon3m0oo
Copy link
Contributor

Thanks. =)

Funnily enough @sandwichmaker and I have known each other for about 15 years or so, and yet I think this might be the first time we've directly worked on something together.

Hopefully later today (or tomorrow) I'll have TLS straightened out (support non-keyword TLS, fast for Android, etc.). It's getting close, though. I'll need @sandwichmaker's help to run the Android benchmarks, though... nudge nudge.

@oon3m0oo
Copy link
Contributor

I've uploaded #1625 which might eliminate a few more cycles. I'd be interested in knowing if it did, since I'm not measuring cycle counts, just time.

@martin-frbg
Copy link
Collaborator

Unfortunately the ongoing problems with the TLS code led me to making the old memory.c the default again in 0.3.3. You can still get the latest revision of the new code (based on PR #1739 with interim fixes for some new races it introduced) by building with USE_TLS=1. Hopefully this can be made the default in 0.3.4 once remaining issues like those seen in #1735 are understood and resolved.

@fenrus75 fenrus75 closed this as completed Oct 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants