Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance issues #23

Closed
ThomasWaldmann opened this issue Feb 28, 2022 · 11 comments
Closed

performance issues #23

ThomasWaldmann opened this issue Feb 28, 2022 · 11 comments

Comments

@ThomasWaldmann
Copy link
Collaborator

ThomasWaldmann commented Feb 28, 2022

linux seems good, macOS (x64, Intel) mediocre, macOS (M1, Apple Silicon) the worst.

See there: #21

TODO: move insights from there to issues (guess best place is not here, but in libdeflate's issue tracker.

@ghost
Copy link

ghost commented Feb 28, 2022

Can you try forcing the ARM cpu_features.h file to be enabled for your M1 environment? I wonder if that would change anything.

@ghost
Copy link

ghost commented Feb 28, 2022

Since I lack access to any MacOS M1 environment, it's not like I can test anything like this.

@ThomasWaldmann
Copy link
Collaborator Author

That wouldn't help, the code there is linux specific. But I guess best is to continue in libdeflate's issue tracker and point them to the zlib-ng code i found.

@ghost
Copy link

ghost commented Feb 28, 2022

Ok. I can't say I'm surprised. For a long time Linux was the only serious ARM target of note.

@ThomasWaldmann
Copy link
Collaborator Author

I was curious about how borgbackup's currently bundled crc32 code performs on macOS 12 with M1 cpu (again on my local machine):

Name (time in us)                Mean              StdDev                Median                   OPS          
---------------------------------------------------------------------------------------------------------------
test_zlib_crc32              560.6912 (1.0)       14.5614 (1.0)        563.9375 (1.0)      1,783.5129 (1.0)    
test_borg_crc32_slice8     7,326.7399 (13.07)    117.9650 (8.10)     7,324.9590 (12.99)      136.4864 (0.08)   

have_clmul is False, thus borg_crc32_clmul is not available (only implemented on x64 within the code currently bundled into borg).

@ThomasWaldmann
Copy link
Collaborator Author

Benchmarks done on github CI - (linux, x64):

Name (time in us)                Mean              StdDev                Median                   OPS          
---------------------------------------------------------------------------------------------------------------
test_borg_crc32_clmul        515.9855 (1.0)       19.2178 (1.0)        520.4060 (1.0)      1,938.0391 (1.0)    
test_borg_crc32_slice8     3,958.2522 (7.67)      84.5450 (4.40)     3,973.1480 (7.63)       252.6368 (0.13)   
test_zlib_crc32            7,500.5678 (14.54)    116.3165 (6.05)     7,520.1550 (14.45)      133.3232 (0.07)

Benchmarks done on github CI - (macOS, x64):

Name (time in ms)            Mean            StdDev            Median                 OPS          
---------------------------------------------------------------------------------------------------
test_zlib_crc32            3.1880 (1.0)      0.39 (5.85)       3.0442 (1.0)      313.6777 (1.0)    
test_borg_crc32_slice8     4.6606 (1.46)     0.0656 (1.0)      4.6442 (1.53)     214.5655 (0.68)

@ThomasWaldmann
Copy link
Collaborator Author

ThomasWaldmann commented Mar 2, 2022

code: borgbackup/borg#6387 - it would also benchmark deflate.crc32 as soon is that is in a pypi release.

@ghost
Copy link

ghost commented Mar 2, 2022

It makes me wonder how libdeflate would fair against zlib-ng. That might explain why Python on MacOS is so different. Whichever version is in active use may be using zlib-ng instead of regular zlib. If so, should we just import the zlib-ng code since it may be doing better than libdeflate?

@ThomasWaldmann
Copy link
Collaborator Author

yeah, zlib-ng definitely also worth testing (but maybe a little bit off-topic here).

@ThomasWaldmann
Copy link
Collaborator Author

Updated performance results using libdeflate 1.12 on macOS M1:

(borg-env) tw@mba2020 borg % borg benchmark cpu
Non-cryptographic checksums / hashes ===========================
crc32 (zlib, used)       1GB        0.055s
crc32 (libdeflate)       1GB        0.027s
xxh64                    1GB        0.122s

Great update, it used to be slower, but now libdeflate 1.12 is twice as fast as zlib crc32 on macOS M1!

@ThomasWaldmann
Copy link
Collaborator Author

guess this is solved by the new libdeflate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant