New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New zstd 1.5.5 version is two times slower in compression speed than older 1.4.5 version #3906
Comments
The sample labelled So, that gives a few possibilities to look into :
Also : use |
I changed the order of execution and added -T32 option: same result time ./zstd1.4.5 -4 -T32 core-file -c > /dev/null time ./zstd1.5.5 -4 -T32 core-file -c > /dev/null |
START=$(date +%s); time ./zstd1.4.5 -vvv -4 -T32 core-file -o 3 ; END=$(date +%s); echo Elapsed time $((END-START)) Sec real 0m2.925s START=$(date +%s); time ./zstd1.5.5 -vvv -4 -T32 core-file -o 4 ; END=$(date +%s); echo Elapsed time $((END-START)) Sec real 0m8.588s For strange reason with -vvv option zstd even reports wrong execution time: |
Some more advanced tests that could be attempted :
I was also wondering if the definition of "level 4" has changed between This could be confirmed by using the internal benchmark module, bypassing potential I/O bottleneck : |
./zstd1.4.5 -b4 core-file ./zstd1.5.5 -b4 -T32 core-file ./zstd1.5.5 -b4 core-file Where came from message "Not enough memory" ? Every new run comes with new bug :) cat /proc/meminfo |
The benchmark module has an internal memory limit (8 GB, divided into 3 buffers, hence ~2.7 GB per buffer). If you want to use more memory, you can change the limit manually and recompile.
All this seems to point towards I/O as the potential bottleneck. And it's logical, considering the extreme speeds requested. The next test could employ a After that, presuming the performance difference comes from the I/O component within |
I made a few local tests, to attempt to mimic the scenario, using a highly compressible synthetic data source of 13 GB. On a
As can be seen in this measurement, In case it would be
Basically, same conclusion. So the reported issue is not reproduced. |
I observed a similar discrepancy. I measured zstd's compression speed on a weak VM (zstd v1.4.8) that has only 4 cores and 8GB RAM and repeated the experiment on a more powerful machine (zstd v1.5.5), which is a few years old but has 128GB of RAM and 64 cores. My expectation was that the latter should be faster (if only because of the more recent software version that promises speedups in the release notes since then). In a small experiment, I compared 1) zstd, 2) zstd with the "--long" parameter, and 3) lrzip. Each strategy was restricted to only use 4 cores. Each of them was evaluated on compression speed and file size over several compression levels. The runs on v1.5.5 were about 10-15% slower than the runs using v1.4.8. Might not be as drastic as "two times slower", but this was contrary to my expectations and therefore seems suspicious to me. Additional differences that might be Version 1.4.8 was installed via Debian sources, whereas v1.5.5 was manually compiled with The data to be compressed was a small set of web crawling results, where the single files are of size up to 4GB. Unfortunately, I cannot share the files, but they are comparable to the common crawl web archive files. |
As I said before this is a regular core file from crashed sofware. It can have large areas of unused/uninitialized memory. It is nothing unusual. Probably you are trying to reproduce the problem on your notebook with SSD drive but I have an old-school server with lots of regular HDDs. |
In which case, I would assume the issue started happening between |
Thank you for reporting @Dmitri555 . However, I didn't manage to reproduce the x2 slow-down reported here. My suspicious are that it's either the HDDs or NUMA, but there are just guesses. @Dmitri555 if you can run the same experiment only ram to ram without going through the HDD it'd be helpful. Additionally, if you can make such the process is pinned to one socket (in case there are multiple CPUs on the machine) that'd allow us to rule out NUMA as well. As for the 15% slowdown, I've spent some time debugging this and I believe this is caused by the additional overhead introduced by AsyncIO's thread synchronization. This should only manifest in cases where the read, write and compression workloads are extremely fast to the point where the added synchronization syscalls actually take a meaningful time of the runtime. Even so, it only reproduced for me on an AMD machine. I don't think there's an easy fix here, one solution is to increase the size of our read buffers, but this could have negative results for other use-cases. The better solution would be to add an io_uring compatible asyncio implementation, that should allow us to remove most of the overhead. We've built the asyncio module with io_uring in mind, so the same API should work, but implementing and testing would still take some work. |
Async I/O make performance worser on file reading DISK DRIVE: time zstd1.5.6 -o /dev/null -T0 -3 core-file time zstd1.5.6 -o /dev/null --no-asyncio -T0 -3 core-file time zstd1.4.5 -o /dev/null -T0 -3 core-file TMPFS (RAM DRIVE) time zstd1.5.6 -o /dev/null -T0 -3 core-file time zstd1.5.6 --no-asyncio -o /dev/null -T0 -3 core-file time zstd1.4.5 -o /dev/null -T0 -3 core-file |
New 1.5.5 version is two times slower than older one 1.4.5 in compression speed
New Version 1.5.5
START=$(date +%s); time ./zstd1.5.5 -T0 -4 -o testnew core-fastdpi_wrk0.470908.11;END=$(date +%s); echo Elapsed time $((END-START)) Sec
core-fastdpi_wrk0.470908.11 : 0.15% ( 13.1 GiB => 20.5 MiB, testnew)
real 0m7.465s
user 0m6.487s
sys 0m7.535s
Elapsed time 8 Sec
Old version 1.4.5
START=$(date +%s); time ./zstd1.4.5 -T0 -4 -o testold core-fastdpi_wrk0.470908.11;END=$(date +%s); echo Elapsed time $((END-START)) Sec
core-fastdpi_wrk0.470908.11 : 0.15% (14083944448 => 21562128 bytes, testold)
real 0m2.975s
user 0m7.905s
sys 0m1.707s
Elapsed time 3 Sec
The text was updated successfully, but these errors were encountered: