Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using LTO + PGO + Bolt #592

Open
zamazan4ik opened this issue Dec 22, 2022 · 3 comments
Open

Consider using LTO + PGO + Bolt #592

zamazan4ik opened this issue Dec 22, 2022 · 3 comments

Comments

@zamazan4ik
Copy link

zamazan4ik commented Dec 22, 2022

Did you search GitHub Issues and GitHub Discussions First?
Yes, no results.

Is your feature request related to a problem? Please describe.
Not a problem - an opportunity.

Describe the solution you'd like

DragonflyDB right now does not support building with more advanced optimization techniques like PGO and BOLT. This tooling has an increasing adoption in the community as a tool to additionally optimize programs. With this tooling, there is a huge chance to gain even more performance "for free".

Here I suggest considering an option at least to play with LTO + PGO + Bolt pipeline (or any combination of them) and test, does it give a performance to the project or not. If yes, would be awesome to have prebuilt binaries with more advanced optimization from the scratch. Also, for the users will be helpful to have the ability to tweak manually their own binaries to their own workloads with the integrated into the build scripts functionality.

Also, there are some caveats to consider like:

  • Increased build times
  • BOLT could be still unstable (or even broken) on some architectures

Links:

@romange
Copy link
Collaborator

romange commented Dec 23, 2022

Thank you for suggesting this enhancement. Definitely something we gonna explore in the future. By the way, we already use "-flto" in our release pipeline.

Having said that, based on my experience with Dragonfly, the majority of the CPU there is spent in the kernel, especially with higher throughput. Another (much smaller) part is spent around Boost.Fibers. I yet need to see the use-case where Dragonfly can benefit from these optimzations.

@zamazan4ik
Copy link
Author

I just finished benchmarking Redis with PGO - link. I think these results can be useful for DragonflyDB too.

@zamazan4ik
Copy link
Author

I did some testing of PGO applied to DragonflyDB.

Test environment

  • Fedora 38
  • Linux kernel 6.3.7
  • AMD Ryzen 9 5900x
  • 48 Gib RAM
  • SSD Samsung 980 Pro 2 Tib
  • Clang 16 (from the Fedora repositories). I use Clang just because I prefer LLVM-based tooling
  • DragonflyDB version: the most recent to the date from main branch (commit e71fae7eea921b6396e4184cca7d26ee9960ec0e)

Tested configurations

I have tested the following DragonflyDB configurations:

  • Release (-DCMAKE_BUILD_TYPE=Release)
  • Release with PGO (-DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-fprofile-instr-use=db.profdata")

As a PGO technique, I use -fprofile-instr-generate/-fprofile-instr-use options from Clang. Build instrumented server version, run memtier_benchmark with the instrumented DragonflyDB, collect instrumentation data, then rebuild DragonflyDB again with the collected data.

Benchmark

I use memtier_benchmark with taskset -c 1-4 memtier_benchmark –ratio 0:1 -t 4 -c 30 –distinct-client-seed -d 256 –key-maximum 1000000 –hide-histogram –pipeline 30 --test-time=300. DragonflyDB is started with the command taskset -c 0 dragonfly --logtostderr --proactor_threads=1 . I use one thread since it gives more consistent results (since DragonflyDB is running on the same machine with memtier_benchmark).

Results

All configurations are benchmarked on the same machine, with the same DragonflyDB configuration, multiple times, etc. The results are shown in memtier_benchmark format. I have rechecked - the results are consistent between runs.

Release
ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        55254.34          ---          ---         0.19819         0.19900         0.21500         0.24700     16397.59
Gets       552541.25      3815.65    548725.59         0.19771         0.19900         0.21500         0.23900     22488.64
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     607795.59      3815.65    548725.59         0.19775         0.19900         0.21500         0.23900     38886.23

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        55053.75          ---          ---         0.19766         0.19900         0.21500         0.27900     16338.06
Gets       550535.70      3812.30    546723.41         0.19716         0.19900         0.21500         0.27900     22409.66
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     605589.45      3812.30    546723.41         0.19720         0.19900         0.21500         0.27900     38747.73

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        54142.54          ---          ---         0.19961         0.19900         0.21500         0.23100     16067.64
Gets       541423.33      3712.54    537710.79         0.19906         0.19900         0.21500         0.23100     22029.47
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     595565.87      3712.54    537710.79         0.19911         0.19900         0.21500         0.23100     38097.12

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        54862.17          ---          ---         0.20090         0.19900         0.22300         0.23100     16281.21
Gets       548619.54      3732.90    544886.65         0.20041         0.19900         0.22300         0.23100     22314.94
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     603481.72      3732.90    544886.65         0.20046         0.19900         0.22300         0.23100     38596.15

Release + PGO
ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        56478.72          ---          ---         0.19128         0.19100         0.20700         0.22300     16760.95
Gets       564785.16      4037.36    560747.81         0.19082         0.19100         0.20700         0.22300     23021.66
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     621263.89      4037.36    560747.81         0.19086         0.19100         0.20700         0.22300     39782.62

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        56169.31          ---          ---         0.19353         0.19900         0.21500         0.22300     16669.13
Gets       561691.03      3970.97    557720.06         0.19313         0.19900         0.21500         0.22300     22884.35
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     617860.34      3970.97    557720.06         0.19317         0.19900         0.21500         0.22300     39553.48

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        56491.59          ---          ---         0.19121         0.19100         0.20700         0.24700     16764.77
Gets       564914.02      4039.67    560874.34         0.19080         0.19100         0.20700         0.23900     23027.27
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     621405.61      4039.67    560874.34         0.19084         0.19100         0.20700         0.23900     39792.04

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        57216.28          ---          ---         0.19136         0.19100         0.20700         0.22300     16979.83
Gets       572160.55      4089.20    568071.35         0.19091         0.19100         0.20700         0.22300     23322.08
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     629376.82      4089.20    568071.35         0.19095         0.19100         0.20700         0.22300     40301.91

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        56998.75          ---          ---         0.19322         0.19100         0.21500         0.32700     16915.27
Gets       569985.29      4035.50    565949.80         0.19285         0.19100         0.21500         0.32700     23223.75
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     626984.04      4035.50    565949.80         0.19288         0.19100         0.21500         0.32700     40139.03

Maybe on some other loads the win will be bigger. Also, didn't test BOLT (llvm-bolt) yet. More info about other PGO results for different kinds of software you can find here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants