Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate using Profile-Guided Optimization (PGO) and LLVM BOLT #2192

Open
zamazan4ik opened this issue Oct 26, 2023 · 3 comments
Open

Evaluate using Profile-Guided Optimization (PGO) and LLVM BOLT #2192

zamazan4ik opened this issue Oct 26, 2023 · 3 comments

Comments

@zamazan4ik
Copy link

Feature Request

If this is a feature request, please fill out the following form in full:

Describe the problem the feature is intended to solve

According to my tests, Profile-Guided Optimization (PGO) improvements on multiple projects. The results are available here. With PGO I got measurable improvements in many projects, including network-based ones like Envoy and HAProxy. So I think optimizing TensorFlow Serving with PGO would be a good idea at least to try. With PGO we possibly will be able to improve the Tensorflow serving performance and reduce its CPU overhead.

Describe the solution

  • Perform PGO benchmarks on Tensorflow Serving. If it shows improvements - add a note to the documentation about possible improvements in Tensorflow Serving's performance with PGO.
  • Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize the Tensorflow Serving according to their own workloads.
  • Optimize pre-built binaries with PGO

Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (CPython, Clang, and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.

Describe alternatives you've considered

No viable alternative here.

Additional context

Here you can look at how PGO is already integrated into multiple projects:

@singhniraj08
Copy link

@zamazan4ik,

We have documented Performace Guide for Tensorflow Serving to help users get optimal model server performance.
Can you please explain in detail what needs to be done from our end to implement PGO with Tensorflow Serving? Based on that I can take this feature implementation to the team. Thank you!

@zamazan4ik
Copy link
Author

Can you please explain in detail what needs to be done from our end to implement PGO with Tensorflow Serving? Based on that I can take this feature implementation to the team.

Sure! At first, you need to integrate the PGO-specific compiler flags into your build pipeline (here are described flags for Clang, here - for GCC. If you want to support other compilers - please use the corresponding documentation to these compilers). I recommend starting with the Instrumentation PGO since generally easier to implement.

Below I collected some examples of how PGO is integrated into the build scripts in other projects (so you can take a look at the existing implementations):

After that point you need to perform the training and optimization PGO phase on your benchmarks so you can estimate - does PGO have any positive effects or not on TF Serving performance (RPS, CPU usage).

This process is simple (for the Clang compiler):

  • Compile TF Serving in Instrumentation mode (-fprofile-instr-generate compiler option for Clang)
  • Run instrumented TF Serving on the benchmark workload
  • After the finish, TF Serving should generate some .profraw files
  • Prepare them with llvm-profdata
  • Recompile TF Serving once again with the generated above profile information
  • Congratulations - you got a PGO-optimized TF Serving binary! Run the benchmarks once again to measure the performance improvements

Only after you can think optimizing TF Serving prebuilt binaries with some predefined sample real-life workload. You need to choose the sample workload, integrate profile gathering into your CI/CD pipeline, etc. On the links above you also can get some insights about such a way.

We have documented Performace Guide for Tensorflow Serving to help users get optimal model server performance.

Awesome, that you have such a guide! If PGO has some positive effects on TF Serving performance, I think you can extract this guide with an additional chapter about rebuilding TF Serving with PGO or even create a dedicated page about PGO in the TF Serving documentation. Here I collected some examples of such documentation in various projects (maybe they can help you with shaping your PGO documentation for TF Serving):

Hope this information was helpful!

@singhniraj08
Copy link

@zamazan4ik, Thank you for the detailed explanation. We will discuss this implementation internally and update this thread.

@singhniraj08 singhniraj08 removed their assignment Nov 2, 2023
@YanghuaHuang YanghuaHuang removed their assignment Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants