Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve /metrics endpoint performance #14453

Merged
merged 28 commits into from
May 12, 2022

Conversation

tjiuming
Copy link
Contributor

@tjiuming tjiuming commented Feb 24, 2022

Motivation

  1. Avoid more than once PrometheusMetricsGenerator#generate invoke in a period(1 minute), for the purpose of saving memory and reducing CPU usage.
  2. Persist metrics data into file and send to client by jetty's send_file. Avoid huge heap memory allocations, too much mem_copy operations, too much memory resizes, high GC pressure and high CPU usage.

Modifications

  1. Using sliding window in PrometheusMetricsGenerator#generate, invoke PrometheusMetricsGenerator#generate once in a period, cache the result and return to client directly.
  2. Persist metrics data into temp file, send it to client by send_file.
  3. Delete temp files when JVM exits.

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API: (no)
  • The schema: (no)
  • The default values of configurations: (no)
  • The wire protocol: (no)
  • The rest endpoints: (no)
  • The admin cli options: (no)
  • Anything that affects deployment: (no)

Documentation

Check the box below or label this PR directly (if you have committer privilege).

Need to update docs?

  • doc-required

    (If you need help on updating docs, create a doc issue)

  • no-need-doc

    (Please explain why)

  • doc

    (If this PR contains doc changes)

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Feb 24, 2022
@tjiuming tjiuming changed the title Dev/prometheus performance Improve /metrics endpoint performance Feb 24, 2022
@tjiuming
Copy link
Contributor Author

Tests to be completed

@codelipenghui codelipenghui added this to the 2.11.0 milestone Feb 24, 2022
@tjiuming
Copy link
Contributor Author

Cancel write metrics data to file, cache buf only.

@github-actions github-actions bot added doc-required Your PR changes impact docs and you will update later. and removed doc-required Your PR changes impact docs and you will update later. labels Feb 25, 2022
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work.
We should make this configurable. In case we have a bug and we want to disable this feature

@tjiuming
Copy link
Contributor Author

tjiuming commented Feb 26, 2022

8172CEA8-6E11-4F1E-A32B-EC4D23687220
DEB1D0CE-A277-4285-91BB-E69475FB3EEC

After and before(write out 200MB metrics data).

Copy link
Contributor

@Jason918 Jason918 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Persist metrics data into temp file, send it to client by send_file.

Do you actual use file IOs? If so, please help point it out.

Generally, it seems this is a nice optimization, but do we have any downside of this feature?

@tjiuming
Copy link
Contributor Author

tjiuming commented Feb 27, 2022

Do you actual use file IOs? If so, please help point it out.

Canceled persist data to temp files, maybe broker has a poor performance disk, so use CompositeDirectByteBuf only.

Generally, it seems this is a nice optimization, but do we have any downside of this feature?

When pulsar has to export more than 50MB metrics data, there will be cause high heap memory usage, high GC pressure and high CPU usage. The pr is for the purpose of fix it.

@Jason918

@Jason918
Copy link
Contributor

When pulsar has to export more than 50MB metrics data, there will be cause high heap memory usage, high GC pressure and high CPU usage. The pr is for the purpose of fix it.

Actually, I am just curious if there is some situation that it's better not use this feature, maybe like metric data is under 50MB?

@tjiuming
Copy link
Contributor Author

tjiuming commented Feb 27, 2022

Actually, I am just curious if there is some situation that it's better not use this feature, maybe like metric data is under 50MB?

Avoid more than once PrometheusMetricsGenerator#generate invoke in a period(1 minute), for the purpose of saving memory and reducing CPU usage. This can be configurable, I'll add a configuration if needed.

Canceled persist data to temp files, maybe broker has a poor performance disk, so use CompositeDirectByteBuf only. But this, we cannot know how much metrics data will be wrote to clients before it generated. So, it cannot be disabled.

@Jason918

@tjiuming
Copy link
Contributor Author

tjiuming commented Feb 27, 2022

Nice work. We should make this configurable. In case we have a bug and we want to disable this feature

Thanks.

Avoid more than once PrometheusMetricsGenerator#generate invoke in a period(1 minute), for the purpose of saving memory and reducing CPU usage.

This feature is a precursor. What I want is in the future we can clean all the metrics objects after metrics data generated, it could release a lot of objects and reduce heap memory usage if there are over 1 million topics in a broker.
Many metrics exported as Gauges, so we don't have to depend on rate/irate functions to aggregate these metrics. It can be reduce prometheus's pressure.

ariestp316@gmail.com added 2 commits March 18, 2022 23:41
@github-actions github-actions bot added doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. and removed doc-not-needed Your PR changes do not impact docs labels Mar 18, 2022
@tjiuming
Copy link
Contributor Author

tjiuming commented Mar 19, 2022

Make cache-metrics-data configurable
@Jason918 @eolivelli Could you please help review?

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

Can you also add the configuration entry to broker.conf and proxy.conf ?

ariestp316@gmail.com added 2 commits March 22, 2022 23:09
# Conflicts:
#	site2/docs/reference-configuration.md
@tjiuming tjiuming requested a review from eolivelli March 23, 2022 02:11
@tjiuming
Copy link
Contributor Author

@eolivelli @Jason918 hi,this PR has blocked for a long time, if there are no more change requests, could you please help approve?

# Conflicts:
#	pulsar-broker/src/main/java/org/apache/pulsar/broker/stats/prometheus/PrometheusMetricsGenerator.java
conf/proxy.conf Outdated Show resolved Hide resolved
@tjiuming tjiuming requested a review from Jason918 April 19, 2022 10:41
Copy link
Contributor

@Jason918 Jason918 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metrics doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants