Skip to content

Conversation

@hangc0276
Copy link
Contributor

Fixes #xyz

Main Issue: #xyz

PIP: #xyz

Motivation

Exposing topic-level metrics costs a lot of CPU and memory for brokers and put heavy traffic pressure on Prometheus to pull metrics, especially when the Pulsar cluster has thousands of topics. To address this issue, we can disable the topic-level metrics by default and enable it when some topics encountered any issues.

However, the exposeTopicLevelMetricsInPrometheus flag is not dynamic and we need to change the broker configuration and restart all the brokers to enable or disable the flag.

We can make the exposeTopicLevelMetricsInPrometheus flag dynamic to avoid restarting all the brokers when enabling or disabling the flag

Modifications

Make the following flags dynamic

  • exposeTopicLevelMetricsInPrometheus
  • exposeConsumerLevelMetricsInPrometheus
  • exposeProducerLevelMetricsInPrometheus
  • exposeManagedLedgerMetricsInPrometheus
  • exposeManagedCursorMetricsInPrometheus
  • exposePreciseBacklogInPrometheus
  • exposeSubscriptionBacklogSizeInPrometheus

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@github-actions github-actions bot added the doc-required Your PR changes impact docs and you will update later. label Aug 16, 2023
@hangc0276 hangc0276 self-assigned this Aug 16, 2023
@hangc0276 hangc0276 added this to the 3.2.0 milestone Aug 16, 2023
@hangc0276 hangc0276 added the type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages label Aug 16, 2023
@hangc0276 hangc0276 changed the title Make exposing topic level metric dynamic [improve] [broker] Make exposing topic level metric dynamic Aug 16, 2023
@hangc0276
Copy link
Contributor Author

@asafm Please help take a look this at this PR and it is related to metrics, thanks.

@Technoboy- Technoboy- closed this Aug 22, 2023
@Technoboy- Technoboy- reopened this Aug 22, 2023
@codecov-commenter
Copy link

codecov-commenter commented Aug 22, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 67.28%. Comparing base (d6734b7) to head (7369f7f).
Report is 518 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #21006       +/-   ##
=============================================
+ Coverage     36.84%   67.28%   +30.43%     
- Complexity    12195    35854    +23659     
=============================================
  Files          1698     1921      +223     
  Lines        129852   175600    +45748     
  Branches      14161    23035     +8874     
=============================================
+ Hits          47843   118148    +70305     
+ Misses        75680    47827    -27853     
- Partials       6329     9625     +3296     
Flag Coverage Δ
inttests 24.27% <28.57%> (+0.14%) ⬆️
systests 27.51% <50.00%> (+2.42%) ⬆️
unittests 66.47% <100.00%> (+34.40%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...org/apache/pulsar/broker/ServiceConfiguration.java 98.09% <100.00%> (+0.04%) ⬆️
...n/java/org/apache/pulsar/broker/PulsarService.java 81.49% <100.00%> (+12.93%) ⬆️
...ats/prometheus/PulsarPrometheusMetricsServlet.java 100.00% <100.00%> (+22.22%) ⬆️

... and 1508 files with indirect coverage changes

@asafm
Copy link
Contributor

asafm commented Aug 30, 2023

@hangc0276 It looks ok, aside from the fact that you don't have any test to validate it and protect from even being ruined :)

@github-actions
Copy link

The pr had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label Sep 30, 2023
Copy link
Member

@dao-jun dao-jun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Technoboy- Technoboy- modified the milestones: 3.2.0, 3.3.0 Dec 22, 2023
@hangc0276 hangc0276 closed this Mar 12, 2024
@hangc0276 hangc0276 reopened this Mar 12, 2024
@coderzc coderzc modified the milestones: 3.3.0, 3.4.0 May 8, 2024
@lhotari lhotari modified the milestones: 4.0.0, 4.1.0 Oct 11, 2024
@lhotari lhotari removed this from the 4.1.0 milestone Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/metrics doc-required Your PR changes impact docs and you will update later. ready-to-test Stale type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants