Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing prometheus metrics for Pulsar function local run mode #10156

Merged
merged 2 commits into from
Apr 7, 2021

Conversation

jerrypeng
Copy link
Contributor

Motivation

Currently, metrics are not exposes with running functions/sources/sinks in local run mode.

Modifications

Expose metrics in prometheus format but running a metrics server and allow users to specify the port the metrics server will start on

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API: (yes / no)
  • The schema: (yes / no / don't know)
  • The default values of configurations: (yes / no)
  • The wire protocol: (yes / no)
  • The rest endpoints: (yes / no)
  • The admin cli options: (yes / no)
  • Anything that affects deployment: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
  • If a feature is not applicable for documentation, explain why?
  • If a feature is not documented yet in this PR, please create a followup issue for adding the documentation

@jerrypeng jerrypeng added this to the 2.8.0 milestone Apr 7, 2021
@jerrypeng jerrypeng self-assigned this Apr 7, 2021
@jerrypeng jerrypeng merged commit 45661ea into apache:master Apr 7, 2021
Copy link
Member

@michaeljmarshall michaeljmarshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great addition! I remember first exploring these local run connectors and thinking that metrics were missing. Hopefully we can get this cherry picked back to earlier versions too.

@@ -544,6 +552,11 @@ private void startThreadedMode(org.apache.pulsar.functions.proto.Function.Functi
if (functionConfig != null && functionConfig.getExposePulsarAdminClientEnabled() != null) {
exposePulsarAdminClientEnabled = functionConfig.getExposePulsarAdminClientEnabled();
}

// Collector Registry for prometheus metrics
CollectorRegistry collectorRegistry = new CollectorRegistry();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is just one CollectorRegistry shared among threads (assuming parallelism > 1 here), won't each metrics port serve the same metrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaeljmarshall yup thus after some thinking, I can up with a new PR:

#10208

For running instances as threads with in the local run JVM, it doesn't make sense to run a metrics server per instance. Better user experience to just run one metrics server and expose all of the metrics for all instances on the same port.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better user experience to just run one metrics server and expose all of the metrics for all instances on the same port.

I agree. Although, it looks like the new PR still has metrics at p ports where p = parallelism. I'll add a note to the new PR too, but wanted to comment here, where you mentioned a single port.

@@ -166,6 +171,8 @@ public RuntimeEnv convert(String value) {
protected String secretsProviderClassName;
@Parameter(names = "--secretsProviderConfig", description = "Whats the config for the secrets provider", hidden = true)
protected String secretsProviderConfig;
@Parameter(names = "--metricsPortStart", description = "The starting port range for metrics server", hidden = true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand why it needs to be a range and not just a single port? If it's obvious and I'm simply missing something, perhaps it'd be valuable to describe why it needs to be a range here so that users understand the behavior when using this new feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There needs to be a range of ports because many function instances can be run via single LocalRun process

wangjialing218 pushed a commit to wangjialing218/pulsar that referenced this pull request Apr 9, 2021
@michaeljmarshall
Copy link
Member

michaeljmarshall commented Apr 10, 2021

@jerrypeng - would you be able to take a look at my questions on this merged PR? Since we're using the same metrics registry, I don't think we need to expose metrics on multiple ports.

@jerrypeng
Copy link
Contributor Author

@michaeljmarshall yup you are right. I created a new PR:

#10208

ivankelly pushed a commit to ivankelly/pulsar that referenced this pull request Aug 10, 2021
public static void registerDefaultCollectors(CollectorRegistry registry) {
// Add the JMX exporter for functionality similar to the kafka connect JMX metrics
try {
new JmxCollector("{}").register(registry);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jerrypeng @srkukarni Isn't that line is a duplicate of all other exporters written below but with other names?

For example, JmxCollector create

# HELP java_lang_OperatingSystem_ProcessCpuTime java.lang:name=null,type=OperatingSystem,attribute=ProcessCpuTime
# TYPE java_lang_OperatingSystem_ProcessCpuTime untyped
java_lang_OperatingSystem_ProcessCpuTime 5.28804E8

and StandardExports creates:
process_cpu_seconds_total

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants