[HUDI-6399] Warn when datadog api key is wrong by parisni · Pull Request #8997 · apache/hudi

parisni · 2023-06-16T12:49:50Z

Change Logs

Currently when the datadog api key is wrong the job will fail. We likely should not fail but log a warning to avoid the whole pipeline fails

Impact

Describe any public API or user-facing feature change or any performance impact.

Risk level (write none, low medium or high below)

If medium or high, explain what verification was done to mitigate the risks.

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

The config description must be updated if new configs are added or the default value of the configs are changed
Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
ticket number here and follow the instruction to make
changes to the website.

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

parisni · 2023-06-16T12:50:15Z

@xushiyan maybe ?

hudi-bot · 2023-06-16T12:57:33Z

CI report:

0763a87 UNKNOWN

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

yihua · 2023-06-16T16:30:02Z

...ient/hudi-client-common/src/main/java/org/apache/hudi/metrics/datadog/DatadogHttpClient.java

-    } catch (IOException e) {
-      throw new IllegalStateException("Failed to connect to Datadog to validate API key.", e);
+    } catch (IOException | IllegalStateException e) {
+      LOG.warn(String.format("Failed to connect to Datadog to validate API key. %s", e.getMessage()));


Should we still fail the job if the metrics collector does not work?

We just should show a warning as proposed here

do you mean, catch any exeption ? this makes sense

Should we still fail the job if the metrics collector does not work?

+1

What I meant is, we should not change the behavior of throwing an exception here. If the metric collection does not work due to API key, it should fail the job so that the user knows and fixes it before proceeding.

I don't understand the rational here. Should user know the API key is not working and fix it before running the job again to properly generate the metrics? It's not a good idea to silently fail here.

To me, the metric provider is responsible to contact the user if metrics won't work (mailing alarm, oncall ...). But the ingestion jobs should not stop working. Not having metrics is a minor problem versus having all the company pipelines broken because of a token renewal issue.

Also users configure the metrics provider to alarm in case of no metrics.

At least I assume some user won't want their nightly jobs broken because of token, this would also be the case for an API or any metrics collection, outage is a minor problem versus stopping working.

Currently same apply for pushing metrics, if it does not work, it is only a warning see

hudi/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/Metrics.java

Lines 124 to 132 in 0763a87

try {

MetricRegistry registry = Metrics.getInstance().getRegistry();

HoodieGauge guage = (HoodieGauge) registry.gauge(metricName, () -> new HoodieGauge<>(value));

guage.setValue(value);

} catch (Exception e) {

// Here we catch all exception, so the major upsert pipeline will not be affected if the

// metrics system has some issues.

LOG.error("Failed to send metrics: ", e);

}

Got it. I suggest having a feature flag on whether to fail the job if the metric provider does not work. By default, it's on, i.e., failing the job due to metric provider, the same behavior as before, while users can turn this off in the case metrics can be skipped.

Sure, will do that

https://hudi.apache.org/docs/configurations#hoodiemetricsdatadogapikeyskipvalidation
shame on me, the option to skip validation already exist. Then this PR is useless.

[HUDI-6399] Warn when datadog api key is wrong

0763a87

yihua reviewed Jun 16, 2023

View reviewed changes

yihua added the area:metrics Metrics and monitoring label Jun 16, 2023

parisni requested review from xushiyan and yihua June 16, 2023 21:54

parisni closed this Jun 26, 2023

hudi-bot mentioned this pull request Dec 9, 2025

Datadog metric reporter should not hard fail when api key is invalid #16032

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-6399] Warn when datadog api key is wrong#8997

[HUDI-6399] Warn when datadog api key is wrong#8997
parisni wants to merge 1 commit intoapache:masterfrom
parisni:fix-datadog-wrong-apikey

parisni commented Jun 16, 2023

Uh oh!

parisni commented Jun 16, 2023

Uh oh!

hudi-bot commented Jun 16, 2023

Uh oh!

yihua Jun 16, 2023

Uh oh!

parisni Jun 16, 2023

Uh oh!

parisni Jun 17, 2023

Uh oh!

xushiyan Jun 18, 2023

Uh oh!

yihua Jun 18, 2023

Uh oh!

yihua Jun 20, 2023

Uh oh!

parisni Jun 20, 2023 •

edited

Loading

Uh oh!

yihua Jun 23, 2023

Uh oh!

parisni Jun 23, 2023

Uh oh!

parisni Jun 26, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	try {
	MetricRegistry registry = Metrics.getInstance().getRegistry();
	HoodieGauge guage = (HoodieGauge) registry.gauge(metricName, () -> new HoodieGauge<>(value));
	guage.setValue(value);
	} catch (Exception e) {
	// Here we catch all exception, so the major upsert pipeline will not be affected if the
	// metrics system has some issues.
	LOG.error("Failed to send metrics: ", e);
	}

Conversation

parisni commented Jun 16, 2023

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

Uh oh!

parisni commented Jun 16, 2023

Uh oh!

hudi-bot commented Jun 16, 2023

CI report:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

parisni Jun 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

parisni Jun 20, 2023 •

edited

Loading