Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Introduce CompositeMetricsReporter #7337

Merged
merged 1 commit into from
Apr 21, 2023

Conversation

nastra
Copy link
Contributor

@nastra nastra commented Apr 13, 2023

This introduces a CompositeMetricsReporter that allows registering multiple (simple) MetricsReporter instances that all get notified whenever a new MetricsReport is available.

This also switches from a collection of reporters to the CompositeMetricsReporter in TableScanContext to fix an issue where duplicate reporters were not handled (see #7337 (comment) for details).

@github-actions github-actions bot added the core label Apr 13, 2023
@nastra nastra force-pushed the support-multiple-metrics-reporters branch from e08d9b6 to dbf7181 Compare April 13, 2023 10:36
@@ -448,4 +475,46 @@ public static MetricsReporter loadMetricsReporter(Map<String, String> properties

return reporter;
}

static class CompositeMetricsReporter implements MetricsReporter {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdblue as we discussed I've made this an inner class. However, we could consider moving the class out of CatalogUtil and replace https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/TableScanContext.java#L359-L366 with a CompositeMetricsReporter.

For cases where we'd like to use multiple reporters during scans

TableScan tableScan = table
        .newScan()
        .metricsReporter(...)
        .metricsReporter(...)
        .metricsReporter(...);

we would internally instantiate a CompositeMetricsReporter in TableScanContext#reportWith(..) and then have TableScanContext#metricsReporter() return MetricsReporter rather than Collection<MetricsReporter>.

Copy link
Contributor Author

@nastra nastra Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact I just realized that the current implementation of TableScanContext#reportWith() that uses Collection<MetricsReporter> actually can contain the same reporter multiple times (which I've seen in some tests where the same thing is reported twice to the log file)

multiple-logging-reporters

@nastra nastra force-pushed the support-multiple-metrics-reporters branch from dbf7181 to 7d276c5 Compare April 13, 2023 13:08
@nastra nastra requested a review from rdblue April 13, 2023 14:19
@rdblue
Copy link
Contributor

rdblue commented Apr 14, 2023

Why do we need to support multiple reporters? I don't think we need to change the behavior.

@nastra nastra force-pushed the support-multiple-metrics-reporters branch from 7d276c5 to af1440f Compare April 18, 2023 08:05
@nastra nastra changed the title Core: Support loading multiple metrics reporters Core: Introduce CompositeMetricsReporter Apr 18, 2023
@nastra nastra force-pushed the support-multiple-metrics-reporters branch 6 times, most recently from 13d7859 to 74c3615 Compare April 18, 2023 09:40
reporter -> {
try {
reporter.report(report);
} catch (Exception e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be RuntimeException since report doesn't throw a checked exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason I'm catching an Exception here is that I think we generally don't want to interfere with actual scans or commits when anything on the reporting side fails. Generally speaking, every reporter should make sure that it doesn't interefere with the scan/commit path when reporting and something fails but I'm not sure we can enforce that.
I'm a bit worried that if we only catch RuntimeException and some reporter misbehaves in an unexpected way, we'd have to go back and fix either the misbehaving reporter or this catch clause.
thoughts?

@nastra nastra force-pushed the support-multiple-metrics-reporters branch from 74c3615 to 8bb4721 Compare April 19, 2023 12:20
@github-actions github-actions bot added the API label Apr 19, 2023
@nastra nastra requested a review from rdblue April 19, 2023 16:39
@nastra nastra force-pushed the support-multiple-metrics-reporters branch 2 times, most recently from db694d4 to abd5453 Compare April 20, 2023 06:25
@nastra nastra requested a review from rdblue April 20, 2023 16:35
@rdblue
Copy link
Contributor

rdblue commented Apr 20, 2023

Looks good. Just one minor thing to fix with the sets.

This also switches from a collection of reporters to the
`CompositeMetricsReporter` in `TableScanContext` to fix an issue where
duplicate reporters were not handled.
@rdblue rdblue merged commit 74b18dd into apache:master Apr 21, 2023
34 checks passed
@nastra nastra deleted the support-multiple-metrics-reporters branch April 21, 2023 15:56
manisin pushed a commit to Snowflake-Labs/iceberg that referenced this pull request May 9, 2023
This also switches from a collection of reporters to the
`CompositeMetricsReporter` in `TableScanContext` to fix an issue where
duplicate reporters were not handled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants