Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-2476. Share more code between metadata and data scanners. #3727

Conversation

Galsza
Copy link
Contributor

@Galsza Galsza commented Aug 29, 2022

What changes were proposed in this pull request?

Clean up container scanner code so it contains less duplication. No functional change.

What is the link to the Apache JIRA

HDDS-2476: Share more code between metadata and data scanners

How was this patch tested?

No functional changes, unit/integration tests are green.

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactor @Galsza. Just some minor comments but looks good overall. One thing I think we can clean up here as well is the scanner/scrubber mixed terminology in the classes and comments. IMO scanner is the term we should use. Scrubber terminology is already used on SCM for the pipeline scrubber, which deletes pipelines that have been allocated for too long. Since these tasks to do not delete anything I think scanner is a better name for them.

@swagle swagle requested a review from sodonnel September 6, 2022 15:51
Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @Galsza. LGTM but let's give Stephen a chance to review as well.

Copy link
Contributor

@sodonnel sodonnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Galsza for working on this. LGTM overall. I only have a few minor comments.

Comment on lines +114 to +113
public ContainerDataScannerMetrics getMetrics() {
return this.metrics;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently metrics is stored in both parent and subclasses. We could eliminate the former by making getMetrics() abstract, and using it instead of metrics variable. It would also allow getting rid of the cast in subclass constructors.

Comment on lines 82 to 83
LOG.info("Completed an iteration of " + this.getClass().toString() +
" in {} minutes." +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actual output:

datanode_1  | 2022-09-07 15:48:00,816 [ContainerDataScanner(/data/hdds/hdds)] INFO ozoneimpl.AbstractContainerScanner: Completed an iteration of class org.apache.hadoop.ozone.container.ozoneimpl.ContainerDataScanner in 0 minutes. Number of iterations (since the data-node restart) : 13, Number of containers scanned in this iteration : 0, Number of unhealthy containers found in this iteration : 0
datanode_1  | 2022-09-07 15:48:00,831 [ContainerMetadataScanner] INFO ozoneimpl.AbstractContainerScanner: Completed an iteration of class org.apache.hadoop.ozone.container.ozoneimpl.ContainerMetadataScanner in 0 minutes. Number of iterations (since the data-node restart) : 25, Number of containers scanned in this iteration : 1, Number of unhealthy containers found in this iteration : 0

The part class org.apache.hadoop.ozone.container.ozoneimpl.ContainerDataScanner looks a bit odd in the message. I think instance name would be better than class name. And since it is already included due to thread name pattern, we might omit it.

Suggested change
LOG.info("Completed an iteration of " + this.getClass().toString() +
" in {} minutes." +
LOG.info("Completed an iteration in {} minutes" +

@Metrics(about = "DataNode container data scrubber metrics", context = "dfs")
public final class ContainerMetadataScrubberMetrics {
@Metrics(about = "Datanode container scanner metrics", context = "dfs")
public abstract class AbstractContainerScannerMetric {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this is a collection of metrics, should be plural: AbstractContainerScannerMetrics.

…t between ContainerDataScanner and ContainerMetadataScanner.
Rename scrubber to scanner.
Remove metric field from AbstractContainerScanner.
Remove warnings about raw use of parameterized class
@Galsza Galsza force-pushed the HDDS-2476_Share_more_code_between_metadata_and_data_scanners branch from 327096a to 67ab432 Compare September 8, 2022 09:28
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Galsza for updating the patch.

Please try to avoid force-push when updating the PR, as it, among other disadvantages, makes it harder to check for changes since last review. Here are some great articles that explain in more detail:

https://developers.mattermost.com/blog/submitting-great-prs/#4-avoid-force-pushing
https://www.freecodecamp.org/news/optimize-pull-requests-for-reviewer-happiness#request-a-review

Comment on lines 73 to 75
if (stopping) {
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already checked in run() right before calling runIteration(). Original code was also checking after the iteration, before logging and updating the metrics. I think the intention was to prevent incomplete iteration being logged as "Completed an iteration ...".

Therefore I think we should move it back.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending CI.

@adoroszlai adoroszlai merged commit e84aa4c into apache:master Sep 8, 2022
@adoroszlai
Copy link
Contributor

Thanks @Galsza for the refactoring, @errose28, @sodonnel for the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants