Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-7571] Add api to get exception details in HoodieMetadataTableValidator with ignoreFailed mode #10960

Merged
merged 2 commits into from
Apr 5, 2024

Conversation

lokeshj1703
Copy link
Contributor

Change Logs

When ignoreFailed is enabled, HoodieMetadataTableValidator ignores failure and continues the validation. This jira aims to add api to get list of exceptions and an api to check if validation exception was thrown.

Impact

NA

Risk level (write none, low medium or high below)

low

Documentation Update

NA

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:S PR with lines of changes in (10, 100] label Apr 3, 2024
@lokeshj1703
Copy link
Contributor Author

@the-other-tim-brown @codope Can you please take a look?

Copy link
Contributor

@the-other-tim-brown the-other-tim-brown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor nitpicks but otherwise lgtm

@@ -197,6 +200,27 @@ public HoodieMetadataTableValidator(JavaSparkContext jsc, Config cfg) {
this.taskLabels = generateValidationTaskLabels();
}

/**
* Returns list of Throwable which were encountered during validation. This method is useful
* when ignoreFailed parameter is set to true.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may also want to note you can use this option to find all issues instead of just the first one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry! Did not get this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i'm not wrong, I think it's about making javadoc more clear that the list of throwables is actually the list of all validation failures (MDT, RLI, etc).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was the idea. It's used to get back all the things that may be wrong with the MDT

* Returns true if validation has failed. This method is useful when ignoreFailed
* parameter is set to true.
*/
public boolean isValidationFailed() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: method name could be hasValidationFailure() to distinguish between generic failures and validation failures?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

@@ -180,6 +181,8 @@ public class HoodieMetadataTableValidator implements Serializable {

private final String taskLabels;

private List<Throwable> throwables = new LinkedList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Any reason to use linked list over array list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. No particular preference in this case. Generally I tend to use linked list if the size of list is not known and I don't need random access.

Copy link
Member

@codope codope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this will be useful for validation. Can you please rebase and add a test for the two public methods added in this patch? Check out TestHoodieMetadataTableValidator after rebase.

@hudi-bot
Copy link

hudi-bot commented Apr 5, 2024

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@@ -197,6 +200,27 @@ public HoodieMetadataTableValidator(JavaSparkContext jsc, Config cfg) {
this.taskLabels = generateValidationTaskLabels();
}

/**
* Returns list of Throwable which were encountered during validation. This method is useful
* when ignoreFailed parameter is set to true.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i'm not wrong, I think it's about making javadoc more clear that the list of throwables is actually the list of all validation failures (MDT, RLI, etc).

@codope codope merged commit 19c20e4 into apache:master Apr 5, 2024
40 checks passed
@@ -438,6 +461,7 @@ private boolean doHoodieMetadataTableValidationOnce() {
if (!cfg.ignoreFailed) {
throw e;
}
throwables.add(e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are you clearing up the throwables list?
if we are calling doHoodieMetadataTableValidationOnce repeatedly, don't we need to clear the list before we execute next time ?

yihua pushed a commit that referenced this pull request May 14, 2024
…lidator with ignoreFailed mode (#10960)

* [HUDI-7571] Add api to get exception details in HoodieMetadataTableValidator with ignoreFailed mode

* Address comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-0.15.0 size:S PR with lines of changes in (10, 100]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants