Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-2537] Fix metadata table for flink #3774

Merged
merged 1 commit into from
Oct 10, 2021

Conversation

danny0405
Copy link
Contributor

Tips

What is the purpose of the pull request

(For example: This pull request adds quick-start document.)

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@hudi-bot
Copy link

hudi-bot commented Oct 9, 2021

CI report:

  • 0653229b7b2d98c746a90d06c3a7621a7ba5f5be UNKNOWN
  • 4520df7 Azure: FAILURE
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run travis re-run the last Travis build
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 force-pushed the HUDI-2537 branch 3 times, most recently from d29e8e4 to a85e90a Compare October 9, 2021 11:10
@danny0405
Copy link
Contributor Author

@hudi-bot run azure

@danny0405 danny0405 merged commit ad63938 into apache:master Oct 10, 2021
// Trigger compaction with suffixes based on the same instant time. This ensures that any future
// delta commits synced over will not have an instant time lesser than the last completed instant on the
// metadata table.
final String compactionInstantTime = latestDeltacommitTime + "001";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @danny0405

I encountered a DateTimeParseException in this method today. This is the error stack when I run a test case TestMetadataTableWithSparkDataSource.testReadability with metrics function enabled.

java.time.format.DateTimeParseException: Text '00000000000000001' could not be parsed: Invalid value for YearOfEra (valid values 1 - 999999999/1000000000): 0

	at java.time.format.DateTimeFormatter.createError(DateTimeFormatter.java:1920)
	at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1855)
	at java.time.LocalDateTime.parse(LocalDateTime.java:492)
	at org.apache.hudi.common.table.timeline.HoodieInstantTimeGenerator.parseDateFromInstantTime(HoodieInstantTimeGenerator.java:102)
	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.parseDateFromInstantTime(HoodieActiveTimeline.java:84)
	at org.apache.hudi.client.SparkRDDWriteClient.completeCompaction(SparkRDDWriteClient.java:322)
	at org.apache.hudi.client.SparkRDDWriteClient.completeTableService(SparkRDDWriteClient.java:461)
	at org.apache.hudi.client.SparkRDDWriteClient.compact(SparkRDDWriteClient.java:346)
...

The exception shows there is an illegal date text, so I checked code, found out the illegal date coming from here, where latestDeltacommitTime is "00000000000000", then compactionInstantTime is "00000000000000001", which is an illegal date for HoodieInstantTimeGenerator.parseDateFromInstantTime.

As I understand, the value "00000000000000" in latestDeltacommitTime comes from the metadata table timeline, which is an init timestamp in metadata table.

Screenshot 2022-08-22 at 11 40 03

May I ask if it is a good solution that I disable Hudi metrics in metadataWriteConfig by default? Do you have a better idea?

There is a similar issue I found before in PR: #6000

Copy link
Contributor Author

@danny0405 danny0405 Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this is a bug, can you fire a fix here ? Say, disable/skip the metrics sending when we the instant time starts with 00000000000000.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, got it, let me fix it here

what do you think if I replace the timestamp as HoodieTimeline.METADATA_BOOTSTRAP_INSTANT_TS (the value is "00000000000001") if latestDeltacommitTime.equals("00000000000000")? As in PR #6000, HoodieTimeline.METADATA_BOOTSTRAP_INSTANT_TS has been handled correctly.

e.g.

if (latestDeltacommitTime.equals(HoodieTimeline.INIT_INSTANT_TS)) {
  compactionInstantTime = HoodieTimeline.METADATA_BOOTSTRAP_INSTANT_TS;
} else {
  compactionInstantTime = latestDeltacommitTime + "001";
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants