-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-2537] Fix metadata table for flink #3774
Conversation
d29e8e4
to
a85e90a
Compare
@hudi-bot run azure |
// Trigger compaction with suffixes based on the same instant time. This ensures that any future | ||
// delta commits synced over will not have an instant time lesser than the last completed instant on the | ||
// metadata table. | ||
final String compactionInstantTime = latestDeltacommitTime + "001"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @danny0405
I encountered a DateTimeParseException
in this method today. This is the error stack when I run a test case TestMetadataTableWithSparkDataSource.testReadability
with metrics function enabled.
java.time.format.DateTimeParseException: Text '00000000000000001' could not be parsed: Invalid value for YearOfEra (valid values 1 - 999999999/1000000000): 0
at java.time.format.DateTimeFormatter.createError(DateTimeFormatter.java:1920)
at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1855)
at java.time.LocalDateTime.parse(LocalDateTime.java:492)
at org.apache.hudi.common.table.timeline.HoodieInstantTimeGenerator.parseDateFromInstantTime(HoodieInstantTimeGenerator.java:102)
at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.parseDateFromInstantTime(HoodieActiveTimeline.java:84)
at org.apache.hudi.client.SparkRDDWriteClient.completeCompaction(SparkRDDWriteClient.java:322)
at org.apache.hudi.client.SparkRDDWriteClient.completeTableService(SparkRDDWriteClient.java:461)
at org.apache.hudi.client.SparkRDDWriteClient.compact(SparkRDDWriteClient.java:346)
...
The exception shows there is an illegal date text, so I checked code, found out the illegal date coming from here, where latestDeltacommitTime
is "00000000000000", then compactionInstantTime
is "00000000000000001", which is an illegal date for HoodieInstantTimeGenerator.parseDateFromInstantTime
.
As I understand, the value "00000000000000" in latestDeltacommitTime
comes from the metadata table timeline, which is an init timestamp in metadata table.
May I ask if it is a good solution that I disable Hudi metrics in metadataWriteConfig
by default? Do you have a better idea?
There is a similar issue I found before in PR: #6000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid this is a bug, can you fire a fix here ? Say, disable/skip the metrics sending when we the instant time starts with 00000000000000
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, got it, let me fix it here
what do you think if I replace the timestamp as HoodieTimeline.METADATA_BOOTSTRAP_INSTANT_TS
(the value is "00000000000001") if latestDeltacommitTime.equals("00000000000000")
? As in PR #6000, HoodieTimeline.METADATA_BOOTSTRAP_INSTANT_TS
has been handled correctly.
e.g.
if (latestDeltacommitTime.equals(HoodieTimeline.INIT_INSTANT_TS)) {
compactionInstantTime = HoodieTimeline.METADATA_BOOTSTRAP_INSTANT_TS;
} else {
compactionInstantTime = latestDeltacommitTime + "001";
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for this way.
Tips
What is the purpose of the pull request
(For example: This pull request adds quick-start document.)
Brief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.