Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-5095] Flink: Stores a special watermark(flag) to identify the c… #7099

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

JerryYue-M
Copy link
Contributor

@JerryYue-M JerryYue-M commented Oct 31, 2022

Change Logs

Fixed #7098 Hold a event time instance in each write task to evaluate the progress for write data

Impact

No Impact ,User can disable it use config

Risk level (write none, low medium or high below)

low

Documentation Update

No

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@XuQianJin-Stars XuQianJin-Stars self-assigned this Nov 1, 2022
return extractTimestamp((HoodieAvroRecord) value);
}
return extractTimestamp((RowData) value);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move these logic into AbstractStreamWriteFunction.java.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why put it here is BulkInsertWriteFunction only extends BulkInsertWriteFunction instead of AbstractStreamWriteFunction, if moving this code to AbstractStreamWriteFunction will make a specified way to deal with BulkInsertWriteFunction

@nsivabalan nsivabalan added flink Issues related to flink priority:major degraded perf; unable to move forward; potential bugs labels Nov 7, 2022
@JerryYue-M
Copy link
Contributor Author

@XuQianJin-Stars what do you think about this way to describe the progress of the streaming writer

@XuQianJin-Stars
Copy link
Contributor

@XuQianJin-Stars what do you think about this way to describe the progress of the streaming writer

+1, This is also the implementation method used in our internal production.

@yuzhaojing
Copy link
Contributor

Please fix checkstyle to pass ci.

@JerryYue-M
Copy link
Contributor Author

@yuzhaojing thanks for the review, I will fix the issue you mentioned

@codope
Copy link
Member

codope commented Nov 29, 2022

@JerryYue-M Gentle ping to fix the checkstyle issues. This is very close to merging.

@JerryYue-M
Copy link
Contributor Author

@codope @danny0405 @yuzhaojing @XuQianJin-Stars
all the discussion and check style issues had been fixed and CI passed already

@JerryYue-M
Copy link
Contributor Author

@hudi-bot run azure

3 similar comments
@JerryYue-M
Copy link
Contributor Author

@hudi-bot run azure

@JerryYue-M
Copy link
Contributor Author

@hudi-bot run azure

@XuQianJin-Stars
Copy link
Contributor

@hudi-bot run azure

Copy link
Contributor

@danny0405 danny0405 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we fetch the event time through the payload clazz event time field we the HoodieRecordPayload#getInsertValue is invoked ? And we can finalize the max timestamp
in the coordinator when commit the metadata.

@JerryYue-M
Copy link
Contributor Author

Can we fetch the event time through the payload clazz event time field we the HoodieRecordPayload#getInsertValue is invoked ? And we can finalize the max timestamp in the coordinator when commit the metadata.
@danny0405 thanks for your suggestion, it's import for me
the reason why i think we need implement in this way is :

  1. I think this is a lightweight and common behavior, no matter what type of payload it is used. if we implement this base on payload, it may be limited by payload class between flink and spark/presto eg.
  2. for later we may need to add some metric for the flink write function eg: processing latency metric , and also need to extract the timestamp from the record
  3. implement base payload may change some code in common module such as modify the HoodieAppendHandler or HoodieMergeHandler

@hudi-bot
Copy link

hudi-bot commented Dec 6, 2022

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@xushiyan xushiyan added priority:critical production down; pipelines stalled; Need help asap. and removed priority:major degraded perf; unable to move forward; potential bugs labels Mar 1, 2023
@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flink Issues related to flink priority:critical production down; pipelines stalled; Need help asap. size:L PR with lines of changes in (300, 1000]
Projects
Status: 🏗 Under discussion
Development

Successfully merging this pull request may close these issues.

None yet

8 participants