-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-5865] Fix table service client to instantiate with timeline server #8080
[HUDI-5865] Fix table service client to instantiate with timeline server #8080
Conversation
@yuzhaojing @xushiyan The changes to the write client are done when introducing the new table service client. Before that, based on my understanding, the inline table services running along with the regular write client share the same timeline server. So I think with the new table service client, we should still follow the same convention. Is there anything I miss? When the table service manager is used, how's the interplay between the timeline server and the table service manager? cc @nsivabalan Before we fully agree on the approach here, let's not merge this PR. Also, I'd like to add some tests to guard around the expected behavior, after the discussion. |
...nt/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java
Show resolved
Hide resolved
@yihua @danny0405 @xushiyan I'm sorry for this serious bug. I think the table service client should share the same timeline server as the regular write client. Here I think the following tests can be added to the table service client:
Want to hear your thoughts and apologize again! |
Yeah, we need some basic UT for the service client. |
7bbcd89
to
cef6b97
Compare
@yuzhaojing Thanks for the suggestion.
I added three tests to make sure that the write config is not modified if the timeline server instance is passed in and the timeline server used by the write client and corresponding table service client is the same.
These are already covered by existing table service tests, so I feel we don't have to test the table service client independently. If new tests are really needed, you can put up a separate PR. |
…ver (apache#8080) - Fixing singleton instance of timeline server usage between write client and table service client.
…ver (apache#8080) - Fixing singleton instance of timeline server usage between write client and table service client.
…ver (apache#8080) - Fixing singleton instance of timeline server usage between write client and table service client. # Conflicts: # hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestSparkRDDWriteClient.java
…ver (apache#8080) - Fixing singleton instance of timeline server usage between write client and table service client.
…ver (apache#8080) - Fixing singleton instance of timeline server usage between write client and table service client.
Change Logs
In 0.13.0 and latest master, the table service client
BaseHoodieTableServiceClient
is instantiated without any timeline server instance, even if the regular write client has one. This causes the table service client to start a new embedded timeline server and overwrite the write config passed in from the constructor so that the write config points to the newly started timeline server. The issue is introduced by #6732, adding the Hudi table service manager.As the regular write client such as
SparkRDDWriteClient
directly passes in the same writeConfig instance, the regular write client's write config is also affected, causing the regular write client to use the newly started embedded timeline server always, instead of the timeline server instance passed in from the constructor or the one instantiated by the regular write client itself.This means that the Deltastreamer's long-lived timeline server is never going to be used because of this issue.
Note that, this issue does not cause a correctness issue; nevertheless, at least one additional timeline server is instantiated because of the issue, which is never used during the write transaction, wasting compute and memory resources.
This PR fixes the issue by properly initiating the table service client with the timeline server from the regular write client.
Three new tests are added to make sure the timeline server used by the write client and the table service client is expected:
TestSparkRDDWriteClient#testWriteClientAndTableServiceClientWithTimelineServer
TestFlinkWriteClient#testWriteClientAndTableServiceClientWithTimelineServer
TestHoodieJavaWriteClientInsert#testWriteClientAndTableServiceClientWithTimelineServer
Impact
This makes sure that the timeline server instance passed to the regular write client is used by the write transaction.
The new tests added guard the behavior:
TestSparkRDDWriteClient#testWriteClientAndTableServiceClientWithTimelineServer
TestFlinkWriteClient#testWriteClientAndTableServiceClientWithTimelineServer
TestHoodieJavaWriteClientInsert#testWriteClientAndTableServiceClientWithTimelineServer
Before this fix, these tests fail:
With this fix, all tests pass:
Risk level
low
Documentation Update
N/A
Contributor's checklist