Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-5865] Fix table service client to instantiate with timeline server #8080

Conversation

yihua
Copy link
Contributor

@yihua yihua commented Mar 1, 2023

Change Logs

In 0.13.0 and latest master, the table service client BaseHoodieTableServiceClient is instantiated without any timeline server instance, even if the regular write client has one. This causes the table service client to start a new embedded timeline server and overwrite the write config passed in from the constructor so that the write config points to the newly started timeline server. The issue is introduced by #6732, adding the Hudi table service manager.

As the regular write client such as SparkRDDWriteClient directly passes in the same writeConfig instance, the regular write client's write config is also affected, causing the regular write client to use the newly started embedded timeline server always, instead of the timeline server instance passed in from the constructor or the one instantiated by the regular write client itself.

This means that the Deltastreamer's long-lived timeline server is never going to be used because of this issue.

Note that, this issue does not cause a correctness issue; nevertheless, at least one additional timeline server is instantiated because of the issue, which is never used during the write transaction, wasting compute and memory resources.

This PR fixes the issue by properly initiating the table service client with the timeline server from the regular write client.

Three new tests are added to make sure the timeline server used by the write client and the table service client is expected:

  • TestSparkRDDWriteClient#testWriteClientAndTableServiceClientWithTimelineServer
  • TestFlinkWriteClient#testWriteClientAndTableServiceClientWithTimelineServer
  • TestHoodieJavaWriteClientInsert#testWriteClientAndTableServiceClientWithTimelineServer

Impact

This makes sure that the timeline server instance passed to the regular write client is used by the write transaction.

The new tests added guard the behavior:

  • TestSparkRDDWriteClient#testWriteClientAndTableServiceClientWithTimelineServer
  • TestFlinkWriteClient#testWriteClientAndTableServiceClientWithTimelineServer
  • TestHoodieJavaWriteClientInsert#testWriteClientAndTableServiceClientWithTimelineServer

Before this fix, these tests fail:
Screenshot 2023-03-17 at 23 11 05
Screenshot 2023-03-17 at 23 35 14
Screenshot 2023-03-17 at 23 49 26

With this fix, all tests pass:
Screenshot 2023-03-17 at 23 11 43
Screenshot 2023-03-17 at 23 35 34
Screenshot 2023-03-17 at 23 49 02

Risk level

low

Documentation Update

N/A

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@yihua yihua added priority:blocker writer-core Issues relating to core transactions/write actions table-service labels Mar 1, 2023
@yihua
Copy link
Contributor Author

yihua commented Mar 1, 2023

@yuzhaojing @xushiyan The changes to the write client are done when introducing the new table service client. Before that, based on my understanding, the inline table services running along with the regular write client share the same timeline server. So I think with the new table service client, we should still follow the same convention. Is there anything I miss? When the table service manager is used, how's the interplay between the timeline server and the table service manager?

cc @nsivabalan

Before we fully agree on the approach here, let's not merge this PR. Also, I'd like to add some tests to guard around the expected behavior, after the discussion.

@yuzhaojing
Copy link
Contributor

@yuzhaojing @xushiyan The changes to the write client are done when introducing the new table service client. Before that, based on my understanding, the inline table services running along with the regular write client share the same timeline server. So I think with the new table service client, we should still follow the same convention. Is there anything I miss? When the table service manager is used, how's the interplay between the timeline server and the table service manager?

cc @nsivabalan

Before we fully agree on the approach here, let's not merge this PR. Also, I'd like to add some tests to guard around the expected behavior, after the discussion.

@yihua @danny0405 @xushiyan I'm sorry for this serious bug. I think the table service client should share the same timeline server as the regular write client. Here I think the following tests can be added to the table service client:

  1. Add unit tests to confirm that the table service client has not made unexpected modifications to writeConfig.
  2. Confirm that the table service of the table service client is scheduled and executed normally.
  3. Call correctly after starting the managed service.

Want to hear your thoughts and apologize again!

@danny0405
Copy link
Contributor

@yuzhaojing @xushiyan The changes to the write client are done when introducing the new table service client. Before that, based on my understanding, the inline table services running along with the regular write client share the same timeline server. So I think with the new table service client, we should still follow the same convention. Is there anything I miss? When the table service manager is used, how's the interplay between the timeline server and the table service manager?
cc @nsivabalan
Before we fully agree on the approach here, let's not merge this PR. Also, I'd like to add some tests to guard around the expected behavior, after the discussion.

@yihua @danny0405 @xushiyan I'm sorry for this serious bug. I think the table service client should share the same timeline server as the regular write client. Here I think the following tests can be added to the table service client:

  1. Add unit tests to confirm that the table service client has not made unexpected modifications to writeConfig.
  2. Confirm that the table service of the table service client is scheduled and executed normally.
  3. Call correctly after starting the managed service.

Want to hear your thoughts and apologize again!

Yeah, we need some basic UT for the service client.

@yihua yihua force-pushed the HUDI-5865-fix-table-service-client-timeline-server branch from 7bbcd89 to cef6b97 Compare March 17, 2023 22:56
@yihua
Copy link
Contributor Author

yihua commented Mar 18, 2023

@yuzhaojing Thanks for the suggestion.

  1. Add unit tests to confirm that the table service client has not made unexpected modifications to writeConfig.

I added three tests to make sure that the write config is not modified if the timeline server instance is passed in and the timeline server used by the write client and corresponding table service client is the same.

  1. Confirm that the table service of the table service client is scheduled and executed normally.
  2. Call correctly after starting the managed service.

These are already covered by existing table service tests, so I feel we don't have to test the table service client independently. If new tests are really needed, you can put up a separate PR.

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan merged commit 344df73 into apache:master Mar 18, 2023
nsivabalan pushed a commit to nsivabalan/hudi that referenced this pull request Mar 18, 2023
…ver (apache#8080)

- Fixing singleton instance of timeline server usage between write client and table service client.
fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Apr 5, 2023
…ver (apache#8080)

- Fixing singleton instance of timeline server usage between write client and table service client.
h1ap pushed a commit to h1ap/hudi that referenced this pull request Apr 12, 2023
…ver (apache#8080)

- Fixing singleton instance of timeline server usage between write client and table service client.

# Conflicts:
#	hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestSparkRDDWriteClient.java
stayrascal pushed a commit to stayrascal/hudi that referenced this pull request Apr 20, 2023
…ver (apache#8080)

- Fixing singleton instance of timeline server usage between write client and table service client.
KnightChess pushed a commit to KnightChess/hudi that referenced this pull request Jan 2, 2024
…ver (apache#8080)

- Fixing singleton instance of timeline server usage between write client and table service client.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:blocker table-service writer-core Issues relating to core transactions/write actions
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

6 participants