Skip to content

Comments

[HUDI-2733] Add support for Thrift sync#4665

Open
stym06 wants to merge 3 commits intoapache:masterfrom
stym06:HUDI-2733
Open

[HUDI-2733] Add support for Thrift sync#4665
stym06 wants to merge 3 commits intoapache:masterfrom
stym06:HUDI-2733

Conversation

@stym06
Copy link
Contributor

@stym06 stym06 commented Jan 21, 2022

What is the purpose of the pull request

This PR adds support to sync Hudi metadata with Hive metastore using its Thrift client

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@stym06 stym06 changed the title Add support for Thrift sync [HUDI-2733] Add support for Thrift sync Jan 21, 2022
@stym06
Copy link
Contributor Author

stym06 commented Jan 24, 2022

@nsivabalan can someone see if this looks good?

@nsivabalan nsivabalan self-assigned this Jan 24, 2022
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm that you have tested this new hive sync mode and its working fine for all methods.

public class ThriftDDLExecutor implements DDLExecutor {

private static final Logger LOG = LogManager.getLogger(ThriftDDLExecutor.class);
private final HMSClient client;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name this HMSThriftClient. We already have hms mode and calling this HMSClient and not being used with "hms" mode doesn't sit well.

}

@Override
public void close() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class for the most part has overlap with HMSDDLExecutor. Can we create a base class and inherit in both executors. you can introduce abstract methods just for the client specific calls. lets try to reuse code as much as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @nsivabalan can you suggest how I should be restructuring the classes? Thanks.


String principal = conf.get(PRINCIPAL_KEY);
// TODO fixme
// String principal = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove any residues from draft work.

* @throws IOException if fails connecting to metastore
* @throws InterruptedException if interrupted during kerberos setup
*/
private void getClient(@Nullable URI uri)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we fix method name. having "get" in the name, but method does not return anything.


if (useSasl) {
LOG.debug("Using SASL authentication");
HadoopThriftAuthBridge.Client authBridge =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have good knowledge on these. But I assume you have tested this and hence gonna skim through.

* Helper utilities. The Util class is just a placeholder for static methods,
* it should be never instantiated.
*/
public final class Util {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be HiveSyncThriftModeUtils

@xushiyan xushiyan linked an issue Jan 31, 2022 that may be closed by this pull request
@nsivabalan
Copy link
Contributor

@stym06 : let me know once the feedback is addressed.

@nsivabalan
Copy link
Contributor

@stym06 : gentle ping.

@stym06
Copy link
Contributor Author

stym06 commented Mar 16, 2022

@nsivabalan apologies for the delayed response. let me check the comments.

@nsivabalan nsivabalan added the priority:high Significant impact; potential bugs label Mar 16, 2022
@xushiyan
Copy link
Member

@stym06 any chance you can rebase and update this PR?

@xushiyan xushiyan self-assigned this Oct 31, 2022
@xushiyan xushiyan added the component:catalog-sync Catalog-sync related label Oct 31, 2022
@github-actions github-actions bot added the size:XL PR with lines of changes > 1000 label Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:catalog-sync Catalog-sync related priority:high Significant impact; potential bugs size:XL PR with lines of changes > 1000

Projects

Status: 🏗 Under discussion

Development

Successfully merging this pull request may close these issues.

[SUPPORT] Hive Sync process stuck and unable to exit

4 participants