Skip to content

[HUDI-5903]: Introduce Glue sync configs#8208

Open
pratyakshsharma wants to merge 5 commits intoapache:masterfrom
pratyakshsharma:hudi-5903
Open

[HUDI-5903]: Introduce Glue sync configs#8208
pratyakshsharma wants to merge 5 commits intoapache:masterfrom
pratyakshsharma:hudi-5903

Conversation

@pratyakshsharma
Copy link
Contributor

This PR is a good starting point for introducing glue sync specific configs. By default, AWS chooses the same region where other services like EMR or EC2 are running. There was no provision for selecting the AWS region so far. Please find more info here - https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/java-dg-region-selection.html.

Change Logs

  • Introduce glue sync specific configs

Impact

  • Users will be able to specifically set AWS region for syncing to glue metastore.
  • Enable users to configure the number of concurrent glue connections.

Risk level (write none, low medium or high below)

None

Documentation Update

New configs need to be added. Tracking jira - https://issues.apache.org/jira/browse/HUDI-5946

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@codope codope added component:catalog-sync Catalog-sync related priority:high Significant impact; potential bugs configs labels Mar 22, 2023
.withDocumentation("AWS session token");

public static final ConfigProperty<String> AWS_REGION = ConfigProperty
.key("hoodie.aws.region")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this config be specific to Glue ? How are we planning to use this config for pointing to S3 in different region ? Will that be a separate config ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original intent of adding this config was only for glue. If S3 and glue need to be pointed to different regions, then we need a separate config for S3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this to GlueSyncConfig class.

@pratyakshsharma
Copy link
Contributor Author

@hudi-bot run azure

@pratyakshsharma
Copy link
Contributor Author

@bvaradar @yihua This is good to review now.

@pratyakshsharma
Copy link
Contributor Author

@bvaradar @xushiyan @codope Can we review this anytime soon?

@pratyakshsharma
Copy link
Contributor Author

@bvaradar ping!

@bvaradar
Copy link
Contributor

bvaradar commented Oct 4, 2023

@pratyakshsharma : Can you resolve the conflicts and I will look into this.

@pratyakshsharma
Copy link
Contributor Author

@bvaradar I have resolved the conflicts, waiting for CI to pass. I have used these changelogs as reference for making all glue client related changes - https://github.com/aws/aws-sdk-java-v2/blob/master/docs/LaunchChangelog.md

@pratyakshsharma
Copy link
Contributor Author

@bvaradar Can we review and merge this please?

Copy link
Contributor

@bvaradar bvaradar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pratyakshsharma Otherwise looks ok. Can you also fix conflicts ?

groupName = ConfigGroups.Names.META_SYNC,
subGroupName = ConfigGroups.SubGroupNames.NONE,
description = "Configs that control Glue catalog sync based client.")
public class GlueSyncConfig extends HiveSyncConfig {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pratyakshsharma what is the reason for renaming this class from GlueCatalogSyncClientConfig to GlueSyncConfig ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had introduced GlueSyncConfig in my initial commit, and as part of resolving conflicts later, looks like I deleted the original class by mistake. Will revert the name change.
Thank you for pointing this out.

@pratyakshsharma
Copy link
Contributor Author

@hudi-bot run azure

@hudi-bot
Copy link
Collaborator

hudi-bot commented Jan 6, 2024

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@github-actions github-actions bot added the size:S PR with lines of changes in (10, 100] label Feb 26, 2024
@xushiyan xushiyan removed the configs label Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:catalog-sync Catalog-sync related priority:high Significant impact; potential bugs size:S PR with lines of changes in (10, 100]

Projects

Status: 🏗 Under discussion

Development

Successfully merging this pull request may close these issues.

5 participants