Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark: Parameterize backup suffix in migrate procedure #7121

Closed

Conversation

edgarRd
Copy link
Contributor

@edgarRd edgarRd commented Mar 16, 2023

This PR adds the ability to use a custom backup suffix during the migrate process. By default, the suffix __BACKUP__ is used, which sometimes gets represented as __backup__ in engines that lower case table names leading to confusion.

This change enables uses to optionally set a suffix for those backup tables. A follow up PR in docs would update the parameter list once merged.

PTAL Thanks.

@edgarRd edgarRd force-pushed the parameterize-backup-suffix-on-migrate branch 2 times, most recently from 058dbcc to 15c3421 Compare March 16, 2023 00:58
@edgarRd edgarRd force-pushed the parameterize-backup-suffix-on-migrate branch from 15c3421 to c284234 Compare March 16, 2023 01:00
Copy link
Contributor

@sririshindra sririshindra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made my comments in the spark3.3 folder. Could you also make the corresponding changes if for spark3.2 and spark3.1.

@edgarRd
Copy link
Contributor Author

edgarRd commented Mar 20, 2023

@sririshindra Thanks for the review. I believe I've made the changes requested, PTAL when you have a chance.

Copy link
Contributor

@sririshindra sririshindra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please take a look at the comment I made about potentially removing Nullable and see if it makes sense. That is up to you as I am not really familiar with its standard usage.

@edgarRd
Copy link
Contributor Author

edgarRd commented Mar 21, 2023

PTAL @rdblue @RussellSpitzer - Thanks!

@RussellSpitzer
Copy link
Member

I'd rather not add another option, could we just change the suffix to backup? I just can't imagine the situation where folks need to change this dynamically

@edgarRd
Copy link
Contributor Author

edgarRd commented Mar 21, 2023

@RussellSpitzer I think backup is just too generic, specially when there are hundreds of tables in a catalog and different formats - is it a backup of the data (a copy) or just the table format? For instance, users may already use backup for other cases, not just for tables migrated to iceberg - this is quite a common case. Therefore, users may have different suffixes they would want to use, for instance _hive or any other. In order to not be prescriptive of the suffix - which right now is very generic - allowing users to optionally changing it should fill that gap.

@RussellSpitzer
Copy link
Member

I think you are just making the case for changing it to something like iceberg_migration_backup or something like that. Still don't see why a user would want it to be different that a more descriptive suffix?

@edgarRd
Copy link
Contributor Author

edgarRd commented Mar 21, 2023

Each user has a different migration strategy and they could use the suffix as a tagging mechanism in the old table name for different scenarios (this is what we do), having a fixed suffix - while it could be more descriptive - still does not fill the gap.

Other issues also arise with large table names failing to be migrated due to table name length character limitations where changing the suffix to a 1 char suffix helps.

@edgarRd
Copy link
Contributor Author

edgarRd commented Oct 9, 2023

I suppose we were okay with changing the backup name on migrate after all: https://github.com/apache/iceberg/pull/8227/files - closing this PR.

@edgarRd edgarRd closed this Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants