Skip to content

Using rewrite_table_path iceberg procedure on a backup #14606

@McKMarcBruchner

Description

@McKMarcBruchner

Query engine

Spark 3.4.3

Question

Hi Iceberg team,

I was wondering how to best use the rewrite_table_path procedure on a Backup.

My situation is the following:

  • I have an S3 bucket on which Iceberg stores the data and metadata files
  • My metastore is being stored in a Hive metastore in a Postgres DB on RDS
  • I have a backup of that S3 bucket on another S3 bucket in another region, maybe even another account
  • I also have a backup of the RDS on the other account
  • Let's say my original S3 bucket got corrupted or I can't reach it anymore, so I need to switch to the backup bucket and backup RDS
  • Now I wanted to use rewrite_table_path and register_table to recreate the tables so that I can use them

What I gather from the documentation:

  • the rewrite_table_path needs to have a registered table to work, because you are specifying the table name in the CALL command
  • on the other hand it says that only after I have run rewrite_table_path, I should run register_table with the new metadata.json. Which makes total sense to me.

My problem is now, how can I run rewrite_table_path without registering the table first? In this case, Spark returns me a Couldn't load table, which makes sense, because the table does not exist.

And in case I first register the table, Spark returns another error Path s3a://backup-bucket/test_table/metadata/v1.metadata.json does not start with s3a://original-bucket/test_table/.

I understand how the rewrite_table_path would work if I can run this on my original bucket with the existing table, then move the data and metadata files to a new bucket and run register_table there. But that might not be possible for me if the old bucket got destroyed or corrupted or is otherwise unreachable.

In this blog they state that my approach should work, but I cannot execute 3. Check for File Path Changes Before Recovery because of the problem described above.

I feel that I'm missing something very obvious. Please advise!

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions