Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
importccl: create tables before ingesting data #37338
importccl: create tables before ingesting in IMPORT
This changes the order in which IMPORT creates tables and ingests data.
Previously, IMPORT computed the tables it would create, but kept them only in the job record during ingestion.
This change switches IMPORT to create the tables in a non-public INGESTING state first, then run the ingestion job, and then simply move them from ingesting to public when it finishes.
This has some advantages and disadvantages. Creating the descriptors early has some nice side-effects: it avoids a potential race where another table could be created, imported or restored with a conlficting name during ingestion, and things try to map KVs to tables can actually see the table, so special cases like enforced splits between tables should work as expected during ingestion, fixing the issue where an importing table could be merged into another table after the ingestion job (which suspends merges) finished but before the descriptors were written. Additionally the admin UI or other places that surface tables can show them if they so choose, which could, e.g. help explain where disk-space is being used by not-yet-public tables.
Unfortunately it also introduces some disadvantages, most notably that writing the descriptors adds more ways by which traffic might end up in the ingesting range and thus more code paths that need ot be audited and must include the proper checks for non-public states, plus writing any incomplete state before the job is finished adds more that needs to be properly cleaned up on failure.
Overall though, importing into an existing table will need to solve the same downsides in any case, so making imports of new tables behave the same as imports into existing tables -- by importing into a just-created existing table -- will simplify the overall code.
Release note: none.
There's one special-case where IMPORT never creates tables, when running with the
Release note (sql change): remove 'transform' option from IMPORT.