-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ArraySchemaEvolution::extend_enumeration #4445
Add ArraySchemaEvolution::extend_enumeration #4445
Conversation
This API allows users to create a new enumeration by appending values to an existing enumeration's value list. The new enumeration can then be passed to ArraySchemaEvolution::extend_enumeration.
This pull request has been linked to Shortcut Story #34799: Add ArraySchemaEvolution API specifically for Enumeration extension. |
This commit adds the new ArraySchemaEvolution::extend_enumeration API for extending enumerations during array schema evolution. Enumerations passed to this API should come from the result of a call to Enumeration::extend.
e71de78
to
8762778
Compare
* @param ctx The TileDB context. | ||
* @param array_schema_evolution The schema evolution. | ||
* @param enumeration The enumeration to be extended. This should be the result | ||
* of a call to tiledb_enumeration_extend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requiring the enumeration to come from a specific source might make this API confusing to use. How about we don't add tiledb_enumeration_extend
and merge it into tiledb_array_schema_evolution_extend_enumeration
like this?
TILEDB_EXPORT capi_return_t tiledb_array_schema_evolution_extend_enumeration(
tiledb_ctx_t* ctx,
tiledb_array_schema_evolution_t* array_schema_evolution,
const char* name,
const void* data,
uint64_t data_size,
const void* offsets,
uint64_t offsets_size) TILEDB_NOEXCEPT;
We could also rename extend_enumeration
to add_enumeration_members
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the first thing is that its not required, its just a tip on how to easily create an extended enumeration. You could just as easily build one from scratch as long as you ensure that the data and offsets are byte-for-byte prefixes. So at the very least I'll need to rewrite that API doc.
The initial reason I went with this approach is due to lazy loading though in the end I came up against the cold reality that is our C API's exposure of the array schema evolution implementation which in the end forced me to ensure that the relevant enumerations are loaded before the evolution is applied. So technically, we could go back and add this sort of API but in hind sight there are two reasons to avoid it.
The first reason I'd prefer to avoid this API is that it'll require a lot more changes to serialization. Right now we're passing the entire enumeration across the wire so it was easy to just re-use the existing enumeration code that's already tested. Adding these API's means I'm adding a whole new message type which is kind of a bear. Its not the end of the world if we decide its worth it, but I'd rather avoid it.
The second more important reason though is the validation of the extension itself. It turns out there's a surprising number of ways that specifying those two buffers can break an extension. A lot of the funky ones are related to fixed size with cell_var_num > 1, but its enough that I think having the two steps process is worth it so that errors from extension are orthogonal from errors during evolution.
As to renaming extend
to add_members
, I'd prefer to stick with extend
. The reason for that is because Enumerations are always const
(because ArraySchema is often const
) which means that all modifications of an Enumeration return a copy with the change rather than doing an in-place update. I know its a bit wishy-washy, but extend
returning a copy seems more reasonable than add_members
returning a copy instead of doing an in-place mutation.
This change adds a new API for allowing extension of enumerations without requiring that users drop the attribute and enumeration before recreating them from scratch. A general outline of the new API looks like such: ```cpp tiledb::Array array(ctx, array_uri, TILEDB_READ); auto old_enmr = array->get_enumeration("enumeraton_name"); std::vector<std::string> values = {"some", "values", "to", "extend"}; auto new_enmr = old_enmr->extend(values); tiledb::ArraySchemaEvolution ase(ctx); ase.extend_enumeration(new_enmr); ase.array_evolve(array_uri); ``` --- TYPE: FEATURE | API DESC: Add ArraySchemaEvolution::extend_enumeration --------- Co-authored-by: KiterLuc <67824247+KiterLuc@users.noreply.github.com> Co-authored-by: Luc Rancourt <lucrancourt@gmail.com>
This change adds a new API for allowing extension of enumerations without requiring that users drop the attribute and enumeration before recreating them from scratch. A general outline of the new API looks like such:
TYPE: FEATURE | API
DESC: Add ArraySchemaEvolution::extend_enumeration