Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ArraySchemaEvolution::extend_enumeration #4445

Merged
merged 4 commits into from
Oct 24, 2023

Conversation

davisp
Copy link
Contributor

@davisp davisp commented Oct 23, 2023

This change adds a new API for allowing extension of enumerations without requiring that users drop the attribute and enumeration before recreating them from scratch. A general outline of the new API looks like such:

tiledb::Array array(ctx, array_uri, TILEDB_READ);
auto old_enmr = array->get_enumeration("enumeraton_name");
std::vector<std::string> values = {"some", "values", "to", "extend"};
auto new_enmr = old_enmr->extend(values);

tiledb::ArraySchemaEvolution ase(ctx);
ase.extend_enumeration(new_enmr);
ase.array_evolve(array_uri);

TYPE: FEATURE | API
DESC: Add ArraySchemaEvolution::extend_enumeration

This API allows users to create a new enumeration by appending values to
an existing enumeration's value list. The new enumeration can then be
passed to ArraySchemaEvolution::extend_enumeration.
@shortcut-integration
Copy link

@davisp davisp requested a review from KiterLuc October 23, 2023 18:48
This commit adds the new ArraySchemaEvolution::extend_enumeration API
for extending enumerations during array schema evolution. Enumerations
passed to this API should come from the result of a call to
Enumeration::extend.
@davisp davisp force-pushed the pd/sc-34799/schema-evolve-enumeration-extension branch from e71de78 to 8762778 Compare October 23, 2023 18:49
@johnkerl
Copy link
Contributor

cc @nguyenv @eddelbuettel

* @param ctx The TileDB context.
* @param array_schema_evolution The schema evolution.
* @param enumeration The enumeration to be extended. This should be the result
* of a call to tiledb_enumeration_extend.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requiring the enumeration to come from a specific source might make this API confusing to use. How about we don't add tiledb_enumeration_extend and merge it into tiledb_array_schema_evolution_extend_enumeration like this?

TILEDB_EXPORT capi_return_t tiledb_array_schema_evolution_extend_enumeration(
    tiledb_ctx_t* ctx,
    tiledb_array_schema_evolution_t* array_schema_evolution,
    const char* name,
    const void* data,
    uint64_t data_size,
    const void* offsets,
    uint64_t offsets_size) TILEDB_NOEXCEPT;

We could also rename extend_enumeration to add_enumeration_members.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the first thing is that its not required, its just a tip on how to easily create an extended enumeration. You could just as easily build one from scratch as long as you ensure that the data and offsets are byte-for-byte prefixes. So at the very least I'll need to rewrite that API doc.

The initial reason I went with this approach is due to lazy loading though in the end I came up against the cold reality that is our C API's exposure of the array schema evolution implementation which in the end forced me to ensure that the relevant enumerations are loaded before the evolution is applied. So technically, we could go back and add this sort of API but in hind sight there are two reasons to avoid it.

The first reason I'd prefer to avoid this API is that it'll require a lot more changes to serialization. Right now we're passing the entire enumeration across the wire so it was easy to just re-use the existing enumeration code that's already tested. Adding these API's means I'm adding a whole new message type which is kind of a bear. Its not the end of the world if we decide its worth it, but I'd rather avoid it.

The second more important reason though is the validation of the extension itself. It turns out there's a surprising number of ways that specifying those two buffers can break an extension. A lot of the funky ones are related to fixed size with cell_var_num > 1, but its enough that I think having the two steps process is worth it so that errors from extension are orthogonal from errors during evolution.

As to renaming extend to add_members, I'd prefer to stick with extend. The reason for that is because Enumerations are always const (because ArraySchema is often const) which means that all modifications of an Enumeration return a copy with the change rather than doing an in-place update. I know its a bit wishy-washy, but extend returning a copy seems more reasonable than add_members returning a copy instead of doing an in-place mutation.

test/src/unit-cppapi-enumerations.cc Outdated Show resolved Hide resolved
test/src/unit-cppapi-enumerations.cc Outdated Show resolved Hide resolved
test/src/unit-cppapi-enumerations.cc Outdated Show resolved Hide resolved
test/src/unit-cppapi-enumerations.cc Outdated Show resolved Hide resolved
test/src/unit-enumerations.cc Outdated Show resolved Hide resolved
tiledb/sm/cpp_api/enumeration_experimental.h Outdated Show resolved Hide resolved
tiledb/sm/cpp_api/enumeration_experimental.h Show resolved Hide resolved
@KiterLuc KiterLuc merged commit 83e6589 into dev Oct 24, 2023
56 checks passed
@KiterLuc KiterLuc deleted the pd/sc-34799/schema-evolve-enumeration-extension branch October 24, 2023 10:48
KiterLuc added a commit that referenced this pull request Oct 24, 2023
This change adds a new API for allowing extension of enumerations
without requiring that users drop the attribute and enumeration before
recreating them from scratch. A general outline of the new API looks
like such:

```cpp
tiledb::Array array(ctx, array_uri, TILEDB_READ);
auto old_enmr = array->get_enumeration("enumeraton_name");
std::vector<std::string> values = {"some", "values", "to", "extend"};
auto new_enmr = old_enmr->extend(values);

tiledb::ArraySchemaEvolution ase(ctx);
ase.extend_enumeration(new_enmr);
ase.array_evolve(array_uri);
```

---
TYPE: FEATURE | API
DESC: Add ArraySchemaEvolution::extend_enumeration

---------

Co-authored-by: KiterLuc <67824247+KiterLuc@users.noreply.github.com>
Co-authored-by: Luc Rancourt <lucrancourt@gmail.com>
KiterLuc added a commit that referenced this pull request Oct 24, 2023
…#4445) (#4452)

Backport
83e6589
from #4445.

TYPE: FEATURE | API
DESC: Add ArraySchemaEvolution::extend_enumeration.

Co-authored-by: Paul J. Davis <paul.davis@tiledb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants