Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change array schema format for schema evolution #2258

Merged
merged 30 commits into from Jul 5, 2021

Conversation

bdeng-xt
Copy link
Contributor

@bdeng-xt bdeng-xt commented May 10, 2021

Schema evolution is a feature for allowing an array schema to charge over time. To implement that, we will use the same system used for fragments and metadata. The array schemas will be moved into the folder __schema with the format of timestamp_timestamp_uuid. The details can be found in our "Schema Evolution" design paper.


TYPE: FEATURE
DESC: Store array schemas under __schema directory

tiledb/sm/misc/utils.cc Show resolved Hide resolved
tiledb/sm/array_schema/array_schema.cc Outdated Show resolved Hide resolved
tiledb/sm/array_schema/array_schema.cc Outdated Show resolved Hide resolved
tiledb/sm/array_schema/array_schema.h Show resolved Hide resolved
tiledb/sm/storage_manager/storage_manager.cc Outdated Show resolved Hide resolved
tiledb/sm/storage_manager/storage_manager.cc Outdated Show resolved Hide resolved
tiledb/sm/storage_manager/storage_manager.cc Outdated Show resolved Hide resolved
tiledb/sm/fragment/fragment_metadata.cc Show resolved Hide resolved
tiledb/sm/fragment/fragment_metadata.cc Outdated Show resolved Hide resolved
RETURN_NOT_OK(get_array_schema_uris(array_uri, &uris));
if (uris.size() == 0) {
return LOG_STATUS(Status::StorageManagerError(
"Can not get the latest array schema; Empty array schemas."));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change comment to Cannot get the latest array schema; No array schemas found.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

Copy link
Contributor

@joe-maley joe-maley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

misc comments, otherwise LGTM

tiledb/sm/array_schema/array_schema.h Show resolved Hide resolved
tiledb/sm/array_schema/array_schema.h Outdated Show resolved Hide resolved
tiledb/sm/fragment/fragment_metadata.cc Show resolved Hide resolved
tiledb/sm/fragment/fragment_metadata.cc Show resolved Hide resolved
tiledb/sm/storage_manager/storage_manager.cc Outdated Show resolved Hide resolved
// Check if the schema directory exists or not
bool is_dir = false;
// Since is_dir could return NOT Ok status, we will not use RETURN_NOT_OK here
vfs_->is_dir(uri.join_path(constants::array_schema_folder_name), &is_dir);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catch the return status and modify if (is_dir) { to if (!st.ok() || is_dir) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to if(!st.ok() || is_dir). but not very sure if we should return true when status is not ok,

tiledb/sm/storage_manager/storage_manager.cc Outdated Show resolved Hide resolved
tiledb/sm/storage_manager/storage_manager.cc Outdated Show resolved Hide resolved
Copy link
Member

@stavrospapadopoulos stavrospapadopoulos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. I'd like @Shelnutt2 to take another look as well and see how the changes might impact the REST performance in TileDB Cloud (e.g., in the various places where we retrieve the array schema).

@joe-maley
Copy link
Contributor

LGTM, thanks. I'd like @Shelnutt2 to take another look as well and see how the changes might impact the REST performance in TileDB Cloud (e.g., in the various places where we retrieve the array schema).

re @Shelnutt2

Copy link
Member

@Shelnutt2 Shelnutt2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdeng-xt I've left some comments here. Lets also schema a few minutes to discuss the next steps. This is a great change to add the initial format changes. Next I believe we need to support loading multiple schemas and passing the appropriate one to each fragment in a read based on the fragment metadata's listed array schema.

tiledb/sm/array_schema/array_schema.cc Show resolved Hide resolved
tiledb/sm/storage_manager/storage_manager.cc Outdated Show resolved Hide resolved
Copy link
Member

@Shelnutt2 Shelnutt2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdeng-xt thanks for the changes, this looks good now. I suggest we merge and move to the next item in a new PR which is the loading of multiple schemas and passing the appropriate one to each fragment object.

@Shelnutt2
Copy link
Member

@bdeng-xt if you can rebase this so the new CI checks run then we can get this merged!

@Shelnutt2 Shelnutt2 merged commit 41e5e8f into dev Jul 5, 2021
@Shelnutt2 Shelnutt2 deleted the bd/schema-evolution-format-change branch July 5, 2021 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants