Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: schema evolution not coercing with large arrow types #2305

Merged
merged 6 commits into from
Mar 21, 2024

Conversation

aersam
Copy link
Contributor

@aersam aersam commented Mar 20, 2024

Description

This fixes schema merge for large arrow types in combination with dictionary types. Basically we just allow merging any arrow data type that is the same delta type

Related Issue(s)

Fixes #2298

Documentation

@github-actions github-actions bot added the binding/rust Issues for the Rust crate label Mar 20, 2024
Copy link

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@ion-elgreco
Copy link
Collaborator

@aersam wouldn't it maybe be better to have the schema merge function take in delta fields instead of arrow

@aersam
Copy link
Contributor Author

aersam commented Mar 20, 2024

@aersam wouldn't it maybe be better to have the schema merge function take in delta fields instead of arrow

Yes! But thats a topic throughout the whole crate, everything is based on arrow schemas instead of delta schemas.

@aersam aersam changed the title fix: schema evolution not coercing with large arrow types fix: schema evolution not coercing with large arrow types Mar 20, 2024
@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Mar 20, 2024

@aersam can be addressed in a follow up if it's to big of a change now.

Btw, forgot to add that in the issue but it is also not working with large lists, and nested versions

@aersam
Copy link
Contributor Author

aersam commented Mar 20, 2024

@aersam can be addressed in a follow up if it's to big of a change now.

Btw, forgot to add that in the issue but it is also not working with large lists, and nested versions

To merge nested correctly, there's no other way than to really do the full thing on our own. So that's what I did now. Since we're still merging arrow schemas, I now do first a conversion to delta and then back to arrow again, which is not optimal, but still does work.

@aersam
Copy link
Contributor Author

aersam commented Mar 20, 2024

I'll adjust the tests 🙂

@aersam
Copy link
Contributor Author

aersam commented Mar 20, 2024

Should I move the delta schema merge code to kernel?

@ion-elgreco
Copy link
Collaborator

@aersam I believe kernel should only for stuff related to delta protocol implementation, and schema evolution is more on the operation side

@github-actions github-actions bot added the binding/python Issues for the Python package label Mar 20, 2024
@aersam
Copy link
Contributor Author

aersam commented Mar 20, 2024

Oh I guess I should merge metadata, too. Will add that

Copy link
Collaborator

@ion-elgreco ion-elgreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aersam Great fix!! 😃

@ion-elgreco ion-elgreco merged commit 6a51971 into delta-io:main Mar 21, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Schema evolution not coercing with Large arrow types
2 participants