Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++/Python][Dataset] Support schema evolution for integer columns #24476

Closed
asfimport opened this issue Mar 31, 2020 · 5 comments
Closed

[C++/Python][Dataset] Support schema evolution for integer columns #24476

asfimport opened this issue Mar 31, 2020 · 5 comments

Comments

@asfimport
Copy link

asfimport commented Mar 31, 2020

When reading in a dataset where the schema specifies that column X is of type int64 but the partition actually contains the data stored in that columns as int32, an upcast should be done.

Reporter: Uwe Korn / @xhochy

Related issues:

Note: This issue was originally created as ARROW-8282. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Francois Saint-Jacques / @fsaintjacques:
Once we have instanciated Fragment, we can create a CastFragment composing an existing Fragment. Some format do support casting, e.g. CSV, and some don't e.g. Parquet or IPC.

@asfimport
Copy link
Author

Joris Van den Bossche / @jorisvandenbossche:
Do we need a separate Fragment type? We could also do the cast when scanning (eg we already do some edits at that point, like projection, adding null columns, etc)

cc @bkietz

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
@jorisvandenbossche Does this still need doing?

@asfimport
Copy link
Author

Uwe Korn / @xhochy:
This is still an issue especially in my context, I can have a look at that in the next two weeks.

@asfimport
Copy link
Author

Uwe Korn / @xhochy:
This has been resolved on master in the meantime thus this will work starting with the 4.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant