[C++/Python][Dataset] Support schema evolution for integer columns #24476

asfimport · 2020-03-31T12:30:07Z

When reading in a dataset where the schema specifies that column X is of type int64 but the partition actually contains the data stored in that columns as int32, an upcast should be done.

Reporter: Uwe Korn / @xhochy

Related issues:

[C++][Dataset] Untangle Dataset, Fragment and ScanOptions (is blocked by)
[C++][Dataset] Schema evolution in Dataset scanning (is related to)

_{Note: This issue was originally created as ARROW-8282. Please see the migration documentation for further details.}

The text was updated successfully, but these errors were encountered:

asfimport · 2020-03-31T21:29:21Z

Francois Saint-Jacques / @fsaintjacques:
Once we have instanciated Fragment, we can create a CastFragment composing an existing Fragment. Some format do support casting, e.g. CSV, and some don't e.g. Parquet or IPC.

asfimport · 2020-12-10T21:10:12Z

Joris Van den Bossche / @jorisvandenbossche:
Do we need a separate Fragment type? We could also do the cast when scanning (eg we already do some edits at that point, like projection, adding null columns, etc)

cc @bkietz

asfimport · 2021-03-22T17:02:24Z

Antoine Pitrou / @pitrou:
@jorisvandenbossche Does this still need doing?

asfimport · 2021-03-22T17:05:00Z

Uwe Korn / @xhochy:
This is still an issue especially in my context, I can have a look at that in the next two weeks.

asfimport · 2021-04-15T12:28:24Z

Uwe Korn / @xhochy:
This has been resolved on master in the meantime thus this will work starting with the 4.0 release.

asfimport closed this as completed Apr 15, 2021

asfimport added this to the 4.0.0 milestone Jan 11, 2023

This was referenced Jan 11, 2023

[C++][Dataset] Untangle Dataset, Fragment and ScanOptions #24278

Closed

[C++][Dataset] Schema evolution in Dataset scanning #26923

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++/Python][Dataset] Support schema evolution for integer columns #24476

[C++/Python][Dataset] Support schema evolution for integer columns #24476

asfimport commented Mar 31, 2020 •

edited

asfimport commented Mar 31, 2020

asfimport commented Dec 10, 2020

asfimport commented Mar 22, 2021

asfimport commented Mar 22, 2021

asfimport commented Apr 15, 2021

[C++/Python][Dataset] Support schema evolution for integer columns #24476

[C++/Python][Dataset] Support schema evolution for integer columns #24476

Comments

asfimport commented Mar 31, 2020 • edited

Related issues:

asfimport commented Mar 31, 2020

asfimport commented Dec 10, 2020

asfimport commented Mar 22, 2021

asfimport commented Mar 22, 2021

asfimport commented Apr 15, 2021

asfimport commented Mar 31, 2020 •

edited