You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be nice to be able to select multiple orc files at once that contain different schema. For instance, if the user has three files with structs:
struct<id, name, city, state>
struct<id, city, state>
struct<id, name, state>
And then used the following code:
data = dd.read_orc(path, column=['id', 'name', 'state'])
Dask should be able to pull from the files, and replace any values with nulls if they don't exist in the data set instead of throwing ValueError: Incompatible schemas while parsing ORC files.
The text was updated successfully, but these errors were encountered:
From my perspective something like this sounds nice, but I suspect that it also introduces some complexity. I would not be surprised to see some resistance from people who think about this kind of thing, but in general I think that @rjzamora or others who are more familiar with this space will likely have better informed thoughts.
It would be nice to be able to select multiple orc files at once that contain different schema. For instance, if the user has three files with structs:
And then used the following code:
data = dd.read_orc(path, column=['id', 'name', 'state'])
Dask should be able to pull from the files, and replace any values with nulls if they don't exist in the data set instead of throwing ValueError: Incompatible schemas while parsing ORC files.
The text was updated successfully, but these errors were encountered: