Skip to content
This repository has been archived by the owner on Dec 20, 2018. It is now read-only.

Be permissive when reading avro files with inconsistent schema #31

Closed
falaki opened this issue Feb 23, 2015 · 9 comments
Closed

Be permissive when reading avro files with inconsistent schema #31

falaki opened this issue Feb 23, 2015 · 9 comments
Milestone

Comments

@falaki
Copy link
Member

falaki commented Feb 23, 2015

If there are multiple files in a directory and some of them have additional records, do not throw exception as long as those fields are not accessed. Ideally show a warning when loading.

On a related note this can be controlled with a flag in the options.

@rxin
Copy link
Contributor

rxin commented May 18, 2015

This affects #49 also.

@nlande
Copy link

nlande commented Jun 12, 2015

Any workarounds? We have an evolving but backwards compatable avro schema and this is a blocker for us.

@FRosner
Copy link

FRosner commented Dec 9, 2015

I would be happy to have this as I also ran into the same issue as #49. Anyone planning to pick it up?

@FRosner
Copy link

FRosner commented Jan 7, 2016

@JoshRosen can you give an indication on whether someone is going to pick it up? If it is required I can also invest some time and see if I can fix it.

If someone already has some indications about the piece of code that has to be changed it is much appreciated.

@FRosner
Copy link

FRosner commented Jan 7, 2016

@nlande did you find a workaround so far? My workaround was to load every file individually and then union them after selecting. However, I ran into huge performance problems with several thousands of avro files because the DAG got pretty big... When trying to show it in the Spark UI, the UI even ran out of memory.

@Gauravshah
Copy link

should be solvable by specifying a schema that is a union in itself. this pull request solves that #95

@adilakhter
Copy link

adilakhter commented Aug 23, 2016

Hi guys, is there any plan to fix this issue? We are currently facing the same problem.

@Gauravshah
Copy link

should be able to do this by specifying a schema while loading files b078cca

@JoshRosen
Copy link
Contributor

I think @Gauravshah is right: this should be addressed by the "custom read schema" support that is added in the forthcoming 3.1.0 release, so I'm going to go ahead and tentatively mark this as fixed.

@JoshRosen JoshRosen added this to the 3.1.0 milestone Nov 27, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants