You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 11, 2022. It is now read-only.
Thanks for the report! From reading the linked threads, it seems likely that Dataflow will expose this underlying bug in Avro if Dataproc does so.
I know that your data is confidential, but do you think you could create a minimal reproduction that you can share? Ideally this would be a table shared with my @google.com address that I can export, or a script to create such a table in a BigQuery project I control. (I'd prefer a table rather than simply a copy of failing generated Avro files so we can experiment with the whole process.)
A few workarounds that come to mind:
You can disable our use of BigQuery's Avro exports by reverting the linked CL and recompiling, or reverting back to the 1.4.0 release of Dataflow
Would you be willing to try upgrading to Avro 1.8.0? That was released very recently and may contain a fix (per AVRO-695 summary, though I have not read the entire thread).
I see that this issue has been unresolved for several months... but Dataflow really likes the performance improvements we get from using Avro. I'd like to better understand the problem and get to a resolution.
Thanks for looking into this. Just shared a sample dataset with you. After exporting the table to GCS as Avro, avro-tools tojson <file> triggers the error.
Yes I realized it's probably a bug in Avro rather than Dataflow or BigQuery specific and would love to resolve it and still get other benefits from Avro.
I tried avro-tools 1.8.0 and got the same error. Haven't tried 1.8.0 in DF yet but not sure if it'll make a difference.
Hi Neville, I believe the underlying bug in the BigQuery Avro file generator has been fixed. Thanks for the report and the reproduction -- this was crucial to successful resolution!
There's a change in Dataflow Java SDK 1.5.0 that exports BQ as Avro instead of JSON (4dce3c2) and that may cause StackOverflowError with certain tables. It seems to be a defect in Avro:
http://stackoverflow.com/questions/24130615/circular-references-not-handled-in-avro
Related issue in Scio: spotify/scio#61
The text was updated successfully, but these errors were encountered: