Skip to content
This repository has been archived by the owner on Nov 11, 2022. It is now read-only.

java.lang.StackOverflowError with BigQuery input #152

Closed
nevillelyh opened this issue Mar 24, 2016 · 3 comments
Closed

java.lang.StackOverflowError with BigQuery input #152

nevillelyh opened this issue Mar 24, 2016 · 3 comments
Assignees
Labels

Comments

@nevillelyh
Copy link
Contributor

There's a change in Dataflow Java SDK 1.5.0 that exports BQ as Avro instead of JSON (4dce3c2) and that may cause StackOverflowError with certain tables. It seems to be a defect in Avro:
http://stackoverflow.com/questions/24130615/circular-references-not-handled-in-avro

Related issue in Scio: spotify/scio#61

@dhalperi dhalperi self-assigned this Mar 24, 2016
@dhalperi dhalperi added the bug label Mar 24, 2016
@dhalperi
Copy link
Contributor

Hi Neville,

Thanks for the report! From reading the linked threads, it seems likely that Dataflow will expose this underlying bug in Avro if Dataproc does so.

I know that your data is confidential, but do you think you could create a minimal reproduction that you can share? Ideally this would be a table shared with my @google.com address that I can export, or a script to create such a table in a BigQuery project I control. (I'd prefer a table rather than simply a copy of failing generated Avro files so we can experiment with the whole process.)

A few workarounds that come to mind:

  • You can disable our use of BigQuery's Avro exports by reverting the linked CL and recompiling, or reverting back to the 1.4.0 release of Dataflow
  • Would you be willing to try upgrading to Avro 1.8.0? That was released very recently and may contain a fix (per AVRO-695 summary, though I have not read the entire thread).

I see that this issue has been unresolved for several months... but Dataflow really likes the performance improvements we get from using Avro. I'd like to better understand the problem and get to a resolution.

Thanks!
Dan

@nevillelyh
Copy link
Contributor Author

Thanks for looking into this. Just shared a sample dataset with you. After exporting the table to GCS as Avro, avro-tools tojson <file> triggers the error.

Yes I realized it's probably a bug in Avro rather than Dataflow or BigQuery specific and would love to resolve it and still get other benefits from Avro.

I tried avro-tools 1.8.0 and got the same error. Haven't tried 1.8.0 in DF yet but not sure if it'll make a difference.

@dhalperi
Copy link
Contributor

Hi Neville, I believe the underlying bug in the BigQuery Avro file generator has been fixed. Thanks for the report and the reproduction -- this was crucial to successful resolution!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants