New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Writes for Dataframes with more than 10000 partitions fail #248

Closed

davidrabinowitz opened this issue Sep 30, 2020 · 5 comments

Assignees

Member

davidrabinowitz commented Sep 30, 2020

When trying to write dataframes with more than 10000 partitions the load job fails with a "Too many sources provided: XXXX. Limit is 10000" error. This should be fixed with a better URI provided to the load job.

davidrabinowitz self-assigned this

anton-statutov-booking commented Jan 6, 2021

Hello, I just stumbled upon on this problem. Do you know a good workaround?

davidrabinowitz added a commit to davidrabinowitz/spark-bigquery-connector that referenced this issue


          Issue GoogleCloudDataproc#248: reducing the size of the URI list for …

37acdd4

…bq load

davidrabinowitz added a commit that referenced this issue


          Issue #248: Reducing the size of the URI list for bq load (#295)

82e618d

Member Author

davidrabinowitz commented Jan 22, 2021

Fixed in version 0.18.1

davidrabinowitz closed this as completed

anton-statutov-booking commented Jan 28, 2021

Thanks. I've tested it - the fix works.

asiunov commented Mar 18, 2022 •

edited

Loading

This solution in #295 may not work in all cases, e.g. when there are many skipped partitions (because usually spark doesn't write empty parts). The easiest workaround is to do df.coalesce(10000).write....

yeshvantbhavnasi commented Apr 5, 2022

Even I see the error popping on with latest version of bigquery-connector

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment