Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writes for Dataframes with more than 10000 partitions fail #248

Closed
davidrabinowitz opened this issue Sep 30, 2020 · 5 comments
Closed

Writes for Dataframes with more than 10000 partitions fail #248

davidrabinowitz opened this issue Sep 30, 2020 · 5 comments
Assignees

Comments

@davidrabinowitz
Copy link
Member

When trying to write dataframes with more than 10000 partitions the load job fails with a "Too many sources provided: XXXX. Limit is 10000" error. This should be fixed with a better URI provided to the load job.

@davidrabinowitz davidrabinowitz self-assigned this Sep 30, 2020
@anton-statutov-booking
Copy link

Hello, I just stumbled upon on this problem. Do you know a good workaround?

davidrabinowitz added a commit to davidrabinowitz/spark-bigquery-connector that referenced this issue Jan 11, 2021
@davidrabinowitz
Copy link
Member Author

Fixed in version 0.18.1

@anton-statutov-booking
Copy link

Thanks. I've tested it - the fix works.

@asiunov
Copy link

asiunov commented Mar 18, 2022

This solution in #295 may not work in all cases, e.g. when there are many skipped partitions (because usually spark doesn't write empty parts). The easiest workaround is to do df.coalesce(10000).write....

@yeshvantbhavnasi
Copy link

Even I see the error popping on with latest version of bigquery-connector

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants