Skip to content

[BigQuery] appends to tables can be asynchronous when doing copy jobs#39120

Merged
stankiewicz merged 1 commit into
apache:masterfrom
stankiewicz:improve_copy_tables
Jun 26, 2026
Merged

[BigQuery] appends to tables can be asynchronous when doing copy jobs#39120
stankiewicz merged 1 commit into
apache:masterfrom
stankiewicz:improve_copy_tables

Conversation

@stankiewicz

Copy link
Copy Markdown
Contributor

When write disposition is 'appends', there is no need to for each job to be synchronous.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@stankiewicz stankiewicz requested a review from shunping June 26, 2026 08:10
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request optimizes BigQuery data loading by removing the requirement for synchronous job completion when appending data to tables. By allowing these copy jobs to run asynchronously, the pipeline can improve throughput and reduce latency during write operations.

Highlights

  • Asynchronous BigQuery Copy Jobs: Updated the BigQuery file loads logic to allow asynchronous execution for copy jobs when the write disposition is set to 'WRITE_APPEND'.
New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the BigQuery file loads logic to determine whether to wait for a job based on the write disposition, specifically setting wait_for_job to false when the disposition is 'WRITE_APPEND'. It also triggers the Python post-commit test suite. The review feedback recommends using the BigQueryDisposition.WRITE_APPEND constant instead of a hardcoded string to avoid magic strings and maintain consistency.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

if full_table_ref not in self._observed_tables:
write_disposition = self.write_disposition
wait_for_job = True
wait_for_job = write_disposition != 'WRITE_APPEND'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using the BigQueryDisposition.WRITE_APPEND constant instead of the hardcoded string 'WRITE_APPEND' is preferred to avoid magic strings and maintain consistency with the rest of the codebase.

Suggested change
wait_for_job = write_disposition != 'WRITE_APPEND'
wait_for_job = write_disposition != BigQueryDisposition.WRITE_APPEND

@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.67%. Comparing base (eef26ca) to head (97acf84).

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #39120      +/-   ##
============================================
+ Coverage     58.62%   58.67%   +0.05%     
  Complexity    15246    15246              
============================================
  Files          2769     2770       +1     
  Lines        275617   276050     +433     
  Branches      12163    12163              
============================================
+ Hits         161577   161974     +397     
- Misses       107621   107657      +36     
  Partials       6419     6419              
Flag Coverage Δ
python 79.36% <100.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@shunping shunping left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but waiting for the tests. Thanks!

@github-actions

Copy link
Copy Markdown
Contributor

Assigning reviewers:

R: @shunping for label python.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@stankiewicz stankiewicz merged commit 0c51b0d into apache:master Jun 26, 2026
117 checks passed
ash6898 pushed a commit to ash6898/beam that referenced this pull request Jun 28, 2026
ash6898 pushed a commit to ash6898/beam that referenced this pull request Jun 29, 2026
stankiewicz added a commit to stankiewicz/beam that referenced this pull request Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants