Apache Airflow Provider(s)
google
What happened?
BigQueryStreamingBufferEmptySensor (added in #66652) decides the streaming buffer is empty by checking table.streaming_buffer is None. That check is unreliable because BigQuery's streamingBuffer table-metadata field is eventually consistent.
For a window of several seconds after a streaming insert, the row is physically in the streaming buffer but table.streaming_buffer is still None. During that window the sensor reports the buffer empty — a false negative.
streaming_buffer is None is therefore ambiguous — it means either:
- "fully flushed / table never had streamed rows", or
- "rows were just streamed in, metadata hasn't caught up yet".
The sensor cannot tell these apart, so a DML task placed downstream of it (UPDATE/DELETE/MERGE) can still hit UPDATE or DELETE statement over table ... would affect rows in the streaming buffer — the exact error the sensor exists to prevent.
Evidence
Empirical timing check against real BigQuery — stream one row, then poll get_table().streaming_buffer:
t= 1.2s streaming_buffer = None -> sensor reports EMPTY (false)
t= 11.6s streaming_buffer present (estimated_rows=1) -> sensor WAITs (correct)
t= 22s..302s still present, estimated_rows=1 -> stays correct
There is a ~10–12s false-empty window. System tests run with the simulated executor (tasks run back-to-back, near-zero scheduling overhead), so the sensor's first poke fires ~1–2s after the upstream streaming-insert task finishes — squarely inside that window.
What you think should happen instead
The sensor should not report an empty buffer while rows are still buffered. Possible directions to evaluate:
- Require the sensor to observe a non-empty → empty transition rather than trusting a single
None reading.
- Otherwise, clearly document the eventual-consistency limitation in the sensor docstring and treat it as best-effort, so users know to add their own guard between a streaming insert and the sensor.
Acceptance criteria: either the sensor no longer false-passes immediately after a streaming insert, or the limitation is clearly documented — at which point the workaround below can be removed.
Workaround currently in place
The example_bigquery_sensors system test's streaming_insert task polls get_table().streaming_buffer until it becomes non-None before completing, so the metadata is caught up before check_streaming_buffer_empty runs its first poke. This makes the system test deterministic but does not fix the sensor for real users. It is tagged in the test with a link back to this issue.
How to reproduce
- Create a BigQuery table.
- Stream a row into it (
BigQueryHook.insert_all).
- Immediately call
BigQueryStreamingBufferEmptySensor.poke() — it returns True (empty) although the row is in the streaming buffer.
Are you willing to submit PR?
Code of Conduct
Drafted-by: Claude Code (Opus 4.7) (human reviewed)
Apache Airflow Provider(s)
google
What happened?
BigQueryStreamingBufferEmptySensor(added in #66652) decides the streaming buffer is empty by checkingtable.streaming_buffer is None. That check is unreliable because BigQuery'sstreamingBuffertable-metadata field is eventually consistent.For a window of several seconds after a streaming insert, the row is physically in the streaming buffer but
table.streaming_bufferis stillNone. During that window the sensor reports the buffer empty — a false negative.streaming_buffer is Noneis therefore ambiguous — it means either:The sensor cannot tell these apart, so a DML task placed downstream of it (
UPDATE/DELETE/MERGE) can still hitUPDATE or DELETE statement over table ... would affect rows in the streaming buffer— the exact error the sensor exists to prevent.Evidence
Empirical timing check against real BigQuery — stream one row, then poll
get_table().streaming_buffer:There is a ~10–12s false-empty window. System tests run with the simulated executor (tasks run back-to-back, near-zero scheduling overhead), so the sensor's first poke fires ~1–2s after the upstream streaming-insert task finishes — squarely inside that window.
What you think should happen instead
The sensor should not report an empty buffer while rows are still buffered. Possible directions to evaluate:
Nonereading.Acceptance criteria: either the sensor no longer false-passes immediately after a streaming insert, or the limitation is clearly documented — at which point the workaround below can be removed.
Workaround currently in place
The
example_bigquery_sensorssystem test'sstreaming_inserttask pollsget_table().streaming_bufferuntil it becomes non-Nonebefore completing, so the metadata is caught up beforecheck_streaming_buffer_emptyruns its first poke. This makes the system test deterministic but does not fix the sensor for real users. It is tagged in the test with a link back to this issue.How to reproduce
BigQueryHook.insert_all).BigQueryStreamingBufferEmptySensor.poke()— it returnsTrue(empty) although the row is in the streaming buffer.Are you willing to submit PR?
Code of Conduct
Drafted-by: Claude Code (Opus 4.7) (human reviewed)