Skip to content

Bug/datafusion query#236

Merged
thorrester merged 3 commits intomainfrom
bug/datafusion-query
Mar 18, 2026
Merged

Bug/datafusion query#236
thorrester merged 3 commits intomainfrom
bug/datafusion-query

Conversation

@thorrester
Copy link
Copy Markdown
Member

@thorrester thorrester commented Mar 18, 2026

Pull Request

Short Summary

Fixes stale DataFusion query results caused by SessionContext caching file metadata after Delta Lake writes. After each write, update_datafusion_session() is now called to re-register the table provider with the latest Delta log state. Also refactors ObjectStore construction to use a shared instance across the trace engine, summary engine, and control table — preventing divergent caching on cloud backends.

Context

Root cause — stale session state after writes

DataFusion's SessionContext caches the Delta table's file metadata at registration time. Before this fix, each write would update the in-memory DeltaTable reference but leave the session holding a stale snapshot of the Delta log. Subsequent queries would silently return results from the pre-write state.

The fix calls update_datafusion_session() after every write in TraceSpanDBEngine, TraceSummaryDBEngine, and ControlTableEngine:

Before:

*table_guard = updated_table;

After:

updated_table.update_datafusion_session(&self.ctx.state())?;
*table_guard = updated_table;

Shared ObjectStore refactor

ObjectStore is now constructed once in TraceSpanDBEngine and passed down to TraceSummaryDBEngine and ControlTableEngine rather than each engine constructing its own from ObjectStorageSettings. On cloud backends (GCS/S3), each ObjectStore instance has its own caching layer — separate instances could return divergent views of the same bucket. Using a shared instance ensures all engines see consistent state.

TraceSpanService now exposes object_store: ObjectStore so callers (e.g., setup.rs) can pass the same instance to TraceSummaryService.

update_incremental removed from summary write path

table_guard.update_incremental() was removed from TraceSummaryDBEngine::write_summary(). This engine runs as a single-writer actor — no other process commits to its Delta table, so the in-memory state is always current. Calling update_incremental before a write could corrupt the DeltaTable snapshot in intermediate error states, producing stale reads until restart.

Additional fixes:

  • ControlTableEngine::write_task_update now propagates unexpected delete errors instead of silently swallowing them (only "no data / empty table" errors are suppressed as expected on a new table)
  • micros_to_datetime replaced Utc.timestamp_opt(...).unwrap() with DateTime::from_timestamp_micros(...).ok_or(...) — returns a proper error on out-of-range timestamps instead of panicking
  • extract_map_attributes in summary.rs replaces unwrap() calls with ?-style let-else returns with warn! logging
  • Duration calculation clamps to max(0) to avoid negative durations on clock skew
  • debug_recent_traces endpoint added at GET /trace/debug/recent — returns the last 10 traces from the past 24 hours (dev/debug aid)
  • type: ignore added to set_tracer_provider call in Python tracing module to suppress mypy false positive

Regression tests added:

  • test_span_write_visibility_across_multiple_writes — verifies span writes are immediately visible across sequential writes
  • test_summary_write_visibility_across_multiple_writes — same for summary writes
File Change
scouter_dataframe/src/parquet/control/engine.rs Accept &ObjectStore instead of &ObjectStorageSettings; propagate delete errors; error-returning Arrow helpers; debug_assert on task_name
scouter_dataframe/src/parquet/tracing/engine.rs Call update_datafusion_session() after every write; pass shared object_store to control engine
scouter_dataframe/src/parquet/tracing/service.rs Expose object_store field; add test_span_write_visibility_across_multiple_writes
scouter_dataframe/src/parquet/tracing/summary.rs Accept &ObjectStore; remove update_incremental; fix micros_to_datetime; replace unwrap in extract_map_attributes; add visibility regression test
scouter_server/src/api/routes/trace/route.rs Add debug_recent_traces handler and route
scouter_server/src/api/setup.rs Pass trace_service.object_store to TraceSummaryService
py-scouter/python/scouter/tracing/__init__.py Add # type: ignore to suppress mypy false positive
py-scouter/examples/tracing/README.md New tracing example README
py-scouter/examples/tracing/instrumentor_example.py New Python tracing instrumentation example

Is this a Breaking Change?

No. All public interfaces (TraceSpanService, TraceSummaryService) retain compatible signatures internally. The ObjectStore refactor is internal to the engine constructors. The new debug route is purely additive.


Open with Devin

@thorrester thorrester marked this pull request as ready for review March 18, 2026 01:51
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 6 additional findings.

Open in Devin Review

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.97%. Comparing base (f2f1582) to head (e838246).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #236      +/-   ##
==========================================
+ Coverage   77.38%   77.97%   +0.59%     
==========================================
  Files          19       19              
  Lines         336      336              
==========================================
+ Hits          260      262       +2     
+ Misses         76       74       -2     
Files with missing lines Coverage Δ
py-scouter/python/scouter/tracing/__init__.py 75.97% <100.00%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@thorrester thorrester merged commit a5a0db3 into main Mar 18, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants