Fix DuckDB staging table schema evolution support#605
Merged
Conversation
- Updated the DuckDBTableManager to automatically add new columns from parquet files to the existing table schema during ingestion. - Adjusted the DuckDBStager to set drop_on_retry to False for improved stability during retries.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Refactors the DuckDB table management layer to properly handle schema evolution in staging tables. Previously, staging tables could fail or produce incorrect results when the schema of incoming data diverged from the existing table definition. This change ensures that new or altered columns are gracefully incorporated without requiring manual table drops or migrations.
Key Accomplishments
robosystems/graph_api/core/duckdb/manager.pywith robust logic (~50 new lines) to detect schema mismatches between incoming data and existing tables, and to automatically reconcile them (e.g., adding new columns, handling type changes).Breaking Changes
None expected. The changes are additive and backward-compatible — existing tables with matching schemas will continue to work as before. Tables with schema mismatches that previously would have errored will now be evolved automatically.
Testing Notes
Infrastructure Considerations
🤖 Generated with Claude Code
Branch Info:
bugfix/duckdb-staging-evolutionmainCo-Authored-By: Claude noreply@anthropic.com