Skip to content

Import workflow fixes#1886

Merged
vish-cs merged 1 commit intodatacommonsorg:masterfrom
vish-cs:itest
Mar 13, 2026
Merged

Import workflow fixes#1886
vish-cs merged 1 commit intodatacommonsorg:masterfrom
vish-cs:itest

Conversation

@vish-cs
Copy link
Copy Markdown
Contributor

@vish-cs vish-cs commented Feb 10, 2026

  • Use PENDING state to track imports with failed ingestion for retry: update spanner-ingestion-workflow to mark imports as PENDING after a failure.
  • Update spanner client to advance the stale read timestamp only if there are no PENDING imports. Refactor update_ingestion_history into multiple methods updating each table.
  • Update get_import_info to support importList as override: this enables running ingestion for specific imports
  • Use cloud scheduler job to determine next_refresh instead of passing schedule string in the import config.
  • Handle duplicate events in CDA feed handler. Feed handler can have duplicate messages delivered so use GCS to detect duplicates
  • Reactored aggregation-helper/main.py to move helper functions to import_herlper.py
  • Update import automation workflow to add default resources and job name to avoid passing them in import config.
  • Add handling for provenance mcf generation: Copy provenance MCF files from GCS. Generate a default metadata file if none found
  • Add spanner schema for import status metadata tables
  • Add a field to track failed ingestion in IngestionHistory table for mixer to read stale read timestamp
  • Cloud build config fixes
  • Other clean up: Add python doc strings for various methods

@gemini-code-assist

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix and enhance the import workflow by adding retry logic and improving import list fetching. However, it introduces significant security vulnerabilities, including path traversal in StorageClient methods, which could allow writing blobs to arbitrary GCS locations, and an MCF injection vulnerability due to unsanitized user input in metadata generation. Furthermore, a critical bug in ingestion-helper/main.py prevents import status updates, and there's a medium-severity suggestion for code clarity. It is imperative to address these security flaws and functional issues to ensure the workflow's integrity and secure operation.

Comment thread import-automation/workflow/ingestion-helper/main.py
Comment thread import-automation/workflow/ingestion-helper/storage_client.py Outdated
Comment thread import-automation/workflow/ingestion-helper/storage_client.py Outdated
Comment thread import-automation/workflow/ingestion-helper/storage_client.py Outdated
Comment thread import-automation/workflow/import-helper/main.py Outdated
@vish-cs vish-cs force-pushed the itest branch 2 times, most recently from 78e3ba7 to 4ebba11 Compare February 11, 2026 08:10
@vish-cs vish-cs requested a review from ajaits February 11, 2026 09:30
@vish-cs vish-cs force-pushed the itest branch 4 times, most recently from 2787715 to 4f7ad69 Compare February 17, 2026 16:50
@vish-cs vish-cs force-pushed the itest branch 2 times, most recently from e87ff84 to a27bca7 Compare February 26, 2026 05:55
Comment thread import-automation/workflow/import-helper/import_helper.py
Comment thread import-automation/workflow/ingestion-helper/import_utils.py
Comment thread import-automation/workflow/ingestion-helper/storage_client.py Outdated
Comment thread import-automation/workflow/ingestion-helper/storage_client.py Outdated
Comment thread import-automation/workflow/ingestion-helper/spanner_client.py
Comment thread import-automation/workflow/ingestion-helper/storage_client.py Outdated
@vish-cs vish-cs merged commit b36ac62 into datacommonsorg:master Mar 13, 2026
9 checks passed
@vish-cs vish-cs deleted the itest branch April 27, 2026 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants