Skip to content

Conversation

@darynaishchenko
Copy link
Contributor

@darynaishchenko darynaishchenko commented Nov 14, 2025

Source SFTP Bulk overrides source uri and doesn't use default uri for this purposes -https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-sftp-bulk/source_sftp_bulk/stream_reader.py#L182

Summary by CodeRabbit

  • Refactor
    • Updated file reference handling to consistently track source URIs during file uploads.

@darynaishchenko darynaishchenko self-assigned this Nov 14, 2025
@github-actions github-actions bot added the enhancement New feature or request label Nov 14, 2025
@github-actions
Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@daryna/file-based/add-source-uri-to-uploadable-remote-file#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch daryna/file-based/add-source-uri-to-uploadable-remote-file

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 14, 2025

📝 Walkthrough

Walkthrough

Added a new source_uri property to UploadableRemoteFile that exposes the underlying URI, then updated AirbyteRecordMessageFileReference construction to use this new property instead of file.uri for populating the source URI field.

Changes

Cohort / File(s) Summary
Add source_uri property
airbyte_cdk/sources/file_based/remote_file.py
Added public source_uri property to UploadableRemoteFile that returns the same URI as self.uri
Update file reference construction
airbyte_cdk/sources/file_based/file_based_stream_reader.py
Updated AirbyteRecordMessageFileReference to populate source_uri field from file.source_uri instead of file.uri

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

  • The changes are minimal and straightforward: adding a simple property that mirrors existing functionality and updating a single reference point to use it
  • No complex logic or control flow changes involved

Possibly related PRs

Suggested reviewers

  • maxi297

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a source_uri property to UploadableRemoteFile. It's specific, concise, and reflects the primary purpose of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch daryna/file-based/add-source-uri-to-uploadable-remote-file

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d9125f and a087a9a.

📒 Files selected for processing (2)
  • airbyte_cdk/sources/file_based/file_based_stream_reader.py (1 hunks)
  • airbyte_cdk/sources/file_based/remote_file.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-11-15T01:04:21.272Z
Learnt from: aaronsteers
Repo: airbytehq/airbyte-python-cdk PR: 58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.

Applied to files:

  • airbyte_cdk/sources/file_based/file_based_stream_reader.py
🧬 Code graph analysis (1)
airbyte_cdk/sources/file_based/file_based_stream_reader.py (1)
airbyte_cdk/sources/file_based/remote_file.py (1)
  • source_uri (60-64)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-google-drive
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Analyze (python)
🔇 Additional comments (2)
airbyte_cdk/sources/file_based/file_based_stream_reader.py (1)

210-210: Perfect! This correctly uses the new configurable source URI.

The change from file.uri to file.source_uri properly leverages the new property, allowing subclasses of UploadableRemoteFile to customize the source URI as needed.

airbyte_cdk/sources/file_based/remote_file.py (1)

58-64: LGTM! Nice solution for allowing Source SFTP Bulk to customize the source URI.

The property implementation looks good and maintains backward compatibility by defaulting to self.uri. I verified that file.uri usage across the codebase is consistent and contextually appropriate—mostly for file tracking, logging, and filtering—so this new property won't conflict with existing patterns.

Quick observation: This property is quite similar to file_uri_for_logging (lines 52-57). Would it be worth adding a brief comment distinguishing their purposes (e.g., one for logging output, the other for record data)? Just thinking it might help future maintainers understand when to use which, wdyt?


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

PyTest Results (Fast)

3 813 tests  ±0   3 801 ✅ ±0   6m 36s ⏱️ +8s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit a087a9a. ± Comparison against base commit 5d9125f.

@github-actions
Copy link

PyTest Results (Full)

3 816 tests  ±0   3 804 ✅ ±0   10m 51s ⏱️ +4s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit a087a9a. ± Comparison against base commit 5d9125f.

@darynaishchenko darynaishchenko merged commit 80b7668 into main Nov 19, 2025
28 of 31 checks passed
@darynaishchenko darynaishchenko deleted the daryna/file-based/add-source-uri-to-uploadable-remote-file branch November 19, 2025 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants