Skip to content

Conversation

@darynaishchenko
Copy link
Contributor

@darynaishchenko darynaishchenko commented Nov 12, 2025

https://github.com/airbytehq/airbyte-internal-issues/issues/15119#issue-3583612496

Summary by CodeRabbit

  • Bug Fixes
    • Improved error messages when files exceed size limits to include the file URI in both user-facing and internal messages for clearer troubleshooting.

@darynaishchenko darynaishchenko self-assigned this Nov 12, 2025
@github-actions
Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@daryna/file-based/add-file-uri-to-size-limit-error-message#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch daryna/file-based/add-file-uri-to-size-limit-error-message

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

@github-actions github-actions bot added the bug Something isn't working label Nov 12, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 12, 2025

📝 Walkthrough

Walkthrough

Error message in the file-based stream reader was updated to include the file URI when a file size exceeds the configured limit; exception type and control flow are unchanged.

Changes

Cohort / File(s) Summary
Error message enhancement
airbyte_cdk/sources/file_based/file_based_stream_reader.py
Appended "File URI: {file.uri}" to the size-exceeded error messages (user-facing and internal) without changing exception types or logic.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

  • Quick check that the URI interpolation produces expected output and avoids leaking sensitive data, wdyt?

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change—updating the error message for FileSizeLimitError to include file URI information, which aligns with the changeset and source branch name.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch daryna/file-based/add-file-uri-to-size-limit-error-message

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
airbyte_cdk/sources/file_based/file_based_stream_reader.py (2)

175-178: Consider using file_uri_for_logging for consistency?

I noticed that the logging statements on lines 189 and 197 use file.file_uri_for_logging instead of file.uri. Would it make sense to use file.file_uri_for_logging here as well for consistency, wdyt?

-            message = f"File size exceeds the {self.FILE_SIZE_LIMIT / 1e9} GB limit. File URI: {file.uri}"
+            message = f"File size exceeds the {self.FILE_SIZE_LIMIT / 1e9} GB limit. File URI: {file.file_uri_for_logging}"

175-178: Could we also include the actual file size in the error message?

Adding the actual file size alongside the limit might help users understand how much they're over the limit. Something like: "File size (X.XX GB) exceeds the 1.5 GB limit. File URI: ...", wdyt?

-            message = f"File size exceeds the {self.FILE_SIZE_LIMIT / 1e9} GB limit. File URI: {file.uri}"
+            message = f"File size ({file_size / 1e9:.2f} GB) exceeds the {self.FILE_SIZE_LIMIT / 1e9} GB limit. File URI: {file.uri}"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f0443aa and 4a20e5e.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/file_based/file_based_stream_reader.py (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Linters
airbyte_cdk/sources/file_based/file_based_stream_reader.py

[error] 172-178: ruff format check failed: 1 file would be reformatted, 773 files already formatted. Run 'ruff format .' to fix code style issues.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-google-drive
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: preview_docs
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Analyze (python)

@github-actions
Copy link

github-actions bot commented Nov 12, 2025

PyTest Results (Fast)

3 811 tests  ±0   3 800 ✅ ±0   7m 12s ⏱️ ±0s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit e1415cb. ± Comparison against base commit f0443aa.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Nov 12, 2025

PyTest Results (Full)

3 814 tests  ±0   3 802 ✅ ±0   10m 58s ⏱️ -6s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit e1415cb. ± Comparison against base commit f0443aa.

♻️ This comment has been updated with latest results.

@darynaishchenko
Copy link
Contributor Author

darynaishchenko commented Nov 12, 2025

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
airbyte_cdk/sources/file_based/file_based_stream_reader.py (1)

175-180: Nice improvement for debugging! The file URI addition is helpful.

The error message now clearly identifies which file exceeded the size limit, which should make troubleshooting much easier. The change looks good!

One small thought: since you have both message (user-facing) and internal_message parameters, would it be useful to include additional context (like file_size) in the internal_message for internal logs while keeping the user message simpler? Just a thought - the current approach of keeping them the same is definitely simpler and consistent, wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a20e5e and e1415cb.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/file_based/file_based_stream_reader.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-11-10T04:50:11.914Z
Learnt from: aaronsteers
Repo: airbytehq/airbyte-python-cdk PR: 13
File: airbyte_cdk/connector.py:99-99
Timestamp: 2024-11-10T04:50:11.914Z
Learning: When a PR's goal is to run the autoformat task from `ruff`, avoid suggesting code changes beyond formatting to prevent potential negative side effects.

Applied to files:

  • airbyte_cdk/sources/file_based/file_based_stream_reader.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-google-drive
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Analyze (python)

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@darynaishchenko darynaishchenko merged commit c9a5cf9 into main Nov 13, 2025
29 of 31 checks passed
@darynaishchenko darynaishchenko deleted the daryna/file-based/add-file-uri-to-size-limit-error-message branch November 13, 2025 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants