Skip to content

feat(mcp): Add pagination parameters to get_cloud_sync_logs tool#888

Merged
Aaron ("AJ") Steers (aaronsteers) merged 8 commits intomainfrom
devin/1764717892-add-log-pagination-params
Dec 3, 2025
Merged

feat(mcp): Add pagination parameters to get_cloud_sync_logs tool#888
Aaron ("AJ") Steers (aaronsteers) merged 8 commits intomainfrom
devin/1764717892-add-log-pagination-params

Conversation

@aaronsteers
Copy link
Contributor

@aaronsteers Aaron ("AJ") Steers (aaronsteers) commented Dec 2, 2025

Summary

Adds optional parameters to the get_cloud_sync_logs MCP tool to prevent sync logs from overwhelming the context window. This addresses an issue where logs were returning ~8.2m tokens and filling up the context.

New parameters:

  • max_lines: Maximum lines to return (defaults to 4000 if not specified)
  • line_offset: Skip N lines from the beginning of logs
  • from_tail: Return lines from the end instead of beginning

Validation ensures line_offset and from_tail cannot be used together. When logs are truncated, metadata is appended showing how many lines were returned vs total.

Review & Testing Checklist for Human

  • Default behavior change: Previously returned all logs, now defaults to 4000 lines. Verify this default is appropriate for typical use cases.
  • Edge case: What happens when line_offset exceeds total lines? Currently returns empty result with truncation note - verify this is acceptable UX.
  • Test manually: Call get_cloud_sync_logs with various parameter combinations (no params, max_lines only, from_tail=True, line_offset=100) against a real connection with logs.

Notes

  • No unit tests were added for the new pagination logic
  • The truncation note format [Showing X of Y total lines] is appended to the log output

Requested by: AJ Steers (Aaron ("AJ") Steers (@aaronsteers))
Link to Devin run: https://app.devin.ai/sessions/2b0a9632ae694b73be67301524ed7a8d

Summary by CodeRabbit

  • New Features

    • Paginated cloud sync logs: limit lines (defaults to 4000, 0 = no limit), skip from start via offset, or retrieve from end.
    • Logs now return a structured paginated result with text, start line, returned line count, and total available lines.
    • Empty logs return an empty structured result instead of raw text.
  • Bug Fixes

    • Improved error handling for missing sync jobs/attempts and requested attempts.
    • Validation preventing conflicting pagination options (offset vs. tail).

✏️ Tip: You can customize this high-level summary in your review settings.

Add optional parameters to limit log output and prevent context window overflow:
- max_lines: Maximum number of lines to return (defaults to 4000)
- line_offset: Number of lines to skip from the beginning
- from_tail: Return lines from the end instead of the beginning

Validation ensures line_offset and from_tail cannot be used together.
Truncation metadata is appended when logs are limited.

Co-Authored-By: AJ Steers <aj@airbyte.io>
@devin-ai-integration
Copy link
Contributor

Original prompt from AJ Steers
SYSTEM:
=== BEGIN THREAD HISTORY (in #hydra-support-bot) ===
Aldo Alberto Jesus Gonzalez Campos (U070BMPDUHJ): <@U05AKF1BCC9> I was doing a test with Sonar backend and found a bug that was driving me crazy but actually was able to reproduce with Claude:

```&gt; can you check my sources in workspace e8b14943-1b39-4ea5-8d08-4792a74090de ? 

⏺ I'll check the sources in your Airbyte Cloud workspace.

⏺ I'll check the sources in your Airbyte Cloud workspace.

⏺ airbyte - list_deployed_cloud_source_connectors (MCP)(name_contains: "", max_items_limit: null, workspace_id: "e8b14943-1b39-4ea5-8d08-4792a74090de")
  ⎿  {                            
       "result": [
         {
     … +6 lines (ctrl+o to expand)

⏺ You have 1 source connector deployed in workspace e8b14943-1b39-4ea5-8d08-4792a74090de:

  Facebook Marketing
  - ID: de58b4dd-9dbb-4bf0-9fb5-a50a486b98e5
  - URL: <https://cloud.airbyte.com/workspaces/e8b14943-1b39-4ea5-8d08-4792a74090de/source/de58b4dd-9dbb-4bf0-9fb5-a50a486b98e5>

  Would you like me to get more details about this source connector, such as its configuration or which streams are available?

&gt; can you show me the logs from the last sync? 

⏺ I need to find the connection that uses this Facebook Marketing source first, then I can get the logs from the last sync.

⏺ I'll need to find the connection that uses this Facebook Marketing source first, then I can get the sync logs.

⏺ airbyte - list_deployed_cloud_connections (MCP)(name_contains: "", max_items_limit: null, workspace_id: "e8b14943-1b39-4ea5-8d08-4792a74090de")
  ⎿  {                            
       "result": [
         {
     … +8 lines (ctrl+o to expand)

⏺ airbyte - get_cloud_sync_logs (MCP)(connection_id: "8cac2cdb-57f5-42d5-ab70-74ed753c061d", workspace_id: "e8b14943-1b39-4ea5-8d08-4792a74090de")
  ⎿  ⚠ Large MCP response (~8.2m tokens), this can fill up context quickly                                                                                                                 ... (2538 chars truncated...)

@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link

github-actions bot commented Dec 2, 2025

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This PyAirbyte Version

You can test this version of PyAirbyte using the following:

# Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1764717892-add-log-pagination-params' pyairbyte --help

# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1764717892-add-log-pagination-params'

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /fix-pr - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test-pr - Runs tests with the updated PyAirbyte

Community Support

Questions? Join the #pyairbyte channel in our Slack workspace.

📝 Edit this welcome message.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 2, 2025

📝 Walkthrough

Walkthrough

Return type for get_cloud_sync_logs changed from raw string to a structured LogReadResult; new parameters (max_lines, from_tail, line_offset) with mutual-exclusion validation were added; missing-resource paths now raise AirbyteMissingResourceError; log slicing and metadata population implemented (default max 4000, 0 = unlimited).

Changes

Cohort / File(s) Change Summary
Log retrieval API enhancements
airbyte/mcp/cloud_ops.py
Added public LogReadResult pydantic model and changed get_cloud_sync_logs to return it. Added parameters max_lines, from_tail, and line_offset with mutual-exclusion validation and default behavior (from_tail=True when offset unspecified). Replaced various missing-log/string-return paths with AirbyteMissingResourceError. Implemented line-limiting logic (default 4000, 0 = no limit), head/offset vs tail slicing, and populated log_text, log_text_start_line, log_text_line_count, total_log_lines_available, plus job/attempt metadata.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Can you double-check validation around line_offset and from_tail for edge values (0, None, negative) to ensure behavior is explicit and documented, wdyt?
  • Can you verify off-by-one indexing for log_text_start_line and log_text_line_count when slicing from head vs tail (1-based start), wdyt?
  • Can you confirm the new AirbyteMissingResourceError usage preserves expected error semantics and identifiers expected by callers, wdyt?

Suggested reviewers

  • aldogonzalez8

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: adding pagination parameters to the get_cloud_sync_logs tool, which matches the core objective of the PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1764717892-add-log-pagination-params

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
airbyte/mcp/cloud_ops.py (2)

805-831: Nice parameter design! Consider adding validation for parameter ranges, wdyt?

The new pagination parameters are well-defined with clear descriptions. The mutual exclusivity constraint is properly documented.

However, there's no validation to ensure max_lines and line_offset are non-negative integers when provided. Negative values could lead to confusing behavior (e.g., max_lines=-10 would result in empty output due to Python slicing semantics).

Consider adding validation after line 838:

     if line_offset is not None and from_tail:
         raise PyAirbyteInputError(
             message="Cannot specify both 'line_offset' and 'from_tail' parameters."
         )
+
+    if max_lines is not None and max_lines < 0:
+        raise PyAirbyteInputError(
+            message="Parameter 'max_lines' must be non-negative."
+        )
+
+    if line_offset is not None and line_offset < 0:
+        raise PyAirbyteInputError(
+            message="Parameter 'line_offset' must be non-negative."
+        )

888-899: Small precision fix needed for the offset check, wdyt?

The truncation note logic is well-implemented and informative. However, there's a minor precision issue on line 896.

Line 896 uses elif line_offset: which relies on truthiness checking. If a user explicitly passes line_offset=0, this condition would be falsy, and the truncation note would omit the offset information. While line_offset=0 is effectively "no offset," if it's explicitly set, it would be more accurate to include it in the note.

Consider this fix for precision:

-        elif line_offset:
+        elif line_offset is not None:
             truncation_note += f" (offset: {line_offset})"

This ensures the offset is shown whenever it's explicitly provided, even if it's 0.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b9f2fa and 42099e1.

📒 Files selected for processing (1)
  • airbyte/mcp/cloud_ops.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte/mcp/cloud_ops.py (1)
airbyte/exceptions.py (1)
  • PyAirbyteInputError (201-210)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Windows)
  • GitHub Check: Pytest (All, Python 3.10, Windows)
  • GitHub Check: Pytest (No Creds)
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (3)
airbyte/mcp/cloud_ops.py (3)

834-838: Validation logic looks solid!

The mutual exclusivity check correctly prevents using line_offset with from_tail=True, while sensibly allowing line_offset with from_tail=False or from_tail=None. The error message is clear.


845-851: Excellent improvement to error messages!

The new specific error messages for missing sync jobs and attempts are much more informative than generic errors would be. They provide clear context to help users understand what went wrong.


873-886: Line limiting logic is correct!

The slicing logic properly handles both tail-based and offset-based line selection:

  • Tail mode correctly retrieves the last N lines
  • Offset mode correctly skips the specified number of lines from the beginning
  • The default of 4000 lines matches the parameter documentation

@github-actions
Copy link

github-actions bot commented Dec 2, 2025

PyTest Results (Fast Tests Only, No Creds)

320 tests  ±0   320 ✅ ±0   5m 55s ⏱️ +9s
  1 suites ±0     0 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit f59b054. ± Comparison against base commit 5b9f2fa.

♻️ This comment has been updated with latest results.

…c_logs

Update the return type from str to LogReadResult with structured fields:
- log_text: The log text content
- log_text_start_line: 1-based line index of first line returned
- log_text_line_count: Count of lines returned
- total_log_lines_available: Total lines available for pagination

Also convert error cases from string returns to proper exceptions.

Co-Authored-By: AJ Steers <aj@airbyte.io>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
airbyte/mcp/cloud_ops.py (1)

818-844: Consider validating non-negative values for max_lines and line_offset?

Negative values could lead to unexpected Python slicing behavior. For instance, line_offset=-5 would cause lines[-5:...] which returns lines from the end rather than raising an error. Adding a quick validation alongside the mutual exclusivity check would make the API more robust, wdyt?

     # Validate that line_offset and from_tail are not both set
     if line_offset is not None and from_tail:
         raise PyAirbyteInputError(
             message="Cannot specify both 'line_offset' and 'from_tail' parameters."
         )
+    if max_lines is not None and max_lines < 0:
+        raise PyAirbyteInputError(
+            message="'max_lines' must be a non-negative integer."
+        )
+    if line_offset is not None and line_offset < 0:
+        raise PyAirbyteInputError(
+            message="'line_offset' must be a non-negative integer."
+        )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42099e1 and 1505461.

📒 Files selected for processing (1)
  • airbyte/mcp/cloud_ops.py (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Windows)
  • GitHub Check: Pytest (All, Python 3.10, Windows)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (No Creds)
🔇 Additional comments (5)
airbyte/mcp/cloud_ops.py (5)

163-174: Nice addition of a structured result model!

The LogReadResult model is well-designed for pagination—having total_log_lines_available alongside the returned line count makes it easy for consumers to understand truncation and request additional pages.


847-851: Validation logic looks correct.

The mutual exclusivity check properly handles the case where from_tail=False (essentially a no-op), only raising when from_tail is truthy alongside a set line_offset.


858-883: Improved error handling with proper exceptions.

Switching from returning error strings to raising AirbyteMissingResourceError is a better pattern—consumers can now catch specific exceptions rather than parsing return values. Just a heads up that this is a breaking change for any existing callers that relied on string returns for error cases, wdyt if that's been considered?


889-896: Clean handling of the empty logs edge case.

Returning a structured LogReadResult with zeros rather than an empty string maintains consistency with the normal flow.


898-919: Line slicing logic looks solid.

The pagination math handles both head and tail modes correctly. The edge case where line_offset exceeds total lines gracefully returns an empty result with accurate metadata—nice touch. The 1-based conversion for log_text_start_line is also correct.

@github-actions
Copy link

github-actions bot commented Dec 2, 2025

PyTest Results (Full)

388 tests  ±0   372 ✅ ±0   23m 39s ⏱️ - 3m 7s
  1 suites ±0    16 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit f59b054. ± Comparison against base commit 5b9f2fa.

♻️ This comment has been updated with latest results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APPROVED

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
airbyte/mcp/cloud_ops.py (2)

907-928: Pagination logic looks solid, minor edge case to consider.

The implementation handles both tail-reading and offset-reading well. One small edge case: if line_offset exceeds total_lines, the result will have log_text_line_count=0 but log_text_start_line will be line_offset + 1 (which is beyond the available lines). This might be slightly confusing but isn't technically wrong. WDYT about clamping start_index to total_lines in such cases, or is the current behavior intentional to signal "you asked for lines beyond what exists"?

Also, after line 858's fix, max_lines could be None, but it's typed as int in the signature (line 819). Should it be int | None to match the runtime behavior?


850-850: Consider updating the docstring to mention pagination.

The docstring could briefly note that logs are paginated by default (4000 lines) to prevent context overflow. WDYT about adding a sentence like "Returns paginated log results with a default limit of 4000 lines"?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1505461 and 983b7e0.

📒 Files selected for processing (1)
  • airbyte/mcp/cloud_ops.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte/mcp/cloud_ops.py (2)
airbyte/exceptions.py (2)
  • PyAirbyteInputError (201-210)
  • AirbyteMissingResourceError (505-509)
airbyte/cloud/sync_results.py (3)
  • SyncResult (218-495)
  • get_attempts (338-352)
  • get_full_log_text (182-214)
🪛 GitHub Actions: Generate Docs
airbyte/mcp/cloud_ops.py

[error] 858-858: SyntaxError: expected 'else' after 'if' expression (max_lines = None if max_lines = 0 else max_lines)

🪛 GitHub Actions: Run Linters
airbyte/mcp/cloud_ops.py

[error] 858-858: SyntaxError: Invalid assignment target at line 858. max_lines = None if max_lines = 0 else max_lines


[error] 858-858: SyntaxError: Expected 'else', found '=' in expression: max_lines = None if max_lines = 0 else max_lines


[error] 858-858: SyntaxError: The keyword is not allowed as a variable declaration name in: max_lines = None if max_lines = 0 else max_lines


[error] 858-858: SyntaxError: Simple statements must be separated by newlines or semicolons in the expression around line 858.


[error] Ruff check failed due to syntax errors in the codebase.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Pytest (All, Python 3.10, Windows)
  • GitHub Check: Pytest (All, Python 3.11, Windows)
🔇 Additional comments (5)
airbyte/mcp/cloud_ops.py (5)

163-174: LGTM! Nice structured result model.

The LogReadResult class provides a clean way to return paginated logs with metadata. The 1-based line indexing aligns well with how most editors display line numbers.


818-849: Nice pagination parameters, but heads up on the breaking change.

The new parameters are well-documented and the default of 4000 lines makes sense for preventing context overflow. Just noting that this changes the default behavior from returning all logs to capping at 4000 lines - existing users relying on complete logs might be surprised. WDYT about mentioning this in the docstring or release notes?


851-856: LGTM! Clean validation logic.

Good use of PyAirbyteInputError to enforce the mutual exclusivity constraint between line_offset and from_tail. The error context is helpful for debugging.


865-889: LGTM! Much better error handling.

Raising AirbyteMissingResourceError instead of returning error strings is a solid improvement. This makes errors more consistent and machine-readable, especially useful in an MCP context where structured errors are valuable.


896-905: LGTM! Good handling of the no-logs edge case.

Returning a structured LogReadResult even when logs are empty maintains consistency with the new return type. The descriptive message in log_text is clear and helpful.

The line 'max_lines = None if max_lines == 0 else max_lines' had a typo
(= instead of ==) and was also redundant since the logic at line 912
already handles max_lines=0 by using total_lines instead.

Co-Authored-By: AJ Steers <aj@airbyte.io>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
airbyte/mcp/cloud_ops.py (2)

818-828: Type annotation looks good, minor comment about default behavior.

The max_lines parameter has type int with a default of 4000, which means it will never be None at runtime. The logic at line 911 (max_lines or total_lines) correctly handles the 0 case for "no limit." Just noting that the comment on line 910 mentions "None" but that's unreachable given the type - would you like to simplify the comment to just mention 0, wdyt?

-    # If max_lines is 0 or None, we return all lines.
+    # If max_lines is 0, we return all lines (no limit).

795-927: Consider adding unit tests for the pagination logic.

The new pagination parameters introduce several code paths:

  • from_tail=True with various max_lines values
  • line_offset with edge cases (0, within range, exceeding total)
  • max_lines=0 for "no limit"
  • Empty logs scenario
  • Validation of mutual exclusivity

These would benefit from unit test coverage to prevent regressions. Would you like me to help generate test cases for these scenarios, wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 983b7e0 and 0ad5dd8.

📒 Files selected for processing (1)
  • airbyte/mcp/cloud_ops.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte/mcp/cloud_ops.py (3)
airbyte/exceptions.py (1)
  • AirbyteMissingResourceError (505-509)
airbyte/_util/api_util.py (1)
  • get_connection (336-363)
airbyte/cloud/sync_results.py (2)
  • get_attempts (338-352)
  • get_full_log_text (182-214)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Pytest (All, Python 3.11, Windows)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Windows)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (No Creds)
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (6)
airbyte/mcp/cloud_ops.py (6)

163-174: Nice addition of a structured result model!

The LogReadResult model cleanly encapsulates all the pagination metadata, making it easier for callers to understand what portion of logs they're seeing. The 1-based indexing is well-documented. LGTM!


851-856: Mutual exclusivity validation looks solid.

The check correctly prevents combining line_offset with from_tail=True while allowing from_tail=False to pass through. The error message is clear and includes helpful context. Nice work!


864-875: Nice improvement: using structured errors instead of string returns.

The switch to AirbyteMissingResourceError aligns with the codebase pattern (e.g., get_connection in api_util.py). This makes error handling more explicit and programmatic.

Just want to confirm - since this changes the behavior from returning a string to raising an exception, have you verified that MCP tool consumers handle this appropriately, wdyt?


894-904: Good handling of the empty logs case.

The placeholder message clearly communicates the lack of logs. One small thought: setting log_text_start_line=1 when there are zero lines feels slightly semantic (what's line 1 of 0 lines?), but it's consistent with the 1-based indexing convention and avoids any potential 0-index confusion. Totally fine as-is!


913-920: Line slicing logic is correct.

The from_tail branch correctly calculates the start index and takes the last N lines. The offset branch handles the standard pagination case well.

One edge case to note: when line_offset >= total_lines, this returns an empty result with log_text_start_line = line_offset + 1 (which exceeds total_log_lines_available). The metadata fields make this unambiguous, but you might consider adding a note in the docstring about this behavior - or is the current behavior intuitive enough, wdyt?


922-927: Clean structured return.

The LogReadResult provides all the pagination metadata (log_text_line_count and total_log_lines_available) that callers need to understand truncation - much better than appending a text note. This keeps the log text clean while still communicating the context.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
airbyte/mcp/cloud_ops.py (1)

899-911: Consider making empty‑logs metadata and text more self‑consistent

For the no‑logs case you return a human‑friendly message in log_text but log_text_line_count=0. That’s perfectly workable, but it does mean line_count doesn’t reflect the number of newline‑separated lines in log_text itself.

Would you consider either (a) returning an empty log_text and relying on the metadata, or (b) setting log_text_line_count to the actual line count of the placeholder message, so the fields always describe the text payload? wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ad5dd8 and 5b03a20.

📒 Files selected for processing (1)
  • airbyte/mcp/cloud_ops.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte/mcp/cloud_ops.py (2)
airbyte/exceptions.py (2)
  • PyAirbyteInputError (201-210)
  • AirbyteMissingResourceError (505-509)
airbyte/cloud/sync_results.py (1)
  • get_full_log_text (182-214)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Windows)
  • GitHub Check: Pytest (All, Python 3.10, Windows)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (No Creds)
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (3)
airbyte/mcp/cloud_ops.py (3)

163-177: LogReadResult shape looks solid and future‑proof

The model fields and docs line up nicely with the pagination behavior and give downstream callers enough metadata to reason about truncation; this structure seems good to ship as‑is, wdyt?


856-861: Input‑validation guard for mutually exclusive params looks good

The mutual‑exclusion check for line_offset and from_tail is clear and uses PyAirbyteInputError consistently with the rest of the module; once the effective_from_tail refactor is in place, this should continue to behave as intended, wdyt?


869-893: Nice upgrade to structured missing‑resource errors

Raising AirbyteMissingResourceError for missing sync jobs/attempts (including the specific (job, attempt) identifier) is a good improvement over returning plain strings and should make debugging from agents easier; seems solid as‑is, wdyt?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
airbyte/mcp/cloud_ops.py (2)

822-864: Clarify pagination param behavior and consider guarding against invalid inputs

A couple of edge cases you might want to nail down here:

  • Because max_lines, from_tail, and line_offset don’t have Python-level defaults (=), they become required keyword-only params for direct Python callers even though Pydantic/Field provides MCP defaults. If anyone imports and calls get_cloud_sync_logs(...) directly today, this is a subtle breaking change. Do you want to add =4000, =None, and =None defaults at the function level to keep the Python API backward-friendly, wdyt?
  • Negative max_lines / line_offset would currently flow through and produce surprising slices/metadata (e.g., negative counts or “past the end” indexes). Would you be open to validating max_lines >= 0 and line_offset >= 0 up front with a PyAirbyteInputError so bad inputs fail loudly instead of behaving oddly, wdyt?

The mutual-exclusion check and the “default tail when both unset” behavior look good and match the docstring.

#!/bin/bash
# Inspect direct Python usages of get_cloud_sync_logs that might rely on old signature/defaults.
rg -n "get_cloud_sync_logs\(" -C3

915-937: Pagination logic is solid, but truncation signaling and extreme offsets might need tightening

The core slicing looks good—max_lines == 0 as “no limit”, honoring line_offset even when total_lines < max_lines, and tail vs head behavior all make sense. A couple of remaining nits:

  • When line_offset is greater than or equal to total_lines, start_index can point past the end (e.g., 101 for a 100‑line log) and we return log_text_line_count == 0. Would clamping start_index to total_lines (or otherwise making log_text_start_line never exceed total_log_lines_available + 1) make the metadata easier to reason about, wdyt?
  • The PR description mentions appending a [Showing X of Y total lines] note when truncation occurs, but the current implementation just drops lines silently. Would you be open to appending a final summary line when len(selected_lines) < (total_lines - start_index) (or len(selected_lines) < total_lines for tail mode) so LLMs can reliably see that truncation happened, wdyt?
  • Given the combinatorics here (tail/head, max_lines, line_offset, max_lines=0), would you consider adding a small unit-test matrix around this block (especially for: default call, small logs, large logs with truncation, line_offset near/over total_lines, and max_lines=0), wdyt? I’d be happy to help sketch those out.
🧹 Nitpick comments (1)
airbyte/mcp/cloud_ops.py (1)

901-913: Align empty-log metadata with the returned message?

In the “no logs” branch, log_text contains a human-readable message but log_text_line_count and total_log_lines_available are both 0. That’s defensible if we treat the message as metadata rather than “real” log lines, but it does slightly contradict the field docstring (“Count of lines we are returning”).

Would it be clearer either to (a) return an empty log_text and let consumers key off the 0 counts, or (b) keep the message but set log_text_line_count (and maybe total_log_lines_available) to match the actual line count of log_text, wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4091d23 and f59b054.

📒 Files selected for processing (1)
  • airbyte/mcp/cloud_ops.py (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-11T19:53:44.427Z
Learnt from: yohannj
Repo: airbytehq/PyAirbyte PR: 716
File: airbyte/logs.py:384-402
Timestamp: 2025-07-11T19:53:44.427Z
Learning: In the PyAirbyte project, when reviewing PRs, maintain clear separation of concerns. Don't suggest changes that are outside the scope of the PR's main objective, even if they would improve consistency or fix other issues. This helps with reviewing changes and potential reverts.

Applied to files:

  • airbyte/mcp/cloud_ops.py
🧬 Code graph analysis (1)
airbyte/mcp/cloud_ops.py (2)
airbyte/exceptions.py (2)
  • PyAirbyteInputError (201-210)
  • AirbyteMissingResourceError (505-509)
airbyte/cloud/sync_results.py (2)
  • get_attempts (338-352)
  • get_full_log_text (182-214)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Pytest (All, Python 3.10, Windows)
  • GitHub Check: Pytest (All, Python 3.11, Windows)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (No Creds)
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (2)
airbyte/mcp/cloud_ops.py (2)

163-178: LogReadResult shape looks appropriate for paginated log responses

The fields and docstrings line up well with how you consume them later (job/attempt IDs, 1‑based start, counts, total). I don’t see issues with this model as-is, so I’d keep it in this form unless downstream use-cases surface new needs, wdyt?


871-895: No breaking change concern—exceptions are already the module pattern and distinguish different error scenarios

The three exception cases (missing job, missing attempts, missing specific attempt) are properly distinguished from the "no logs" case at line 905, which still returns a LogReadResult with a message. Using AirbyteMissingResourceError for missing resources is already established in this module (see lines 1100, 1114). No callers were found that depend on string parsing from these specific resource-not-found scenarios.

Likely an incorrect or invalid review comment.

@aaronsteers Aaron ("AJ") Steers (aaronsteers) merged commit cf60864 into main Dec 3, 2025
22 checks passed
@aaronsteers Aaron ("AJ") Steers (aaronsteers) deleted the devin/1764717892-add-log-pagination-params branch December 3, 2025 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants