feat: Add preflight check to destination smoke test tool#1007
Conversation
- Run basic_types scenario before requested scenarios to validate credentials and connectivity - Return early with structured error if preflight fails - Add skip_preflight parameter to core function, MCP tool, and CLI - Improve error surfacing by walking exception chain to find the actual connector error message instead of generic wrapper - Add preflight_passed field to DestinationSmokeTestResult model Closes #1006 Co-Authored-By: AJ Steers <aj@airbyte.io>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This PyAirbyte VersionYou can test this version of PyAirbyte using the following: # Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1774553037-preflight-smoke-test' pyairbyte --help
# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1774553037-preflight-smoke-test'PR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
📚 Show Repo GuidanceHelpful ResourcesCommunity SupportQuestions? Join the #pyairbyte channel in our Slack workspace. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds an optional preflight (basic_types) write to destination smoke tests (skippable), surfaces connector internal errors from logs, returns Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Would you like a brief changelog entry or an example CLI usage line for 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
airbyte/_util/destination_smoke_tests.py (1)
269-283: Consider handling non-PyAirbyte exceptions in the chain?The chain-walking logic is a nice improvement! One thing I noticed: if the deepest exception is a standard Python exception (like
ValueErrororConnectionError) withoutget_message(), we skip it and fall back to the wrapper's message. This could lose the actual root cause message.For example, with a chain like
AirbyteConnectorWriteError(original_exception=ConnectionError("Connection refused")), we'd still get the generic wrapper message instead of "Connection refused".What do you think about adding a fallback for non-PyAirbyte exceptions, something like this?
💡 Suggested enhancement
# If we found a deeper exception with a specific message, use it - if deepest is not ex and hasattr(deepest, "get_message"): + if deepest is not ex: + if hasattr(deepest, "get_message"): deep_msg = deepest.get_message() # Only use the deep message if it's more specific than the wrapper if deep_msg and hasattr(ex, "get_message") and deep_msg != ex.get_message(): return f"{type(ex).__name__}: {deep_msg}" + else: + # For standard exceptions, use str() to get the message + deep_msg = str(deepest) + if deep_msg and hasattr(ex, "get_message") and deep_msg != ex.get_message(): + return f"{type(ex).__name__}: {deep_msg}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@airbyte/_util/destination_smoke_tests.py` around lines 269 - 283, The chain-walking currently prefers deepest exceptions only if they implement get_message(); update the logic in destination_smoke_tests.py around variables ex and deepest so that if deepest lacks get_message() but str(deepest) yields a non-empty, more specific message than ex.get_message(), you return f"{type(ex).__name__}: {str(deepest)}" (i.e., treat standard Python exceptions like ConnectionError/ValueError as valid root-cause messages); retain the existing fallback to ex.get_message() and final fallback to str(ex) if no get_message() exists.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@airbyte/_util/destination_smoke_tests.py`:
- Around line 269-283: The chain-walking currently prefers deepest exceptions
only if they implement get_message(); update the logic in
destination_smoke_tests.py around variables ex and deepest so that if deepest
lacks get_message() but str(deepest) yields a non-empty, more specific message
than ex.get_message(), you return f"{type(ex).__name__}: {str(deepest)}" (i.e.,
treat standard Python exceptions like ConnectionError/ValueError as valid
root-cause messages); retain the existing fallback to ex.get_message() and final
fallback to str(ex) if no get_message() exists.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 9307797b-c09e-41ef-b161-6182b4909d09
📒 Files selected for processing (3)
airbyte/_util/destination_smoke_tests.pyairbyte/cli/pyab.pyairbyte/mcp/local.py
…walking - Add 'warnings' field to DestinationSmokeTestResult to surface readback failures (e.g. cache connection issues) that cause table_statistics to be null - Improve _sanitize_error to handle standard Python exceptions (ConnectionError, ValueError, etc.) in the exception chain, not just PyAirbyte exceptions with get_message() Co-Authored-By: AJ Steers <aj@airbyte.io>
_run_preflight() now returns (bool, str | None) so the caller includes the sanitized connector error in the structured result instead of a generic 'verify credentials' message. Co-Authored-By: AJ Steers <aj@airbyte.io>
…ter error surfacing Co-Authored-By: AJ Steers <aj@airbyte.io>
…main run Co-Authored-By: AJ Steers <aj@airbyte.io>
The top-level import of PREDEFINED_SCENARIOS from airbyte.cli.smoke_test_source._scenarios caused a circular import that broke test collection. Replace with inline scenario data (mirrors the basic_types schema) to avoid the cycle. Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Pull request overview
Adds an automatic preflight connectivity/credentials check to the destination smoke test workflow and improves how failures and non-fatal readback issues are surfaced to callers (CLI + MCP).
Changes:
- Introduces
skip_preflightflag to CLI and MCP tool interfaces. - Adds preflight execution (writes a small
basic_types-derived scenario) and returns early with a structured error on failure. - Enhances error sanitization (log TRACE
internal_messageextraction, exception-chain selection) and surfaces readback failures aswarnings.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
airbyte/mcp/local.py |
Adds skip_preflight parameter to the MCP tool and forwards it into the shared smoke test runner. |
airbyte/cli/pyab.py |
Adds --skip-preflight CLI flag and forwards it into the shared smoke test runner. |
airbyte/_util/destination_smoke_tests.py |
Implements preflight logic, extends result model with preflight_passed/warnings, and improves error/warning surfacing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…tatic logger template Co-Authored-By: AJ Steers <aj@airbyte.io>
… file reading Co-Authored-By: AJ Steers <aj@airbyte.io>
…ions Co-Authored-By: AJ Steers <aj@airbyte.io>
Summary
Closes #1006.
Adds an automatic preflight check to the destination smoke test tool. Before running the user-requested scenarios, the tool now writes
basic_typesto the destination as a lightweight connectivity/credential validation. If the preflight fails, the tool returns early with a structured error instead of running all scenarios against a broken destination.Also significantly improves error surfacing throughout the smoke test:
_sanitize_error()now resolves errors in priority order: (1) connector log file TRACE ERRORinternal_message, (2)original_exceptionchain, (3) top-level exception. This means failures like bad credentials now showjava.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available...instead of the genericAirbyteConnectorWriteError: Connector failed.table_statistics: null) are now captured in a newwarningsfield instead of being silently logged.Changes:
DestinationSmokeTestResultgetspreflight_passed: bool | None(True/False/None=skipped) andwarnings: list[str] | None_run_preflight()runsbasic_typeswrite, returns(passed, sanitized_error)_extract_trace_error_from_log()parses the connector's log file for TRACE ERROR JSON and extracts theinternal_messagefield_sanitize_error()uses a 3-tier resolution: log file → exception chain → top-level messagerun_destination_smoke_test()acceptsskip_preflight: bool = Falsewarningswith sanitized messagesskip_preflightparameter_prepare_destination_config()moved before preflight so both preflight and main run use the prepared configUpdates since last revision
_extract_trace_error_from_log()to parse the connector's log file for TRACE ERRORinternal_message. Many Java-based connectors emit a genericmessage(e.g. "Something went wrong in the connector") while the actionable detail lives only ininternal_message. This is now tried first in_sanitize_error()._run_preflight()returnstuple[bool, str | None]so the sanitized connector error is included in the result instead of being replaced by a generic wrapper.warningsfield on the result model, making it clear whytable_statisticsis null (e.g.PyAirbyteSecretNotFoundError: Secret not found.for Snowflake).Local test results
Postgres (Docker container,
destination-postgres):skip_preflight=True,basic_types— success, fulltable_statisticswith column statsbasic_types—preflight_passed: true, fulltable_statisticspreflight_passed: false, error:java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 60001msSnowflake (GSM creds,
destination-snowflake):skip_preflight=True,basic_types— success, 3 records deliveredpreflight_passed: true, 3 records deliveredtable_statistics: nullwith warning:PyAirbyteSecretNotFoundError: Secret not found.— pre-existing bug in_dest_to_cache.pywhere Snowflakecredentialsdict lacks apasswordattribute (outside scope of this PR, but now surfaced viawarnings)Review & Testing Checklist for Human
_extract_trace_error_from_logreads entire log file into memory: For long-running syncs with verbose logging, the log file could be large. The function reads all lines then scans in reverse. Confirm this is acceptable for expected log sizes._sanitize_errorprefers log file over exception chain: The log fileinternal_messageis tried first. If the log contains a TRACE ERROR from a different failure than the one that raised the exception, the wrong message could be surfaced. Verify this ordering is correct for the common case._run_preflightcatches broadException: Intentional (preflight should never crash the main flow), but verify this aligns with the project's exception handling philosophy.basic_typesmay be written twice: If the user's scenarios includebasic_types(e.g.fast), the preflight writes it first, then it's written again. Confirm append semantics are acceptable.pyab destination-smoke-test --destination=destination-postgreswith good and bad credentials. Verify the error message in the bad-credentials case is actionable (not "Connector failed.").Notes
table_statistics: nullissue is a pre-existing bug insnowflake_destination_to_cache()whereDestinationSnowflake.credentialsis parsed as a dict without apasswordattribute. This PR surfaces the issue viawarningsbut does not fix the root cause.elapsed_secondsis0.0in the preflight-failure early return since the preflight is not timed separately.Link to Devin session: https://app.devin.ai/sessions/e2af697f2eb64507b4bbbced88c9af6c
Requested by: Aaron ("AJ") Steers (@aaronsteers)
Summary by CodeRabbit
Release Notes
New Features
--skip-preflightCLI option to optionally disable preflight checksBug Fixes
Important
Auto-merge enabled.
This PR is set to merge automatically when all requirements are met.