Skip to content

Conversation

aaronsteers
Copy link
Contributor

@aaronsteers aaronsteers commented Sep 26, 2025

Summary by CodeRabbit

  • Bug Fixes

    • Improved CI secret masking to recursively find secrets in nested configurations (dicts/lists) and mask multi-line values line-by-line, preventing accidental exposure while preserving non-secret output and CI gating behavior.
  • Tests

    • Added comprehensive unit tests validating masking across flat, nested, list-based, multi-line and edge-case configurations.

@Copilot Copilot AI review requested due to automatic review settings September 26, 2025 04:54
@github-actions github-actions bot added bug Something isn't working security labels Sep 26, 2025
Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@aj/ci/fix/handle-multi-line-secrets-in-secrets-masking#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch aj/ci/fix/handle-multi-line-secrets-in-secrets-masking

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes secret masking functionality to properly handle multi-line secrets like private keys in GitHub CI environments. The change refactors the masking logic to process secrets line-by-line instead of as single strings.

  • Extracted secret value masking logic into a dedicated helper function
  • Added line-by-line processing for multi-line secrets using splitlines()
  • Fixed control flow logic by changing nested if to elif for better structure

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

coderabbitai bot commented Sep 26, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds two helpers to recursively locate and mask secret values in configs: _print_ci_secret_mask_for_string(secret: str) masks multi-line strings line-by-line; _print_ci_secret_mask_for_value(value: Any) traverses dicts/lists and delegates to the string masker. _print_ci_secrets_masks_for_config now uses the recursive helper for secrets.

Changes

Cohort / File(s) Summary
Secret masking helpers & config masking update
airbyte_cdk/cli/airbyte_cdk/_secrets.py
Added _print_ci_secret_mask_for_string(secret: str) and _print_ci_secret_mask_for_value(value: Any); updated _print_ci_secrets_masks_for_config to recurse into lists/dicts and delegate masking to the new helpers, emitting ::add-mask:: per line for scalar strings and preserving existing CI gating and error handling.
Unit tests for secret masking
unit_tests/cli/airbyte_cdk/test_secret_masks.py
Added parameterized tests covering flat/nested dicts, lists, multi-line secrets, deep nesting, and edge cases; tests mock the internal mask-printing function and assert expected masked-string invocations (order-insensitive).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor CI as CI Runner
  participant CFG as _print_ci_secrets_masks_for_config
  participant VAL as _print_ci_secret_mask_for_value
  participant STR as _print_ci_secret_mask_for_string
  participant GHA as GitHub Actions Mask

  CI->>CFG: provide config, spec, connector_root
  CFG->>CFG: iterate properties/keys
  alt property marked as secret
    CFG->>VAL: mask(value)
    alt value is dict
      VAL->>VAL: recurse over key/value pairs
    else value is list
      VAL->>VAL: recurse over items
    else scalar/string
      VAL->>STR: mask(string)
      STR->>STR: split into lines
      loop per line
        STR->>GHA: ::add-mask::<line>
      end
    end
  else non-secret or no value
    CFG->>CFG: skip masking
  end
  CFG-->>CI: done
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Would you like a short note in docs or tests highlighting the multi-line masking behavior, wdyt?

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly and concisely describes the core change of this pull request, which is fixing the handling of multi-line secrets in the secret masking logic, matching the implemented code updates and new helper functions.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch aj/ci/fix/handle-multi-line-secrets-in-secrets-masking

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
airbyte_cdk/cli/airbyte_cdk/_secrets.py (1)

467-479: Mask whole serialized containers too?

I noticed we only traverse nested members and emit masks for the leaf values, so composite secrets (dict/list) never have their serialized form masked even though the docstring says we handle the JSON stringified version. If an integrator logs the whole container in one go (e.g. json.dumps(secret_value)), punctuation and even delimiter-only lines would remain unmasked unless every leaf value happens to be split exactly the same way. Mind adding an extra mask pass for the serialized container (after the child recursion) so we cover that path as well, wdyt?

     if isinstance(value, dict):
         # For nested dicts, we also need to mask the json-stringified version
         for v in value.values():
             _print_ci_secret_mask_for_value(v)
 
+        json_mask = json.dumps(value, separators=(",", ":"))
+        for line in json_mask.splitlines() or [json_mask]:
+            print(f"::add-mask::{line}")
         return
 
     if isinstance(value, list):
         # For lists, we also need to mask the json-stringified version
         for list_item in value:
             _print_ci_secret_mask_for_value(list_item)
 
+        json_mask = json.dumps(value, separators=(",", ":"))
+        for line in json_mask.splitlines() or [json_mask]:
+            print(f"::add-mask::{line}")
         return
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b10ee9 and 7e93206.

📒 Files selected for processing (1)
  • airbyte_cdk/cli/airbyte_cdk/_secrets.py (1 hunks)
🧰 Additional context used
🪛 GitHub Check: CodeQL
airbyte_cdk/cli/airbyte_cdk/_secrets.py

[failure] 486-486: Clear-text logging of sensitive information
This expression logs sensitive data (secret) as clear text.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)

Copy link

github-actions bot commented Sep 26, 2025

PyTest Results (Fast)

3 771 tests  +9   3 759 ✅ +9   6m 21s ⏱️ +5s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 5a7ce56. ± Comparison against base commit 5b10ee9.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Sep 26, 2025

PyTest Results (Full)

3 774 tests  +9   3 762 ✅ +9   10m 57s ⏱️ -3s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 5a7ce56. ± Comparison against base commit 5b10ee9.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7e93206 and f36b41a.

📒 Files selected for processing (2)
  • airbyte_cdk/cli/airbyte_cdk/_secrets.py (1 hunks)
  • unit_tests/cli/airbyte_cdk/test_secret_masks.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
unit_tests/cli/airbyte_cdk/test_secret_masks.py (1)
airbyte_cdk/cli/airbyte_cdk/_secrets.py (1)
  • _print_ci_secrets_masks_for_config (497-513)
🪛 GitHub Actions: Linters
unit_tests/cli/airbyte_cdk/test_secret_masks.py

[error] 1-6: Ruff format failed due to formatting changes. 1 file would be reformatted; an extra blank line was added after the module docstring. Run 'ruff format' to fix the style issues.

🪛 GitHub Check: CodeQL
airbyte_cdk/cli/airbyte_cdk/_secrets.py

[failure] 469-469: Clear-text logging of sensitive information
This expression logs sensitive data (secret) as clear text.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-shopify
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: SDM Docker Image Build
🔇 Additional comments (1)
airbyte_cdk/cli/airbyte_cdk/_secrets.py (1)

477-488: Restore JSON-level masking on containers

By skipping the json.dumps branch we no longer register a mask for the serialized dict/list itself. That regresses prior behavior and leaks booleans or nulls when the config is emitted via json.dumps because we now only register "True"/"False" while the log prints true/false. Could we restore the container-level mask before recursing to keep the old coverage, wdyt?

Apply this diff to fix the regression:

     if isinstance(value, dict):
-        # For nested dicts, we call recursively on each value
-        for v in value.values():
+        _print_ci_secret_mask_for_string(json.dumps(value))
+        for v in value.values():
             _print_ci_secret_mask_for_value(v)
 
         return
 
     if isinstance(value, list):
-        # For lists, we call recursively on each list item
-        for list_item in value:
+        _print_ci_secret_mask_for_string(json.dumps(value))
+        for list_item in value:
             _print_ci_secret_mask_for_value(list_item)
 
         return

@aaronsteers aaronsteers requested a review from jpefaur September 26, 2025 05:31
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
airbyte_cdk/cli/airbyte_cdk/_secrets.py (1)

462-471: Multi-line masking is correct; add a defensive CI guard to avoid accidental leaks and quiet CodeQL?

This helper works as intended. Given the CodeQL “clear-text logging” alert on Line 470 and that this function could be called out of band in the future, would you add a CI check here as defense-in-depth (we already gate upstream) and to document intent, wdyt?

 def _print_ci_secret_mask_for_string(secret: str) -> None:
     """Print GitHub CI mask for a single secret string.

     We expect single-line secrets, but we also handle the case where the secret contains newlines.
     For multi-line secrets, we must print a secret mask for each line separately.
     """
+    # Defense-in-depth: only emit mask commands in CI environments
+    if not os.environ.get("CI"):
+        logger.debug("Skipping ::add-mask:: outside CI context")
+        return
     for line in secret.splitlines():
         if line.strip():  # Skip empty lines
             print(f"::add-mask::{line!s}")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f36b41a and acee4a8.

📒 Files selected for processing (1)
  • airbyte_cdk/cli/airbyte_cdk/_secrets.py (1 hunks)
🧰 Additional context used
🪛 GitHub Check: CodeQL
airbyte_cdk/cli/airbyte_cdk/_secrets.py

[failure] 470-470: Clear-text logging of sensitive information
This expression logs sensitive data (secret) as clear text.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Check: source-shopify
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
🔇 Additional comments (2)
airbyte_cdk/cli/airbyte_cdk/_secrets.py (2)

473-497: Remove double splitting and (optionally) also mask container JSON for dict/list values?

  • Minor: You’re splitting lines here and again inside _print_ci_secret_mask_for_string. You could delegate once to reduce duplication.
  • Optional/compat: Previous behavior masked json.dumps(value) for dicts/lists. If logs might include the entire container serialized, adding that back preserves coverage. It won’t hurt and is cheap. Wdyt?
 def _print_ci_secret_mask_for_value(value: Any) -> None:
     """Print GitHub CI mask for a single secret value.

     Call this function for any values identified as secrets, regardless of type.
     """
     if isinstance(value, dict):
-        # For nested dicts, we call recursively on each value
+        # Also mask the JSON-stringified container in case logs print the whole object
+        print(f"::add-mask::{json.dumps(value)}")
+        # Then recurse into values
         for v in value.values():
             _print_ci_secret_mask_for_value(v)

         return

     if isinstance(value, list):
-        # For lists, we call recursively on each list item
+        # Also mask the JSON-stringified container
+        print(f"::add-mask::{json.dumps(value)}")
+        # Then recurse into items
         for list_item in value:
             _print_ci_secret_mask_for_value(list_item)

         return

-    # For any other types besides dict and list, we convert to string and mask each line
-    # separately to handle multi-line secrets (e.g. private keys).
-    for line in str(value).splitlines():
-        if line.strip():  # Skip empty lines
-            _print_ci_secret_mask_for_string(line)
+    # For scalars, delegate to the string masker (handles multi-line)
+    _print_ci_secret_mask_for_string(str(value))

508-516: LGTM on recursive traversal and delegation.

The switch to delegating to _print_ci_secret_mask_for_value for matched keys and recursing into nested structures looks correct and aligns with the PR goal to cover multi-line and nested cases.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
unit_tests/cli/airbyte_cdk/test_secret_masks.py (1)

34-35: Broaden the config type hint?

Since a few parametrized cases feed lists into config, the current dict annotation is lying to type checkers. Could we widen it to something like Mapping[str, Any] | Sequence[Any] (or even Any) so the signature matches the data we pass, wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between acee4a8 and 5a7ce56.

📒 Files selected for processing (1)
  • unit_tests/cli/airbyte_cdk/test_secret_masks.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
unit_tests/cli/airbyte_cdk/test_secret_masks.py (1)
airbyte_cdk/cli/airbyte_cdk/_secrets.py (1)
  • _print_ci_secrets_masks_for_config (499-515)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Check: source-shopify
🔇 Additional comments (1)
unit_tests/cli/airbyte_cdk/test_secret_masks.py (1)

2-6: Drop the blank line after the module docstring

Ruff is still failing because it wants the first import immediately after the docstring (same thing the bot flagged earlier). Could we pull the import up and rerun ruff format so CI goes green, wdyt?

Copy link
Contributor

@dbgold17 dbgold17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed online, approving!

@aaronsteers aaronsteers merged commit 90fdec5 into main Sep 26, 2025
28 of 29 checks passed
@aaronsteers aaronsteers deleted the aj/ci/fix/handle-multi-line-secrets-in-secrets-masking branch September 26, 2025 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working security

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants