Skip to content

Conversation

bosper351
Copy link
Contributor

@bosper351 bosper351 commented Sep 9, 2025

Current implementation of HttpClient uses requests library to create a PreparedRequest and send it, but do not take into consideration the environment variables that requests do. This make it impossible to use custom CA certificates together with Airbyte. This PR adds the support of env settings to HttpClient.

The implementation is doing what requests library recommends when working with PreparedRequests to properly handle self-signed certificates: https://requests.readthedocs.io/en/latest/user/advanced/#prepared-requests

Summary by CodeRabbit

  • Bug Fixes

    • HTTP requests now honor environment-based session settings (e.g., proxies, custom CA bundles, SSL verification, client certificates), improving compatibility with enterprise networks and custom trust stores.
  • Tests

    • Added a unit test confirming that a custom CA bundle specified via environment variable is correctly applied during requests.

Current implementation of `HttpClient` uses requests library to create a `PreparedRequest` and send it, but do not take into consideration the environment variables that `requests` do. This make it impossible to use custom CA certificates together with Airbyte. This PR adds the support of env settings to `HttpClient`.

The implementation is doing what `requests` library recommends when working with `PreparedRequests` to properly handle self-signed certificates: https://requests.readthedocs.io/en/latest/user/advanced/#prepared-requests
Copy link
Contributor

coderabbitai bot commented Sep 9, 2025

📝 Walkthrough

Walkthrough

HttpClient.send_request now merges requests.Session environment settings into per-request kwargs before retry handling. A unit test asserts that REQUESTS_CA_BUNDLE from the environment is propagated as the verify parameter to the internal _send_with_retry call.

Changes

Cohort / File(s) Summary
HTTP client env settings merge
airbyte_cdk/sources/streams/http/http_client.py
In send_request, calls Session.merge_environment_settings(...) with the request URL and merges the returned env_settings into request_kwargs before invoking retry logic. No public signatures changed.
Unit tests
unit_tests/sources/streams/http/test_http_client.py
Adds a test ensuring REQUESTS_CA_BUNDLE is respected by verifying that verify from env is passed to _send_with_retry.

Sequence Diagram(s)

sequenceDiagram
  actor Caller
  participant HC as HttpClient
  participant RS as requests.Session
  participant Retry as _send_with_retry

  Caller->>HC: send_request(method, url, request_kwargs)
  HC->>RS: prepare_request(Request(...))
  RS-->>HC: PreparedRequest
  HC->>RS: merge_environment_settings(url, None, None, None, None)
  RS-->>HC: env_settings {proxies, verify, cert, ...}
  Note over HC: Merge env_settings into request_kwargs
  HC->>Retry: _send_with_retry(prepared_req, request_kwargs+env)
  Retry-->>HC: Response
  HC-->>Caller: Response
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Would you like to also add a test for proxies or client certificates from merge_environment_settings to cover more environment-driven cases, wdyt?

Pre-merge checks (2 passed, 1 warning)

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title clearly summarizes the primary change by indicating that the pull request adds support for respecting the REQUESTS_CA_BUNDLE environment variable, and it follows a concise conventional commit style without extraneous details.
Description Check ✅ Passed The description directly addresses the current limitation of HttpClient not honoring requests’ environment variables, explains the motivation for enabling custom CA certificates with Airbyte, and references the relevant documentation, making it clearly on-topic and informative.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
unit_tests/sources/streams/http/test_http_client.py (2)

4-4: Remove unused import to keep linters happy.

Looks like os isn’t used elsewhere in this file (the patch.dict decorator resolves the module). Shall we drop it, wdyt?

-import os

749-767: Add override-precedence test and fix formatting with Ruff.

Could we add a sibling test confirming that an explicit request_kwargs["verify"] overrides REQUESTS_CA_BUNDLE? For example:

@patch.dict("os.environ", {"REQUESTS_CA_BUNDLE": "/env/ca-bundle.crt"})
def test_send_request_explicit_verify_overrides_env():
    http_client = HttpClient(name="test", logger=MagicMock())
    with patch.object(http_client, "_send_with_retry") as mock_send:
        http_client.send_request(
            http_method="GET",
            url="https://api.example.com",
            request_kwargs={"timeout": 10, "verify": False},
        )
        passed_kwargs = mock_send.call_args[1]["request_kwargs"]
        assert passed_kwargs["verify"] is False

The CI failure is due to Ruff formatting—running ruff --fix . should resolve it, wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae0e8aa and c2d0351.

📒 Files selected for processing (2)
  • airbyte_cdk/sources/streams/http/http_client.py (1 hunks)
  • unit_tests/sources/streams/http/test_http_client.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
airbyte_cdk/sources/streams/http/http_client.py (1)
airbyte_cdk/sources/streams/http/http.py (1)
  • request_kwargs (242-253)
unit_tests/sources/streams/http/test_http_client.py (1)
airbyte_cdk/sources/streams/http/http_client.py (2)
  • name (518-519)
  • send_request (521-558)
🪛 GitHub Actions: Linters
unit_tests/sources/streams/http/test_http_client.py

[error] 746-746: Ruff format --diff detected formatting changes. 1 file would be reformatted (unit_tests/sources/streams/http/test_http_client.py). Exit code 1. Command: 'poetry run ruff format --diff .'

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (1)
airbyte_cdk/sources/streams/http/http_client.py (1)

548-550: In http_client.py, let per-request kwargs override environment settings, wdyt?

-        env_settings = self._session.merge_environment_settings(request.url, None, None, None, None)
-        request_kwargs = {**request_kwargs, **env_settings}
+        env_settings = self._session.merge_environment_settings(
+            request.url,
+            request_kwargs.get("proxies"),
+            request_kwargs.get("stream"),
+            request_kwargs.get("verify"),
+            request_kwargs.get("cert"),
+        )
+        # Environment defaults first; explicit per-request kwargs take precedence.
+        request_kwargs = {**env_settings, **request_kwargs}

Could you confirm no existing call sites depend on the current merge order for keys like verify or proxies?

@bosper351 bosper351 marked this pull request as draft September 9, 2025 13:59
@bosper351
Copy link
Contributor Author

/autofix

@aaronsteers
Copy link
Contributor

aaronsteers commented Sep 17, 2025

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

@aaronsteers
Copy link
Contributor

aaronsteers commented Sep 17, 2025

/test

PR test job started... Check job output.

❌ Tests failed.

data=data,
)

env_settings = self._session.merge_environment_settings(request.url, None, None, None, None)
Copy link
Contributor

@aaronsteers aaronsteers Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronsteers
Copy link
Contributor

@bosper351 - Thank you for this contribution! 🙏

Can you check on these two pytest failures?

FAILED unit_tests/sources/declarative/decoders/test_decoders_memory_usage.py::test_jsonl_decoder_memory_usage[type: JsonlDecoder] - assert 0 == (2000000 * 4)
FAILED unit_tests/sources/streams/http/test_http.py::test_request_kwargs_used - AssertionError: send(<ANY>, cert=None, proxies='google.com') call not found

@bosper351 bosper351 marked this pull request as ready for review September 18, 2025 14:49
@bosper351
Copy link
Contributor Author

/autofix


list(stream.read_records(sync_mode=SyncMode.full_refresh))

stream._http_client._session.send.assert_any_call(ANY, **request_kwargs)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proxies dict is converted to OrderedDict somewhere under the hood. This plain assertion fails due to type mismatch between a dict and OrderedDict. To avoid this mismatch, each key of proxies is compared separately.

@bosper351
Copy link
Contributor Author

@aaronsteers it's resolved. Please check out when you have a moment.

@aaronsteers
Copy link
Contributor

aaronsteers commented Sep 20, 2025

/test

PR test job started... Check job output.

✅ Tests passed.

@aaronsteers aaronsteers merged commit c67c556 into airbytehq:main Sep 20, 2025
17 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants