Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Source S3: Add HTTPS validation for S3 endpoint #32109

Merged
merged 12 commits into from Nov 10, 2023

Conversation

tolik0
Copy link
Contributor

@tolik0 tolik0 commented Nov 2, 2023

What

Enhanced security by ensuring S3 endpoints are HTTPS.

How

Implemented a check in the Config class that raises a ValidationError for endpoints starting with "http://".

Copy link

vercel bot commented Nov 2, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Nov 9, 2023 10:29pm

Copy link
Contributor

github-actions bot commented Nov 2, 2023

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Nov 2, 2023
@tolik0 tolik0 requested a review from brianjlai November 2, 2023 18:30
@tolik0 tolik0 marked this pull request as ready for review November 2, 2023 18:33
@octavia-squidington-iv octavia-squidington-iv requested a review from a team November 2, 2023 18:34
@brianjlai
Copy link
Contributor

The changes look good and thanks for adding some unit tests. It's probably not a super widely used feature, but I have a slight concern of the customers that are using this in an insecure way that we'll disrupt their syncs. I did a search through our config DB in cloud and no users are using http which is definitely good. Once we get a passing CI and tests will 👍

This could affect our OSS customers. but I think we can still move forward with this because this endpoint should only be used securely and for the case of a locally running Airbyte instance, if its in the network they could probably hit the S3 address directly too.

Copy link
Collaborator

@davydov-d davydov-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add pattern and examples to spec, and extend the description

@octavia-squidington-iv octavia-squidington-iv requested a review from a team November 2, 2023 18:50
@tolik0 tolik0 requested a review from davydov-d November 2, 2023 22:04
@tolik0 tolik0 force-pushed the tolik0/source-s3/http-endpoint-validation-error branch from e2b3123 to a7004b3 Compare November 3, 2023 11:54
Comment on lines 319 to 322
"description": "Endpoint to an S3 compatible service. Leave empty to use AWS. The custom endpoint must be secure, but the 'https' prefix is not required.",
"default": "",
"examples": ["my-s3-endpoint.com", "https://my-s3-endpoint.com"],
"pattern": "^(?!http://).*$",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get why we allow http endpoint but raise a validation error if endpoint.startswith("http://")
I think it's fine to allow them on OSS but not on Cloud.
To enforce https on Cloud I suggest to use the is_cloud_environment function of the CDK.
You can check a PR for a similar use case here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alafanechere The regex pattern is designed to reject http:// prefixes. Using is_cloud_environment won't affect the check_connector_https_url_only test since the pattern and check are inside .py file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure. I meant that we could allow http on OSS and forbid on cloud with the is_cloud_environment, which would make the pattern not required here.
As you allow my-s3-endpoint.com (without prefix) I'm wondering what enforces the use of HTTPS. I can't spot a logic in your PR that validates that my-s3-endpoint.com is secured.

"default": "",
"examples": ["my-s3-endpoint.com", "https://my-s3-endpoint.com"],
"pattern": "^(?!http://).*$",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case if you'd still like to allow http:// string here a minor change to this module is required to make the QA check pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alafanechere, can I instead modify QA check to pass if there's a specific comment at the end of the line indicating that it should be ignored?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good idea :) something like #ignore-check-connector-https-url-only

Comment on lines 319 to 322
"description": "Endpoint to an S3 compatible service. Leave empty to use AWS. The custom endpoint must be secure, but the 'https' prefix is not required.",
"default": "",
"examples": ["my-s3-endpoint.com", "https://my-s3-endpoint.com"],
"pattern": "^(?!http://).*$",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure. I meant that we could allow http on OSS and forbid on cloud with the is_cloud_environment, which would make the pattern not required here.
As you allow my-s3-endpoint.com (without prefix) I'm wondering what enforces the use of HTTPS. I can't spot a logic in your PR that validates that my-s3-endpoint.com is secured.

Comment on lines 60 to 63
endpoint = values.get("endpoint")
if endpoint:
if endpoint.startswith("http://"):
raise ValidationError("The endpoint must be a secure HTTPS endpoint.", model=Config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you allow endpoint without https:// prefix I don't think this is enough to validate that the endpoint is secured. I would suggst making a HEAD request to the endpoint with and https:// prefix and assert you get a 200 + a response with https:// prefix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We ensure secure connections by configuring the boto3 client to use SSL and verify certificates, as shown in the code here: boto3 SSL configuration. This aligns with the boto3 documentation on enforcing SSL usage: boto3 documentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, and this client is used during check right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the client is utilized during the check to attempt reading a line from each stream. You can see the implementation of the check method here: FileBasedSource.check and the associated availability strategy here: DefaultFileBasedAvailabilityStrategy.

@alafanechere alafanechere requested a review from a team November 7, 2023 10:25
Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification @tolik0 . I added a rule in the QA checks to ignore the S3 config file. Feel free to merge once the tests pass.

@@ -10,7 +10,7 @@ data:
connectorSubtype: file
connectorType: source
definitionId: 69589781-7828-43c5-9f63-8925b1c1ccc2
dockerImageTag: 4.1.4
dockerImageTag: 4.1.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dockerImageTag: 4.1.5
dockerImageTag: 4.2.0

Shouldn't this be a minor bump as its a new "feature"?

@alafanechere
Copy link
Contributor

alafanechere commented Nov 7, 2023

I reverted my QA checks changes. Feel free to pull in #32242 when merged and use your nice new comment feature!

@tolik0 tolik0 merged commit 21cfb2a into master Nov 10, 2023
26 checks passed
@tolik0 tolik0 deleted the tolik0/source-s3/http-endpoint-validation-error branch November 10, 2023 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/s3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants