New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Source S3: Add HTTPS validation for S3 endpoint #32109
✨ Source S3: Add HTTPS validation for S3 endpoint #32109
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
Before Merging a Connector Pull RequestWow! What a great pull request you have here! 🎉 To merge this PR, ensure the following has been done/considered for each connector added or updated:
If the checklist is complete, but the CI check is failing,
|
The changes look good and thanks for adding some unit tests. It's probably not a super widely used feature, but I have a slight concern of the customers that are using this in an insecure way that we'll disrupt their syncs. I did a search through our config DB in cloud and no users are using This could affect our OSS customers. but I think we can still move forward with this because this endpoint should only be used securely and for the case of a locally running Airbyte instance, if its in the network they could probably hit the S3 address directly too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also add pattern and examples to spec, and extend the description
e2b3123
to
a7004b3
Compare
"description": "Endpoint to an S3 compatible service. Leave empty to use AWS. The custom endpoint must be secure, but the 'https' prefix is not required.", | ||
"default": "", | ||
"examples": ["my-s3-endpoint.com", "https://my-s3-endpoint.com"], | ||
"pattern": "^(?!http://).*$", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really get why we allow http
endpoint but raise a validation error if endpoint.startswith("http://")
I think it's fine to allow them on OSS but not on Cloud.
To enforce https on Cloud I suggest to use the is_cloud_environment
function of the CDK.
You can check a PR for a similar use case here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alafanechere The regex pattern is designed to reject http://
prefixes. Using is_cloud_environment
won't affect the check_connector_https_url_only
test since the pattern and check are inside .py
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sure. I meant that we could allow http
on OSS and forbid on cloud with the is_cloud_environment
, which would make the pattern not required here.
As you allow my-s3-endpoint.com
(without prefix) I'm wondering what enforces the use of HTTPS. I can't spot a logic in your PR that validates that my-s3-endpoint.com
is secured.
"default": "", | ||
"examples": ["my-s3-endpoint.com", "https://my-s3-endpoint.com"], | ||
"pattern": "^(?!http://).*$", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case if you'd still like to allow http://
string here a minor change to this module is required to make the QA check pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alafanechere, can I instead modify QA check to pass if there's a specific comment at the end of the line indicating that it should be ignored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes good idea :) something like #ignore-check-connector-https-url-only
"description": "Endpoint to an S3 compatible service. Leave empty to use AWS. The custom endpoint must be secure, but the 'https' prefix is not required.", | ||
"default": "", | ||
"examples": ["my-s3-endpoint.com", "https://my-s3-endpoint.com"], | ||
"pattern": "^(?!http://).*$", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sure. I meant that we could allow http
on OSS and forbid on cloud with the is_cloud_environment
, which would make the pattern not required here.
As you allow my-s3-endpoint.com
(without prefix) I'm wondering what enforces the use of HTTPS. I can't spot a logic in your PR that validates that my-s3-endpoint.com
is secured.
endpoint = values.get("endpoint") | ||
if endpoint: | ||
if endpoint.startswith("http://"): | ||
raise ValidationError("The endpoint must be a secure HTTPS endpoint.", model=Config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you allow endpoint without https://
prefix I don't think this is enough to validate that the endpoint is secured. I would suggst making a HEAD
request to the endpoint with and https://
prefix and assert you get a 200 + a response with https://
prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We ensure secure connections by configuring the boto3 client to use SSL and verify certificates, as shown in the code here: boto3 SSL configuration. This aligns with the boto3 documentation on enforcing SSL usage: boto3 documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, and this client is used during check
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, the client is utilized during the check to attempt reading a line from each stream. You can see the implementation of the check method here: FileBasedSource.check and the associated availability strategy here: DefaultFileBasedAvailabilityStrategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification @tolik0 . I added a rule in the QA checks to ignore the S3 config file. Feel free to merge once the tests pass.
@@ -10,7 +10,7 @@ data: | |||
connectorSubtype: file | |||
connectorType: source | |||
definitionId: 69589781-7828-43c5-9f63-8925b1c1ccc2 | |||
dockerImageTag: 4.1.4 | |||
dockerImageTag: 4.1.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dockerImageTag: 4.1.5 | |
dockerImageTag: 4.2.0 |
Shouldn't this be a minor bump as its a new "feature"?
This reverts commit 0d52e98.
I reverted my QA checks changes. Feel free to pull in #32242 when merged and use your nice new comment feature! |
What
Enhanced security by ensuring S3 endpoints are HTTPS.
How
Implemented a check in the Config class that raises a ValidationError for endpoints starting with
"http://"
.