Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚨 Add SSL documentation and check logic for S3 Destination 🚨 #17340

Merged
merged 9 commits into from
Oct 3, 2022

Conversation

ryankfu
Copy link
Contributor

@ryankfu ryankfu commented Sep 28, 2022

What

Closes #16301

Adds documentation for customers to secure their S3 destination connection. This follows after concerns brought up in this slack thread which also references how other companies emphasis the need for customers to enforce only encrypted traffic for their S3 buckets/clusters

There is also a new "check" assertion that requires users to use "HTTPS only" custom endpoints as referred to by the original spec.json and here

How

Documentation for users to follow the shared responsibility model that AWS employs for securing a S3 destination endpoint and removes the ability for users to use a custom endpoint that is not always HTTPS only

Removal of the line within constants.ts will revert the hiding of the destination connector within the UI. S3 destination still exists, only that it was previously hidden within the UI and so users could not update their connections if desired

Recommended reading order

  1. S3BaseChecks.java
  2. S3DestinationTest.java
  3. s3.md
  4. constants.ts
  5. spec.json

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

This can potentially break users if they were using a custom endpoint that was not always HTTPS only

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the connector is published, connector added to connector index as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here
Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here
Connector Generator
  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed

Tests

Unit

Put your unit tests output here.

Integration

Put your integration tests output here.

Acceptance

Put your acceptance tests output here.

@ryankfu ryankfu requested review from Amruta-Ranade and a team September 28, 2022 17:55
@ryankfu ryankfu requested a review from a team as a code owner September 28, 2022 17:55
@github-actions github-actions bot added area/connectors Connector related issues area/documentation Improvements or additions to documentation area/platform issues related to the platform area/frontend Related to the Airbyte webapp labels Sep 28, 2022
@github-actions
Copy link
Contributor

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to run corresponding integration tests:

  • destination-redshift
  • destination-gcs
  • destination-s3
  • destination-snowflake
  • destination-jdbc
  • destination-r2
  • destination-databricks
  • destination-bigquery-denormalized
  • destination-bigquery

@ryankfu
Copy link
Contributor Author

ryankfu commented Sep 28, 2022

/test connector=connectors/s3-destination

🕑 connectors/s3-destination https://github.com/airbytehq/airbyte/actions/runs/3145769061
❌ connectors/s3-destination https://github.com/airbytehq/airbyte/actions/runs/3145769061
🐛

Build Failed

Test summary info:

Could not find result summary

* @param endpoint URL string representing an accessible S3 bucket
*/
public static void testCustomEndpointSecured(final String endpoint) {
if (!endpoint.contains("s3-accesspoint")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be overzealous and also not catch all problematic endpoints? It's possible to self-host an S3-compatible API, which is mostly what this config is used for. And that's considered secure, assuming the connection uses HTTPS. E.g. I might have https://mystorage.edgao.example.com.

But I could also (hypothetically) host my API at http://s3-accesspoint.edgao.example.com - which passes this check, but is not secured.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, I think we should only be enforcing this check in cloud. I.e. OSS users might want to use unsecured connections intentionally, and we should allow that to happen. (maybe they're hosting airbyte and S3 within their own network, and are OK with having unsecured traffic within their network)

also also, where is this method actually invoked? I'm guessing you have a local unpushed commit or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the check to validate if https:// is contained in the endpoint. Currently looking into using AdaptiveDestinationRunner on how to get this feature only within cloud and not within OSS users

Also added the method invocation within BaseS3Destination since it accidentally got cleared due to the overhaul of S3 heirarchy 🤦‍♂️

@github-actions
Copy link
Contributor

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to run corresponding integration tests:

  • destination-s3
  • destination-r2
  • destination-bigquery
  • destination-databricks
  • destination-gcs
  • destination-snowflake
  • destination-jdbc
  • destination-redshift
  • destination-bigquery-denormalized

@github-actions
Copy link
Contributor

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to run corresponding integration tests:

  • destination-jdbc
  • destination-r2
  • destination-databricks
  • destination-bigquery-denormalized
  • destination-gcs
  • destination-redshift
  • destination-snowflake
  • destination-s3
  • destination-bigquery

@ryankfu
Copy link
Contributor Author

ryankfu commented Sep 29, 2022

/test connector=connectors/destination-s3

🕑 connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/3150488922
❌ connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/3150488922
🐛 https://gradle.com/s/jwotji4kys77o

Build Failed

Test summary info:

Could not find result summary

@@ -47,6 +47,7 @@ public AirbyteConnectionStatus check(JsonNode config) {
S3BaseChecks.attemptS3WriteAndDelete(storageOperations, destinationConfig, destinationConfig.getBucketName());
S3BaseChecks.testSingleUpload(s3Client, destinationConfig.getBucketName(), destinationConfig.getBucketPath());
S3BaseChecks.testMultipartUpload(s3Client, destinationConfig.getBucketName(), destinationConfig.getBucketPath());
S3BaseChecks.testCustomEndpointSecured(destinationConfig.getEndpoint());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from what I understand, this will force all uses of S3 connector to use HTTPS even outside of Airbyte Cloud, which is not the intention of the issue. The intention is to force this only in Airbyte Cloud.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I haven't pushed up the code that uses the AdaptiveDestinationRunner which will only set the check with custom endpoint for Cloud. I'm currently in the process of getting a local version of MinIO S3 running to test the custom endpoint doesn't fail for OSS users

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you pushed the changes that use AdaptiveDestinationRunner, but this part is still here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good catch, I'll update this

@github-actions
Copy link
Contributor

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to run corresponding integration tests:

  • source-bigquery
  • destination-pubsub
  • destination-kafka
  • destination-snowflake
  • source-cockroachdb
  • destination-redshift
  • source-alloydb
  • destination-elasticsearch
  • destination-oracle-strict-encrypt
  • source-oracle
  • source-mysql
  • source-e2e-test-cloud
  • source-mongodb-v2
  • destination-mongodb
  • destination-bigquery
  • source-e2e-test
  • destination-mariadb-columnstore
  • destination-dynamodb
  • source-tidb
  • destination-mysql
  • destination-keen
  • source-db2-strict-encrypt
  • destination-local-json
  • destination-cassandra
  • destination-kinesis
  • destination-azure-blob-storage
  • source-kafka
  • destination-tidb
  • destination-rockset
  • source-mongodb-strict-encrypt
  • destination-mssql-strict-encrypt
  • source-mssql-strict-encrypt
  • destination-scylla
  • destination-clickhouse-strict-encrypt
  • destination-mongodb-strict-encrypt
  • source-clickhouse
  • destination-mqtt
  • destination-postgres
  • source-clickhouse-strict-encrypt
  • source-snowflake
  • destination-oracle
  • destination-e2e-test
  • destination-gcs
  • source-jdbc
  • destination-pulsar
  • destination-s3
  • source-redshift
  • destination-dev-null
  • destination-meilisearch
  • source-cockroachdb-strict-encrypt
  • source-mssql
  • destination-redis
  • source-mysql-strict-encrypt
  • source-relational-db
  • destination-databricks
  • destination-jdbc
  • destination-mysql-strict-encrypt
  • destination-postgres-strict-encrypt
  • source-alloydb-strict-encrypt
  • destination-mssql
  • source-sftp
  • source-oracle-strict-encrypt
  • destination-clickhouse
  • source-elasticsearch
  • destination-elasticsearch-strict-encrypt
  • destination-bigquery-denormalized
  • source-postgres
  • destination-csv
  • source-scaffold-java-jdbc
  • source-postgres-strict-encrypt
  • source-db2
  • destination-r2

@ryankfu
Copy link
Contributor Author

ryankfu commented Sep 30, 2022

/test connector=connectors/destination-s3

🕑 connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/3162204720
✅ connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/3162204720
No Python unittests run

Build Passed

Test summary info:

All Passed

@github-actions
Copy link
Contributor

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to run corresponding integration tests:

  • source-bigquery
  • destination-mssql
  • destination-scylla
  • destination-mariadb-columnstore
  • source-redshift
  • destination-clickhouse
  • destination-pubsub
  • source-sftp
  • destination-mongodb
  • source-db2
  • destination-oracle-strict-encrypt
  • destination-postgres-strict-encrypt
  • source-alloydb
  • destination-kinesis
  • source-tidb
  • source-mssql-strict-encrypt
  • source-elasticsearch
  • destination-databricks
  • source-cockroachdb-strict-encrypt
  • source-oracle
  • destination-kafka
  • source-oracle-strict-encrypt
  • source-snowflake
  • destination-csv
  • destination-e2e-test
  • source-postgres-strict-encrypt
  • destination-dynamodb
  • destination-postgres
  • destination-tidb
  • destination-meilisearch
  • source-mysql
  • destination-keen
  • destination-clickhouse-strict-encrypt
  • source-cockroachdb
  • destination-local-json
  • destination-dev-null
  • destination-bigquery-denormalized
  • destination-redshift
  • source-relational-db
  • destination-jdbc
  • source-clickhouse
  • destination-s3
  • destination-snowflake
  • destination-elasticsearch-strict-encrypt
  • destination-mongodb-strict-encrypt
  • source-kafka
  • destination-mqtt
  • destination-mysql-strict-encrypt
  • destination-gcs
  • destination-elasticsearch
  • destination-redis
  • source-clickhouse-strict-encrypt
  • destination-r2
  • source-mysql-strict-encrypt
  • destination-azure-blob-storage
  • source-mongodb-v2
  • destination-pulsar
  • source-e2e-test-cloud
  • source-mssql
  • destination-cassandra
  • source-postgres
  • source-mongodb-strict-encrypt
  • source-jdbc
  • destination-bigquery
  • destination-rockset
  • source-scaffold-java-jdbc
  • source-alloydb-strict-encrypt
  • destination-oracle
  • source-e2e-test
  • source-db2-strict-encrypt
  • destination-mssql-strict-encrypt
  • destination-mysql

@Amruta-Ranade
Copy link
Contributor

Doc looks good to me! Will let technical reviewers approve the PR :)

Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple tiny nitpicks but otherwise :shipit:

(perfect timing @Amruta-Ranade 😂 )

if (endpoint == null || endpoint.length() == 0) {
return true;
} else {
return endpoint.contains("https://");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick:

Suggested change
return endpoint.contains("https://");
return endpoint.startsWith("https://");

maybe we should be using a proper url parser, but that feels kind of overkill here

final S3DestinationConfig destinationConfig = this.configFactory.getS3DestinationConfig(config, super.storageProvider());

if (!S3BaseChecks.testCustomEndpointSecured(destinationConfig.getEndpoint())) {
return new AirbyteConnectionStatus()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: add a comment explaining why we do an early return instead of returning the list of this status + super.check(config)

import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;

public class S3DestinationStrictEncryptTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, solid test cases :)

@ryankfu
Copy link
Contributor Author

ryankfu commented Oct 3, 2022

/publish connector=connectors/destination-s3 run-tests=false

🕑 Publishing the following connectors:
connectors/destination-s3
https://github.com/airbytehq/airbyte/actions/runs/3176939327


Connector Did it publish? Were definitions generated?
connectors/destination-s3

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@github-actions
Copy link
Contributor

github-actions bot commented Oct 3, 2022

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to run corresponding integration tests:

  • source-jdbc
  • source-alloydb
  • destination-bigquery
  • destination-clickhouse-strict-encrypt
  • destination-clickhouse
  • destination-mariadb-columnstore
  • source-mysql
  • destination-kafka
  • destination-mqtt
  • destination-kinesis
  • destination-csv
  • source-relational-db
  • source-sftp
  • source-alloydb-strict-encrypt
  • destination-mssql-strict-encrypt
  • destination-e2e-test
  • source-mongodb-v2
  • destination-r2
  • source-clickhouse
  • source-cockroachdb
  • source-elasticsearch
  • destination-snowflake
  • destination-rockset
  • destination-mssql
  • destination-dev-null
  • source-postgres-strict-encrypt
  • source-snowflake
  • source-mssql-strict-encrypt
  • source-e2e-test-cloud
  • source-db2
  • destination-mysql
  • destination-elasticsearch-strict-encrypt
  • source-clickhouse-strict-encrypt
  • source-e2e-test
  • source-mssql
  • destination-dynamodb
  • source-oracle
  • destination-gcs
  • destination-postgres
  • source-bigquery
  • source-oracle-strict-encrypt
  • destination-pubsub
  • destination-s3
  • destination-postgres-strict-encrypt
  • destination-scylla
  • destination-redis
  • source-db2-strict-encrypt
  • destination-cassandra
  • source-tidb
  • destination-oracle
  • source-mysql-strict-encrypt
  • destination-tidb
  • source-postgres
  • source-scaffold-java-jdbc
  • destination-meilisearch
  • source-redshift
  • destination-local-json
  • destination-elasticsearch
  • source-kafka
  • destination-oracle-strict-encrypt
  • destination-bigquery-denormalized
  • destination-redshift
  • destination-mysql-strict-encrypt
  • destination-mongodb
  • destination-keen
  • destination-mongodb-strict-encrypt
  • destination-azure-blob-storage
  • destination-jdbc
  • destination-pulsar
  • destination-databricks
  • source-mongodb-strict-encrypt
  • source-cockroachdb-strict-encrypt

@ryankfu ryankfu merged commit 1d956df into master Oct 3, 2022
@ryankfu ryankfu deleted the ryan/add-ssl-s3-destination branch October 3, 2022 20:56
jhammarstedt pushed a commit to jhammarstedt/airbyte that referenced this pull request Oct 31, 2022
…hq#17340)

* Adds logic to fail upon non-deterministic custom S3 endpoint and documentation for insecure settings

* Reused config factory settings to a single static variable

* Updated error message and example in the spec.json to match expectation of secured endpoint

* Added validation check within the base s3

* Integrated AdaptiveDestinationRunner with S3Destination

* Reduced visibility for testing and fixed AdaptiveDestinationRunner issue

* Adds speicifc secure protocol with S3 and empty endpoint check

* Bumps docker version and adds comments and clearer string methods

* auto-bump connector version [ci skip]

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
@grishick grishick added the team/destinations Destinations team's backlog label Nov 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation area/frontend Related to the Airbyte webapp area/platform issues related to the platform connectors/destination/s3 team/destinations Destinations team's backlog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ability to enforce SSL in S3 Destination connector
6 participants