Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File CDK: S3 connector testing & slow rollout #28137

Closed
5 tasks
clnoll opened this issue Jul 11, 2023 · 1 comment
Closed
5 tasks

File CDK: S3 connector testing & slow rollout #28137

clnoll opened this issue Jul 11, 2023 · 1 comment

Comments

@clnoll
Copy link
Contributor

clnoll commented Jul 11, 2023

This ticket is for doing the testing and reconnaissance needed to ensure that the new S3 connector will be backwards-compatible with the existing syncs, and rolling out the new connector in a deliberate way to keep an eye on the syncs.

This ticket is blocked on the S3 config adapter tickets

If there are any breaking changes that cannot be avoided using the config adapter, we should identify the connectors that will be impacted and communicate this info to support prior to the release of the S3 connector.

Acceptance Criteria

  • A new S3 connector that uses the File CDK is published.
  • The connector can be used to sync all file types (CSV, JSONL, Parquet, and Avro).
  • Test coverage exists for the old and new versions of the config.
  • Tests can be run using CATs.
  • Any unavoidable breaking changes are communicated to Support & on Slack, and we have an understanding of which connectors will be affected.
@clnoll clnoll changed the title File CDK: S3 connector backwards compatibility File CDK: S3 connector - ensure backwards compatibility Jul 11, 2023
@clnoll clnoll changed the title File CDK: S3 connector - ensure backwards compatibility File CDK: S3 connector with backwards compatibility Jul 11, 2023
@clnoll
Copy link
Contributor Author

clnoll commented Jul 11, 2023

Grooming notes:

  • We will have the following breaking changes:
    • Getting rid of columns for parquet file type; no cloud customers are using this, but some OS users might be.
    • newlines_in_values options for JSONL; these both have some adoption but supporting them would force us to use pyarrow for parsing both of these file types, which comes with memory-related drawbacks.
  • Review existing S3 CATs to ensure test coverage
  • Choose some (all?) existing S3 connectors to run syncs on to verify they succeed; swap out the old connector via launchdarkly
  • The idea is that this will be a slow rollout of the new connector to a few workspaces at a time.

@clnoll clnoll changed the title File CDK: S3 connector with backwards compatibility File CDK: S3 connector testing & slow rollout Aug 3, 2023
@maxi297 maxi297 closed this as completed Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants