-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source S3: infer schema of the first file only #23189
Source S3: infer schema of the first file only #23189
Conversation
/test connector=connectors/source-s3
Build FailedTest summary info:
|
airbyte-integrations/connectors/source-s3/integration_tests/integration_test_abstract.py
Show resolved
Hide resolved
airbyte-integrations/connectors/source-s3/acceptance-test-config.yml
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/source.py
Show resolved
Hide resolved
airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/stream.py
Show resolved
Hide resolved
airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/stream.py
Show resolved
Hide resolved
☝️ the failing test is not related to this PR |
this PR fixes the error above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT for the precise fix you did, but failing tests are not going to allow you to merge it the way it should be, right, maybe the order of operations should be different here and you might want to merge the avro fix first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank for the effort. The overall logic looks good to me but I have a couple of questions/suggestions.
airbyte-integrations/connectors/source-s3/acceptance-test-config.yml
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/source.py
Show resolved
Hide resolved
airbyte-integrations/connectors/source-s3/integration_tests/config_minio.json
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/stream.py
Show resolved
Hide resolved
airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/stream.py
Show resolved
Hide resolved
airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/stream.py
Outdated
Show resolved
Hide resolved
Hey @alafanechere can you please take a look at the comments I've left? |
…es' of github.com:airbytehq/airbyte into ddavydov/#1470-source-s3-do-not-infer-schema-of-all-files
/test connector=connectors/source-s3
Build FailedTest summary info:
|
/test connector=connectors/source-s3
Build PassedTest summary info:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank for the changes! I just have minor suggestions remaining.
Please sync with @airbytehq/cloud-support and @YowanR to decide if it's safe to deploy this potentially backward version to all customers now.
If not I would suggest pinning the version to 1.0.0 on Cloud and wait for our "feature flag" functionality to test this version on a subset of users.
airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/stream.py
Outdated
Show resolved
Hide resolved
@alafanechere thanks! What is the "feature flag" functionality? Do you have any links? |
@davydov-d It's a project @pedroslopez and @erohmensing will tackle pretty soon. It will allow us to deploy a preview version of a connector to a subset of workspaces. Here's the tech spec if you're curious. |
/test connector=connectors/source-s3
Build PassedTest summary info:
|
/publish connector=connectors/source-s3
if you have connectors that successfully published but failed definition generation, follow step 4 here |
* airbytehq#1470 Source S3: infer schema of the first file * airbytehq#1470 source s3: upd changelog * airbytehq#1470 source s3: review fixes * airbytehq#1470 source s3: review fixes * airbytehq#1470 source s3: bump version * airbytehq#1470 source s3: review fixes * auto-bump connector version --------- Co-authored-by: Serhii Lazebnyi <53845333+lazebnyi@users.noreply.github.com> Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
* #1470 Source S3: infer schema of the first file * #1470 source s3: upd changelog * #1470 source s3: review fixes * #1470 source s3: review fixes * #1470 source s3: bump version * #1470 source s3: review fixes * auto-bump connector version --------- Co-authored-by: Serhii Lazebnyi <53845333+lazebnyi@users.noreply.github.com> Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
* #1470 Source S3: infer schema of the first file * #1470 source s3: upd changelog * #1470 source s3: review fixes * #1470 source s3: review fixes * #1470 source s3: bump version * #1470 source s3: review fixes * auto-bump connector version --------- Co-authored-by: Serhii Lazebnyi <53845333+lazebnyi@users.noreply.github.com> Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
What
https://github.com/airbytehq/oncall/issues/1470
How
If user enforced schema is not provided - infer file schema based on the first file only, do not iterate over all the available files