Source s3 error on unsupported config #29541

clnoll · 2023-08-17T18:00:32Z

As suggested during sprint planning, raise an exception if the user sets additional_reader_options or advanced_options that aren't supported in v4.

github-actions · 2023-08-17T18:01:01Z

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

PR name follows PR naming conventions
Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
Secrets in the connector's spec are annotated with airbyte_secret
All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

Check for hidden checklists in your PR description
Toggle the github label checklist-action-run on/off to re-run the checklist CI.

octavia-squidington-iii · 2023-08-17T18:05:58Z

source-s3 test report (commit `5f57aaa5ba`) - ❌

⏲️ Total pipeline duration: 03mn18s

Step	Result
Validate airbyte-integrations/connectors/source-s3/metadata.yaml	✅
Connector version semver check	✅
Connector version increment check	❌
QA checks	✅
Code format checks	❌
Connector package install	✅
Build source-s3 docker image for platform linux/x86_64	✅
Unit tests	❌

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-s3 test

girarda · 2023-08-17T18:21:44Z

airbyte-integrations/connectors/source-s3/source_s3/v4/legacy_config_transformer.py

+            if autogenerate_column_names := advanced_options.pop("autogenerate_column_names", None):
+                csv_options["autogenerate_column_names"] = autogenerate_column_names
+
+            if advanced_options or additional_reader_options:


do we want to allow auto_dict_encode, and timestamp_parsers? I think @maxi297 verified they had no impact on the output so we don't need to "deprecate" them

there's also an instance with "check_utf" set to False, which will fail

Cool I updated this so we ignore them now. @maxi297 can you confirm that was what you found?

Sorry @girarda I didn't see your second comment until I pushed my change. Fixing now.

And good catch!

Just to make sure we are on the same page:

For auto_dict_encode, the type returned by pyarrow converted as a string is dictionary<values=string, indices=int32, ordered=0>. Since we don't have this type in here, we fallback as a string

For timestamp_parsers, the type returned by pyarrow converted as a string is timestamp[s, tz=UTC]. Since we don't have this type in here, we fallback as a string

Therefore, I think there is no impact since the value would have been a string anyway

Makes sense to me @maxi297!

…ons or advanced_options are set

maxi297 · 2023-08-18T12:20:27Z

airbyte-integrations/connectors/source-s3/unit_tests/v4/test_legacy_config_transformer.py

+                "quote_char": '"',
+                "encoding": "utf8",
+                "double_quote": True,
+                "null_values": ["", "null", "NULL", "N/A", "NA", "NaN", "None"],


No action item yet but I did change those three default values in the rollout branch so this will have to be updated once we merge this into the rollout branch

I was going to merge it into my other PR and then into master, instead of into the rollout branch, to isolate the rollout branch to code related to the switchover instead of including bug fixes. WDYT?

girarda · 2023-08-18T14:12:39Z

airbyte-integrations/connectors/source-s3/source_s3/v4/legacy_config_transformer.py

+    @staticmethod
+    def _filter_legacy_noops(advanced_options: Dict[str, Any]):
+        ignore_all = ("auto_dict_encode", "timestamp_parsers")
+        ignore_by_value = (("check_utf8", False),)


nice. good idea not to silently ignore check_utf=True

clnoll requested review from brianjlai and maxi297 August 17, 2023 18:00

octavia-squidington-iii added area/connectors Connector related issues connectors/source/s3 labels Aug 17, 2023

clnoll changed the base branch from master to fix-legacy-state-identification August 17, 2023 18:02

clnoll requested a review from girarda August 17, 2023 18:04

girarda reviewed Aug 17, 2023

View reviewed changes

clnoll added 2 commits August 17, 2023 14:35

Source S3: raise exception in v4 if deprecated additional_reader_opti…

083553d

…ons or advanced_options are set

Ignore options that were identified to be noops

1fdf926

clnoll force-pushed the source-s3-error-on-unsupported-config branch from 5f57aaa to 1fdf926 Compare August 17, 2023 19:03

clnoll added 2 commits August 17, 2023 15:17

formatting

dfb05c5

Handle case where we want to ignore only specific value

fcdf72f

maxi297 reviewed Aug 18, 2023

View reviewed changes

clnoll requested review from maxi297 and girarda August 18, 2023 12:47

girarda approved these changes Aug 18, 2023

View reviewed changes

clnoll merged commit cfdc274 into fix-legacy-state-identification Aug 18, 2023
23 of 26 checks passed

clnoll deleted the source-s3-error-on-unsupported-config branch August 18, 2023 14:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Source s3 error on unsupported config #29541

Source s3 error on unsupported config #29541

clnoll commented Aug 17, 2023

github-actions bot commented Aug 17, 2023

octavia-squidington-iii commented Aug 17, 2023

girarda Aug 17, 2023

girarda Aug 17, 2023

clnoll Aug 17, 2023

clnoll Aug 17, 2023

clnoll Aug 17, 2023

maxi297 Aug 18, 2023

clnoll Aug 18, 2023

maxi297 Aug 18, 2023

clnoll Aug 18, 2023 •

edited

girarda Aug 18, 2023

Source s3 error on unsupported config #29541

Source s3 error on unsupported config #29541

Conversation

clnoll commented Aug 17, 2023

github-actions bot commented Aug 17, 2023

Before Merging a Connector Pull Request

octavia-squidington-iii commented Aug 17, 2023

source-s3 test report (commit 5f57aaa5ba) - ❌

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clnoll Aug 18, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

source-s3 test report (commit `5f57aaa5ba`) - ❌

clnoll Aug 18, 2023 •

edited