Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source s3 error on unsupported config #29541

Merged

Conversation

clnoll
Copy link
Contributor

@clnoll clnoll commented Aug 17, 2023

Closes #29531.

As suggested during sprint planning, raise an exception if the user sets additional_reader_options or advanced_options that aren't supported in v4.

@github-actions
Copy link
Contributor

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@clnoll clnoll changed the base branch from master to fix-legacy-state-identification August 17, 2023 18:02
@clnoll clnoll requested a review from girarda August 17, 2023 18:04
@octavia-squidington-iii
Copy link
Collaborator

source-s3 test report (commit 5f57aaa5ba) - ❌

⏲️ Total pipeline duration: 03mn18s

Step Result
Validate airbyte-integrations/connectors/source-s3/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Code format checks
Connector package install
Build source-s3 docker image for platform linux/x86_64
Unit tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-s3 test

if autogenerate_column_names := advanced_options.pop("autogenerate_column_names", None):
csv_options["autogenerate_column_names"] = autogenerate_column_names

if advanced_options or additional_reader_options:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to allow auto_dict_encode, and timestamp_parsers? I think @maxi297 verified they had no impact on the output so we don't need to "deprecate" them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's also an instance with "check_utf" set to False, which will fail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool I updated this so we ignore them now. @maxi297 can you confirm that was what you found?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @girarda I didn't see your second comment until I pushed my change. Fixing now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And good catch!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure we are on the same page:

  • For auto_dict_encode, the type returned by pyarrow converted as a string is dictionary<values=string, indices=int32, ordered=0>. Since we don't have this type in here, we fallback as a string
  • For timestamp_parsers, the type returned by pyarrow converted as a string is timestamp[s, tz=UTC]. Since we don't have this type in here, we fallback as a string

Therefore, I think there is no impact since the value would have been a string anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me @maxi297!

@clnoll clnoll force-pushed the source-s3-error-on-unsupported-config branch from 5f57aaa to 1fdf926 Compare August 17, 2023 19:03
"quote_char": '"',
"encoding": "utf8",
"double_quote": True,
"null_values": ["", "null", "NULL", "N/A", "NA", "NaN", "None"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No action item yet but I did change those three default values in the rollout branch so this will have to be updated once we merge this into the rollout branch

Copy link
Contributor Author

@clnoll clnoll Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to merge it into my other PR and then into master, instead of into the rollout branch, to isolate the rollout branch to code related to the switchover instead of including bug fixes. WDYT?

@clnoll clnoll requested review from maxi297 and girarda August 18, 2023 12:47
@staticmethod
def _filter_legacy_noops(advanced_options: Dict[str, Any]):
ignore_all = ("auto_dict_encode", "timestamp_parsers")
ignore_by_value = (("check_utf8", False),)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. good idea not to silently ignore check_utf=True

@clnoll clnoll merged commit cfdc274 into fix-legacy-state-identification Aug 18, 2023
23 of 26 checks passed
@clnoll clnoll deleted the source-s3-error-on-unsupported-config branch August 18, 2023 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Raise exception if config options that aren't supported in v4 are used
4 participants