New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source s3 error on unsupported config #29541
Source s3 error on unsupported config #29541
Conversation
Before Merging a Connector Pull RequestWow! What a great pull request you have here! 🎉 To merge this PR, ensure the following has been done/considered for each connector added or updated:
If the checklist is complete, but the CI check is failing,
|
source-s3 test report (commit
|
Step | Result |
---|---|
Validate airbyte-integrations/connectors/source-s3/metadata.yaml | ✅ |
Connector version semver check | ✅ |
Connector version increment check | ❌ |
QA checks | ✅ |
Code format checks | ❌ |
Connector package install | ✅ |
Build source-s3 docker image for platform linux/x86_64 | ✅ |
Unit tests | ❌ |
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-s3 test
if autogenerate_column_names := advanced_options.pop("autogenerate_column_names", None): | ||
csv_options["autogenerate_column_names"] = autogenerate_column_names | ||
|
||
if advanced_options or additional_reader_options: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to allow auto_dict_encode
, and timestamp_parsers
? I think @maxi297 verified they had no impact on the output so we don't need to "deprecate" them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's also an instance with "check_utf" set to False, which will fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool I updated this so we ignore them now. @maxi297 can you confirm that was what you found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @girarda I didn't see your second comment until I pushed my change. Fixing now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure we are on the same page:
- For
auto_dict_encode
, the type returned by pyarrow converted as a string isdictionary<values=string, indices=int32, ordered=0>
. Since we don't have this type in here, we fallback as a string - For
timestamp_parsers
, the type returned by pyarrow converted as a string istimestamp[s, tz=UTC]
. Since we don't have this type in here, we fallback as a string
Therefore, I think there is no impact since the value would have been a string anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me @maxi297!
…ons or advanced_options are set
5f57aaa
to
1fdf926
Compare
"quote_char": '"', | ||
"encoding": "utf8", | ||
"double_quote": True, | ||
"null_values": ["", "null", "NULL", "N/A", "NA", "NaN", "None"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No action item yet but I did change those three default values in the rollout branch so this will have to be updated once we merge this into the rollout branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to merge it into my other PR and then into master, instead of into the rollout branch, to isolate the rollout branch to code related to the switchover instead of including bug fixes. WDYT?
@staticmethod | ||
def _filter_legacy_noops(advanced_options: Dict[str, Any]): | ||
ignore_all = ("auto_dict_encode", "timestamp_parsers") | ||
ignore_by_value = (("check_utf8", False),) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice. good idea not to silently ignore check_utf=True
cfdc274
into
fix-legacy-state-identification
Closes #29531.
As suggested during sprint planning, raise an exception if the user sets
additional_reader_options
oradvanced_options
that aren't supported in v4.