-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨CAT: Add test to ensure all file types covered #33746
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
Warning Soft code freeze is in effect until 2024-01-02. Please avoid merging to master. #freedom-and-responsibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @askarpets - just a few requests.
In addition to the requests inline in the code, can you please update the PR description with:
- An example of how a user can set unsupported file types
- An example of how a user can bypass this test
- A list of which connectors we expect to run this test by default (i.e. a list of file-based, certified connectors). If we expect any to fail because the test config needs to be updated to list the unsupported file types, please state that here and either open a ticket or go ahead and update the config.
file_types: Optional[FileTypesConfig] = Field( | ||
default_factory=FileTypesConfig, | ||
description="For file-based connectors, unsupported by source file types can be configured or a test can be skipped at all", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this should just be called unsupported_file_types
, and should be a list/set (similar to empty_streams
). WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @clnoll, thanks for the review!
I added this section as a complex object mostly because we need to have the ability to skip this test at all, and to do so, a user just need to set FileTypesConfig.skip_test = True
instead of listing all available file types. Do you think there could be a better approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I see what you mean now. LGTM!
@staticmethod | ||
def _get_file_extension(file_name: str) -> str: | ||
_, file_extension = splitext(file_name) | ||
return file_extension |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return file_extension | |
return file_extension.casefold() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, thanks! I also added this to _get_unsupported_file_types
method to avoid issues when users list unsupported file types in uppercase for some reason.
f"Please make sure you added files with all of supported structured types {structured_types} " | ||
f"and at least one with unstructured type {unstructured_types} to the test account." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind updating this to 1) list out the ones that are missing, and 2) provide instructions for marking a file type as unsupported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, please take a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Looks great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This reverts commit 5d0fc4d.
What
Add a CAT to ensure all file types covered
Resolves #33363
How
This test will run if connector's metadata has the following params:
connectorSubtype: file
ab_internal.ql >= 400
(i.e. certified file-based connectors) and verify that the sandbox account for this connector contains the following:
.avro
, .csv
,.jsonl
,.parquet
.pdf
,.doc
,.docx
,.ppt
,.pptx
,.md
unless otherwise specified in the connector's tests config (
acceptance-test-config.yml
).In case the connector does not support some of the file types listed above, there is a possibility to disable checks for them using
unsupported_types
section in config:Another option is to skip the test at all:
Important note
Next steps
The following connectors have
connectorSubtype: file
andab_internal.ql >= 400
and their configs need to be reviewed:source-file
source-google-sheets
source-s3
Recommended reading order
test_core.py
config.py
🚨 User Impact 🚨
No breaking changes
Pre-merge Actions
Updating the Python CDK
Airbyter
Before merging:
--use-local-cdk --name=source-<connector>
as optionsairbyte-ci connectors --use-local-cdk --name=source-<connector> test
After merging: