-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎉 Airbyte CDK (File-based CDK): Stop the sync if the record could not be parsed #32589
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
airbyte-cdk/python/airbyte_cdk/sources/file_based/stream/default_file_based_stream.py
Outdated
Show resolved
Hide resolved
airbyte-cdk/python/airbyte_cdk/sources/file_based/config/avro_format.py
Outdated
Show resolved
Hide resolved
airbyte-cdk/python/airbyte_cdk/sources/file_based/file_types/avro_parser.py
Outdated
Show resolved
Hide resolved
airbyte-cdk/python/airbyte_cdk/sources/file_based/file_types/parquet_parser.py
Show resolved
Hide resolved
airbyte-cdk/python/airbyte_cdk/sources/file_based/file_types/avro_parser.py
Outdated
Show resolved
Hide resolved
…se-RecordParseError
airbyte-cdk/python/unit_tests/sources/file_based/stream/test_default_file_based_stream.py
Show resolved
Hide resolved
airbyte-cdk/python/unit_tests/sources/file_based/scenarios/csv_scenarios.py
Show resolved
Hide resolved
airbyte-cdk/python/airbyte_cdk/sources/file_based/file_types/avro_parser.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, just requesting a few small changes @bazarnov.
airbyte-cdk/python/airbyte_cdk/sources/file_based/file_types/csv_parser.py
Outdated
Show resolved
Hide resolved
airbyte-cdk/python/airbyte_cdk/sources/file_based/file_types/parquet_parser.py
Outdated
Show resolved
Hide resolved
Thanks @bazarnov this is getting really close! In addition to the new comments, can you also upgrade the S3 connector (and other file-based connectors if it's not too much trouble) to use the version of the CDK from this branch, and verify that the CATs pass? |
I'll do my best to test the |
The CATs have passed successfully using the airbyte-ci connectors --use-local-cdk --name=source-s3 test I''ll create the follow-up issue to upgrade the |
We have a
Which of these should be updated? @clnoll |
airbyte-cdk/python/airbyte_cdk/sources/file_based/file_types/avro_parser.py
Outdated
Show resolved
Hide resolved
…se-RecordParseError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
…se-RecordParseError
@bazarnov please start by migrating a couple of the less-used of the connectors (maybe azure-blob-storage first, then gcs) and monitor for errors, before migrating the others. |
@clnoll Thanks for your proposition. We have already generated follow-up issues for all connectors and will include them in upcoming sprints. Thank you. |
Great, thank you @lazebnyi! |
What
Resolving: #31605
How
RecordParseError
tostream.default_file_based_stream
.FileBasedErrorCollector
to collect theparse_errors
and yield them after all the streams are read + raised theAirbyteTracebackException
to highlight the issues with the sync, instead of silently succeed with a bunch of logged errors.🚨 User Impact 🚨
Some of the users, that have their sync
succeeded
so far, will hit the error that should make them act to fix their files on their end or change the file input to continue syncing.This is made on purpose as mentioned in this comment:
#31605 (comment)