Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source S3: choose between data types when merging master schema #16631

Conversation

davydov-d
Copy link
Collaborator

What

When master schema is constructed of two or more json schemas, it is possible there is a type mismatch. In this case we log a warning and the type remains what it was in a user provided configuration. If user did not fill in the schema, an error is raised.

How

If possible - try to choose the broadest of two datatypes. Otherwise, do what we used to do - raise an error

@github-actions github-actions bot added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Sep 13, 2022
@davydov-d
Copy link
Collaborator Author

davydov-d commented Sep 13, 2022

/test connector=connectors/source-s3

🕑 connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/3044004798
✅ connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/3044004798
Python tests coverage:

	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          10      4    60%   15-18
	 source_acceptance_test/config.py                        83      6    93%   78-80, 84-86
	 source_acceptance_test/conftest.py                     164    164     0%   6-282
	 source_acceptance_test/plugin.py                        48     48     0%   6-104
	 source_acceptance_test/tests/test_core.py              329    111    66%   39, 50-58, 63-70, 74-75, 79-80, 164, 202-219, 228-236, 240-245, 251, 284-289, 327-334, 374-376, 379, 439-448, 477-478, 484, 487, 520-530, 543-568, 573-577
	 source_acceptance_test/tests/test_full_refresh.py       52      2    96%   34, 65
	 source_acceptance_test/tests/test_incremental.py       121     25    79%   21-23, 29-31, 36-43, 48-61, 208-216
	 source_acceptance_test/utils/asserts.py                 37      2    95%   57-58
	 source_acceptance_test/utils/common.py                  77     17    78%   15-16, 24-30, 47-54, 64, 67
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       110     48    56%   23-26, 32, 36, 39-64, 67-69, 72-74, 77-79, 82-84, 87-89, 92-110, 144-146
	 source_acceptance_test/utils/json_schema_helper.py     105     13    88%   30-31, 38, 41, 65-68, 96, 120, 190-192
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1325    463    65%
Name                                                              Stmts   Miss  Cover
-------------------------------------------------------------------------------------
source_s3/source_files_abstract/formats/parquet_spec.py               9      0   100%
source_s3/source_files_abstract/formats/jsonl_spec.py                13      0   100%
source_s3/source_files_abstract/formats/csv_spec.py                  16      0   100%
source_s3/source_files_abstract/formats/avro_spec.py                  5      0   100%
source_s3/s3file.py                                                  37      0   100%
source_s3/s3_utils.py                                                19      0   100%
source_s3/__init__.py                                                 2      0   100%
source_s3/source.py                                                  29      1    97%
source_s3/source_files_abstract/storagefile.py                       23      1    96%
source_s3/source_files_abstract/formats/abstract_file_parser.py      35      2    94%
source_s3/stream.py                                                  43      3    93%
source_s3/source_files_abstract/stream.py                           238     17    93%
source_s3/source_files_abstract/formats/csv_parser.py                76     18    76%
source_s3/source_files_abstract/file_info.py                         26      8    69%
source_s3/utils.py                                                   31     10    68%
source_s3/source_files_abstract/source.py                            37     14    62%
source_s3/source_files_abstract/spec.py                              44     22    50%
source_s3/source_files_abstract/formats/jsonl_parser.py              33     17    48%
source_s3/source_files_abstract/formats/avro_parser.py               38     25    34%
source_s3/source_files_abstract/formats/parquet_parser.py            61     44    28%
-------------------------------------------------------------------------------------
TOTAL                                                               815    182    78%
Name                                                              Stmts   Miss  Cover
-------------------------------------------------------------------------------------
source_s3/source_files_abstract/storagefile.py                       23      0   100%
source_s3/source_files_abstract/spec.py                              44      0   100%
source_s3/source_files_abstract/formats/parquet_spec.py               9      0   100%
source_s3/source_files_abstract/formats/jsonl_spec.py                13      0   100%
source_s3/source_files_abstract/formats/jsonl_parser.py              33      0   100%
source_s3/source_files_abstract/formats/csv_spec.py                  16      0   100%
source_s3/source_files_abstract/formats/avro_spec.py                  5      0   100%
source_s3/source_files_abstract/formats/abstract_file_parser.py      35      0   100%
source_s3/source.py                                                  29      0   100%
source_s3/s3file.py                                                  37      0   100%
source_s3/s3_utils.py                                                19      0   100%
source_s3/__init__.py                                                 2      0   100%
source_s3/source_files_abstract/formats/parquet_parser.py            61      1    98%
source_s3/stream.py                                                  43      1    98%
source_s3/source_files_abstract/source.py                            37      2    95%
source_s3/source_files_abstract/formats/avro_parser.py               38      3    92%
source_s3/source_files_abstract/file_info.py                         26      3    88%
source_s3/source_files_abstract/stream.py                           238     40    83%
source_s3/source_files_abstract/formats/csv_parser.py                76     18    76%
source_s3/utils.py                                                   31      8    74%
-------------------------------------------------------------------------------------
TOTAL                                                               815     76    91%

Build Passed

Test summary info:

All Passed

@davydov-d davydov-d merged commit 73ba7b6 into master Sep 19, 2022
@davydov-d davydov-d deleted the ddavydov/#422-source-s3-choose-broadest-data-type-when-merging-json-schemas branch September 19, 2022 07:50
@davydov-d davydov-d restored the ddavydov/#422-source-s3-choose-broadest-data-type-when-merging-json-schemas branch September 19, 2022 07:51
@davydov-d
Copy link
Collaborator Author

davydov-d commented Sep 19, 2022

/publish connector=connectors/source-s3

🕑 Publishing the following connectors:
connectors/source-s3
https://github.com/airbytehq/airbyte/actions/runs/3080752562


Connector Did it publish? Were definitions generated?
connectors/source-s3

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

robbinhan pushed a commit to robbinhan/airbyte that referenced this pull request Sep 29, 2022
…ytehq#16631)

* airbytehq#422 source s3: choose broadest data type when there is a mismatch during merging json schemas

* airbytehq#422 source s3: upd changelog
jhammarstedt pushed a commit to jhammarstedt/airbyte that referenced this pull request Oct 31, 2022
…ytehq#16631)

* airbytehq#422 source s3: choose broadest data type when there is a mismatch during merging json schemas

* airbytehq#422 source s3: upd changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/s3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants