Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source file: do not read whole file on check and discover #24278

Merged
merged 4 commits into from
Mar 22, 2023

Conversation

davydov-d
Copy link
Collaborator

What

https://github.com/airbytehq/oncall/issues/1681
Fix timing out check and discover commands

How

Read only the header of a file on check.
Read a single chunk of data on discover.

@davydov-d
Copy link
Collaborator Author

davydov-d commented Mar 21, 2023

/test connector=connectors/source-file

🕑 connectors/source-file https://github.com/airbytehq/airbyte/actions/runs/4478139235
✅ connectors/source-file https://github.com/airbytehq/airbyte/actions/runs/4478139235
Python tests coverage:

Name                      Stmts   Miss  Cover
---------------------------------------------
source_file/__init__.py       2      0   100%
source_file/utils.py         13      1    92%
source_file/source.py        81      7    91%
source_file/client.py       288     37    87%
---------------------------------------------
TOTAL                       384     45    88%
Name                      Stmts   Miss  Cover
---------------------------------------------
source_file/__init__.py       2      0   100%
source_file/client.py       288     43    85%
source_file/utils.py         13      8    38%
source_file/source.py        81     60    26%
---------------------------------------------
TOTAL                       384    111    71%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestIncremental.test_two_sequential_reads: Incremental syncs are not supported on this connector.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:100: The previous and actual specifications are identical.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:509: The previous and actual discovered catalogs are identical.
======================== 32 passed, 3 skipped in 48.86s ========================

# this is to ensure we make all conditions under which the bug is reproduced, i.e.
# - chunk size < file size
# - column type in the last chunk is not `string`
@patch("source_file.client.Client.CSV_CHUNK_SIZE", 1)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropped this test because it doesn't represent the expected behavior from now on

with client.reader.open():
list(client.streams)
return AirbyteConnectionStatus(status=Status.SUCCEEDED)
list(client.streams(empty_schema=True))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read only file header when running check to ensure the connection succeeds

reader_options["chunksize"] = self.CSV_CHUNK_SIZE
if skip_data:
reader_options["nrows"] = 0
reader_options["index_col"] = 0
yield from reader(fp, **reader_options)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read only self.CSV_CHUNK_SIZE bytes of data to generate schema. Otherwise a time out is possible in case a large file is read

@davydov-d
Copy link
Collaborator Author

davydov-d commented Mar 21, 2023

/publish connector=connectors/source-file

🕑 Publishing the following connectors:
connectors/source-file
https://github.com/airbytehq/airbyte/actions/runs/4483199836


Connector Did it publish? Were definitions generated?
connectors/source-file

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@davydov-d
Copy link
Collaborator Author

davydov-d commented Mar 21, 2023

/publish connector=connectors/source-file-secure

🕑 Publishing the following connectors:
connectors/source-file-secure
https://github.com/airbytehq/airbyte/actions/runs/4483366192


Connector Did it publish? Were definitions generated?
connectors/source-file-secure

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@davydov-d davydov-d merged commit 9fe07dc into master Mar 22, 2023
@davydov-d davydov-d deleted the ddavydov/#1681-source-file-fix-check branch March 22, 2023 08:06
erohmensing pushed a commit that referenced this pull request Mar 22, 2023
* #1681 source file: do not read whole file on check and discover

* #1681 source file: upd changelog

* #1681 source file: add allowed hosts

* auto-bump connector version

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
erohmensing pushed a commit that referenced this pull request Mar 22, 2023
* #1681 source file: do not read whole file on check and discover

* #1681 source file: upd changelog

* #1681 source file: add allowed hosts

* auto-bump connector version

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/file connectors/source/file-secure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants