Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csv discovery takes too much memory (#2089) #2883

Merged
merged 7 commits into from
Apr 16, 2021

Conversation

Zirochkaa
Copy link
Contributor

@Zirochkaa Zirochkaa commented Apr 14, 2021

Use chunksize parameter when reading csv files.

What

This PR is for #2089 issue.

How

Describe the solution
Use chunksize parameter when reading csv files. It'll load not all the file but its chunks one at a time. Each chunk has 10000 lines.

Pre-merge Checklist

  • Run integration tests
  • Publish Docker images

Recommended reading order

  1. airbyte-integrations/connectors/source-file/source_file/client.py

Use `chunksize` parameter when reading csv files.
@Zirochkaa
Copy link
Contributor Author

Zirochkaa commented Apr 14, 2021

/test connector=source-file

🕑 source-file https://github.com/airbytehq/airbyte/actions/runs/748845557
❌ source-file https://github.com/airbytehq/airbyte/actions/runs/748845557

@Zirochkaa Zirochkaa linked an issue Apr 15, 2021 that may be closed by this pull request
Handle `None` value for `fields` variable in `client.read()` function.
@Zirochkaa
Copy link
Contributor Author

Zirochkaa commented Apr 15, 2021

/test connector=source-file

🕑 source-file https://github.com/airbytehq/airbyte/actions/runs/751408029
✅ source-file https://github.com/airbytehq/airbyte/actions/runs/751408029

Copy link
Contributor

@keu keu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comments

Refactor code in `client.read()` function. Remove duplicate code.
@Zirochkaa
Copy link
Contributor Author

Zirochkaa commented Apr 15, 2021

/test connector=source-file

🕑 source-file https://github.com/airbytehq/airbyte/actions/runs/751566871
✅ source-file https://github.com/airbytehq/airbyte/actions/runs/751566871

Copy link
Contributor

@vitaliizazmic vitaliizazmic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_dataframes method requires changes according to @keu comment

Implement version with yield statements.
@Zirochkaa
Copy link
Contributor Author

Zirochkaa commented Apr 16, 2021

/test connector=source-file

🕑 source-file https://github.com/airbytehq/airbyte/actions/runs/755903104
✅ source-file https://github.com/airbytehq/airbyte/actions/runs/755903104

Update return annotation for load_dataframes().
@Zirochkaa
Copy link
Contributor Author

Zirochkaa commented Apr 16, 2021

/test connector=source-file

🕑 source-file https://github.com/airbytehq/airbyte/actions/runs/756013060
❌ source-file https://github.com/airbytehq/airbyte/actions/runs/756013060

@Zirochkaa Zirochkaa requested a review from keu April 16, 2021 14:56
@Zirochkaa
Copy link
Contributor Author

Zirochkaa commented Apr 16, 2021

/test connector=source-file

🕑 source-file https://github.com/airbytehq/airbyte/actions/runs/756087460
✅ source-file https://github.com/airbytehq/airbyte/actions/runs/756087460

Update source-file connector version from `0.2.1` to `0.2.2`.
@Zirochkaa
Copy link
Contributor Author

Zirochkaa commented Apr 16, 2021

/publish connector=connectors/source-file

🕑 connectors/source-file https://github.com/airbytehq/airbyte/actions/runs/756283124
✅ connectors/source-file https://github.com/airbytehq/airbyte/actions/runs/756283124

@Zirochkaa Zirochkaa merged commit df80593 into master Apr 16, 2021
@Zirochkaa Zirochkaa deleted the oleh/2089-csv-discovery-takes-too-much-memory branch April 16, 2021 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

csv discovery takes too much memory
4 participants