csv discovery takes too much memory (#2089) #2883

Zirochkaa · 2021-04-14T15:16:51Z

Use chunksize parameter when reading csv files.

What

This PR is for #2089 issue.

How

Describe the solution
Use chunksize parameter when reading csv files. It'll load not all the file but its chunks one at a time. Each chunk has 10000 lines.

Pre-merge Checklist

Run integration tests
Publish Docker images

Recommended reading order

airbyte-integrations/connectors/source-file/source_file/client.py

Use `chunksize` parameter when reading csv files.

Zirochkaa · 2021-04-14T15:27:00Z

/test connector=source-file

🕑 source-file https://github.com/airbytehq/airbyte/actions/runs/748845557
❌ source-file https://github.com/airbytehq/airbyte/actions/runs/748845557

Handle `None` value for `fields` variable in `client.read()` function.

Zirochkaa · 2021-04-15T08:59:26Z

/test connector=source-file

🕑 source-file https://github.com/airbytehq/airbyte/actions/runs/751408029
✅ source-file https://github.com/airbytehq/airbyte/actions/runs/751408029

keu

see my comments

airbyte-integrations/connectors/source-file/source_file/client.py

Refactor code in `client.read()` function. Remove duplicate code.

Zirochkaa · 2021-04-15T09:52:23Z

/test connector=source-file

🕑 source-file https://github.com/airbytehq/airbyte/actions/runs/751566871
✅ source-file https://github.com/airbytehq/airbyte/actions/runs/751566871

vitaliizazmic

load_dataframes method requires changes according to @keu comment

Implement version with yield statements.

Zirochkaa · 2021-04-16T14:16:42Z

/test connector=source-file

🕑 source-file https://github.com/airbytehq/airbyte/actions/runs/755903104
✅ source-file https://github.com/airbytehq/airbyte/actions/runs/755903104

airbyte-integrations/connectors/source-file/source_file/client.py

Update return annotation for load_dataframes().

Zirochkaa · 2021-04-16T14:56:23Z

/test connector=source-file

🕑 source-file https://github.com/airbytehq/airbyte/actions/runs/756013060
❌ source-file https://github.com/airbytehq/airbyte/actions/runs/756013060

Remove unused import.

Zirochkaa · 2021-04-16T15:21:56Z

/test connector=source-file

🕑 source-file https://github.com/airbytehq/airbyte/actions/runs/756087460
✅ source-file https://github.com/airbytehq/airbyte/actions/runs/756087460

Update source-file connector version from `0.2.1` to `0.2.2`.

Zirochkaa · 2021-04-16T16:35:13Z

/publish connector=connectors/source-file

🕑 connectors/source-file https://github.com/airbytehq/airbyte/actions/runs/756283124
✅ connectors/source-file https://github.com/airbytehq/airbyte/actions/runs/756283124

csv discovery takes too much memory (#2089)

ee1208c

Use `chunksize` parameter when reading csv files.

Zirochkaa requested review from keu, arhip11, yevhenii-ldv and vitaliizazmic April 14, 2021 15:16

Zirochkaa self-assigned this Apr 14, 2021

auto-assign bot requested review from ChristopheDuong and jrhizor April 14, 2021 15:16

Zirochkaa removed request for jrhizor and ChristopheDuong April 14, 2021 15:26

Zirochkaa linked an issue Apr 15, 2021 that may be closed by this pull request

csv discovery takes too much memory #2089

Closed

csv discovery takes too much memory (#2089)

1efd444

Handle `None` value for `fields` variable in `client.read()` function.

keu suggested changes Apr 15, 2021

View reviewed changes

airbyte-integrations/connectors/source-file/source_file/client.py Outdated Show resolved Hide resolved

airbyte-integrations/connectors/source-file/source_file/client.py Show resolved Hide resolved

csv discovery takes too much memory (#2089)

391f455

Refactor code in `client.read()` function. Remove duplicate code.

vitaliizazmic reviewed Apr 16, 2021

View reviewed changes

csv discovery takes too much memory (#2089)

f465bd9

Implement version with yield statements.

Zirochkaa requested review from keu and vitaliizazmic April 16, 2021 14:26

keu suggested changes Apr 16, 2021

View reviewed changes

airbyte-integrations/connectors/source-file/source_file/client.py Show resolved Hide resolved

csv discovery takes too much memory (#2089)

bd5d2f0

Update return annotation for load_dataframes().

Zirochkaa requested a review from keu April 16, 2021 14:56

csv discovery takes too much memory (#2089)

3cc1e27

Remove unused import.

keu approved these changes Apr 16, 2021

View reviewed changes

Zirochkaa requested a review from sherifnada April 16, 2021 15:34

vitaliizazmic approved these changes Apr 16, 2021

View reviewed changes

sherifnada approved these changes Apr 16, 2021

View reviewed changes

csv discovery takes too much memory (#2089)

04e4052

Update source-file connector version from `0.2.1` to `0.2.2`.

Zirochkaa merged commit df80593 into master Apr 16, 2021

Zirochkaa deleted the oleh/2089-csv-discovery-takes-too-much-memory branch April 16, 2021 16:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csv discovery takes too much memory (#2089) #2883

csv discovery takes too much memory (#2089) #2883

Zirochkaa commented Apr 14, 2021 •

edited

Loading

Zirochkaa commented Apr 14, 2021 •

edited by github-actions bot

Loading

Zirochkaa commented Apr 15, 2021 •

edited by github-actions bot

Loading

keu left a comment

Zirochkaa commented Apr 15, 2021 •

edited by github-actions bot

Loading

vitaliizazmic left a comment

Zirochkaa commented Apr 16, 2021 •

edited by github-actions bot

Loading

Zirochkaa commented Apr 16, 2021 •

edited by github-actions bot

Loading

Zirochkaa commented Apr 16, 2021 •

edited by github-actions bot

Loading

Zirochkaa commented Apr 16, 2021 •

edited by github-actions bot

Loading

csv discovery takes too much memory (#2089) #2883

csv discovery takes too much memory (#2089) #2883

Conversation

Zirochkaa commented Apr 14, 2021 • edited Loading

What

How

Pre-merge Checklist

Recommended reading order

Zirochkaa commented Apr 14, 2021 • edited by github-actions bot Loading

Zirochkaa commented Apr 15, 2021 • edited by github-actions bot Loading

keu left a comment

Choose a reason for hiding this comment

Zirochkaa commented Apr 15, 2021 • edited by github-actions bot Loading

vitaliizazmic left a comment

Choose a reason for hiding this comment

Zirochkaa commented Apr 16, 2021 • edited by github-actions bot Loading

Zirochkaa commented Apr 16, 2021 • edited by github-actions bot Loading

Zirochkaa commented Apr 16, 2021 • edited by github-actions bot Loading

Zirochkaa commented Apr 16, 2021 • edited by github-actions bot Loading

Zirochkaa commented Apr 14, 2021 •

edited

Loading

Zirochkaa commented Apr 14, 2021 •

edited by github-actions bot

Loading

Zirochkaa commented Apr 15, 2021 •

edited by github-actions bot

Loading

Zirochkaa commented Apr 15, 2021 •

edited by github-actions bot

Loading

Zirochkaa commented Apr 16, 2021 •

edited by github-actions bot

Loading

Zirochkaa commented Apr 16, 2021 •

edited by github-actions bot

Loading

Zirochkaa commented Apr 16, 2021 •

edited by github-actions bot

Loading

Zirochkaa commented Apr 16, 2021 •

edited by github-actions bot

Loading