-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csv discovery takes too much memory (#2089) #2883
csv discovery takes too much memory (#2089) #2883
Conversation
Use `chunksize` parameter when reading csv files.
/test connector=source-file
|
Handle `None` value for `fields` variable in `client.read()` function.
/test connector=source-file
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comments
airbyte-integrations/connectors/source-file/source_file/client.py
Outdated
Show resolved
Hide resolved
Refactor code in `client.read()` function. Remove duplicate code.
/test connector=source-file
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
load_dataframes method requires changes according to @keu comment
Implement version with yield statements.
/test connector=source-file
|
Update return annotation for load_dataframes().
/test connector=source-file
|
Remove unused import.
/test connector=source-file
|
Update source-file connector version from `0.2.1` to `0.2.2`.
/publish connector=connectors/source-file
|
Use
chunksize
parameter when reading csv files.What
This PR is for #2089 issue.
How
Describe the solution
Use
chunksize
parameter when reading csv files. It'll load not all the file but its chunks one at a time. Each chunk has 10000 lines.Pre-merge Checklist
Recommended reading order
airbyte-integrations/connectors/source-file/source_file/client.py