Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Add read_parquet, to_parquet and to_csv #6129

Merged
merged 16 commits into from
Feb 8, 2023

Conversation

Tishj
Copy link
Contributor

@Tishj Tishj commented Feb 7, 2023

This PR adds some methods ported from pandas to our python API.

It also deprecates the old write_csv method, making it instead an alias to to_csv.
Also write_parquet is added as an alias to to_parquet

tools/pythonpkg/duckdb-stubs/__init__.pyi Outdated Show resolved Hide resolved
@Mytherin
Copy link
Collaborator

Mytherin commented Feb 8, 2023

Looks like there are some non-std::moves in this PR - could you patch them out?

@carlopi
Copy link
Contributor

carlopi commented Feb 8, 2023

Somehow related, I see this pattern where there are a bunch of read_csv, read_parquet, read_???, would potentially not make sense to have at some point a generic file ingesting functionality that takes as parameter the kind of file?

Something like read(fileLocation, 'csv') or read(fileLocation, 'json', {schema: "..."}) that also takes an optional third parameter with eventual options. And equivalent for output.

And then on the other side say you implement read_yourformat, it would be nice to be able to register that (either compile time or INSTALL time) into the generic read() handler.

I was discussing something similar regarding [de]compress with @samansmink, unsure whether this has value enough that (after some more thinking) this can be considered (probably at the library level so that can be then exposed also to SQL / Python / other bindings.

@Tishj
Copy link
Contributor Author

Tishj commented Feb 8, 2023

@carlopi I think there is value in that, but definitely in addition to methods like these.
Because we already have table_function relation, which can also be used to run the read_csv_auto table function, these methods added in this PR are just syntactic sugar to make it easier to use these functions.

@Mytherin Mytherin merged commit ff99a73 into duckdb:master Feb 8, 2023
@Mytherin
Copy link
Collaborator

Mytherin commented Feb 8, 2023

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants