New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copying the data frame to disk #89
Comments
Do we need to add more to our README? Happy to review a PR! See When such a data frame is processed with dplyr, the processing happens on DuckDB and is using DuckDB memory. Only when a result is collected (either manually as above or because the code requests an operation that can't be run in DuckDB yet) the data is materialized as a data frame. Materialization also emits a message by default, see |
Thanks for the reply. So I understood |
Parquet is the safest option for now. The DuckDB database that duckplyr uses is currently an opaque implementation detail, and supporting export to a different connection/database file is not straightforward. @Tmonster: please correct me if I'm misrepresenting. |
@krlmlr you are right, although creating a duckdb database for storing the data frame contents should be fairly straightforward. persistent_con <- dbConnect(duckdb(), dbdir="persisted_table.db"))
dbWriteTable(persistent_con, "my_persistent_table", my_df) I haven't tested this though, so unsure if it really is this easy. Also working on a |
How can we pour the results of a query speicified by a relational object into a new table (possibly in a different database connection)? Do we need |
I think in this case we would need a If not, we could also convert the relation object to a normal df and then use dbWriteTable. If we do that, you can write to different databases in the following (slightly hacky) way. con <- dbConnect(duckdb(), dbdir='db1.db')
dbSendQuery(con, "attach 'db2.db'")
dbSendQuery(con, "use db2;")
dbWriteTable(con, "cars", mtcars)
dbSendQuery(con, "use db1;")
dbWriteTable(con, "cars", mtcars) You can then verify that both databases have a copy of cars by opening them individually. Unfortunately I don't think we have a relational way yet to change databases. Potentially also something I can add to the |
Most importantly, this would save the roundtrip through R memory. A For writing to other databases, I suspect we can always attach them to our main database and write to a schema? Would that work? |
Ah, I agree that would be helpful. I can look into this.
Yes this will work. I can probably write a |
My impression at the moment is that if you use
as_duckplyr_df()
, then duckdb is actually processing the data frame in R's allocated memory. This may or may not be correct.What I want to do is first save the data frame to disk in an efficient format (hopefully the DuckDB binary format, but parquet would work as well), and then re-open the data frame for querying on disk. Is there currently a way to do this, or a plan for how to do this with
duckplyr
?The text was updated successfully, but these errors were encountered: