New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Improvements to the Relational API #5407
Comments
I've looked into the Jupyter interactivity a bit, and it seems Jupyter does not support the re-rendering of output (like we do for the progress bar). To make the progress bars interactive we need to use ipywidgets |
So I've been working on this more, I got it to render an ipywidgets progress bar
Somehow now I can render the entire progress bar without excessively holding the GIL? Tried to make the background black and the bar yellow, but that much customization is sadly not supported for this widget without making your own |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days. |
This issue was closed because it has been stale for 30 days with no activity. |
Currently we have an API that goes from Connection -> Relation -> Result -> output_format
Where output format is either native python (fetchall, fetchmany, fetchone) or another package (arrow, pandas, numpy)
We want to improve the API with the following features/changes:
Result
portion entirely, and go straight from Relation -> output_format insteadHere we'll keep track of the progress of each of these points and the steps associated with them.
Getting rid of DuckDBPyResult
For some time now we have seen issues pop up regarding the implicit switch from relation->result and the duplication of methods across relation and result.
Not to mention the confusion/friction caused by having to fully materialize the result even when the desired output is - for example - a RecordBatchReader.
materialize
method on a Relation, to materialize the relation and store the results in a temporary table, returning a view over this temporary table as another relation.What is a default connection?
We would like to make it more obvious which connection you're using and also give you control over setting this connection, and also helping users to prevent creating extra connections that could cause confusion when switching back and forth between using this connection explicitly and executing queries through the main
duckdb.
module handle (which uses the default_connection).shared_ptr
everywhere, as recommended by https://pybind11.readthedocs.io/en/stable/advanced/smart_ptrs.html#std-shared-ptrcon = duckdb.connect()
) will just return the set default_connection.set_default_connection
method which will change the connection used when queries are executed through the defaultduckdb
module handle.Interactive result rendering
When issuing commands through the interactive python shell, we want to render progress bars, and 'pretty-print' results, also when used in other interactive environments, like Jupyter Notebooks, we want to adapt our rendering of these progress bars and result sets based on the environment.
Work smarter not harder
To support esoteric, but useful utility methods, like exporting to excel for example, we would like to follow in the footsteps of libraries like Pandas and Polars, and use external libraries to perform the bulk of the work.
excel
/to_excel
to output results to excel format.Create a list of useful utility functions and libraries we can leverage to perform these tasks.
The text was updated successfully, but these errors were encountered: