Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Improvements to the Relational API #5407

Closed
3 of 9 tasks
Tishj opened this issue Nov 18, 2022 · 4 comments
Closed
3 of 9 tasks

[Python] Improvements to the Relational API #5407

Tishj opened this issue Nov 18, 2022 · 4 comments
Labels

Comments

@Tishj
Copy link
Contributor

Tishj commented Nov 18, 2022

Currently we have an API that goes from Connection -> Relation -> Result -> output_format
Where output format is either native python (fetchall, fetchmany, fetchone) or another package (arrow, pandas, numpy)

We want to improve the API with the following features/changes:

  • Do away with the intermediate Result portion entirely, and go straight from Relation -> output_format instead
  • Clear up confusion regarding implicit and explicit connections
  • Support interactive result and progress rendering
  • Support extra conversion methods backed by other libraries

Here we'll keep track of the progress of each of these points and the steps associated with them.

Getting rid of DuckDBPyResult

For some time now we have seen issues pop up regarding the implicit switch from relation->result and the duplication of methods across relation and result.
Not to mention the confusion/friction caused by having to fully materialize the result even when the desired output is - for example - a RecordBatchReader.

  • Merge the DuckDBPyResult module into the DuckDBPyRelation.
  • Create a materialize method on a Relation, to materialize the relation and store the results in a temporary table, returning a view over this temporary table as another relation.

What is a default connection?

We would like to make it more obvious which connection you're using and also give you control over setting this connection, and also helping users to prevent creating extra connections that could cause confusion when switching back and forth between using this connection explicitly and executing queries through the mainduckdb. module handle (which uses the default_connection).

  • Handle the DuckDBPyConnection as a shared_ptr everywhere, as recommended by https://pybind11.readthedocs.io/en/stable/advanced/smart_ptrs.html#std-shared-ptr
  • Don't create a new connection if connect is called without arguments (i.e con = duckdb.connect()) will just return the set default_connection.
  • Add a set_default_connection method which will change the connection used when queries are executed through the default duckdb module handle.

Interactive result rendering

When issuing commands through the interactive python shell, we want to render progress bars, and 'pretty-print' results, also when used in other interactive environments, like Jupyter Notebooks, we want to adapt our rendering of these progress bars and result sets based on the environment.

  • Enable progress bars by default in an interactive environment.
  • Output HTML when executed in Jupyter Notebook (probably more customizations to come)
  • Output using the newly introduced Box-Renderer mode in an interactive python shell.

Work smarter not harder

To support esoteric, but useful utility methods, like exporting to excel for example, we would like to follow in the footsteps of libraries like Pandas and Polars, and use external libraries to perform the bulk of the work.

  • Add utility functions like excel/to_excel to output results to excel format.
    Create a list of useful utility functions and libraries we can leverage to perform these tasks.
@Tishj
Copy link
Contributor Author

Tishj commented Nov 24, 2022

I've looked into the Jupyter interactivity a bit, and it seems Jupyter does not support the re-rendering of output (like we do for the progress bar).
It basically ignores the \r .. prints, and keeps only the first print

To make the progress bars interactive we need to use ipywidgets
EDIT: I have seen it update the bar, but our progress bar seems to permanently freeze the jupyter notebook execution

@Tishj
Copy link
Contributor Author

Tishj commented Nov 25, 2022

So I've been working on this more, I got it to render an ipywidgets progress bar
But it has the same issue as the text-based bar, it causes the cell to hang infinitely
It didn't when I only called update within the do ... while (RESULT_NOT_READY) loop
but when I added another call to update at the very end to finish the bar (it only got to 33%) - it hung infinitely

I've added a scope that holds the gil for the duration of the execution - and that fixed it
We need to hold the GIL during execution on Jupyter Notebooks to use progress bars

Hmm this might have something to do with the IPython.display method/object
https://ipython.readthedocs.io/en/stable/api/generated/IPython.display.html#IPython.display.DisplayHandle
I'll investigate this

Somehow now I can render the entire progress bar without excessively holding the GIL?
image

Tried to make the background black and the bar yellow, but that much customization is sadly not supported for this widget without making your own

@github-actions
Copy link

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the stale label Jul 29, 2023
@github-actions
Copy link

This issue was closed because it has been stale for 30 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant