Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cursor() unable to find Pandas df registered in parent connection #3980

Closed
wenboz1122 opened this issue Jun 28, 2022 · 7 comments
Closed

Cursor() unable to find Pandas df registered in parent connection #3980

wenboz1122 opened this issue Jun 28, 2022 · 7 comments
Labels

Comments

@wenboz1122
Copy link

What happens?

Same issue to #1736 The cursor used to work well with v0.3.4 but the issue happens again with v0.4.0. Seems a regression due to recent changes. We have to use cursor() instead of the connection object itself because our generic interface also supports other database APIs and cursor() is a common practice that follows https://peps.python.org/pep-0249/, which is also useful for parallized queries.

To Reproduce

See #1736

Environment (please complete the following information):

  • OS: (e.g. iOS)
  • DuckDB Version: [0.4.0]
  • DuckDB Client: [Python 3.9]

Identity Disclosure:

  • Full Name: [Wenbo Zhang]

If the above is not given and is not obvious from your GitHub profile page, we might close your issue without further review. Please refer to the reasoning behind this rule if you have questions.

@pdet
Copy link
Member

pdet commented Jun 28, 2022

Hmm, cursors now acquire a new connection to allow for multithreading in the Python API, it's possible they are not inheriting the registered objects properly. Will have a look in a bit.

@pdet
Copy link
Member

pdet commented Jun 28, 2022

I think the main question here is how we desire to treat Cursors in DuckDB.
As said previously they are currently new connections to the parent connection, so do not share temporary views or dependencies on python objects from the parent connection.
The problem here is that if we decide to share these views we need to decide how to handle these dependencies.
E.g., if in a cursor we register a new python object, should this be registered in the cursor's parent?
If we unregister in the parent, should it be replicated to the cursor?
Should cursors be allowed to birth new cursors?

I think we can create a new Python object with a type cursor, that inherits most of the PyConnection functionality, with the exception of cursor creation, and object register/unregister. That way we only need to maintain these in the original connection and we just have to invalidate the cursors upon the connection destruction.

@Mytherin any particular opinion on this?

@Mytherin
Copy link
Collaborator

I think cursors need to stay as separate connections. Perhaps an alternative solution would be to add support for creating "global" temporary objects in the catalog. This way when registering a data frame it could be created as a global temporary view, rather than as a connection-local temporary view.

@pdet
Copy link
Member

pdet commented Jun 28, 2022

I think cursors need to stay as separate connections. Perhaps an alternative solution would be to add support for creating "global" temporary objects in the catalog. This way when registering a data frame it could be created as a global temporary view, rather than as a connection-local temporary view.

I think this can be problematic in all sorts of ways with the object registration still. For example:

con = duckdb.connect()
con.register(df_1, 'df_1')
cur = con.cursor() # cur should be able to see df_1
cur.register(df_2, 'df_2') # Now both con and cur can see df_2, but cur holds the reference to df_2
cur.close()
del df_2
# at this point con still has a view of df_2 but df_2 has been properly deleted since its duckdb reference died with cur
con.query("select * from df_2") # woopsie

With my cursor suggestion, underneath they could still have a different duckdb connection. The main difference is that we don't output PyConnections from cursor calls and that PyCursors don't have registration, unregistration, and cursor methods.

@Mytherin
Copy link
Collaborator

Doesn't the Register create a view? Can't the view itself hold a py::obj reference to the DataFrame - similar to how replacement scans work? This way no references within connections are necessary - and "unregister" becomes a matter of dropping the view.

@github-actions
Copy link

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the stale label Jul 30, 2023
@github-actions
Copy link

This issue was closed because it has been stale for 30 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants