Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Cannot close AdbcConnection with open AdbcStatement #538

Closed
mavestergaard opened this issue Mar 24, 2023 · 24 comments
Closed

Comments

@mavestergaard
Copy link

I am using polars library to write to Azure Postgre SQL DB.

This is my code:
import polars as pl
from datetime import datetime
data = {
    'text': ['foo', 'bar', 'baz'],
    'bool': [True, False, True],
    'datetime': [datetime.now(), datetime.now(), datetime.now()],
    'date': ['2022-03-24', '2022-03-23', '2022-03-22'],
    'float': [1.1, 2.2, 3.3]
}
df = pl.DataFrame(data, schema={
    'text': pl.Utf8,
    'bool': pl.Boolean,
    'datetime': pl.Datetime('us', None),
    'date': pl.Date,
    'float': pl.Float64
})
connection_string = f"postgresql://{servername}:5432/{dbname}?user={username}&password={pw}"
df.write_database(
            table_name="schema_name.my_table",
            connection_uri=connection_string,
            if_exists='append',
            engine='adbc'
        )
This is the error:
    df.write_database(
  File "/home/mv/.virtualenvs/env_fsspec/lib/python3.10/site-packages/polars/dataframe/frame.py", line 2940, in write_database
    with _open_adbc_connection(connection_uri) as conn:
  File "/home/mv/.virtualenvs/env_fsspec/lib/python3.10/site-packages/adbc_driver_manager/dbapi.py", line 217, in __exit__
    self.close()
  File "/home/mv/.virtualenvs/env_fsspec/lib/python3.10/site-packages/adbc_driver_manager/dbapi.py", line 302, in close
    self._conn.close()
  File "adbc_driver_manager/_lib.pyx", line 851, in adbc_driver_manager._lib.AdbcConnection.close
  File "adbc_driver_manager/_lib.pyx", line 452, in adbc_driver_manager._lib._AdbcHandle._check_open_children
  File "adbc_driver_manager/_lib.pyx", line 454, in adbc_driver_manager._lib._AdbcHandle._check_open_children
RuntimeError: Cannot close AdbcConnection with open AdbcStatement
Versions I have installed:
adbc-driver-manager==0.3.0
adbc-driver-postgresql==0.3.0
pandas==1.5.3
polars==0.16.14
pyarrow==11.0.0

Please let me know if you need anything further.

Many thanks!

@lidavidm
Copy link
Member

I suppose you might want to pin 0.2.0 for now.

Is there more context? The Polars code in question appears to be closing the cursor correctly (though it should use a context manager), so I think this would only happen if there was an uncaught exception raised.

@mavestergaard
Copy link
Author

Unfortunately, it is not working with pinned version. And there seems to be something wrong with the Polars code.

It is doing:

cursor.close()
conn.commit()

Shouldn't it be somehting like (or with context manager as I am posting in the code below)

cursor.commit()
cursor.close()

With the below workaround I got a different error, which I can't find any context on. Can you help with that?

The error is:

File "adbc_driver_manager/_lib.pyx", line 371, in adbc_driver_manager._lib.check_error
adbc_driver_manager._lib.NotSupportedError: ADBC_STATUS_NOT_IMPLEMENTED (2): [libpq] Field #1 ('text') has unsupported parameter type 35

The text column in db is varchar(16)
The code that almost works:

import adbc_driver_postgresql.dbapi as adbc
with adbc.connect(connection_string) as conn:
    with conn.cursor() as cursor:
        cursor.adbc_ingest("schema.mytable", df.to_arrow(), "append")
print("insert completed")

Any help is really appreciated!!

@mavestergaard
Copy link
Author

Unfortunately, it is not working with pinned version. And there seems to be something wrong with the Polars code.

It is doing:

cursor.close()
conn.commit()

Shouldn't it be somehting like (or with context manager as I am posting in the code below)

cursor.commit()
cursor.close()

With the below workaround I got a different error, which I can't find any context on. Can you help with that?

The error is:

File "adbc_driver_manager/_lib.pyx", line 371, in adbc_driver_manager._lib.check_error
adbc_driver_manager._lib.NotSupportedError: ADBC_STATUS_NOT_IMPLEMENTED (2): [libpq] Field #1 ('text') has unsupported parameter type 35

The text column in db is varchar(16)
The code that almost works:

import adbc_driver_postgresql.dbapi as adbc
with adbc.connect(connection_string) as conn:
    with conn.cursor() as cursor:
        cursor.adbc_ingest("schema.mytable", df.to_arrow(), "append")
print("insert completed")

Any help is really appreciated!!

@mavestergaard mavestergaard reopened this Mar 24, 2023
@lidavidm
Copy link
Member

The driver is fairly new and I haven't yet had time to build out support for all the different Postgres types and do benchmarking. So right now, string columns just aren't supported yet, sorry :/

@mavestergaard
Copy link
Author

That's fair - thanks! Is there an open "issue" I can track to see when the support is added?

@lidavidm
Copy link
Member

Mostly listed here: #81
And this is related: #197

@lidavidm
Copy link
Member

We can leave this issue open too, I'll use this to also add integration tests with Polars (I hadn't realized they had done this - so thank you for bringing it up to me) and see if I can reproduce things/improve things on our side and Polars side (context manager, and supporting the "fail" mode they list)

@mcrumiller
Copy link

Hi @lidavidm is there any update on this? I see integration tests for writing to the database but not reading; I'm unable to do even a simple query in polars via postgres (note it claims syntax error but there is none in my query):

Traceback (most recent call last):
  File "<...>\site-packages\polars\io\database.py", line 140, in _read_sql_adbc
    cursor.execute(query)
  File "<...>\site-packages\adbc_driver_manager\dbapi.py", line 604, in execute
    handle, self._rowcount = self._stmt.execute_query()
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "adbc_driver_manager\_lib.pyx", line 991, in adbc_driver_manager._lib.AdbcStatement.execute_query
  File "adbc_driver_manager\_lib.pyx", line 385, in adbc_driver_manager._lib.check_error
adbc_driver_manager._lib.OperationalError: ADBC_STATUS_IO (10): [libpq] Failed to execute query: could not begin COPY: ERROR:  syntax error at or near "("
LINE 1: ...ore_alr_vw WHERE val_dt = '2023-05-10') TO STDOUT (FORMAT bi...
                                                             ^

Query was: COPY (SELECT p

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<...>\site-packages\polars\io\database.py", line 104, in read_database
    return _read_sql_adbc(query, connection_uri)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<...>\site-packages\polars\io\database.py", line 138, in _read_sql_adbc
    with _open_adbc_connection(connection_uri) as conn:
  File "<...>\site-packages\adbc_driver_manager\dbapi.py", line 217, in __exit__
    self.close()
  File "<...>\site-packages\adbc_driver_manager\dbapi.py", line 302, in close
    self._conn.close()
  File "adbc_driver_manager\_lib.pyx", line 855, in adbc_driver_manager._lib.AdbcConnection.close
  File "adbc_driver_manager\_lib.pyx", line 456, in adbc_driver_manager._lib._AdbcHandle._check_open_children
  File "adbc_driver_manager\_lib.pyx", line 458, in adbc_driver_manager._lib._AdbcHandle._check_open_children
RuntimeError: Cannot close AdbcConnection with open AdbcStatement

@lidavidm
Copy link
Member

What is the full query it generated?

@mcrumiller
Copy link

It was:

SELECT string_field
FROM my_table
WHERE date_field = '2023-05-10'
ORDER BY string_field

@lidavidm
Copy link
Member

Hmm, I'm not able to replicate this.

I created a table:

postgres=# CREATE TABLE my_table (string_field TEXT, date_field DATE);
CREATE TABLE
postgres=# INSERT INTO my_table VALUES ('a', '2023-05-10'), ('b', '2023-05-11'), ('c', '2023-05-12');
INSERT 0 3
postgres=# SELECT * FROM my_table;
 string_field | date_field 
--------------+------------
 a            | 2023-05-10
 b            | 2023-05-11
 c            | 2023-05-12
(3 rows)

Then I ran your query with polars:

>>> import polars
>>> import polars as pl
>>> uri = 'postgres://localhost:5432/postgres?user=postgres&password=password'
>>> query = '''SELECT string_field
... FROM my_table
... WHERE date_field = '2023-05-10'
... ORDER BY string_field'''
>>> pl.read_database(query, connection_uri=uri, engine='adbc')
shape: (1, 1)
┌──────────────┐
│ string_field │
│ ---          │
│ str          │
╞══════════════╡
│ a            │
└──────────────┘
>>> pl.__version__
'0.18.4'

@lidavidm
Copy link
Member

Given the error, it seems something differs about the query?

adbc_driver_manager._lib.OperationalError: ADBC_STATUS_IO (10): [libpq] Failed to execute query: could not begin COPY: ERROR:  syntax error at or near "("
LINE 1: ...ore_alr_vw WHERE val_dt = '2023-05-10') TO STDOUT (FORMAT bi...
                                                             ^

Query was: COPY (SELECT p

@mcrumiller
Copy link

mcrumiller commented Jun 27, 2023

What's your system? I'm running on Windows w/ Greenplum postgres, which I think may be the issue here. I set up the exact same table as you with the exact same query and I still get the error:

uri = f"postgresql://{username}:{password}@{server}:5432/{database}"
query = "SELECT string_field FROM my_table WHERE date_field = '2023-05-10'"
pl.read_database(query, connection_uri=uri, engine="adbc")
Traceback (most recent call last):
  File "C:\Projects\project-brr\.venv_brr\Lib\site-packages\polars\io\database.py", line 140, in _read_sql_adbc
    cursor.execute(query)
  File "C:\Projects\project-brr\.venv_brr\Lib\site-packages\adbc_driver_manager\dbapi.py", line 604, in execute
    handle, self._rowcount = self._stmt.execute_query()
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "adbc_driver_manager\_lib.pyx", line 991, in adbc_driver_manager._lib.AdbcStatement.execute_query
  File "adbc_driver_manager\_lib.pyx", line 385, in adbc_driver_manager._lib.check_error
adbc_driver_manager._lib.OperationalError: ADBC_STATUS_IO (10): [libpq] Failed to execute query: could not begin COPY: ERROR:  syntax error at or near "("
LINE 1: ..._table WHERE date_field = '2023-05-10') TO STDOUT (FORMAT bi...
                                                             ^

Query was: COPY (SELECT s

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Projects\project-brr\.venv_brr\Lib\site-packages\polars\io\database.py", line 104, in read_database
    return _read_sql_adbc(query, connection_uri)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\project-brr\.venv_brr\Lib\site-packages\polars\io\database.py", line 138, in _read_sql_adbc
    with _open_adbc_connection(connection_uri) as conn:
  File "C:\Projects\project-brr\.venv_brr\Lib\site-packages\adbc_driver_manager\dbapi.py", line 217, in __exit__
    self.close()
  File "C:\Projects\project-brr\.venv_brr\Lib\site-packages\adbc_driver_manager\dbapi.py", line 302, in close
    self._conn.close()
  File "adbc_driver_manager\_lib.pyx", line 855, in adbc_driver_manager._lib.AdbcConnection.close
  File "adbc_driver_manager\_lib.pyx", line 456, in adbc_driver_manager._lib._AdbcHandle._check_open_children
  File "adbc_driver_manager\_lib.pyx", line 458, in adbc_driver_manager._lib._AdbcHandle._check_open_children
RuntimeError: Cannot close AdbcConnection with open AdbcStatement

@lidavidm
Copy link
Member

Hmm, Ubuntu Linux with PostgreSQL 13.

Does Greenplum's fork retain COPY support? If not, the driver won't work.

@lidavidm
Copy link
Member

lidavidm commented Jun 27, 2023

e.g. does this work in psql?

COPY (SELECT * FROM my_table) TO STDOUT (FORMAT binary);

@mcrumiller
Copy link

mcrumiller commented Jun 27, 2023

Ahh, that's the issue. postgres COPY function is only allowed from the backend: https://www.postgresql.org/docs/current/sql-copy.html. See here:

Files named in a COPY command are read or written directly by the server, not by the client application. Therefore, they must reside on or be accessible to the database server machine, not the client.

I've run into the past. The workaround is to use the \copy function instead.

Do not confuse COPY with the psql instruction \copy. \copy invokes COPY FROM STDIN or COPY TO STDOUT, and then fetches/stores the data in a file accessible to the psql client. Thus, file accessibility and access rights depend on the client rather than the server when \copy is used.

And in the \copy documentation:

Performs a frontend (client) copy. This is an operation that runs an SQL COPY command, but instead of the server reading or writing the specified file, psql reads or writes the file and routes the data between the server and the local file system. This means that file accessibility and privileges are those of the local user, not the server, and no SQL superuser privileges are required.

@lidavidm
Copy link
Member

Right, but we are using COPY TO STDOUT as seen in the error message.

@mcrumiller
Copy link

mcrumiller commented Jun 27, 2023

Thanks for your help David. I think the issue is in the (FORMAT binary) part of the COPY TO statement.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "adbc_driver_manager\_lib.pyx", line 991, in adbc_driver_manager._lib.AdbcStatement.execute_query
  File "adbc_driver_manager\_lib.pyx", line 385, in adbc_driver_manager._lib.check_error
adbc_driver_manager._lib.OperationalError: ADBC_STATUS_IO (10): [libpq] Failed to execute query: could not begin COPY: ERROR:  syntax error at or near "("
LINE 1: ...= '2023-05-10'::DATE ORDER BY plcy_nbr) TO STDOUT (FORMAT bi...
                                                             ^

When I run this in my handy DBeaver postgresql app, I get this error:

image

But note that when I run without the FORMAT binary, it parses correctly (but fails for a different reason, DBeaver uses jdbc):

image

@mcrumiller
Copy link

mcrumiller commented Jun 27, 2023

Hi David, I found the Greenplum documentation. My company is still on Greenplum 5 (6 is the latest, 7 in beta). It looks like in 6+ they support the FORMAT binary syntax, but prior to that the word FORMAT is omitted; I can get the parser to work correctly if I remove that:

image

So, yeah. Is there an easy way to get around this by altering the COPY command in the dbapi.execute() function? If not I think I'm out of luck here.

@mcrumiller
Copy link

mcrumiller commented Jun 27, 2023

By the way, this looks like it was a change in Postgres between Version 8.4, which didn't use the word FORMAT and Version 9.0, which does. So the syntax for COPY changed. This makes sense, considering on Wikipedia the Release History has the following:

image

This changed all the way back in 2010, so I'm guessing there's no much hope for the legacy support/syntax, and it probably wouldn't even return the correct format anyway. Thanks and I'm glad we were able to figure out my issue.

@lidavidm
Copy link
Member

There's not a way to adjust this (and postgres 8 is long EOL). Even if there was, the COPY mode is quite tied into postgres internals so there's no guarantee we can properly read the data. It's possible we could eventually support falling back to the non-COPY mode (we already have to do this with prepared statements - you could experiment with that?).

@mcrumiller
Copy link

Even if there was, the COPY mode is quite tied into postgres internals so there's no guarantee we can properly read the data

Right, I figured. I think I'll have to continue to use connectorx until my company decides it no longer wants to live with the dinosaurs. Thanks for your help.

@lidavidm
Copy link
Member

I filed #855 and and #856 for some minor improvements here

@lidavidm
Copy link
Member

I'm going to close this now - we added some integration tests with Polars and we now test with more Postgres versions, though still not version 8 (and the driver probably still won't work with that version).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants