-
Notifications
You must be signed in to change notification settings - Fork 45
Open
Labels
Description
What happens?
DBAPI cursor with fetchmany causes deadlock when concurrent connections used within single process
To Reproduce
from collections import deque, namedtuple
from typing import Sequence
import duckdb
import pyarrow
from loguru import logger
from tqdm import tqdm
schema = pyarrow.schema([
( 'id', pyarrow.int32()),
( 'name', pyarrow.string()),
( 'loves_food', pyarrow.bool_())
])
logger.info('Generating data')
data = deque()
for i in tqdm(range(4_000_000)):
data.append({
'id': i,
'name': 'Namey McNameFace',
'loves_food': True
})
arrow_table = pyarrow.Table.from_pylist(data, schema=schema)
conn = duckdb.connect('test.db')
conn.execute('CREATE OR REPLACE TABLE test (id INTEGER, name VARCHAR, loves_food BOOLEAN)')
logger.info('Inserting data')
conn.execute('INSERT INTO test SELECT * FROM arrow_table')
conn.execute('CREATE OR REPLACE TABLE test_write (id INTEGER, name VARCHAR, loves_food BOOLEAN)')
def get_data_in_chunks():
with conn.cursor() as cur:
cur.execute('SELECT * FROM test')
while True:
rows = cur.fetchmany(10000)
if not rows:
break
Row = namedtuple('Row', [c[0] for c in cur.description])
yield [Row(*r) for r in rows]
def insert_data_in_chunks(ins_data: Sequence):
with conn.cursor() as cur:
ins_table = pyarrow.Table.from_pylist(ins_data, schema=schema)
# this will stop executing at some point -- the exact number may be system-dependent, my mac is 2,240,000
# however, CPU usage will remain very high
cur.execute('INSERT INTO test_write SELECT * FROM ins_table')
num_processed = 0
for rs in get_data_in_chunks():
num_processed += len(rs)
logger.info(f'Got {len(rs)} records; {num_processed} processed so far')
dict_data = map(lambda r: r._asdict(), rs)
insert_data_in_chunks(rs)
OS:
macOS (aarch64), Windows 11 (x64)
DuckDB Package Version:
1.4.2
Python Version:
3.12
Full Name:
William Welch
Affiliation:
Self
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration to reproduce the issue?
- Yes, I have