Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[duckdb] Decrease likelyhood of failures due to connection locks on the db #18746

Open
jamiedemaria opened this issue Dec 14, 2023 Discussed in #18737 · 1 comment
Open

[duckdb] Decrease likelyhood of failures due to connection locks on the db #18746

jamiedemaria opened this issue Dec 14, 2023 Discussed in #18737 · 1 comment
Assignees
Labels
integration: duckdb Related to DuckDB integrations type: troubleshooting Related to debugging and error messages

Comments

@jamiedemaria
Copy link
Contributor

Duckdb only allows one connection to have a write connection at a time. We get around this a bit by having retries when a connection is made. However, this doesn't account for all cases. One thing we could do is only make write connections when we are actually writing to the DB. This would require an update to the connect function of DBClient to accept a parameter to specify whether the connection is read or write. We can make this a kwarg with defaults so that any custom DBClients out there won't see breaking changes. Then in the DuckDB io manager we can use this to determine what kind of connection to make

Discussed in #18737

Originally posted by EtienneT December 14, 2023
I use duckdb with dagster and it has been working great except for one thing. I sometimes get the following error message:

duckdb.duckdb.IOException: IO Error: Cannot open file "c:\dagster-home\storage\data.duckdb": The process cannot access the file because it is being used by another process.

But dagster is the only process using this file. In most software defined assets I use duckdb with the IOManager. But in one asset, I have to use DuckDBResource to be able to change the table schema (add new columns on the fly).

From my Definitions.resources:

"duckdb": DuckDBPandasPolarsIOManager(database=str(storage / 'data.duckdb'), schema='main'),
"duckdb_resource": DuckDBResource(database=str(storage / 'data.duckdb'), schema='main'),

Could the IOManager and the DuckDBResource compete for the unique write connection to duckdb?

Thanks!

@jamiedemaria jamiedemaria self-assigned this Dec 14, 2023
EtienneT added a commit to EtienneT/dagster that referenced this issue Dec 21, 2023
…t and a new way to pass a lambda to DbIOManager to calculate this kwargs dynamically.
@EtienneT
Copy link

@jamiedemaria I submitted a quick PR if you ever have time to take a look. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration: duckdb Related to DuckDB integrations type: troubleshooting Related to debugging and error messages
Projects
None yet
Development

No branches or pull requests

3 participants