New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Java] IO Error: Connection error for HTTP HEAD #10047
Comments
Is there any options for me to control duckdb's connection configuration? |
@NewtonVan There's the keep alive option which might work as a workaround (#9648) To use it, run edit: I just realized this feature was not included in 0.9.2 so this will only be available in the dev builds of duckdb currently |
@NewtonVan quick question: does the |
Facing the same issue when using Python import duckdb
con = duckdb.connect()
con.install_extension("httpfs")
con.load_extension("httpfs")
con.execute("SELECT * FROM 'https://raw.githubusercontent.com/duckdb/duckdb-web/main/data/weather.csv';") Failed after about 300 seconds of execution
|
@samansmink |
@samansmink It seems that the problem is greater than that. It seems that the connections are not closed even when the connection class is collected in python. This error persists even if a new duckdb connection is created on every usage. |
Another observation: This error has to do with the number of threads. On a thread that had previously created a (single) connection, a single file read (in my case using |
could you try:
sometimes the extensions for pre-releases aren't uploaded yet, but this should work Could you provide us with a reproducible script? Right now im not really sure how to reproduce this. I tried to run the python query you provided for an extended time, but I could not get it to fail. |
I might try to reproduce it later. It happened inside a very complex system. It happened inside a thread pool 's operation (~30 threads), and after reading ~100 parquet files from a GCS backend using the httpfs. |
We see this trying to read parquet glob from S3 too so I don't think it's just JSON |
Facing the same issue when doing read_parquet glob on GCP bucket. |
Folks, |
I have done a little more investigating and I can definitely reproduce this, but only when I increase the thread count to pretty high numbers (~300). I guess this is expected behaviour though and not really a bug. Also disabling http_keep_alive completely solves the issues afaict. If people have a reproduction of this with relatively low number of files and threads, im very interested, ideally I would get:
Otherwise some detailed info on the dataset is also helpful:
oh and please test on v0.10.0 :) @mustafahasankhan do you know at what thread count this occurs and whether |
@samansmink I was able to reproduce it with thread count of even 1 with 1200+ files. |
@mustafahasankhan thanks, that's interesting, that definitely sounds like a bug. Could you give me access to this dataset? If not, could you give me some more details with which I can reproduce this? You can ping me at sam@duckdblabs.com |
For those using MinIO in Docker, use port 9000:9000 instead of 9000:9001. docker run -p 9000:9000 -p 9090:9090 --name minio_name -e "MINIO_ROOT_USER=minio" -e "MINIO_ROOT_PASSWORD=minio123" -v ${HOME}/minio/data:/data quay.io/minio/minio server /data --console-address ":9090" In my case it worked. |
What happens?
It's ok to run in duckdb and demo(use the same api with program has issues). In our own program, we use aws java api to upload file, and duckdb is used to help us achieve a row to column operation. And it report following exception.
To Reproduce
I'm sorry it's quite hard to reproduce. The same command runs well in both duckdb and UT. The only similar issue I found is #9232 , #9647 . A reasonable guess is that our s3 client occupy some connection resource and duckdb reach its limit.
OS:
macOS, Linux
DuckDB Version:
0.9.2
DuckDB Client:
java
Full Name:
yanhui chen
Affiliation:
ApeCloud
Have you tried this on the latest
main
branch?I have tested with a main build
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
The text was updated successfully, but these errors were encountered: