Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NodeJS client - Memory mode - S3 connection --> client does not automatically reconnect #5929

Closed
2 tasks done
dberardo-com opened this issue Jan 18, 2023 · 8 comments
Closed
2 tasks done
Labels

Comments

@dberardo-com
Copy link

dberardo-com commented Jan 18, 2023

What happens?

Long running in-memory nodejs clients of duckdb that are fetching from S3 might experience network problems.

I have noticed that my remote duckdb client was not able to "find" any parquet data from S3, although a local development instance was.

I thought the reason could be a network problem that cause a disconnection so i have just restarted the application. After restart, everything is working fine, and the query that was "not finding" files before, now can find them.

is it possible that the nodejs client does not automatically reconnects to S3 after network failures ? if so, how to add this behavior?

To Reproduce

TBH, very hard to reproduce . Is

OS:

k8s

DuckDB Version:

0.6

DuckDB Client:

nodejs

Full Name:

check github profile

Affiliation:

check github profile

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree
@Mause
Copy link
Member

Mause commented Jan 18, 2023

Can you please fill out the missing parts of the issue template? Which "latest as of 18-12-2022" version are you referring to? The latest master or release build?

You also said to check your profile for affiliation, but neither your name nor your company are listed there

With regards to your issue, have you enabled object caching?

@dberardo-com
Copy link
Author

With regards to your issue, have you enabled object caching?

positive:
SET enable_object_cache=true;

@tobilg
Copy link

tobilg commented Jan 23, 2023

@dberardo-com have you checked that your WebIdentity wasn't rotated? If you're running on EKS and WebIdentity, it might happed that they invalidate, and then subsequently you won't have access to S3 because of invalid/missing credentials. See https://medium.com/airwalk/how-a-pod-assumes-an-aws-identity-284fc6fda873

You can also inspect the token with jwt.io and determine if it's still valid.

@dberardo-com
Copy link
Author

hi @tobilg i see your point. I am using a self hosted MinIO intsallation, but i guess the auth logic is the same. Where can i find the JWT stored in duckdb ?

and how to prevent this in long-running application ? i would expect from the duckdb library to handle re-authentication by itself ... ?

@tobilg
Copy link

tobilg commented Jan 23, 2023

Well, I don't know how this works with MinIO... With S3, you'd have to specify the s3_access_key_id/s3_secret_access_key/s3_session_token (see https://duckdb.org/docs/extensions/httpfs). In EKS, the IAM permissions usually are provided via a service account role, and via the metadata service or webhook endpoint made available to the container you're running. I don't know about your setup, because you didn't explain that in your issue.

Re-authentication with potentially new credentials done by DuckDB automatically afaik (and this would hardly be possible, as it relies on the credentials that are given to the container...)

@dberardo-com
Copy link
Author

currently i use access/key auth strategy, so no session token involved (at least not as a configuration). I cant say if auth is the problem, i have more a feeling that this could be a timeout with missing reconnection, in this sense i feel that the comment from @Mause goes in the right direction.
But again, unfortunately the stdout log gives pretty much no useful information.

if there is any other place where i could look for more verbose logs, i would give it a try

@github-actions
Copy link

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the stale label Jul 29, 2023
@github-actions
Copy link

This issue was closed because it has been stale for 30 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants