-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mountpoint operations hang indefinitely #706
Comments
Hi @SamStudio8, thanks for reporting the issue. From your backtrace, mountpoint seems to be stuck at The backtrace you provided is very helpful but it's only for 1 worker thread. Would it be possible for you to get backtrace from all workers in mountpoint, so we can confirm the root cause? |
Another thing we're interested in is a way to reproduce the issue. Could you share more about your access pattern that causing the hang so we have better idea how to reproduce it? |
@monthonk Thanks for taking a look! I am glad the backtraces were helpful. The machine that I produced these backtraces from has been terminated, but I will capture a backtrace of all workers the next time I see this manifest and update here. With regards to the access pattern, it is hard to characterise specifics that would be helpful for a repro but there are several things going on:
Sorry that isn't much to go on! |
@monthonk Backtrace of all threads from a new hang attached. |
@SamStudio8 thanks again for providing more details on your access pattern! I will take a look at the full backtrace and let you know once we have any updates. |
I have looked at the full backtraces you sent but didn't see any Anyway, we have just released v1.4.0 which includes a bug fix for the problem with |
Thanks @monthonk, way ahead of you on that front! Took 1.4.0 for a spin this morning but we're getting some intermittent "bad file descriptor" errors (we haven't changed anything workload related between 1.3.2 and 1.4.0). I am trying to isolate the cause to see what we're doing wrong. |
Just FYI I think I have associated the "bad file descriptor" error to an entry in the mount-s3 log that indicates that we're reading from a closed file handle. I am not sure how this is the case, but I will investigate further and then hopefully we'll be able to reap the benefits of 1.4.0! |
Hey @SamStudio8, wondering if you were able to try again with v1.4.0 and are still facing this issue? |
Hi @ahmarsuhail, thanks for checking in. I was able to circumvent the "bad file descriptor" described above by just copying the file in question to change its access pattern to just read from EBS. I haven't seen this hanging issue in 1.4.0 so I am happy if you want to close this issue for now. We can reopen it if I am able to reproduce it again. |
Sounds good, thank you! closing for now. |
@SamStudio8 It may be worth subscribing to #749 since it looks like the same issue. |
Mountpoint for Amazon S3 version
mount-s3 1.3.2
AWS Region
eu-west-1
Describe the running environment
Mountpoint installed to EC2 instances and a single client is started to share a bucket mount with ECS tasks.
Mountpoint options
What happened?
Occasionally, mount-s3 enters what appears to be a deadlock and will hang indefinitely. Any process attempting to access the mount will be stuck in an uninterruptible sleep. I don't have a lot to go on but attach the relevant logs in the hope that somebody who knows the internals of mount-s3 might be able to see what might be going on. I've no idea if useful but I've also backtraced the main client-s3 thread and one of the workers.
As our ECS tasks are stuck I can access the machine to read logs and run commands. The instance is otherwise usable: EBS volume is responsive and the mount-s3 client continues to log (the same lines repeatedly). I do not have a repro but I am happy to collect additional data with any further debugging suggestions, as I am able to unreliably reproduce the issue.
Relevant log output
The text was updated successfully, but these errors were encountered: