-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move file_id_manager.db sqlite db off NFS #2246
Conversation
Ref 2i2c-org#2245 Cause of 2i2c-org#2244 Reported upstream as jupyter-server/jupyter_server_fileid#60 I think without this, images that upgrade to JupyterLab 3.6 might cause intermittent failures, even when not using RTC.
/cc @ryanlovett who I know is also experimenting with RTC, in case this affects their install too |
Merging this PR will trigger the following deployment actions. Support and Staging deployments
Production deployments
|
I tested this, and can confirm that it puts the db under /tmp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
As pointed out by @yuvipanda in 2i2c-org/infrastructure#2246. We don't need the YAML anchor right now, but I'm preserving it in case it is useful later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wiee!!! Nice work tracking this down!
I'd love to learn the debugging path to realize this was the issue!
@yuvipanda @pnasrat do you think this could have caused the slowness reported by the openscapes folks in the workshop they had in https://2i2c.freshdesk.com/a/tickets/439? Looking at the image they use openscapes/python:main
.
Ah nevermind, it seems that is jupyter lab --version
-> 3.5.3
.
Co-authored-by: Erik Sundell <erik.i.sundell@gmail.com>
@consideRatio it could be, but I somehow doubt it, as the failure mode observed here was total failure to save rather than a service degradation. However, consistently making NFS ops on sqlite does cause service degradation for everyone, but only at scale - the nbsignatures db is still on NFS, and mostly fine. So I would suspect it is not. As for debugging, @pnasrat was poking around with cool new functionality of kubectl / ephemeral debug pods that was pretty fun! After trying to figure out if it was NFS or jupyter, the fact that RStudio worked fine but Jupyter (classic nor lab) did not helped us narrow this down. After that, looking at the stacktrace (and using strace) to try figure out which db was the problem led us to the issue at hand. I didn't know fileid was using an sqlite db... We put this information in the #managed_jupyte_inc_81 channel on slack if you are interested, @consideRatio |
Given I'll be riding out tomorrow, I'd appreciate if someone else can take this one through? |
@pnasrat could you follow this through? |
🎉🎉🎉🎉 Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/runs/4261593430 |
I'll get this merged so that we can start ruling out issues with this going onwards. |
@consideRatio sorry I didn't see the mention yesterday. Just for clarification what does "take this one through" mean - did it mean merge and deploy. |
No worries @pnasrat!! Typically for changes to hub configuration and cloud infrastructure, we self-merge after approval. So I think what Yuvi meant, was that someone helps merge this PR and ensures it gets deployed successfully without creating an incident. |
As pointed out by @yuvipanda in 2i2c-org/infrastructure#2246. We don't need the YAML anchor right now, but I'm preserving it in case it is useful later.
Ref #2245 Cause of #2244 Reported upstream as jupyter-server/jupyter_server_fileid#60
I think without this, images that upgrade to JupyterLab 3.6 might cause intermittent failures, even when not using RTC.