Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move file_id_manager.db sqlite db off NFS #2246

Merged
merged 2 commits into from
Feb 24, 2023

Conversation

yuvipanda
Copy link
Member

Ref #2245 Cause of #2244 Reported upstream as jupyter-server/jupyter_server_fileid#60

I think without this, images that upgrade to JupyterLab 3.6 might cause intermittent failures, even when not using RTC.

Ref 2i2c-org#2245
Cause of 2i2c-org#2244
Reported upstream as jupyter-server/jupyter_server_fileid#60

I think without this, images that upgrade to JupyterLab 3.6 might
cause intermittent failures, even when not using RTC.
@yuvipanda yuvipanda requested a review from a team February 22, 2023 21:57
@yuvipanda
Copy link
Member Author

yuvipanda commented Feb 22, 2023

/cc @ryanlovett who I know is also experimenting with RTC, in case this affects their install too

@github-actions
Copy link

github-actions bot commented Feb 22, 2023

Merging this PR will trigger the following deployment actions.

Support and Staging deployments

Cloud Provider Cluster Name Upgrade Support? Reason for Support Redeploy Upgrade Staging? Reason for Staging Redeploy
gcp 2i2c-uk No Yes Core infrastructure has been modified
gcp m2lines No Yes Core infrastructure has been modified
aws ubc-eoas No Yes Core infrastructure has been modified
gcp callysto No Yes Core infrastructure has been modified
aws nasa-cryo No Yes Core infrastructure has been modified
gcp leap No Yes Core infrastructure has been modified
aws openscapes No Yes Core infrastructure has been modified
gcp 2i2c No Yes Core infrastructure has been modified
aws nasa-veda No Yes Core infrastructure has been modified
gcp pangeo-hubs No Yes Core infrastructure has been modified
gcp awi-ciroh No Yes Core infrastructure has been modified
kubeconfig utoronto No Yes Core infrastructure has been modified
aws 2i2c-aws-us No Yes Core infrastructure has been modified
gcp meom-ige No Yes Core infrastructure has been modified
gcp linked-earth No Yes Core infrastructure has been modified
gcp cloudbank No Yes Core infrastructure has been modified
aws gridsst No Yes Core infrastructure has been modified
aws carbonplan No Yes Core infrastructure has been modified
aws victor No Yes Core infrastructure has been modified

Production deployments

Cloud Provider Cluster Name Hub Name Reason for Redeploy
gcp 2i2c-uk lis Core infrastructure has been modified
gcp m2lines prod Core infrastructure has been modified
aws ubc-eoas prod Core infrastructure has been modified
gcp callysto prod Core infrastructure has been modified
aws nasa-cryo prod Core infrastructure has been modified
gcp leap prod Core infrastructure has been modified
aws openscapes prod Core infrastructure has been modified
gcp 2i2c demo Core infrastructure has been modified
gcp 2i2c ohw Core infrastructure has been modified
gcp 2i2c pfw Core infrastructure has been modified
gcp 2i2c peddie Core infrastructure has been modified
gcp 2i2c catalyst-cooperative Core infrastructure has been modified
gcp 2i2c earthlab Core infrastructure has been modified
gcp 2i2c paleohack2021 Core infrastructure has been modified
gcp 2i2c aup Core infrastructure has been modified
gcp 2i2c temple Core infrastructure has been modified
gcp 2i2c ucmerced Core infrastructure has been modified
aws nasa-veda prod Core infrastructure has been modified
gcp pangeo-hubs prod Core infrastructure has been modified
gcp awi-ciroh prod Core infrastructure has been modified
kubeconfig utoronto prod Core infrastructure has been modified
kubeconfig utoronto r-prod Core infrastructure has been modified
aws 2i2c-aws-us researchdelight Core infrastructure has been modified
gcp meom-ige prod Core infrastructure has been modified
gcp meom-ige drakkar-demo Core infrastructure has been modified
gcp linked-earth prod Core infrastructure has been modified
gcp cloudbank ccsf Core infrastructure has been modified
gcp cloudbank csm Core infrastructure has been modified
gcp cloudbank elcamino Core infrastructure has been modified
gcp cloudbank glendale Core infrastructure has been modified
gcp cloudbank howard Core infrastructure has been modified
gcp cloudbank miracosta Core infrastructure has been modified
gcp cloudbank skyline Core infrastructure has been modified
gcp cloudbank demo Core infrastructure has been modified
gcp cloudbank fresno Core infrastructure has been modified
gcp cloudbank lassen Core infrastructure has been modified
gcp cloudbank sbcc Core infrastructure has been modified
gcp cloudbank lacc Core infrastructure has been modified
gcp cloudbank mills Core infrastructure has been modified
gcp cloudbank palomar Core infrastructure has been modified
gcp cloudbank pasadena Core infrastructure has been modified
gcp cloudbank sjcc Core infrastructure has been modified
gcp cloudbank tuskegee Core infrastructure has been modified
gcp cloudbank avc Core infrastructure has been modified
gcp cloudbank csu Core infrastructure has been modified
aws gridsst prod Core infrastructure has been modified
aws carbonplan prod Core infrastructure has been modified
aws victor prod Core infrastructure has been modified

@yuvipanda
Copy link
Member Author

I tested this, and can confirm that it puts the db under /tmp

Copy link
Contributor

@pnasrat pnasrat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

ryanlovett added a commit to ryanlovett/datahub that referenced this pull request Feb 22, 2023
As pointed out by @yuvipanda in 2i2c-org/infrastructure#2246. We don't need the YAML anchor right now, but I'm preserving it in case it is useful later.
Copy link
Member

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wiee!!! Nice work tracking this down!

I'd love to learn the debugging path to realize this was the issue!


@yuvipanda @pnasrat do you think this could have caused the slowness reported by the openscapes folks in the workshop they had in https://2i2c.freshdesk.com/a/tickets/439? Looking at the image they use openscapes/python:main.

Ah nevermind, it seems that is jupyter lab --version -> 3.5.3.

helm-charts/basehub/values.yaml Show resolved Hide resolved
Co-authored-by: Erik Sundell <erik.i.sundell@gmail.com>
@yuvipanda
Copy link
Member Author

@consideRatio it could be, but I somehow doubt it, as the failure mode observed here was total failure to save rather than a service degradation. However, consistently making NFS ops on sqlite does cause service degradation for everyone, but only at scale - the nbsignatures db is still on NFS, and mostly fine. So I would suspect it is not.

As for debugging, @pnasrat was poking around with cool new functionality of kubectl / ephemeral debug pods that was pretty fun! After trying to figure out if it was NFS or jupyter, the fact that RStudio worked fine but Jupyter (classic nor lab) did not helped us narrow this down. After that, looking at the stacktrace (and using strace) to try figure out which db was the problem led us to the issue at hand. I didn't know fileid was using an sqlite db...

We put this information in the #managed_jupyte_inc_81 channel on slack if you are interested, @consideRatio

@yuvipanda
Copy link
Member Author

Given I'll be riding out tomorrow, I'd appreciate if someone else can take this one through?

@consideRatio
Copy link
Member

Given I'll be riding out tomorrow, I'd appreciate if someone else can take this one through?

@pnasrat could you follow this through?

@consideRatio consideRatio merged commit bdc8eb1 into 2i2c-org:master Feb 24, 2023
@github-actions
Copy link

🎉🎉🎉🎉

Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/runs/4261593430

@consideRatio
Copy link
Member

I'll get this merged so that we can start ruling out issues with this going onwards.

@pnasrat
Copy link
Contributor

pnasrat commented Feb 24, 2023

@consideRatio sorry I didn't see the mention yesterday. Just for clarification what does "take this one through" mean - did it mean merge and deploy.

@consideRatio
Copy link
Member

No worries @pnasrat!!

Typically for changes to hub configuration and cloud infrastructure, we self-merge after approval. So I think what Yuvi meant, was that someone helps merge this PR and ensures it gets deployed successfully without creating an incident.

ryanlovett added a commit to ryanlovett/datahub that referenced this pull request Feb 24, 2023
As pointed out by @yuvipanda in 2i2c-org/infrastructure#2246. We don't need the YAML anchor right now, but I'm preserving it in case it is useful later.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

3 participants