Move file_id_manager.db sqlite db off NFS #2246

yuvipanda · 2023-02-22T21:57:08Z

Ref #2245 Cause of #2244 Reported upstream as jupyter-server/jupyter_server_fileid#60

I think without this, images that upgrade to JupyterLab 3.6 might cause intermittent failures, even when not using RTC.

Ref 2i2c-org#2245 Cause of 2i2c-org#2244 Reported upstream as jupyter-server/jupyter_server_fileid#60 I think without this, images that upgrade to JupyterLab 3.6 might cause intermittent failures, even when not using RTC.

yuvipanda · 2023-02-22T21:57:33Z

/cc @ryanlovett who I know is also experimenting with RTC, in case this affects their install too

github-actions · 2023-02-22T21:58:02Z

Merging this PR will trigger the following deployment actions.

Support and Staging deployments

Cloud Provider	Cluster Name	Upgrade Support?	Upgrade Staging?	Reason for Staging Redeploy
gcp	2i2c-uk	No	Yes	Core infrastructure has been modified
gcp	m2lines	No	Yes	Core infrastructure has been modified
aws	ubc-eoas	No	Yes	Core infrastructure has been modified
gcp	callysto	No	Yes	Core infrastructure has been modified
aws	nasa-cryo	No	Yes	Core infrastructure has been modified
gcp	leap	No	Yes	Core infrastructure has been modified
aws	openscapes	No	Yes	Core infrastructure has been modified
gcp	2i2c	No	Yes	Core infrastructure has been modified
aws	nasa-veda	No	Yes	Core infrastructure has been modified
gcp	pangeo-hubs	No	Yes	Core infrastructure has been modified
gcp	awi-ciroh	No	Yes	Core infrastructure has been modified
kubeconfig	utoronto	No	Yes	Core infrastructure has been modified
aws	2i2c-aws-us	No	Yes	Core infrastructure has been modified
gcp	meom-ige	No	Yes	Core infrastructure has been modified
gcp	linked-earth	No	Yes	Core infrastructure has been modified
gcp	cloudbank	No	Yes	Core infrastructure has been modified
aws	gridsst	No	Yes	Core infrastructure has been modified
aws	carbonplan	No	Yes	Core infrastructure has been modified
aws	victor	No	Yes	Core infrastructure has been modified

Production deployments

Cloud Provider	Cluster Name	Hub Name	Reason for Redeploy
gcp	2i2c-uk	lis	Core infrastructure has been modified
gcp	m2lines	prod	Core infrastructure has been modified
aws	ubc-eoas	prod	Core infrastructure has been modified
gcp	callysto	prod	Core infrastructure has been modified
aws	nasa-cryo	prod	Core infrastructure has been modified
gcp	leap	prod	Core infrastructure has been modified
aws	openscapes	prod	Core infrastructure has been modified
gcp	2i2c	demo	Core infrastructure has been modified
gcp	2i2c	ohw	Core infrastructure has been modified
gcp	2i2c	pfw	Core infrastructure has been modified
gcp	2i2c	peddie	Core infrastructure has been modified
gcp	2i2c	catalyst-cooperative	Core infrastructure has been modified
gcp	2i2c	earthlab	Core infrastructure has been modified
gcp	2i2c	paleohack2021	Core infrastructure has been modified
gcp	2i2c	aup	Core infrastructure has been modified
gcp	2i2c	temple	Core infrastructure has been modified
gcp	2i2c	ucmerced	Core infrastructure has been modified
aws	nasa-veda	prod	Core infrastructure has been modified
gcp	pangeo-hubs	prod	Core infrastructure has been modified
gcp	awi-ciroh	prod	Core infrastructure has been modified
kubeconfig	utoronto	prod	Core infrastructure has been modified
kubeconfig	utoronto	r-prod	Core infrastructure has been modified
aws	2i2c-aws-us	researchdelight	Core infrastructure has been modified
gcp	meom-ige	prod	Core infrastructure has been modified
gcp	meom-ige	drakkar-demo	Core infrastructure has been modified
gcp	linked-earth	prod	Core infrastructure has been modified
gcp	cloudbank	ccsf	Core infrastructure has been modified
gcp	cloudbank	csm	Core infrastructure has been modified
gcp	cloudbank	elcamino	Core infrastructure has been modified
gcp	cloudbank	glendale	Core infrastructure has been modified
gcp	cloudbank	howard	Core infrastructure has been modified
gcp	cloudbank	miracosta	Core infrastructure has been modified
gcp	cloudbank	skyline	Core infrastructure has been modified
gcp	cloudbank	demo	Core infrastructure has been modified
gcp	cloudbank	fresno	Core infrastructure has been modified
gcp	cloudbank	lassen	Core infrastructure has been modified
gcp	cloudbank	sbcc	Core infrastructure has been modified
gcp	cloudbank	lacc	Core infrastructure has been modified
gcp	cloudbank	mills	Core infrastructure has been modified
gcp	cloudbank	palomar	Core infrastructure has been modified
gcp	cloudbank	pasadena	Core infrastructure has been modified
gcp	cloudbank	sjcc	Core infrastructure has been modified
gcp	cloudbank	tuskegee	Core infrastructure has been modified
gcp	cloudbank	avc	Core infrastructure has been modified
gcp	cloudbank	csu	Core infrastructure has been modified
aws	gridsst	prod	Core infrastructure has been modified
aws	carbonplan	prod	Core infrastructure has been modified
aws	victor	prod	Core infrastructure has been modified

yuvipanda · 2023-02-22T21:58:45Z

I tested this, and can confirm that it puts the db under /tmp

pnasrat

LGTM

@yuvipanda

As pointed out by @yuvipanda in 2i2c-org/infrastructure#2246. We don't need the YAML anchor right now, but I'm preserving it in case it is useful later.

consideRatio

Wiee!!! Nice work tracking this down!

I'd love to learn the debugging path to realize this was the issue!

@yuvipanda @pnasrat do you think this could have caused the slowness reported by the openscapes folks in the workshop they had in https://2i2c.freshdesk.com/a/tickets/439? Looking at the image they use openscapes/python:main.

Ah nevermind, it seems that is jupyter lab --version -> 3.5.3.

helm-charts/basehub/values.yaml

Co-authored-by: Erik Sundell <erik.i.sundell@gmail.com>

yuvipanda · 2023-02-22T23:04:21Z

@consideRatio it could be, but I somehow doubt it, as the failure mode observed here was total failure to save rather than a service degradation. However, consistently making NFS ops on sqlite does cause service degradation for everyone, but only at scale - the nbsignatures db is still on NFS, and mostly fine. So I would suspect it is not.

As for debugging, @pnasrat was poking around with cool new functionality of kubectl / ephemeral debug pods that was pretty fun! After trying to figure out if it was NFS or jupyter, the fact that RStudio worked fine but Jupyter (classic nor lab) did not helped us narrow this down. After that, looking at the stacktrace (and using strace) to try figure out which db was the problem led us to the issue at hand. I didn't know fileid was using an sqlite db...

We put this information in the #managed_jupyte_inc_81 channel on slack if you are interested, @consideRatio

yuvipanda · 2023-02-22T23:04:42Z

Given I'll be riding out tomorrow, I'd appreciate if someone else can take this one through?

consideRatio · 2023-02-22T23:09:15Z

Given I'll be riding out tomorrow, I'd appreciate if someone else can take this one through?

@pnasrat could you follow this through?

github-actions · 2023-02-24T10:34:05Z

🎉🎉🎉🎉

Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/runs/4261593430

consideRatio · 2023-02-24T10:34:17Z

I'll get this merged so that we can start ruling out issues with this going onwards.

pnasrat · 2023-02-24T12:16:32Z

@consideRatio sorry I didn't see the mention yesterday. Just for clarification what does "take this one through" mean - did it mean merge and deploy.

consideRatio · 2023-02-24T12:20:55Z

No worries @pnasrat!!

Typically for changes to hub configuration and cloud infrastructure, we self-merge after approval. So I think what Yuvi meant, was that someone helps merge this PR and ensures it gets deployed successfully without creating an incident.

@yuvipanda

As pointed out by @yuvipanda in 2i2c-org/infrastructure#2246. We don't need the YAML anchor right now, but I'm preserving it in case it is useful later.

Move file_id_manager.db sqlite db off NFS

d4ac466

Ref 2i2c-org#2245 Cause of 2i2c-org#2244 Reported upstream as jupyter-server/jupyter_server_fileid#60 I think without this, images that upgrade to JupyterLab 3.6 might cause intermittent failures, even when not using RTC.

yuvipanda requested a review from a team February 22, 2023 21:57

yuvipanda mentioned this pull request Feb 22, 2023

Using sqlite in WAL mode causes file saving failure when used on JupyterHub on NFS jupyter-server/jupyter_server_fileid#60

Closed

pnasrat reviewed Feb 22, 2023

View reviewed changes

pnasrat approved these changes Feb 22, 2023

View reviewed changes

ryanlovett added a commit to ryanlovett/datahub that referenced this pull request Feb 22, 2023

Move fileid database to /tmp/.

88701dc

As pointed out by @yuvipanda in 2i2c-org/infrastructure#2246. We don't need the YAML anchor right now, but I'm preserving it in case it is useful later.

ryanlovett mentioned this pull request Feb 22, 2023

Move fileid database to /tmp/. berkeley-dsep-infra/datahub#4288

Merged

consideRatio approved these changes Feb 22, 2023

View reviewed changes

helm-charts/basehub/values.yaml Show resolved Hide resolved

Add comment pointing to upstream discussion

787b38d

Co-authored-by: Erik Sundell <erik.i.sundell@gmail.com>

consideRatio merged commit bdc8eb1 into 2i2c-org:master Feb 24, 2023

ryanlovett added a commit to ryanlovett/datahub that referenced this pull request Feb 24, 2023

Move fileid database to /tmp/.

ebbc543

As pointed out by @yuvipanda in 2i2c-org/infrastructure#2246. We don't need the YAML anchor right now, but I'm preserving it in case it is useful later.

damianavila assigned yuvipanda Feb 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move file_id_manager.db sqlite db off NFS #2246

Move file_id_manager.db sqlite db off NFS #2246

yuvipanda commented Feb 22, 2023

yuvipanda commented Feb 22, 2023 •

edited

github-actions bot commented Feb 22, 2023 •

edited

yuvipanda commented Feb 22, 2023

pnasrat left a comment

consideRatio left a comment •

edited

yuvipanda commented Feb 22, 2023

yuvipanda commented Feb 22, 2023

consideRatio commented Feb 22, 2023

github-actions bot commented Feb 24, 2023

consideRatio commented Feb 24, 2023

pnasrat commented Feb 24, 2023

consideRatio commented Feb 24, 2023

Move file_id_manager.db sqlite db off NFS #2246

Move file_id_manager.db sqlite db off NFS #2246

Conversation

yuvipanda commented Feb 22, 2023

yuvipanda commented Feb 22, 2023 • edited

github-actions bot commented Feb 22, 2023 • edited

Support and Staging deployments

Production deployments

yuvipanda commented Feb 22, 2023

pnasrat left a comment

Choose a reason for hiding this comment

consideRatio left a comment • edited

Choose a reason for hiding this comment

yuvipanda commented Feb 22, 2023

yuvipanda commented Feb 22, 2023

consideRatio commented Feb 22, 2023

github-actions bot commented Feb 24, 2023

consideRatio commented Feb 24, 2023

pnasrat commented Feb 24, 2023

consideRatio commented Feb 24, 2023

yuvipanda commented Feb 22, 2023 •

edited

github-actions bot commented Feb 22, 2023 •

edited

consideRatio left a comment •

edited