-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom tutorial notebooks #2
Changes from 3 commits
1012b69
b37b2a0
3c659df
e165fb4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
#!/bin/sh | ||
|
||
# Notebook directory used by JupyterHub in birdhouse-deploy | ||
NOTEBOOK_DIR="/notebook_dir/tutorial-notebooks" | ||
|
||
# Download notebooks required for the base image and add them to the notebook directory | ||
wget -O - https://github.com/Ouranosinc/pavics-sdi/archive/master.tar.gz | \ | ||
tar -xz --wildcards -C $NOTEBOOK_DIR --strip=4 "*/docs/source/notebooks/jupyter_extensions.ipynb" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The biggest missing feature of my previous implemention is no pluggable/customizable list of notebooks so each org deployment of PAVICS can choose what to deploy. A default list should be include for bootstrapping of course. So I do not see this pluggable feature here. Or maybe you can add more docs explaining how it could be done? Each org override the |
||
|
||
# Remove write permission on the tutorial-notebooks | ||
chmod -R 555 $NOTEBOOK_DIR/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so tuto notebooks are downloaded only at the moment the personal jupyter server start. So user will only get notebook update when they destroy and restart their perso server.
So losing the auto notebook deploy feature but in return it's standalone, no dependency on an external process to fetch the notebooks. So there's pros and cons, I probably see where you're going with this.
Why
--SingleUserNotebookApp.default_url=/lab
isn't that already the default when launched by JupyterHub? Is this jupyter server meant to be launch standalone, outside of the Jupyterhub in PAVICS? I am okay with this, doesn't harm, I am just curious why it was needed.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify for
--SingleUserNotebookApp.default_url=/lab
, although this code might change since latest feedback. When I changed theCMD
line in the Dockerfile to include thedownload-notebooks.sh
script call, jupyter-notebook was started instead of jupyter-lab when I started the image via the birdhouse-deploy stack.I am not 100% clear on why just changing that line would affect using notebook vs lab in birdhouse and had trouble with fixing it. Maybe it is just with the way the CMD is written in the Dockerfile. I know there is a difference in the behavior if you write the CMD line of the Dockerfile in the exec form (like the original code), or with the shell form.
Anyway, adding the argument
SingleUserNotebookApp.default_url
fixed the problem by making sure the right jupyter platform was used (lab instead of notebook).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tlvu Actually this solution was made without considering autodeploy, since we thought only fetching at image startup would be enough. I'll have to admit that finding a compromise between standalone images and autodeploy would be wonderful.
After a great discussion with @dbyrns, the idea of triggering a scheduled task which calls the actual custom image with a "FETCH_NOTEBOOKS" switch came to the table. We would keep the custom image specific dependencies inside the docker image, plus making sur the autodeploy would update the notebooks, let's say at midnight. Then, caching the notebooks on a shared volume would make sure we avoid useless requests. The "FETCH_NOTEBOOKS" switch would ensure that only one fetch is done, instead of having 20 containers fetching the same data, at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matprov @dbyrns
I like this. So removing the
download-notebooks.sh
script call from DockerfileCMD
but leaving the script in the image and call the script from a cronjob instead. Yeah that would work and preverve "the burden of mapping notebooks with the right image is moved to the image curator." from #2 (comment) which I agree as well.The cronjob will write to
$JUPYTERHUB_USER_DATA_DIR/tutorial-notebooks/[oe,nlp]
(https://github.com/bird-house/birdhouse-deploy/blob/20d4f430005d6eb5e5680f05114eeaf936f7c38b/birdhouse/env.local.example#L232) on disc which will be volume-mount read-only to/notebook_dir/tutorial-notebooks
inside the Jupyter environment and available to the end-user.Basically instead of using our cronjob (https://github.com/bird-house/birdhouse-deploy/blob/20d4f430005d6eb5e5680f05114eeaf936f7c38b/birdhouse/components/scheduler/config.yml.template#L13-L27) which hardcode our notebooks, you roll your own cronjob for your own notebooks. Yeah that's perfect for now. Sorry my current notebook autodeploy is not pluggable so you can not easily add your own notebooks.
If possible, try to use the deploy-data (https://github.com/bird-house/birdhouse-deploy/blob/20d4f430005d6eb5e5680f05114eeaf936f7c38b/birdhouse/deployment/deploy-data) mechanism that was meant to "get some files from some git repos and put it somewhere". I was tired of repeatedly re-writing the same thing so I created that generic script. It uses
git pull
instead of direct wget or curl so extremely bandwidth efficient for big download (its first use-case was to autodeploy xclim and raven testdata to Thredds https://github.com/bird-house/birdhouse-deploy/blob/20d4f430005d6eb5e5680f05114eeaf936f7c38b/birdhouse/env.local.example#L167-L189). Basically get some testdata files from some git repos and put it somewhere for Thredds to see. Exactly the same usecase as get some notebooks from some git repos and put it somewhere for Jupyter to see.Also you will have to volume-mount either sub-folder
eo
ornlp
under$JUPYTERHUB_USER_DATA_DIR/tutorial-notebooks
depending on the image so you'll have to modify https://github.com/bird-house/birdhouse-deploy/blob/20d4f430005d6eb5e5680f05114eeaf936f7c38b/birdhouse/config/jupyterhub/jupyterhub_config.py.template#L52-L56 for this. Hope you can find a way so the association is also with "the image curator".There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matprov @dbyrns
By the way, I think we can have both standalone and autodeploy together without one breaking the other.
That
FETCH_NOTEBOOKS
flag, if enable ondocker run
of the image, will perform the fetch, then launch the Jupyter perso server ! So for our PAVICS stack, we will only use thatFETCH_NOTEBOOKS
flag in the cronjob only but other user using the image standalone outside of the PAVICS stack can enable the flag and get the notebooks, without autodeploy of course.Cheap for quick demo ! I am thinking https://mybinder.org/ and I did configure binder for our Jupyter env + notebooks, check this out https://mybinder.org/v2/gh/Ouranosinc/PAVICS-e2e-workflow-tests/master (it was mentioned in the README as well https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests#launch-jupyter-notebook-server-using-binder)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChaamC I think you probably want to ignore that post from me for now and only focus on getting the auto notebook deploy working. I was just getting ahead like usual. We can incrementally improve it later. Just get the basic working first.