Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move jupyter notebooks out of the project-monai/monai repo #894

Closed
wyli opened this issue Aug 12, 2020 · 12 comments · Fixed by #917
Closed

Move jupyter notebooks out of the project-monai/monai repo #894

wyli opened this issue Aug 12, 2020 · 12 comments · Fixed by #917
Labels
Design discussions related to the generic API designs

Comments

@wyli
Copy link
Contributor

wyli commented Aug 12, 2020

Background
The Project-MONAI/MONAI repository now has 20 jupyter notebooks. They are mainly simple and intuitive working demos for the MONAI key features. With the upcoming MONAI educational events, MONAI will likely to have more demos in the form of jupyter notebooks.

Proposal
This ticket proposes to move the notebooks out of the project-MONAI/monai repo, to separate the maintenance efforts, and improve the usability. More specifically:

Motivation

  • The notebooks require different types of maintenance efforts compared with the rest of the codebase.
    it's currently unclear what/how we enforce the coding format, type hints, benchmarking, usability checks (see also https://github.com/Project-MONAI/MONAI/issues/872).
  • Git version control is not very compatible with the notebooks,
    this creates burdens for reviewing/verifying them (for example in pull request 881 Refine all the notebooks #884, git diff is a pain)
  • Some content requires additional packages such as matplotlib, it's unclear how these are managed and integration-tested. This should be decoupled from the main codebase.
  • Some notebooks contain multimedia content which increases the main codebase size
  • Currently, it's difficult to run notebooks with a pip-installed MONAI package. Because git-cloning notebooks will also automatically download the MONAI dev version (they are coupled). This potentially conflicts with the pip-installed monai at the system site-package level.

A new home for the notebooks

  • option 1: create a Tutorial repo under Project-MONAI org (following the Pytorch org: https://github.com/pytorch/tutorials).
  • option 2: convert the notebooks into html (with colab runnable links), deliver them via http://monai.io/tutorials pages
  • option 3: open to any other suggestions to address the issues mentioned in the previous section

--------- summary of the discussions so far------------------

  • can’t purely rely on colab (the internet access, python version, Numba issues)
  • free plugin: ReviewNB which can help make the git diff of notebook changes
  • notebooks and the core codebase have the tagging/versioning conflicts (need temp. workaround)
  • notebooks tests could/should be automated
  • key notebooks with the main repo, separate repo(s) for full tutorials
  • notebooks need manual checks (e.g. visually check the figures) before any milestones anyway
@Nic-Ma Nic-Ma added the Design discussions related to the generic API designs label Aug 12, 2020
@Nic-Ma
Copy link
Contributor

Nic-Ma commented Aug 12, 2020

Thanks @wyli for your summary and for raising this discussion.
I want to add some concerns about moving the notebooks out of current repo:

  1. China users can't access Google websites, including Colab, I think this is a big risk if we move all notebooks to Colab.
    And even with VPN, I feel it's very slow to open Colab web pages.
  2. Currently, all the notebooks have a button "run in Colab", if user clicks it, will open a Colab page to run this notebook directly, all the notebooks don't have external dependencies(Thanks for @benjamin-gorman 's help to add Colab support for all the notebooks). So we don't have any blocker to run the notebooks in Colab now.
  3. I think the main problem here is how to maintain notebooks and review PRs about them. If we move all the notebooks to Colab directly, I don't know how users submit PR to modify notebooks or add new notebooks, if we move all the notebooks to a new repo, I still don't quite understand what's the benefits(reduce the size of core repo?).
  4. To review jupyter notebook in Github, there is a free plugin: ReviewNB which can help make the diff more clear for review.
  5. Currently, all the notebooks will install the latest MONAI release version first if you run them, if users just cloned the MONAI source code and run it, it should not conflict with the dev version.

@ericspod , could you please also help share some comments?
Welcome a discussion about this topic.

Thanks.

@wyli
Copy link
Contributor Author

wyli commented Aug 12, 2020

@Nic-Ma good points, just wanted to clarify the fact that

there are external dependencies in the notebooks, which are not covered by monai/requirements.txt or monai/requirements-dev.txt.

Great to know that the notebooks use the pip installed monai release. The cloned core codebase is not actually needed/used when running the demos, which is another indicator that the notebooks shouldn't be part of the core codebase IMO.

@wyli
Copy link
Contributor Author

wyli commented Aug 12, 2020

just realised there's another issue of cyclic dependencies when releasing the package:
the notebooks require the tagging and releasing of the latest MONAI codebase,
but the tagging and releasing depend on the readiness of all components of the codebase which currently includes the notebook folders.

So, If the notebooks use the latest tagging release, the notebook and the core codebase should be managed with different development lifecycles -- the core codebase should be released first.

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Aug 12, 2020

Hi @wyli ,

the notebooks require the tagging and releasing of the latest MONAI codebase

As we use pip install monai in notebooks now, notebooks don't have a dependency on explicit MONAI version.
I think we don't have this issue anymore?

Thanks.

@ericspod
Copy link
Member

There is an issue of dependency between the notebooks and the codebase, if we write a notebook to rely on the version of MONAI as committed with the notebook then that will be ahead of what we tag for pip. If what we choose to do is state that the notebooks should be for the most recent version installable via pip then they have to be explicitly written that way, which does exclude having experimental notebooks in the codebase.

For versioning using a tools like like ReviewNB makes sense, converting to html isn't great because you lose the dynamic aspect and it sometimes doesn't get layed out as well. If we create a new repo for notebooks it has to be something people can check out and immediately use with some information on what dependencies exist such as matplotlib.

We can't rely on Colab because of the access issue but also they're stuck on Python 3.6. The notebook I sent around about using Numba with Pytorch didn't work on Colab for example. I'm sure other things won't work either.

@wyli
Copy link
Contributor Author

wyli commented Aug 12, 2020

Hi @wyli ,

the notebooks require the tagging and releasing of the latest MONAI codebase

As we use pip install monai in notebooks now, notebooks don't have a dependency on explicit MONAI version.
I think we don't have this issue anymore?

Thanks.

for a concrete example, https://github.com/Project-MONAI/MONAI/blob/master/examples/notebooks/automatic_mixed_precision.ipynb demonstrates the latest monai AMP capability, but pip install monai currently installs v0.2. v0.2 doesn't claim this feature. At least this causes confusions.

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Aug 12, 2020

Hi @ericspod and @wyli ,

Thanks for your sharing. About the dependency issue, currently, if the notebook doesn't have a dependency after MONAI v0.2, we use pip install monai in the notebook directly, if it depends on new features, we temporarily use the pip install -qU git+https://github.com/Project-MONAI/MONAI and I will remove this line when we tag and release v0.3. For example:
https://github.com/Project-MONAI/MONAI/blob/master/examples/notebooks/mednist_GAN_workflow.ipynb

The AMP notebook is newly added, I am adding above logic as all the other new notebooks.

Thanks.

@wyli
Copy link
Contributor Author

wyli commented Aug 12, 2020

Hi @ericspod and @wyli ,

Thanks for your sharing. About the dependency issue, currently, if the notebook doesn't have a dependency after MONAI v0.2, we use pip install monai in the notebook directly, if it depends on new features, we temporarily use the pip install -qU git+https://github.com/Project-MONAI/MONAI and I will remove this line when we tag and release v0.3. For example:
https://github.com/Project-MONAI/MONAI/blob/master/examples/notebooks/mednist_GAN_workflow.ipynb

The AMP notebook is newly added, I am adding above logic as all the other new notebooks.

Thanks.

One of the main goals of this feature request is to remove this temporary workaround from the core codebase, now I'm confused...

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Aug 12, 2020

Hi @wyli ,

  1. No matter where we put the notebooks, I think we need to re-run all of them to double confirm before every release.
  2. And these 3 new notebooks can only run with the latest code before v0.3 release, so must install MONAI from master code even put notebooks into other repo or Colab, right?

Anyway, I already updated the AMP notebook in PR: #884.

Thanks.

@aylward
Copy link
Collaborator

aylward commented Aug 12, 2020

Perhaps it would be good to keep two or three key notebooks (getting started, spleen segmentation, and one other) in the MONAI repo. I think it is a great idea to start a (or many) separate repo(s) for tutorials and other notebooks.

Also, you can automate the testing of a jupyter notebook. See the script:
https://github.com/InsightSoftwareConsortium/ITKTubeTK/blob/master/CMake/EvaluateIPythonNotebook.py

@wyli
Copy link
Contributor Author

wyli commented Aug 18, 2020

thanks for the discussions, based on the feedback, we plan to migrate the notebooks to a separate repo, and retain a readme file in this repo with links to them (will categorise the links into essential tutorials, latest features etc.)

notebook quality checks will be addressed by a separate ticket

@wyli
Copy link
Contributor Author

wyli commented Aug 20, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Design discussions related to the generic API designs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants