New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instantly play with Notebooks shared by anyone #2

Open
amit1rrr opened this Issue Oct 23, 2018 · 7 comments

Comments

Projects
None yet
3 participants
@amit1rrr
Contributor

amit1rrr commented Oct 23, 2018

Problem

There's a lot of content available in the form of Notebooks. But it's not always easy to just download a Notebook and start playing with it. You need to have the right environment (dependent packages, python version, env variables, data files etc.) to be able to execute Notebook code cells.

In the Jupyter world it's called the reproducibility problem i.e. one cannot reproduce the Notebook results given just the Notebook.

Solution

We'll ask Notebook authors to provide a dockerfile in their repo. It's the easiest and most powerful format to specify the complete environment required for Notebook execution.

Once the Dockerfile is available here's how the workflow will look for users:

  • User can browse any GitHub repository containing Notebooks
  • There's a tag/url in the Readme file saying "Launch via ReviewNb" (similar to travis build status)
  • User clicks on the tag and a new Jupyter environment with all the Notebooks (in the repo) preloaded is available for user to play with.
  • User can keep the Jupyter environment, share it with others or kill at the end of the session.

Benefits

  • Solves reproducibility problem of Notebooks
  • Users more likely to actually get their handy dirty with the algorithms and data instead of just skimming over the Notebook HTML (learning by practice)
  • People can share not just dependencies but even required data files and such via Dockerfile.

Feel free to upvote/downvote the issue indicating whether you think this is useful feature or not. I also welcome additional questions/comments/discussion on the issue.

@amit1rrr amit1rrr self-assigned this Oct 23, 2018

@gnestor

This comment has been minimized.

gnestor commented Oct 29, 2018

The binder project was created to solve this very problem: https://github.com/jupyterhub/binderhub

@amit1rrr

This comment has been minimized.

Contributor

amit1rrr commented Oct 30, 2018

I'm aware of BinderHub. It's a fantastic project. I'm approaching the reproducibility problem a bit differently. Here are a few important differences for the end user:

  • Users can bring their own hardware. E.g. Launch this repo on my GPU
  • Support for easily pulling and running images locally. For lightweight notebooks that doesn't require computational resources of cloud, why spend $$.
  • Much easier for authors to setup their repo for interactivity. We scan repo and auto generate Dockerfile (~80% there, see pipreqs etc.). Repo authors can review this Dockerfile, make a few changes if needed and commit to the repo. This is because,
    • It's a bit harder for non technical audience to generate Dockerfile so we automate large part of that.
    • Dockerfile is a cleaner, powerful way to specify the environment than Repo2Docker.
  • Support for private repos. I'm not sure if private repos are supported in Binder without spinning up your own BinderHub.
@gnestor

This comment has been minimized.

gnestor commented Oct 30, 2018

Ok, yes different use case. Mybinder.org (the hosted instance of Binderhub) doesn't support private repos, but I self-hosted instance could.

@NawfalTachfine

This comment has been minimized.

NawfalTachfine commented Nov 5, 2018

I highly recommend using pipenv for this. It's lighter than spawning entire containers for everything.

@amit1rrr

This comment has been minimized.

Contributor

amit1rrr commented Nov 7, 2018

I have been thinking about docker vs. pipenv to power this feature and there are some limitations/oddities with pipenv like below:

  • Limits us to only work with Python Jupyter Notebooks. Jupyter itself supports a lot of kernels and would be good for us to be language agnostic & allow 1-click launch for any Jupyter Notebook.
  • If running anything requires specific low level libraries or OS then that information can't be captured in pipenv (very easily codified in Dockerfile)
  • Pipenv is good to manage dependencies locally. But I haven't seen users commiting their pipfile or pipfile.lock to the repo. In contrast, commiting Dockerfile to repo is very natural workflow for many.

It's lighter than spawning entire containers for everything.

I agree with the sentiment that Docker containers/images might feel heavyweight for the use case. That's why we should abstract away all the 'heavy-ness' from the user. They need to only care about having the Dockerfile in the the repo (see my earlier comment about making Dockerfile generation semi-automated as well). The user doesn't need to know, how/when we create images, how containers are spawned/destroyed etc. Binder is pretty good at abstracting all that away from the users and we should aspire to do the same.

@NawfalTachfine

This comment has been minimized.

NawfalTachfine commented Nov 7, 2018

I think I was too quick to jumpy the trigger. I mainly meant to bring pipenv to your attention. In my own workflow, I actually use both. As you said pip has its limitations and it's more practical to have a base image and be able to tinker with python libs as you experiment without rebuilding your image so often.

Custom kernels are a neat feature but I doubt that they are used much today, save perhaps for R. Have you seen adoption rise for more exotic kernels?

@amit1rrr

This comment has been minimized.

Contributor

amit1rrr commented Nov 13, 2018

Custom kernels are a neat feature but I doubt that they are used much today, save perhaps for R. Have you seen adoption rise for more exotic kernels?

Not that much really. Mainly Python & R.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment