Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Pipeline + test framework for Jupyter Notebooks #19

Open
amit1rrr opened this issue Mar 29, 2019 · 4 comments
Open

CI Pipeline + test framework for Jupyter Notebooks #19

amit1rrr opened this issue Mar 29, 2019 · 4 comments
Labels
Feature Request A new feature that's under consideration.

Comments

@amit1rrr
Copy link
Member

amit1rrr commented Mar 29, 2019

Problem

Jupyter Notebooks neither have a dedicated test framework nor a CI pipeline. Which results in,

  • Almost no tests written for Notebooks
  • Lack of confidence whether a Notebook would run as expected when executing on a fresh kernel
  • Inability to use Notebooks for any production workflow

Users tend to jump around Notebook cells while developing, resulting in an unpredictable kernel state. Given Notebooks are prone to muddled state, importance of testing & continuous integration increases multifold.

Solution

Here are the bare minimum things we need,

  • Every time a Notebook change is committed, run all cells from top to bottom and make sure no cell execution results in error
  • Ability to write and execute unit tests for Notebook code
  • Given a set of inputs, provide ability to specify & assert that the actual outputs of certain Notebook cells matches the expected outputs

All of the above is possible today by tinkering with separate tools such as Doctest, unittest, papermill etc. We need to combine it all, fill the gaps and make an open source testing framework dedicated to testing Jupyter Notebooks.

As a last step, this testing framework can then be integrated into ReviewNB interface to run tests after each commit is pushed for users who wish to enable CI pipeline on their repos.


Feel free to upvote/downvote the issue indicating whether you think this is useful feature or not. I also welcome additional questions/comments/discussion on the issue.

@amit1rrr amit1rrr added the Feature Request A new feature that's under consideration. label Apr 3, 2019
@amit1rrr
Copy link
Member Author

amit1rrr commented Apr 5, 2019

The test framework is now available here: https://github.com/ReviewNB/treon

@nickponvert
Copy link

Hi, any prediction for when this will be a part of ReviewNB? I am trying to decide if I set up a different CI system to run treon or wait for integration. Thanks! Awesome work!

@amit1rrr
Copy link
Member Author

@nickponvert I would recommend setting up your CI to use treon as of now since realistically I'm a few months away from delivering a CI pipeline in ReviewNB. There are still some open questions in delivering this feature,

  1. How should we enable CI for all repositories (including free open source ones) when the compute cost for it would be significant.
  2. How to detect and avoid abuse of our resources as part of CI (e.g. someone mining bitcoins in their notebooks).

I've considered making it premium only feature or BYOC (Bring Your Own Compute) but not particularly happy with either of these. One viable path is to restrict free open source repositories to only one parallel build at any time with timeout and cool-off period.

If you or anyone from the community have inputs on this, I'm all ears!

@amit1rrr
Copy link
Member Author

Saturncloud is running into the exact same problem here: https://discourse.jupyter.org/t/bitcoin-mining-abuse-security/1367

There are some solid recommendations for best practises in that thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request A new feature that's under consideration.
Projects
None yet
Development

No branches or pull requests

3 participants
@amit1rrr @nickponvert and others