Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate papermill for automated testing of Jupyter Notebooks #70

Open
vedran-kasalica opened this issue Mar 8, 2024 · 0 comments
Open
Labels
enhancement New feature or request

Comments

@vedran-kasalica
Copy link
Member

vedran-kasalica commented Mar 8, 2024

Jupyter Notebook Testing Overview

In the field of data science and analysis, Jupyter Notebooks come as a very handy tool since they provide interactive computing and visualization of data. But when projects scale and complex workflows integrate with notebooks, it is now needed to integrate the notebooks with automated processes.

Here are four tools that enable testing of Jupyter Notebooks and integration with CI/CD pipelines: nbconvert, nbval, papermill, and pytest-notebook.

nbconvert

This is a facility that allows one to convert Jupyter Notebooks to many formats, including Python scripts, HTML, PDF, and Markdown, among others. This might prove useful for sharing analyses in different formats or merging notebooks into different development pipeline stages.

Key Features:

  • Converts notebooks to a wide range of formats, including python scripts that can be then tested.
  • Can be used from the command line or programmatically.
  • Supports custom templates for conversions.

nbval

"nbval" is a pytest plugin running notebooks like tests in the full test framework. In this context, it allows checking the reproducibility of your analysis and fitting the notebook in the continuous integration loop.

Key Features:

  • Runs notebooks as tests, checking for errors.
  • Supports "pass" or "fail" outcomes based on the execution result.
  • Integrates seamlessly with existing pytest workflows.

papermill

Further, papermill provides a parameterized way of executing Jupyter notebooks. In a way, the notebook could be given varying input to be run on them, which makes this very useful for operations such as batch processing or automated reporting or even parameterized analysis.

Key Features:

  • Enables parameterized execution of notebooks.
  • Facilitates automation of notebook execution.
  • Supports logging and output analysis for executed notebooks.

pytest-notebook

The pytest-notebook is a notebook testing plugin testing against notebooks in more sophisticated ways. For example, it makes the tests compare the outputs as found in the notebook with some expected outputs. Last checked, the latest version is 0.10.

Key Features:

  • Facilitates testing of notebooks with more granular control.
  • Compares outputs, including binary outputs, to expected results.
  • Integrates with the pytest framework, supporting its ecosystem of plugins and features.

Proposal: Adopting Papermill for Enhanced Notebook Testing

After reviewing the capabilities of the aforementioned tools, I propose the adoption of papermill for our workflows. Here is where Papermill really comes to the fore, based on its core strength of stability and flexibility not only to allow for the automation and parameterized execution of Jupyter Notebooks but also for systematic testing of the outputs all the way down to binary data. Hence, they form the best suitable need for our case, especially for the advanced areas in data analysis flexibility in execution and assurance at the time of output verification.

@vedran-kasalica vedran-kasalica added the enhancement New feature or request label Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant