# 🧐 `scilint`

*infuse quality into notebook based workflows with a new type of build tool*

---

`scilint` aims to **bring a style and quality standard into notebook based Data Science workflows**. How you define a quality notebook is difficult and somewhat subjective. It can have the obvious meaning of being free of bugs but also legibility and ease of comprehension are important too.

`scilint` takes the approach of breaking down potentially quality relevant aspects of the notebook and providing what we believe are sensible defaults that potentially correlate with higher quality workflows. We also let users define the quality line as they see fit through configuration of existing thresholds and ability to add new metrics. As use of the library grows we anticipate being able to statistically relate some of the quality relevant attributes to key delivery metrics like "change failure rate" or "lead time to production".

# Standing on the shoulders of giants - *an nbdev library*

> `scilint` is written on top of the excellent `nbdev` library. This library is revolutionary as it truly optimises all the benefits of notebooks and compensates for some of their weaker points. For more information on `nbdev` see the [homepage](https://nbdev.fast.ai/) or [github repo](https://github.com/fastai/nbdev)

# Getting Started

[WIP] - (reviewers) this requires that the library is published to pypi which has not yet happened

`pip install scilint`

`scilint` has the following main features/commands:
    
1. `scilint_tidy`: run an in-place opinionated flavour of [nbQA](https://github.com/nbQA-dev/nbQA) to tidy up your notebooks
2. `scilint_lint`: inspect the notebooks for potential quality correlates and report on the findings
3. `scilint_build`: the build command for notebooks: ensuring they all run, pass their tests and meet a consistent style/quality standard

## `scilint_tidy`

To get a consistent style across your notebooks you can run `scilint_tidy`; this currently runs `autoflake`, `black` and `isort` in-place across all of your notebooks.

## `scilint_lint`

Exposes potential quality issues within your notebook using some pre-defined checks. Default threshold values for these checks are provided that will enable a build to be marked as passed or failed.

*[WIP] configuration of metrics thresholds and ability to disable individual checks will be coming soon.*

### What does this look like?

Image of a report in CSV & report output..

### Wait.. what is a test in a Notebook?

We view every `assert` statement as being a test

### Quality Metrics

* **Calls-Per-Function {Median, Mean}:** tbc..................
* **In-Function-Percent:** tbc..................
* **Asserts-Per-Function:** tbc..................
* **InlineAssertsPerFunction {Median, Mean}:** tbc..................
* **MarkdownToCodeRatio:** tbc..................
* **TotalCodeLen**: tbc..................

### Fail Threshold

> For now a very basic failure threshold is set by providing a number of warnings that will be accepted without failing the build. The default is 1 but this can be increased via the `fail-over` parameter. As the library matures we will revisit adding more nuanced options.

# Contributing

After you clone this repository, please run nbdev_install_hooks in your terminal. This sets up git hooks, which clean up the notebooks to remove the extraneous stuff stored in the notebooks (e.g. which cells you ran) which causes unnecessary merge conflicts.

To run the tests in parallel, launch nbdev_test.

Before submitting a PR, check that the local library and notebooks match.

If you made a change to the notebooks in one of the exported cells, you can export it to the library with nbdev_prepare.
If you made a change to the library, you can export it back to the notebooks with nbdev_update.