# pandas DataFrame Validation with Bulwark

[Bulwark](https://bulwark.readthedocs.io/en/stable/index.html) is a package for property-based testing of pandas dataframes. The project was heavily influenced by the no longer supported [Engarde](https://github.com/engarde-dev/engarde) library.

## 1. Installation

``` console
$ pipenv install bulwark
Installing bulwark…
Adding bulwark to Pipfile's [packages]…
✔ Installation Succeeded
Locking [dev-packages] dependencies…
✔ Success!
Updated Pipfile.lock (0d075a)!
```

## 2. Use

### 2.1 Checks

With the [bulwark.checks](https://bulwark.readthedocs.io/en/v0.4.2/bulwark.html#module-bulwark.checks) module you can check many common assumptions, e.g.

* `has_columns` checks whether certain columns exist in such-and-such a way and in the correct order
* `has_dtypes` checks the data types of columns
* `has_no_infs` checks if there are no [numpy.inf](https://numpy.org/doc/stable/reference/constants.html#numpy.inf) in the DataFrame
* `has_no_nans` checks if there are no [numpy.nan](https://numpy.org/doc/stable/reference/constants.html#numpy.nan) in the DataFrame
* `has_set_within_vals` checks if the values specified in a dict are a subset of the associated column
* `has_unique_index` checks if the index is unique
* `is_monotonic` checks whether values of a column are ascending or descending
* `one_to_many` checks whether there is an n:1 relationship between two columns

The checks are then very simple, e.g. the check whether there are no `numpy.nan` in the column `pipe` with

```python
import bulwark.checks as ck

df.pipe(ck.has_no_nans())
```

### 2.2 Decorators

For each check, bulwark.creates [decorators](https://bulwark.readthedocs.io/en/v0.4.2/bulwark.html#module-bulwark.decorators), e.g. `@dc.IsShape((-1, 10))` or `@dc.IsMonotonic(strict=True)`.

### `CustomCheck`

You can also create your own custom functions, for example:

In [1]:
import bulwark.checks as ck
import bulwark.decorators as dc
import numpy as np
import pandas as pd


def len_longer_than(df, l):
    if len(df) <= l:
        raise AssertionError("df is not as long as expected.")
    return df


@dc.CustomCheck(len_longer_than, 10)
def append_a_df(df, df2):
    return pd.concat([df, df2], ignore_index=True)


df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
df2 = pd.DataFrame({"a": [1, np.nan, 3, 4], "b": [4, 5, 6, 7]})

append_a_df(df, df2)

AssertionError: len_longer_than is not true.

### `MultiCheck`

With `MultiCheck` you can run several tests at the same time and see all the errors at once, for example:

In [2]:
@dc.MultiCheck(
    checks={
        ck.has_no_nans: {"columns": None},
        len_longer_than: {"l": 6}
    },
    warn=False,
)
def append_a_df(df, df2):
    return pd.concat([df, df2], ignore_index=True)


df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
df2 = pd.DataFrame({"a": [1, np.nan, 3, 4], "b": [4, 5, 6, 7]})

append_a_df(df, df2)

AssertionError: (4, 'a')