TARDIS generates lots of numerical data in forms of numpy arrays and dataframes. These objects are large and need to be saved in external files. The current TARDIS testing framework is organically grown and not all parts of the code follow the same structure of saving and reading data. The data is saved in the [tardis-refdata](https://github.com/tardis-sn/tardis-refdata) repository. 

To make the code structure the same at all places, we started to [pytest-arraydiff](https://github.com/astropy/pytest-arraydiff). The plugin works by saving files which are return by test functions. However in a recent release, pytest said that it would be raising errors whenever objects are returned by  test functions.

Therefore we decided to move on from pytest and look into a new library. This notebook looks are two alternatives, starting with Syrupy. My personal opinions are listed near the end.


### Download required dependencies using this code

In [1]:
# !pip install syrupy pytest-regressions ipytest

In [1]:
import pytest

In [2]:
import ipytest
ipytest.autoconfig()

In [3]:
!rm -rf ./t* __snapshots__/

### The current directory should be empty(but should have the notebook).

In [4]:
!tree -L 3

[01;34m.[0m
└── comp.ipynb

0 directories, 1 file


### The following test should fail- saying that no snapshots are present.

In [5]:
%%ipytest -vv 
pytest_plugins = "syrupy"

from typing import Any, List, Tuple
import pytest
import numpy as np
from syrupy.data import SnapshotCollection
from syrupy.extensions.single_file import SingleFileSnapshotExtension
import pandas as pd
from syrupy.location import PyTestLocation
from syrupy.types import SerializableData, SerializedData, SnapshotIndex
from gettext import gettext


class NumpySnapshotExtenstion(SingleFileSnapshotExtension):
    _file_extension = "dat"

    def matches(self, *, serialized_data, snapshot_data):
        try:
            if np.testing.assert_allclose(
                np.array(snapshot_data), np.array(serialized_data)
            )  is not None:
                return False
            else: return True
            
        except:
            return False

    def _read_snapshot_data_from_location(
        self, *, snapshot_location: str, snapshot_name: str, session_id: str
    ):
        # see https://github.com/tophat/syrupy/blob/f4bc8453466af2cfa75cdda1d50d67bc8c4396c3/src/syrupy/extensions/base.py#L139
        try:
            return np.loadtxt(snapshot_location).tolist()
        except OSError:
            return None

    @classmethod
    def _write_snapshot_collection(
        cls, *, snapshot_collection: SnapshotCollection
    ) -> None:
        # see https://github.com/tophat/syrupy/blob/f4bc8453466af2cfa75cdda1d50d67bc8c4396c3/src/syrupy/extensions/base.py#L161
        
        filepath, data = (
            snapshot_collection.location,
            next(iter(snapshot_collection)).data,
        )
        np.savetxt(filepath, data)

    def serialize(self, data: SerializableData, **kwargs: Any) -> str:
        return data


class PandasSnapshotExtenstion(SingleFileSnapshotExtension):
    _file_extension = "hdf"

    def matches(self, *, serialized_data, snapshot_data):
        try:
            if pd.testing.assert_frame_equal(
                serialized_data, snapshot_data
            )  is not None:
                return False
            else: return True
            
        except:
            return False

    def _read_snapshot_data_from_location(
        self, *, snapshot_location: str, snapshot_name: str, session_id: str
    ):
        # see https://github.com/tophat/syrupy/blob/f4bc8453466af2cfa75cdda1d50d67bc8c4396c3/src/syrupy/extensions/base.py#L139
        try:
            return pd.read_hdf(snapshot_location)
        except OSError:
            return None

    @classmethod
    def _write_snapshot_collection(
        cls, *, snapshot_collection: SnapshotCollection
    ) -> None:
        # see https://github.com/tophat/syrupy/blob/f4bc8453466af2cfa75cdda1d50d67bc8c4396c3/src/syrupy/extensions/base.py#L161
        filepath, data = (
            snapshot_collection.location,
            next(iter(snapshot_collection)).data,
        )
        data.to_hdf(filepath, "/blah")

    def serialize(self, data: SerializableData, **kwargs: Any) -> str:
        return data


@pytest.fixture
def snapshot_pandas(snapshot):
    return snapshot.use_extension(PandasSnapshotExtenstion)

@pytest.fixture
def snapshot_numpy(snapshot):
    return snapshot.use_extension(NumpySnapshotExtenstion)


def test_pd(snapshot_pandas):
    data = [30,40,60]
    assert snapshot_pandas == pd.DataFrame(data, columns=['Numbers'])

@pytest.mark.parametrize('no', [1,2])
def test_pd3(snapshot_pandas, no):
    data = [30,40,60, no]
    assert snapshot_pandas == pd.DataFrame(data, columns=['Numbers'])

def test_np(snapshot_numpy):
    # assert snapshot_numpy == np.array([1, 3]).tolist()
    # assert snapshot_numpy == np.array([1, 2.5]).tolist()
    assert snapshot_numpy == np.array([1, 3])
    assert snapshot_numpy == np.array([1, 2.5])


platform linux -- Python 3.8.17, pytest-7.4.0, pluggy-1.2.0 -- /home/atharva/miniconda3/envs/tardis-devel/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.8.17', 'Platform': 'Linux-5.15.0-57-generic-x86_64-with-glibc2.10', 'Packages': {'pytest': '7.4.0', 'pluggy': '1.2.0'}, 'Plugins': {'datadir': '1.4.1', 'anyio': '3.7.1', 'cov': '4.1.0', 'arraydiff': '0.6.0a1', 'metadata': '3.0.0', 'html': '3.2.0', 'syrupy': '4.0.8', 'regressions': '2.4.2', 'doctestplus': '1.0.0'}}
rootdir: /home/atharva/workspace/code/tardis-main/syrupy-reg-comp
plugins: datadir-1.4.1, anyio-3.7.1, cov-4.1.0, arraydiff-0.6.0a1, metadata-3.0.0, html-3.2.0, syrupy-4.0.8, regressions-2.4.2, doctestplus-1.0.0
[1mcollecting ... [0mcollected 4 items

t_716e1676229c4afba3797ca226bf4bee.py::test_pd [31mFAILED[0m[31m                                        [ 25%][0m
t_716e1676229c4afba3797ca226bf4bee.py::test_pd3[1] [31mFAILED[0m[31m                                    [ 50%][0m
t_716e1676229c4afba3797ca226b

### That was expected. Lets update the snapshots.

In [6]:
%%ipytest -vv --snapshot-update

def test_pd(snapshot_pandas):
    data = [30,40,60]
    assert snapshot_pandas == pd.DataFrame(data, columns=['Numbers'])

@pytest.mark.parametrize('no', [1,2])
def test_pd3(snapshot_pandas, no):
    data = [30,40,60, no]
    assert snapshot_pandas == pd.DataFrame(data, columns=['Numbers'])

def test_np(snapshot_numpy):
    assert snapshot_numpy == np.array([1, 3]).tolist()
    assert snapshot_numpy == np.array([1, 2.5]).tolist()


platform linux -- Python 3.8.17, pytest-7.4.0, pluggy-1.2.0 -- /home/atharva/miniconda3/envs/tardis-devel/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.8.17', 'Platform': 'Linux-5.15.0-57-generic-x86_64-with-glibc2.10', 'Packages': {'pytest': '7.4.0', 'pluggy': '1.2.0'}, 'Plugins': {'datadir': '1.4.1', 'anyio': '3.7.1', 'cov': '4.1.0', 'arraydiff': '0.6.0a1', 'metadata': '3.0.0', 'html': '3.2.0', 'syrupy': '4.0.8', 'regressions': '2.4.2', 'doctestplus': '1.0.0'}}
rootdir: /home/atharva/workspace/code/tardis-main/syrupy-reg-comp
plugins: datadir-1.4.1, anyio-3.7.1, cov-4.1.0, arraydiff-0.6.0a1, metadata-3.0.0, html-3.2.0, syrupy-4.0.8, regressions-2.4.2, doctestplus-1.0.0
[1mcollecting ... [0mcollected 4 items

t_716e1676229c4afba3797ca226bf4bee.py::test_pd [32mPASSED[0m[33m                                        [ 25%][0m
t_716e1676229c4afba3797ca226bf4bee.py::test_pd3[1] [32mPASSED[0m[33m                                    [ 50%][0m
t_716e1676229c4afba3797ca226b

### There should be fivesnapshots.

In [7]:
!tree -L 3

[01;34m.[0m
├── comp.ipynb
└── [01;34m__snapshots__[0m
    └── [01;34mt_716e1676229c4afba3797ca226bf4bee[0m
        ├── test_np.1.dat
        ├── test_np.dat
        ├── test_pd3[1].hdf
        ├── test_pd3[2].hdf
        └── test_pd.hdf

2 directories, 6 files


In [8]:
%%ipytest -vv 

def test_pd(snapshot_pandas):
    data = [30,40,60]
    assert snapshot_pandas == pd.DataFrame(data, columns=['Numbers'])

@pytest.mark.parametrize('no', [1,2])
def test_pd3(snapshot_pandas, no):
    data = [30,40,60, no]
    assert snapshot_pandas == pd.DataFrame(data, columns=['Numbers'])

def test_np(snapshot_numpy):
    assert snapshot_numpy == np.array([1, 3]).tolist()
    assert snapshot_numpy == np.array([1, 2.5]).tolist()

platform linux -- Python 3.8.17, pytest-7.4.0, pluggy-1.2.0 -- /home/atharva/miniconda3/envs/tardis-devel/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.8.17', 'Platform': 'Linux-5.15.0-57-generic-x86_64-with-glibc2.10', 'Packages': {'pytest': '7.4.0', 'pluggy': '1.2.0'}, 'Plugins': {'datadir': '1.4.1', 'anyio': '3.7.1', 'cov': '4.1.0', 'arraydiff': '0.6.0a1', 'metadata': '3.0.0', 'html': '3.2.0', 'syrupy': '4.0.8', 'regressions': '2.4.2', 'doctestplus': '1.0.0'}}
rootdir: /home/atharva/workspace/code/tardis-main/syrupy-reg-comp
plugins: datadir-1.4.1, anyio-3.7.1, cov-4.1.0, arraydiff-0.6.0a1, metadata-3.0.0, html-3.2.0, syrupy-4.0.8, regressions-2.4.2, doctestplus-1.0.0
[1mcollecting ... [0mcollected 4 items

t_716e1676229c4afba3797ca226bf4bee.py::test_pd [32mPASSED[0m[33m                                        [ 25%][0m
t_716e1676229c4afba3797ca226bf4bee.py::test_pd3[1] [32mPASSED[0m[33m                                    [ 50%][0m
t_716e1676229c4afba3797ca226b

### Lets change a few things now.

In [9]:
%%ipytest -vv 

def test_pd(snapshot_pandas):
    data = [30,40,20]
    assert snapshot_pandas == pd.DataFrame(data, columns=['Numbers'])

@pytest.mark.parametrize('no', [1,2])
def test_pd3(snapshot_pandas, no):
    data = [30,40,20, no]
    assert snapshot_pandas == pd.DataFrame(data, columns=['Numbers'])

def test_np(snapshot_numpy):
    assert snapshot_numpy == np.array([1, 2]).tolist()
    assert snapshot_numpy == np.array([1, 2]).tolist()

platform linux -- Python 3.8.17, pytest-7.4.0, pluggy-1.2.0 -- /home/atharva/miniconda3/envs/tardis-devel/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.8.17', 'Platform': 'Linux-5.15.0-57-generic-x86_64-with-glibc2.10', 'Packages': {'pytest': '7.4.0', 'pluggy': '1.2.0'}, 'Plugins': {'datadir': '1.4.1', 'anyio': '3.7.1', 'cov': '4.1.0', 'arraydiff': '0.6.0a1', 'metadata': '3.0.0', 'html': '3.2.0', 'syrupy': '4.0.8', 'regressions': '2.4.2', 'doctestplus': '1.0.0'}}
rootdir: /home/atharva/workspace/code/tardis-main/syrupy-reg-comp
plugins: datadir-1.4.1, anyio-3.7.1, cov-4.1.0, arraydiff-0.6.0a1, metadata-3.0.0, html-3.2.0, syrupy-4.0.8, regressions-2.4.2, doctestplus-1.0.0
[1mcollecting ... [0mcollected 4 items

t_716e1676229c4afba3797ca226bf4bee.py::test_pd [31mFAILED[0m[31m                                        [ 25%][0m
t_716e1676229c4afba3797ca226bf4bee.py::test_pd3[1] [31mFAILED[0m[31m                                    [ 50%][0m
t_716e1676229c4afba3797ca226b

### The methods used won't be changed until a major release. You can update the snapshots like this

In [10]:
%%ipytest -vv --snapshot-update

def test_pd(snapshot_pandas):
    data = [30,40,20]
    assert snapshot_pandas == pd.DataFrame(data, columns=['Numbers'])

@pytest.mark.parametrize('no', [1,2])
def test_pd3(snapshot_pandas, no):
    data = [30,40,20, no]
    assert snapshot_pandas == pd.DataFrame(data, columns=['Numbers'])

def test_np(snapshot_numpy):
    assert snapshot_numpy == np.array([1, 2]).tolist()
    assert snapshot_numpy == np.array([1, 2]).tolist()

platform linux -- Python 3.8.17, pytest-7.4.0, pluggy-1.2.0 -- /home/atharva/miniconda3/envs/tardis-devel/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.8.17', 'Platform': 'Linux-5.15.0-57-generic-x86_64-with-glibc2.10', 'Packages': {'pytest': '7.4.0', 'pluggy': '1.2.0'}, 'Plugins': {'datadir': '1.4.1', 'anyio': '3.7.1', 'cov': '4.1.0', 'arraydiff': '0.6.0a1', 'metadata': '3.0.0', 'html': '3.2.0', 'syrupy': '4.0.8', 'regressions': '2.4.2', 'doctestplus': '1.0.0'}}
rootdir: /home/atharva/workspace/code/tardis-main/syrupy-reg-comp
plugins: datadir-1.4.1, anyio-3.7.1, cov-4.1.0, arraydiff-0.6.0a1, metadata-3.0.0, html-3.2.0, syrupy-4.0.8, regressions-2.4.2, doctestplus-1.0.0
[1mcollecting ... [0mcollected 4 items

t_716e1676229c4afba3797ca226bf4bee.py::test_pd [32mPASSED[0m[33m                                        [ 25%][0m
t_716e1676229c4afba3797ca226bf4bee.py::test_pd3[1] [32mPASSED[0m[33m                                    [ 50%][0m
t_716e1676229c4afba3797ca226b

### Onto pytest regressions.

### Lets look at the directories first. This would help us figure out what created in later steps.

In [11]:
!rm -rf ./t*

zsh:1: no matches found: ./t*


In [12]:
!tree -L 3

[01;34m.[0m
├── comp.ipynb
└── [01;34m__snapshots__[0m
    └── [01;34mt_716e1676229c4afba3797ca226bf4bee[0m
        ├── test_np.1.dat
        ├── test_np.dat
        ├── test_pd3[1].hdf
        ├── test_pd3[2].hdf
        └── test_pd.hdf

2 directories, 6 files


### This should fail in the first try but should then succeed.

In [13]:
%%ipytest -vv 
pytest_plugins = "regressions"

# From the docs-

def summary_grids():
    return {
        "Main Grid": {
            "id": 0,
            "cell_count": 1000,
            "active_cells": 300,
            "properties": [
                {"name": "Temperature", "min": 75, "max": 85},
                {"name": "Porosity", "min": 0.3, "max": 0.4},
            ],
        },
        "Refin1": {
            "id": 1,
            "cell_count": 48,
            "active_cells": 44,
            "properties": [
                {"name": "Temperature", "min": 78, "max": 81},
                {"name": "Porosity", "min": 0.36, "max": 0.39},
            ],
        },
    }

def test_grids2(data_regression):
    data = summary_grids()
    data_regression.check(data)

def test_pd(dataframe_regression):
    # you can provide tolerance from arguments.
    # along with path and the name of the file.
    data = [30,40,60]
    dataframe_regression.check(pd.DataFrame(data, columns=['Numbers']))

def test_pd_multindex(dataframe_regression):
    arrays = [
        np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
        np.array(["one", "two", "one", "two", "one", "two", "one", "two"]),
    ]
    s = pd.Series(np.random.randn(8), index=arrays)
    df = pd.DataFrame(np.random.randn(8, 4), index=arrays)
    
    dataframe_regression.check(df)

def test_file(file_regression):
    stuff = "This needs to be a string."
    file_regression.check(stuff)


def test_num_dict(num_regression):
    num = np.array([1,2,3])
    # num2 = np.array([1,2,3,4]) # The need to be of the same shape, or it yells at you
    # TypeError: Checking multiple arrays with different shapes are not supported for non-float arrays

    num2 = np.array([4,5,6])
    num_dict = {
        "num": num,
        "num2": num2
    }

    # inputs to num_reg have to be dicts.
    num_regression.check(num_dict)


def test_ndarray(ndarrays_regression):
    num = np.array([1,2,3])
    num2 = np.array([4,5,6, 7]) # this can be separates shapes
    num_dict = {
        "num": num,
        "num2": num2
    }
    ndarrays_regression.check(num_dict)
    




platform linux -- Python 3.8.17, pytest-7.4.0, pluggy-1.2.0 -- /home/atharva/miniconda3/envs/tardis-devel/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.8.17', 'Platform': 'Linux-5.15.0-57-generic-x86_64-with-glibc2.10', 'Packages': {'pytest': '7.4.0', 'pluggy': '1.2.0'}, 'Plugins': {'datadir': '1.4.1', 'anyio': '3.7.1', 'cov': '4.1.0', 'arraydiff': '0.6.0a1', 'metadata': '3.0.0', 'html': '3.2.0', 'syrupy': '4.0.8', 'regressions': '2.4.2', 'doctestplus': '1.0.0'}}
rootdir: /home/atharva/workspace/code/tardis-main/syrupy-reg-comp
plugins: datadir-1.4.1, anyio-3.7.1, cov-4.1.0, arraydiff-0.6.0a1, metadata-3.0.0, html-3.2.0, syrupy-4.0.8, regressions-2.4.2, doctestplus-1.0.0
[1mcollecting ... [0mcollected 6 items

t_716e1676229c4afba3797ca226bf4bee.py::test_grids2 [31mFAILED[0m[31m                                    [ 16%][0m
t_716e1676229c4afba3797ca226bf4bee.py::test_pd [31mFAILED[0m[31m                                        [ 33%][0m
t_716e1676229c4afba3797ca226b

### A temp directory should be created-

In [14]:
!tree -L 3

[01;34m.[0m
├── comp.ipynb
├── [01;34m__snapshots__[0m
│   └── [01;34mt_716e1676229c4afba3797ca226bf4bee[0m
│       ├── test_np.1.dat
│       ├── test_np.dat
│       ├── test_pd3[1].hdf
│       ├── test_pd3[2].hdf
│       └── test_pd.hdf
└── [01;34mt_716e1676229c4afba3797ca226bf4bee[0m
    ├── test_file.txt
    ├── test_grids2.yml
    ├── test_ndarray.npz
    ├── test_num_dict.csv
    ├── test_pd.csv
    └── test_pd_multindex.csv

3 directories, 12 files


### This was the file created.

In [15]:
!cat ./*/test_grids2.yml

Main Grid:
  active_cells: 300
  cell_count: 1000
  id: 0
  properties:
  - max: 85
    min: 75
    name: Temperature
  - max: 0.4
    min: 0.3
    name: Porosity
Refin1:
  active_cells: 44
  cell_count: 48
  id: 1
  properties:
  - max: 81
    min: 78
    name: Temperature
  - max: 0.39
    min: 0.36
    name: Porosity


In [16]:
!cat ./*/test_pd_multindex.csv

,,0,1,2,3
bar,one,0.60253427010792759,2.0023368950373688,-0.79621795385818739,0.3737527840596675
bar,two,-1.0035005757179127,-2.1063444753108524,1.1113714494156453,1.4021311984795786
baz,one,0.5472137975074961,0.86416981449180652,0.71977582673979712,-0.30678902359383164
baz,two,-0.85438767814522054,0.027840560034292638,1.037739523711543,-0.90381359744030576
foo,one,-0.372207078125116,1.3430840677911227,1.3806764812762617,-0.47313286398934107
foo,two,2.4038456499527769,1.9791235556350704,-0.53925073094450848,-1.4872594769790817
qux,one,-0.24098838356724259,-0.91322114484659833,-0.923455769345783,-0.97503944008970878
qux,two,1.5085487443296575,-1.0443438060463952,-1.6365687841038032,1.1633939085122509


### Lets change the code and fail this-

In [17]:
%%ipytest -vv 

def summary_grids():
    return {
        "Main Grid": {
            "id": 0,
            "cell_count": 100, # I changed this
            "active_cells": 300,
            "properties": [
                {"name": "Temperature", "min": 75, "max": 85},
                {"name": "Porosity", "min": 0.3, "max": 0.4},
            ],
        },
        "Refin1": {
            "id": 1,
            "cell_count": 48,
            "active_cells": 44,
            "properties": [
                {"name": "Temperature", "min": 78, "max": 81},
                {"name": "Porosity", "min": 0.36, "max": 0.39},
            ],
        },
    }

def test_grids2(data_regression):
    data = summary_grids()
    data_regression.check(data)

platform linux -- Python 3.8.17, pytest-7.4.0, pluggy-1.2.0 -- /home/atharva/miniconda3/envs/tardis-devel/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.8.17', 'Platform': 'Linux-5.15.0-57-generic-x86_64-with-glibc2.10', 'Packages': {'pytest': '7.4.0', 'pluggy': '1.2.0'}, 'Plugins': {'datadir': '1.4.1', 'anyio': '3.7.1', 'cov': '4.1.0', 'arraydiff': '0.6.0a1', 'metadata': '3.0.0', 'html': '3.2.0', 'syrupy': '4.0.8', 'regressions': '2.4.2', 'doctestplus': '1.0.0'}}
rootdir: /home/atharva/workspace/code/tardis-main/syrupy-reg-comp
plugins: datadir-1.4.1, anyio-3.7.1, cov-4.1.0, arraydiff-0.6.0a1, metadata-3.0.0, html-3.2.0, syrupy-4.0.8, regressions-2.4.2, doctestplus-1.0.0
[1mcollecting ... [0mcollected 1 item

t_716e1676229c4afba3797ca226bf4bee.py::test_grids2 [31mFAILED[0m[31m                                    [100%][0m

[31m[1m___________________________________________ test_grids2 ____________________________________________[0m

data_regression = <pytest_regres

### But you can also update it when it fails, like so-

In [18]:
%%ipytest -vv  --force-regen

# There is also this command when you want to update all files- when one file is causing others to fail
# %%ipytest -vv   --regen-all


def summary_grids():
    return {
        "Main Grid": {
            "id": 0,
            "cell_count": 100, # I changed this but it didn't fail
            "active_cells": 300,
            "properties": [
                {"name": "Temperature", "min": 75, "max": 85},
                {"name": "Porosity", "min": 0.3, "max": 0.4}
            ],
        },
        "Refin1": {
            "id": 1,
            "cell_count": 48,
            "active_cells": 44,
            "properties": [
                {"name": "Temperature", "min": 78, "max": 81},
                {"name": "Porosity", "min": 0.36, "max": 0.39},
            ],
        },
    }

def test_grids2(data_regression):
    data = summary_grids()
    data_regression.check(data)

platform linux -- Python 3.8.17, pytest-7.4.0, pluggy-1.2.0 -- /home/atharva/miniconda3/envs/tardis-devel/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.8.17', 'Platform': 'Linux-5.15.0-57-generic-x86_64-with-glibc2.10', 'Packages': {'pytest': '7.4.0', 'pluggy': '1.2.0'}, 'Plugins': {'datadir': '1.4.1', 'anyio': '3.7.1', 'cov': '4.1.0', 'arraydiff': '0.6.0a1', 'metadata': '3.0.0', 'html': '3.2.0', 'syrupy': '4.0.8', 'regressions': '2.4.2', 'doctestplus': '1.0.0'}}
rootdir: /home/atharva/workspace/code/tardis-main/syrupy-reg-comp
plugins: datadir-1.4.1, anyio-3.7.1, cov-4.1.0, arraydiff-0.6.0a1, metadata-3.0.0, html-3.2.0, syrupy-4.0.8, regressions-2.4.2, doctestplus-1.0.0
[1mcollecting ... [0mcollected 1 item

t_716e1676229c4afba3797ca226bf4bee.py::test_grids2 [31mFAILED[0m[31m                                    [100%][0m

[31m[1m___________________________________________ test_grids2 ____________________________________________[0m

[1m[31mE   AssertionError: FIL

### Things should match now

In [19]:
%%ipytest -vv  

# There is also this command when you want to update all files- when one file is causing others to fail
# %%ipytest -vv   --regen-all


def summary_grids():
    return {
        "Main Grid": {
            "id": 0,
            "cell_count": 100, # I changed this but it didn't fail
            "active_cells": 300,
            "properties": [
                {"name": "Temperature", "min": 75, "max": 85},
                {"name": "Porosity", "min": 0.3, "max": 0.4}
            ],
        },
        "Refin1": {
            "id": 1,
            "cell_count": 48,
            "active_cells": 44,
            "properties": [
                {"name": "Temperature", "min": 78, "max": 81},
                {"name": "Porosity", "min": 0.36, "max": 0.39},
            ],
        },
    }

def test_grids2(data_regression):
    data = summary_grids()
    data_regression.check(data)

platform linux -- Python 3.8.17, pytest-7.4.0, pluggy-1.2.0 -- /home/atharva/miniconda3/envs/tardis-devel/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.8.17', 'Platform': 'Linux-5.15.0-57-generic-x86_64-with-glibc2.10', 'Packages': {'pytest': '7.4.0', 'pluggy': '1.2.0'}, 'Plugins': {'datadir': '1.4.1', 'anyio': '3.7.1', 'cov': '4.1.0', 'arraydiff': '0.6.0a1', 'metadata': '3.0.0', 'html': '3.2.0', 'syrupy': '4.0.8', 'regressions': '2.4.2', 'doctestplus': '1.0.0'}}
rootdir: /home/atharva/workspace/code/tardis-main/syrupy-reg-comp
plugins: datadir-1.4.1, anyio-3.7.1, cov-4.1.0, arraydiff-0.6.0a1, metadata-3.0.0, html-3.2.0, syrupy-4.0.8, regressions-2.4.2, doctestplus-1.0.0
[1mcollecting ... [0mcollected 1 item

t_716e1676229c4afba3797ca226bf4bee.py::test_grids2 [32mPASSED[0m[33m                                    [100%][0m

../../../../miniconda3/envs/tardis-devel/lib/python3.8/site-packages/_pytest/config/__init__.py:1204
    self._mark_plugins_for_rewrite(hook)



In [20]:
!cat ./*/test_grids2.yml

Main Grid:
  active_cells: 300
  cell_count: 100
  id: 0
  properties:
  - max: 85
    min: 75
    name: Temperature
  - max: 0.4
    min: 0.3
    name: Porosity
Refin1:
  active_cells: 44
  cell_count: 48
  id: 1
  properties:
  - max: 81
    min: 78
    name: Temperature
  - max: 0.39
    min: 0.36
    name: Porosity


### Multiple files can be created if you want, using pytest parametrization.(should fail in the first attempt)

In [21]:
%%ipytest -vv  

def summary_grids_2():
    return {
        "Main Grid": {
            "id": 0,
            "cell_count": 200, # I changed this
            "active_cells": 300,
            "properties": [
                {"name": "Temperature", "min": 75, "max": 85},
                {"name": "Porosity", "min": 0.3, "max": 0.4},
            ],
        },
        "Refin1": {
            "id": 1,
            "cell_count": 48,
            "active_cells": 44,
            "properties": [
                {"name": "Temperature", "min": 78, "max": 81},
                {"name": "Porosity", "min": 0.36, "max": 0.39},
            ],
        },
    }



@pytest.mark.parametrize('data', [summary_grids(), summary_grids_2()])
def test_grids3(data_regression, data):
    data_regression.check(data)

platform linux -- Python 3.8.17, pytest-7.4.0, pluggy-1.2.0 -- /home/atharva/miniconda3/envs/tardis-devel/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.8.17', 'Platform': 'Linux-5.15.0-57-generic-x86_64-with-glibc2.10', 'Packages': {'pytest': '7.4.0', 'pluggy': '1.2.0'}, 'Plugins': {'datadir': '1.4.1', 'anyio': '3.7.1', 'cov': '4.1.0', 'arraydiff': '0.6.0a1', 'metadata': '3.0.0', 'html': '3.2.0', 'syrupy': '4.0.8', 'regressions': '2.4.2', 'doctestplus': '1.0.0'}}
rootdir: /home/atharva/workspace/code/tardis-main/syrupy-reg-comp
plugins: datadir-1.4.1, anyio-3.7.1, cov-4.1.0, arraydiff-0.6.0a1, metadata-3.0.0, html-3.2.0, syrupy-4.0.8, regressions-2.4.2, doctestplus-1.0.0
[1mcollecting ... [0mcollected 2 items

t_716e1676229c4afba3797ca226bf4bee.py::test_grids3[data0] [31mFAILED[0m[31m                             [ 50%][0m
t_716e1676229c4afba3797ca226bf4bee.py::test_grids3[data1] [31mFAILED[0m[31m                             [100%][0m

[31m[1m___________________

### But should work next time

In [22]:
%%ipytest -vv  
@pytest.mark.parametrize('data', [summary_grids(), summary_grids_2()])
def test_grids3(data_regression, data):
    data_regression.check(data)

platform linux -- Python 3.8.17, pytest-7.4.0, pluggy-1.2.0 -- /home/atharva/miniconda3/envs/tardis-devel/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.8.17', 'Platform': 'Linux-5.15.0-57-generic-x86_64-with-glibc2.10', 'Packages': {'pytest': '7.4.0', 'pluggy': '1.2.0'}, 'Plugins': {'datadir': '1.4.1', 'anyio': '3.7.1', 'cov': '4.1.0', 'arraydiff': '0.6.0a1', 'metadata': '3.0.0', 'html': '3.2.0', 'syrupy': '4.0.8', 'regressions': '2.4.2', 'doctestplus': '1.0.0'}}
rootdir: /home/atharva/workspace/code/tardis-main/syrupy-reg-comp
plugins: datadir-1.4.1, anyio-3.7.1, cov-4.1.0, arraydiff-0.6.0a1, metadata-3.0.0, html-3.2.0, syrupy-4.0.8, regressions-2.4.2, doctestplus-1.0.0
[1mcollecting ... [0mcollected 2 items

t_716e1676229c4afba3797ca226bf4bee.py::test_grids3[data0] [32mPASSED[0m[33m                             [ 50%][0m
t_716e1676229c4afba3797ca226bf4bee.py::test_grids3[data1] [32mPASSED[0m[33m                             [100%][0m

../../../../miniconda3/envs/

In [23]:
!tree -L 3

[01;34m.[0m
├── comp.ipynb
├── [01;34m__snapshots__[0m
│   └── [01;34mt_716e1676229c4afba3797ca226bf4bee[0m
│       ├── test_np.1.dat
│       ├── test_np.dat
│       ├── test_pd3[1].hdf
│       ├── test_pd3[2].hdf
│       └── test_pd.hdf
└── [01;34mt_716e1676229c4afba3797ca226bf4bee[0m
    ├── test_file.txt
    ├── test_grids2.yml
    ├── test_grids3_data0_.yml
    ├── test_grids3_data1_.yml
    ├── test_ndarray.npz
    ├── test_num_dict.csv
    ├── test_pd.csv
    └── test_pd_multindex.csv

3 directories, 14 files


In [24]:
pd.read_csv("./t_716e1676229c4afba3797ca226bf4bee/test_pd_multindex.csv")

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,0,1,2,3
0,bar,one,0.602534,2.002337,-0.796218,0.373753
1,bar,two,-1.003501,-2.106344,1.111371,1.402131
2,baz,one,0.547214,0.86417,0.719776,-0.306789
3,baz,two,-0.854388,0.027841,1.03774,-0.903814
4,foo,one,-0.372207,1.343084,1.380676,-0.473133
5,foo,two,2.403846,1.979124,-0.539251,-1.487259
6,qux,one,-0.240988,-0.913221,-0.923456,-0.975039
7,qux,two,1.508549,-1.044344,-1.636569,1.163394


### My opinions on which library to choose-

 [Syrupy](https://github.com/tophat/syrupy)-
 
- Has more stars(342) and frequently maintained. They were very quick(<24 hrs) to merge PRs and address my issues.
- Is maintained by a company called tophat. The first commit was back in Oct 2019.
- Is also used by quite big open source libraries(https://github.com/langchain-ai/langchain is listed as one of the [dependents](https://github.com/tophat/syrupy/network/dependents).)
- They provide different types of APIs for different [use cases](https://github.com/tophat/syrupy?tab=readme-ov-file#extending-syrupy)(custom matchers, like one that I am using, you can [customise assert statements](https://github.com/tophat/syrupy?tab=readme-ov-file#assertion-options), multiple files/single file in the same test function, saving files elsewhere etc.)
- Parameterising works the same as expected- there are multiple files created.
- The API that I am using is a private method. They intend to make those public in the next major release, but have no immediate plans to do so(they said the code I wrote is pretty stable). For syrupy to work as we want to, we have to have our own class(see 6th code cell).



[Regressions](https://github.com/ESSS/pytest-regressions)-
- Simple and maintained by people of the pytest community.
- 164 stars. First commit was in June 2018. 
- [6 fixtures](https://pytest-regressions.readthedocs.io/en/latest/api.html) in total-
  - data for yaml serializable dict- creates yml files
  - dataframes- ascii files(.csv)
  - file
  - num regression- for dicts of same sized arrays
  - ndarray- for dict of np arrays(can be of different sizes)
  - image- for images
- Fixed API- cannot save custom file formats. 
- Possible to save data in external locations.
- One test function can't do more than one file.
- Requires tests code to be modular. Since one test function can only produce one file, this might require us to parameterise fixtures to produce different types of inputs. Parameterising fixtures is not as easy as test functions. [Inputs have to be items in a list](https://docs.pytest.org/en/7.3.x/how-to/fixtures.html#fixture-parametrize).
