# Run the tests for Climate12K

This notebooks runs the test for the Climate12K database. To run the tests on the data, just click on *Cell* &rarr; *Run all*. When the notebook is finished (i.e. if you see a table [at the bottom](#final)), you can download the Excel file with the aggregated results [here](../data/results.xlsx) and access the detailed test report formatted as HTML [here](../data/report.html).

This notebook downloads the database from http://lipdverse.org/globalHolocene/current_version, based on the version you specify in the `db_version` variable (see below). It has been developed by Philipp Sommer (philipp.sommer@unil.ch), please do not hesitate to get in touch if you run into any problems.

**Things you might want to adapt:**

- the database version string (`db_version`, see [here](#db_version))

In [None]:
import pandas as pd
import os.path as osp
from urllib import request
import zipfile

<a id=db_version></a>Read in the LipD data from http://lipdverse.org/globalHolocene/current_version

You should set the latest version here manually!

In [None]:
db_version = '0_30_1'

In [None]:
%%time
if not osp.exists('../data'):
    os.makedirs('../data')
    zipped = f'globalHolocene{db_version}.zip'
    uri = f'http://lipdverse.org/globalHolocene/current_version/{zipped}'
    target = osp.join('../data', zipped)
    print('downloading ' + uri)
    request.urlretrieve(uri, target)
    with zipfile.ZipFile(target) as f:
        f.extractall('../data')

Now we run the tests using the pytest command and serialize the data. This eventually produces a lot of output if the data contains multiple errors. But don't worry, you can access an aggregated test report for Excel [here](../data/results.xlsx) and you can view the detailed test results formatted as HTML [here](../data/report.html). 

Note: Each test has his own nodeid. With this nodeid (e.g. `tests/test_data.py::test_duplicated_ages[AMP112.vanderBilt.2016.LPDd2a984fe]`) that appears in the html report and in the Excel file, you can use both files together.

In [None]:
!pytest ../tests --html=../data/report.html --self-contained-html --serialize-lipds ../data/lipds.pkl

If you want to rerun these tests, better now set the `--lipd-data=../data/lipds.pkl` to speed them up. You can rerun the tests by removing the `#` in the following line:

In [None]:
#!pytest ../tests --html=../data/report.html --self-contained-html --lipd-data ../data/lipds.pkl

<a id="final"></a>Finally, let's have a look into the summary table of the test report.

In [None]:
summary = pd.read_excel('../data/results.xlsx', 'Summary')
summary