# Tutorial 2: Output files and metadata

Here, we will quickly look at the output files of the ``Scanner`` and ``Cluster`` class.

**NOTE: This tutorial requires that the previous tutorial was already run**!

**NOTE FOR CONTRIBUTORS: Always clear all output before commiting (``Cell`` > ``All Output`` > ``Clear``)**!

In [1]:
# Magic
%matplotlib inline
# Reload modules whenever they change
%load_ext autoreload
%autoreload 2

# Make bclustering package available even without installation
import sys
sys.path = ["../../"] + sys.path

In [6]:
import pandas
import json

## Scanner class

The call ``s.write(directory="output/cluster", name="tutorial_basics")`` of the last tutorial created two files: 
* ``output/cluster/tutorial_basics_output_data.csv`` contains the distributions for all points in wilson space
* ``output/cluster/tutorial_basics_metadata.json`` contains additional metadata about the whole 

### Loading the CSV data

This is what the output file for the data looks like:

In [19]:
!head output/cluster/tutorial_basics_data.csv

index,CVL_bctaunutau,CSL_bctaunutau,CT_bctaunutau,bin0,bin1,bin2,bin3,bin4,bin5,bin6,bin7,bin8,cluster,bpoint
0,-1.0,-1.0,-1.0,0.019218498441031788,0.07580410813701566,0.1246708359227077,0.15496778068703781,0.16685928655458301,0.16163110182874887,0.14038208485371928,0.10414009446322449,0.05232620911193127,0,False
1,-1.0,-1.0,-0.7777777777777778,0.019773890725310134,0.07705748254308728,0.12562499198025914,0.15532540359510516,0.16672624194488106,0.16117275308314305,0.13968223094664492,0.10321663356733266,0.051420371614236654,0,False
2,-1.0,-1.0,-0.5555555555555556,0.02091069886913175,0.07983716708445192,0.1279863021131987,0.15635940328668596,0.16642064419685929,0.15985881013134118,0.13775838580963498,0.10108864309214295,0.04977994541655343,0,False
3,-1.0,-1.0,-0.33333333333333337,0.02268079097294613,0.08432914872072125,0.13197526716773222,0.158200364146124,0.16591936014961814,0.1575255411720375,0.13438706988620103,0.09759335472112815,0.04738910306349156,0,False
4,-1.0,-1.0,-0.111111

If you want to directly load the dataframe from the output file, you can do so easily:

In [15]:
df = pandas.read_csv("output/cluster/tutorial_basics_data.csv")
df.set_index("index", inplace=True)

In [16]:
df.head()

Unnamed: 0_level_0,CVL_bctaunutau,CSL_bctaunutau,CT_bctaunutau,bin0,bin1,bin2,bin3,bin4,bin5,bin6,bin7,bin8,cluster,bpoint
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
0,-1.0,-1.0,-1.0,0.019218,0.075804,0.124671,0.154968,0.166859,0.161631,0.140382,0.10414,0.052326,0,False
1,-1.0,-1.0,-0.777778,0.019774,0.077057,0.125625,0.155325,0.166726,0.161173,0.139682,0.103217,0.05142,0,False
2,-1.0,-1.0,-0.555556,0.020911,0.079837,0.127986,0.156359,0.166421,0.159859,0.137758,0.101089,0.04978,0,False
3,-1.0,-1.0,-0.333333,0.022681,0.084329,0.131975,0.1582,0.165919,0.157526,0.134387,0.097593,0.047389,0,False
4,-1.0,-1.0,-0.111111,0.024851,0.089976,0.137132,0.160655,0.165283,0.15442,0.129933,0.093153,0.044598,1,False


### Loading the metadata

The metadata contains lots of additional information. The output file uses the ``json`` format:

In [28]:
!head -n 100 output/cluster/tutorial_basics_metadata.json

{
    "bpoint": {
        "bpoint": {
            "cluster_column": {},
            "metric": {
                "args": [],
                "kwargs": {}
            }
        }
    },
    "cluster": {
        "cluster": {
            "cluster_args": {
                "max_d": 0.04
            },
            "git": {
                "branch": "HEAD",
                "msg": "minor changes",
                "sha": "409ae62cf072089977af162c52b96ff2dbaa6733",
                "time": "Mon 25 Mar 2019 14:58"
            },
            "hierarchy": {
                "method": "complete",
                "optimal_ordering": false
            },
            "metric": {
                "args": [],
                "kwargs": {}
            },
            "n_clusters": 4,
            "time": "Mon 25 Mar 2019 15:10"
        }
    },
    "errors": {
        "abs_cov": null,
        "poisson": false,
        "rel_cov": null
    },
    "scan": {
        "dfunction"

If you want to load this directly without relying on the ``Scanner`` or ``Cluster`` class, you can do so by calling

In [25]:
with open("output/cluster/tutorial_basics_metadata.json") as inputfile:
    metadata = json.load(inputfile)

Now you can use ``metadata`` like a nested python dictionary, e.g. ``metadata["bpoints"]`` contains the information about the sampled points and ``metadata[dfunction]`` the information about the function that was used to generate the distributions.

## Cluster class

Basically the same, just adds more metadata to it.