# Basic Tutorial

This tutorial will talk about how to use this software from your own python project or Jupyter notebook.
There is also a nice command line interface that enables you to do the same with just two lines in your command line.

**NOTE FOR CONTRIBUTORS: Always clear all output before commiting (``Cell`` > ``All Output`` > ``Clear``)**!

In [None]:
# Magic
%matplotlib inline
# Reload modules whenever they change
%load_ext autoreload
%autoreload 2

# Make bclustering package available even without installation
import sys
sys.path = ["../../"] + sys.path

In [None]:
import numpy as np
import flavio
import functools

In [None]:
import bclustering.physics.models.bdlnu.distribution as bdlnu

## Scanning

### Setting it up

In [None]:
from bclustering.scan import Scanner

Let's set up a scanner object and configure it.

In [None]:
s = Scanner()

First we set up the function/distribution that we want to consider. Here we look into the differential cross section with respect to $q^2$ of $B\longrightarrow D \tau \bar\nu_\tau$. This is implemented in 

In [None]:
# s.set_dfunction(
#     bdlnu.dGq2,
#     binning=np.linspace(bdlnu.q2min, bdlnu.q2max, 3),
#     normalize=True
# )

In [None]:
s.set_dfunction(
   functools.partial(flavio.np_prediction, "dBR/dq2(B+->Dtaunu)"),
    binning=np.linspace(bdlnu.q2min, bdlnu.q2max, 3),
    normalize=True
)

In [None]:
from wilson import Wilson
w = Wilson(
    {
        "CVL_bctaunutau": 10,
        "CSL_bctaunutau": 1,
        "CT_bctaunutau": 100
    }, 
    scale=5,
    eft='WET',
    basis='flavio'
)
q=Wilson({}, scale=5, eft='WET', basis='flavio')

First, let's set up the Wilson coefficients (alias "benchmark points") that need to be sampled. The Wilson coefficients are implemented using the Wilson package (https://wilson-eft.github.io/ ), which allows to use a variety of bases, EFTs and matches them to user specified scales.
Using the example of $B\longrightarrow D \tau \bar\nu_\tau$, we sample the coefficients ``CVL_bctaunutau``, ``CSL_bctaunutau`` and ``CT_bctaunutau`` from the ``flavio`` basis with 4 points between $-1\times 10^{-2}$ and $1\times 10^{-2}$ :

In [None]:
s.set_bpoints_equidist(
    {
        "CVL_bctaunutau": (-1, 1, 4),
        "CSL_bctaunutau": (-1, 1, 4),
        "CT_bctaunutau": (-1, 1, 4)
    },
    scale=5,
    eft='WET',
    basis='flavio'
)

### Running it

In [None]:
# Start running with maximally 3 cores
s.run(no_workers=3)

The results are saved in a dataframe, ``Scanner.df``. Let's have a look:

In [None]:
s.df

The configuration of the scanner is saved in a mdatadata object, which is a nested dictionary of config items. 
As an example, we can quickly check for the number of bins in q2 later:

In [None]:
s.metadata["scan"]

The metadata also contains information about the source code version you're using (git has, commit messages etc.).

### Output files

Now it's time to write out the results for later use.

In [None]:
# Write out results
s.write("output/scan/tutorial_basics")

This has created 2 files. ``../output/scan/tutorial_basics_output_data.csv`` contains the data that we saw previously as a pandas dataframe in csv format:

In [None]:
!head output/scan/tutorial_basics_data.csv

The other one contains the configuration in json format:

In [None]:
!head -n 20 output/scan/tutorial_basics_metadata.json

## Clustering

### Setting it up

In [None]:
from bclustering.cluster import HierarchyCluster

In [None]:
c = HierarchyCluster("output/scan/tutorial_basics")

This has loaded the results from the previous step. The data is again the same dataframe as before:

In [None]:
c.df.head()

Similarly, the cluster object also contains the previous metadata:

In [None]:
c.metadata["scan"]

### Running it 

In [None]:
c.build_hierarchy()

In [None]:
c.cluster(max_d=0.2)

The cluster numbers are directly added as a new column to the dataframe:

In [None]:
c.df.head()

In [None]:
c.write("output/cluster/tutorial_basics")