# CONTRE: Example

**Run this notebook in the folder `contre/example`.**  
This is an abstract and easy example to show the usage of the CONTRE Continuum Reweighting.

In [None]:
import json
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from root_pandas import read_root
from contre.example.generate_data import generate_data

## Generation of Files

In this abstract example, we have the following components and variables:  
- `componentA` (anlaouge to Continuum MC), 
- `componentB` (only on resonance).
- `variable1` (badly simulated for `componentA`)
- `variable2`,
- `__candidate__` (`"__candidate__" == 0`) is selected),
- `EventType` (To define "signal" and "background" for the classifier)

In the following cell these samples are generated and stored in `example_input`.

In [None]:
size_mc=500000
size_data=10000
size_mc_offres=150000
size_data_offres=8000
frac_a=0.8

data, componentA, componentB, data_offres, componentA_offres = generate_data(
    size_mc, size_data, size_mc_offres, size_data_offres, frac_a)

### Histogram of the example data:  
You can also look at the other variables.

In [None]:
variable="variable1"
# variable="variable2"

In [None]:
# Scaling the MC to match the data
w = size_data / size_mc
w_offres = size_data_offres / size_mc_offres

In [None]:
fig, ax = plt.subplots(1, 2, figsize=[12.8, 4.8])

# on-resonance histogram
count, edges = np.histogram(
    data[variable], bins=30, range=(0, 1))

bin_width = edges[1] - edges[0]
bin_mids = edges[:-1]+bin_width
ax[0].plot(
    bin_mids, count, color="black", marker='.', ls="",
    label="data")

ax[0].hist(
    [componentA[variable], componentB[variable]],
    bins=30, range=(0, 1), stacked=True,
    weights=[[w]*len(componentA), [w]*len(componentB)],
    label=["componentA", "componentB"])

ax[0].set_title("On resonance")
ax[0].legend()

# off-resonance histogram
count, edges = np.histogram(
    data_offres[variable], bins=30, range=(0, 1))
ax[1].plot(
    bin_mids, count, color="black", marker='.', ls="")

ax[1].hist(
    componentA_offres[variable], bins=30, range=(0, 1),
    weights=[w_offres]*len(componentA_offres))

ax[1].set_title("Off resonance")

plt.show()

ComponentA has a large disagreement to data. I represents the Continuum MC and will be reweighted in the following.

## Setting the parameters

To start the training you need to set the parameters by writing them to a `yaml` file.
You can look at the example file `example_parameters.yaml`.

### `example_parameters.yaml` 

The file contains:

```yaml
# you can find your results in <result_path>/name=<name>
name: my_example
result_path: example_output

# path to all off-resoance data and MC ntuple files
off_res_files: 
    - example_input/data_offres.root
    - example_input/componentA_offres.root

# path to on-resonance MC to be reweighted (i.e. Continuum)
on_res_files:
    - example_input/componentA.root

# name of the tree in the ntuple root file
tree_name: variables

# List of the variables used for training
training_variables:
    - variable1

# to adjust the parameters of the training
training_parameters: 
    train_size: 0.9
    test_size: 0.1
    # the following variables change the fastBDT hyperparameters
    # they can be removed
    nTrees: 100
    shrinkage: 0.2
    nLevels: 3
```

### Additional Comments

1. Files:
    - All files given in `off_res_files` will be used for training.
    - The training will be applied to __all `on_res_files`__. Weights will be calculated. You only need to give the MC components that can be found in off-resonance MC. (In this example, only `componentA`)
2. In `"training_parameters"` you can define:
    - `test_size` and `train_size`. Your ntuple files will be split into a test- and a train sample with e.g. 90% data in the train- and 10% data in the test sample.
    - Hyper-parameters of the BDT can be adjusted. (these options can be removed.)

3. `training_variables`: The variables used for training. The variables used should be eventbased. If you use other variables, be aware that the programm selects allways `__candidate__ == 0` for training.

4. Normalisation of the weights:
    - the reweighted MC Sample will correspond to the luminosity of the used off-resonance Data

## Starting the training
The training is implemented with `b2luigi`. With the following runfile the training can be started.

`run_example.py` contains:  

```python
import yaml
import b2luigi
from contre.reweighting import DelegateReweighting

parameter_file = 'example_parameters.yaml'
with open(parameter_file) as f:
    parameters = yaml.load(f)

b2luigi.set_setting(
    "result_path",
    parameters.get("result_path"),
)

b2luigi.process(
    DelegateReweighting(
        name=parameters.get("name"),
        parameter_file=parameter_file)
)

```

Remove your output if you want to rerun the training and your input files changed:

In [None]:
! rm -r example_output/

In [None]:
%run run_example.py

## Finding and using the output
Output files of the reweigted test samples are listed in `<output_folder>/name=<name>o/validation_resluts.json`.  
Weights for the on-resonance files are listed in the same folder in the file `results.json`.

In [None]:
with open("example_output/name=my_example/validation_results.json", "r") as f:
    validation_results = json.load(f)
test_samples = [read_root(sample) for sample in validation_results["test_samples"]]
validation_weights = read_root(validation_results["validation_weights"])

In [None]:
with open("example_output/name=my_example/results.json", "r") as f:
    results = json.load(f)
weights = read_root(results["weights"])

### Weights

- Stored in one file,
- ordered in the same order as the list of the on-resonance files (or test samples),
- for validation weights, the first part belongs to the off-resonance __data__ test samples, 
- the eweighted MC samples correspond to the luminosity of the used off-resonance Data. 
- contain three columns:
    - q: Classifier output
    - EventType: 0 for MC, 1 for Data
    - weight: the weight belonging to a correspondig event in the reweighted sample

In [None]:
weights.head()

In [None]:
data_offres_test = test_samples[0]
componentA_offres_test = test_samples[1]

In [None]:
a = validation_weights[len(data_offres_test):]
a = a['weight'].values
componentA_offres_test["contre_weight"] = a

b = weights
b = b['weight'].values
componentA["contre_weight"] = b

### Scaling of the weights
The reweighted test samples match the luminosity of the off-resonance data and don't need to be scaled anymore.

The reweighted on-resonance MC sample has the integrated Luminosity   
$$L_{data,off-res.}\cdot \frac{L_{MC,on-res.}}{L_{MC,off-res.}}\quad,$$ 
and needs to be scaled to match the on-resonance Data.

In [None]:
componentA["contre_weight"] *= size_data / size_data_offres * size_mc_offres / size_mc

In [None]:
fig, ax = plt.subplots(1, 2, figsize=[12.8, 4.8])

# on-resonance histogram
count, edges = np.histogram(
    data[variable], bins=30, range=(0, 1))

bin_width = (edges[1] - edges[0]) / 2
bin_mids = edges[:-1]+bin_width
ax[0].plot(
    bin_mids, count, color="black", marker='.', ls="",
    label="data")

w = size_data/size_mc
ax[0].hist(
    [componentA[variable], componentB[variable]],
    bins=30, range=(0, 1), stacked=True,
    weights=[componentA["contre_weight"], [w]*len(componentB)],
    label=["componentA\n(reweighted)", "componentB"])

ax[0].set_title("On resonance")
ax[0].legend()

# off-resonance histogram
count, edges = np.histogram(
    data_offres_test[variable], bins=30, range=(0, 1))
ax[1].plot(
    bin_mids, count, color="black", marker='.', ls="")

ax[1].hist(
    componentA_offres_test[variable], bins=30, range=(0, 1),
    weights=componentA_offres_test["contre_weight"],
)

ax[1].set_title("Off resonance, test samples")

plt.show()