# CONTRE: Example

This is an abstract and easy example to show the usage of the CONTRE Continuum Reweighting.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from root_pandas import read_root

## Generation of Files

In this abstract example, we have the following components and variables:  
- `componentA` (anlaouge to Continuum MC), 
- `componentB` (only on resonance).
- `variable1` (badly simulated for `componentA`)
- `variable2`,
- `__candidate__` (`"__candidate__" == 0`) is selected),
- `EventType` (To define "signal" and "background" for the classifier)

In the following cell these samples are generated and stored in `example_input`.

In [None]:
size_mc = 5e6
size_data = 5e5
frac_a = 0.8
frac_b = 1 - frac_a
frac_offres = 0.5

%run -i generate_data.py

In [None]:
print(size_data*frac_a*0.7)

In [None]:
componentA.head()

### Histogram of the example data:  
You can also look at the other variables.

In [None]:
variable="variable2"
plot_histograms(variable)  # this function is defined in generate_data.py

## Setting the parameters

To start the training you need to set the parameters by writing them to a json file.
You can look at the example file `example_parameters.json`which is explained in the following.

1. Define the `"name"` and and `"result_path"` parameters. You can find a list of your results in `result_path/name=<name>/`
1. Define the paths of your on- and off-resonance files as lists.
    - All files given in `off_res_files` will be used for training.  
    - Files the training will be applied to all `on_res_files` and weights are calculated. In principle, only `componentA` is reweighted, but you can compare the classifier output between on resonance data and MC if you include `data`. `componentB` is not needed.
2. In `"training_parameters"` you can define:
    - `test_size` and `train_size`. Your ntuple files will be split into a test- and a train sample with e.g. 90% data in the train- and 10% data in the test sample.
    - `training_variables`. The variables used for training. The variables used should be eventbased. If you use other variables, be aware that the programm selects allways `__candidate__ == 0` for training.  

`example_parameters.json` contains:
```json
{
    "name": "my_example",
    "result_path": "example_output",
    

    "off_res_files": [
        "example/data_offres.root",
        "example/componentA_offres.root"
    ]

    "training_parameters": {
        "train_size": 0.9,
        "test_size": 0.1,
        "training_variables": [
            "variable1"
        ]
    },

    "reweighting_parameters": {
        "normalize_to": 0
    }
}
```

## Starting the training
The Training is implemented with `b2luigi`. Edit the following runfile to start the training.

`run_example.py` contains:  

```python
import json
import b2luigi
from contre.validation import DelegateValidation

parameter_file = 'example_parameters.json'
with open(parameter_file) as file:
    parameters = json.load(file)

b2luigi.set_setting(
    "result_path",
    parameters.get("result_path"),
)

b2luigi.process(
    DelegateValidation(
        name=parameters.get("name"),
        parameter_file=parameter_file)
)

```

Remove your output if you want to rerun the training.

In [None]:
! rm -r example_output/

In [None]:
%run run_example.py

## Finding and using the output
All Output files are stored in /results_json

In [None]:
with open("example_output/name=my_example/validation_results.json", "r") as f:
    results = json.load(f)
print(results)
test_samples = [read_root(sample) for sample in results["test_samples"]]
print(results["validation_weights"])
validation_weights = read_root(results["validation_weights"])

In [None]:
with open("example_output/name=my_example/results.json", "r") as f:
    results = json.load(f)
print(results)
print(results["weights"])
weights = read_root(results["weights"])

In [None]:
results

The weights are stored in one file ordered in the same order as the list of the test samples (on_res_files). In this case the first rows belong to the data testsample. They are not needed for reweighting. All other weights belong to to the testsample of `componentA-offres`

In [None]:
print(len(validation_weights))
len(data_offres) + len(componentA)


In [None]:
data_offres_test = test_samples[0]
componentA_offres_test = test_samples[1]

In [None]:
print(len(data_offres_test), len(componentA_offres_test))

In [None]:
a = validation_weights[len(data_offres_test):]#, "weight"]# .values
a = a['weight'].values
componentA_offres_test["contre_weight"] = a

b = weights[len(data):]
b = b['weight'].values
componentA["contre_weight"] = b

I wrote an importfunction for weights in `import_results.`

### Normalizing the weights
The weights must be normalized to match the ratio of data and mc used for training.

Because the train samples were 0.9 of the sample for all samples we can normalize to `size_data/size_mc`.

In [None]:
componentA.head()

In [None]:
componentA_offres_test.head()

In [None]:
weight_mean = np.mean(componentA_offres_test["contre_weight"])
print(weight_mean)
# componentA_offres_test["contre_weight"] *= size_data / size_mc / weight_mean
print(size_data/size_mc)

In [None]:
fig, ax = plt.subplots(1, 2, figsize=[12.8, 4.8])

# on resonance histogram
count, edges = np.histogram(
    data[variable], bins=30, range=(0, 1))

bin_width = (edges[1] - edges[0]) / 2
bin_mids = edges[:-1]+bin_width
ax[0].plot(
    bin_mids, count, color="black", marker='.', ls="",
    label="data")

w = size_data/size_mc
ax[0].hist(
    [componentA[variable], componentB[variable]],
    bins=30, range=(0, 1), stacked=True,
    weights=[componentA["contre_weight"], [w]*len(componentB)],
    label=["componentA\n(reweighted)", "componentB"])

ax[0].set_title("On resonance")
ax[0].legend()

# off resonance histogram
count, edges = np.histogram(
    data_offres_test[variable], bins=30, range=(0, 1))
ax[1].plot(
    bin_mids, count, color="black", marker='.', ls="")

ax[1].hist(
    componentA_offres_test[variable], bins=30, range=(0, 1),
    weights=componentA_offres_test["contre_weight"],
)

ax[1].set_title("Off resonance, test samples")

plt.show()