# Apply methods

In this notebook we will show how to load dataset saved in the previous notebook and apply cut-based
analysis and boosted decision tree on it. Finally we use accuracy to evaluate their performance.

## Introduction to built-in methods

Currently there're three categories of built-in methods: cuts, trees, and (neural) networks.

In a cut-based analysis, we apply a series of cuts on the input data to select as many signal events
and as few background events as possible. Each cut reduces the number of background events and
inevitably also signal events. The goal usually is to find a set of cuts that maximizes the significance
$s/\sqrt{s+b}$, where $s$ and $b$ are the number of remaining signal and background events, respectively.

The boosted decision tree is one of trees methods. It is a machine learning method commonly used in
high energy physics. The name "boosted" comes from the idea to combine weak classifiers into a strong
one. 

There are also neural networks methods. They will be covered in the next version.

First import necessary packages.

In [1]:
from hml.datasets import Dataset
from hml.methods.cuts import CutBasedAnalysis
from hml.methods.trees import BoostedDecisionTree
import numpy as np
from sklearn.metrics import accuracy_score

To load dataset, we use the class method `load` of `Dataset` class. The method takes the dataset
directory as input and returns a `Dataset` object.

In [2]:
dataset = Dataset.load("./data/z_vs_qcd")

Then we apply a boosted decision tree. It comes from `scikit-learn` package originally. The `compile`
method takes loss function name, optimizer name, and a list of metrics as input. Here we use default
parameters as in `scikit-learn`.

In [3]:
method1 = BoostedDecisionTree()
method1.compile()
method1.fit(dataset.data, dataset.target)

In [4]:
y_true = dataset.target
y_pred = method1.predict(dataset.data)
accuracy_score(y_true, y_pred)

0.9834353173867358

For a cut-based method, we use `CutBasedAnalysis` class. The same workflow goes here as well.

In [5]:
method2 = CutBasedAnalysis()
method2.compile()
method2.fit(dataset.data, dataset.target)

In [6]:
y_true = dataset.target
y_pred = method2.predict(dataset.data)
accuracy_score(y_true, y_pred)

0.7319364608443043