# Tutorial on using TMVA methods

In this tutorial we will use the Boosted Decision Tree of the TMVA library, installed with ROOT, for a binary classification problem. A .root file with higgs signal and background events is used as input.
First, we will import some usefull libraries:

In [1]:
from ROOT import TMVA, TFile, TTree, TCut
from subprocess import call
from os.path import isfile

Welcome to JupyROOT 6.11/02


Now, we will set up TMVA, specifying the output file and the analysis type of the TMVA factory

In [2]:
# Setup TMVA
TMVA.Tools.Instance()
TMVA.PyMethodBase.PyInitialize()

output = TFile.Open('TMVA.root', 'RECREATE')
factory = TMVA.Factory('TMVAClassification', output,
                       '!V:!Silent:Color:DrawProgressBar::AnalysisType=Classification')


We have to select the input root file. It has two trees, one for the signal events and one for the background. Each one contains the same variables. The necessary labels will be handled by TMVA when we specify the signal and background tree. Then, we chose the number of event we want to train our model.

In [3]:
data = TFile.Open('higgs_small.root')
signal = data.Get('TreeS')
background = data.Get('TreeB')

dataloader = TMVA.DataLoader('dataset')
for branch in signal.GetListOfBranches():
    dataloader.AddVariable(branch.GetName())

dataloader.AddSignalTree(signal, 1.0)
dataloader.AddBackgroundTree(background, 1.0)
dataloader.PrepareTrainingAndTestTree(TCut(''),
                                      'nTrain_Signal=3000:nTrain_Background=3000:SplitMode=Random:NormMode=NumEvents:!V')

DataSetInfo              : [dataset] : Added class "Signal"
                         : Add Tree TreeS of type Signal with 5296 events
DataSetInfo              : [dataset] : Added class "Background"
                         : Add Tree TreeB of type Background with 4703 events
                         : Dataset[dataset] : Class index : 0  name : Signal
                         : Dataset[dataset] : Class index : 1  name : Background


---
Next step is choosing the built in method we want to use. We will use a BDT with 80 estimators, each one with a max depth of 4. With the `VarTransform` option we call aply transformations on the input variables. Here, we're using the decorrelation (`D`) and the gaussianization (`G`) options. Other options are normalisation,principal component analysis (PCA) and uniformisation with short-hand versions of (`N`,`P`,`U`) 

In [4]:
factory.BookMethod(dataloader, TMVA.Types.kFisher, 'Fisher',
                   '!H:!V:Fisher:VarTransform=D,G')

factory.BookMethod(dataloader, TMVA.Types.kBDT, 'BDT',
                   '!H:!V:VarTransform=G,D:CreateMVAPdfs=True:PDFInterpolMVAPdf=Spline2:NTrees=600:MaxDepth=5')

<ROOT.TMVA::MethodBDT object ("BDT") at 0x562c34c62cb0>

Factory                  : Booking method: [1mFisher[0m
                         : 
Fisher                   : [dataset] : Create Transformation "D" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'lepton_pT' <---> Output : variable 'lepton_pT'
                         : Input : variable 'lepton_eta' <---> Output : variable 'lepton_eta'
                         : Input : variable 'lepton_phi' <---> Output : variable 'lepton_phi'
                         : Input : variable 'missing_energy_magnitude' <---> Output : variable 'missing_energy_magnitude'
                         : Input : variable 'missing_energy_phi' <---> Output : variable 'missing_energy_phi'
                         : Input : variable 'jet_1_pt' <---> Output : variable 'jet_1_pt'
                         : Input : variable 'jet_1_eta' <---> Output : variable 'jet_1_eta'
                         : Input

In [5]:
# Run training, test and evaluation
factory.TrainAllMethods()
factory.TestAllMethods()
factory.EvaluateAllMethods()


Factory                  : [1mTrain all methods[0m
Factory                  : [dataset] : Create Transformation "I" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'lepton_pT' <---> Output : variable 'lepton_pT'
                         : Input : variable 'lepton_eta' <---> Output : variable 'lepton_eta'
                         : Input : variable 'lepton_phi' <---> Output : variable 'lepton_phi'
                         : Input : variable 'missing_energy_magnitude' <---> Output : variable 'missing_energy_magnitude'
                         : Input : variable 'missing_energy_phi' <---> Output : variable 'missing_energy_phi'
                         : Input : variable 'jet_1_pt' <---> Output : variable 'jet_1_pt'
                         : Input : variable 'jet_1_eta' <---> Output : variable 'jet_1_eta'
                         : Input : variable 'jet_1_phi' <---> Out

0%, time left: unknown
7%, time left: 0 sec
13%, time left: 0 sec
19%, time left: 0 sec
25%, time left: 0 sec
32%, time left: 0 sec
38%, time left: 0 sec
44%, time left: 0 sec
50%, time left: 0 sec
57%, time left: 0 sec
63%, time left: 0 sec
69%, time left: 0 sec
75%, time left: 0 sec
82%, time left: 0 sec
88%, time left: 0 sec
94%, time left: 0 sec
0%, time left: unknown
6%, time left: 11 sec
12%, time left: 9 sec
19%, time left: 7 sec
25%, time left: 6 sec
31%, time left: 6 sec
37%, time left: 6 sec
44%, time left: 5 sec
50%, time left: 4 sec
56%, time left: 4 sec
62%, time left: 3 sec
69%, time left: 3 sec
75%, time left: 2 sec
81%, time left: 1 sec
87%, time left: 1 sec
94%, time left: 0 sec
0%, time left: unknown
7%, time left: 1 sec
13%, time left: 0 sec
19%, time left: 0 sec
25%, time left: 1 sec
32%, time left: 1 sec
38%, time left: 1 sec
44%, time left: 1 sec
50%, time left: 0 sec
57%, time left: 0 sec
63%, time left: 0 sec
69%, time left: 0 sec
75%, time left: 0 sec
82%, time

Now the `TMVA.root` will contain all the usefull information of the analysis. In order to save all the plots, start a root session in a terminal and run `TMVA::TMVAGui("TMVA.root")`.