
<img src="tmva_logo.gif" height="20%" width="20%">

# TMVA Classification 

This notebook is a basic example for training and testing TMVA classifiers. 

## Declare Factory class

Create the Factory class. Later you can choose the methods
whose performance you'd like to investigate. 

The factory is the major TMVA object you have to interact with. Here is the list of parameters you need to pass

 - The first argument is the base of the name of all the output
weightfiles in the directory weight/ that will be created with the 
method parameters 

 - The second argument is the output file for the training results
  
 - The third argument is a string option defining some general configuration for the TMVA session. For example all TMVA output can be suppressed by removing the "!" (not) in front of the "Silent" argument in the option string

In [1]:
TMVA::Tools::Instance();


auto outputFile = TFile::Open("TMVA_ClassificationOutput.root", "RECREATE");

TMVA::Factory factory("TMVAClassification", outputFile,
                      "!V:ROC:!Silent:Color:!DrawProgressBar:AnalysisType=Classification" ); 

## Define the input dataset

Define input data file consisting of signal and background trees

In [2]:
//TString inputFileName = "http://root.cern.ch/files/tmva_example.root";
TString inputFileNameS = "RSG_C10_M500.root";
TString inputFileNameB = "ZtautauB_221.root";

auto inputFileS = TFile::Open( inputFileNameS );
auto inputFileB = TFile::Open( inputFileNameB );

// --- Register the training and test trees

TTree *signalTree     = (TTree*)inputFileS->Get("Nominal");
TTree *backgroundTree = (TTree*)inputFileB->Get("Nominal");


## Create DataLoader class

The next step is to declare the DataLoader class which provides the interface from TMVA to the input data 


In [3]:
TMVA::DataLoader * loader = new TMVA::DataLoader("dataset");

In [4]:
// global event weights per tree (see below for setting event-wise weights)
Double_t signalWeight     = 1.0;
Double_t backgroundWeight = 1.0;
   
// You can add an arbitrary number of signal or background trees
loader->AddSignalTree    ( signalTree,     signalWeight     );
loader->AddBackgroundTree( backgroundTree, backgroundWeight );


DataSetInfo              : [dataset] : Added class "Signal"
                         : Add Tree Nominal of type Signal with 476 events
DataSetInfo              : [dataset] : Added class "Background"
                         : Add Tree Nominal of type Background with 18 events


## Define input variables

Through the DataLoader we define the input variables that will be used for the MVA training.
Note that we can also use variable expressions, which can be parsed by *TTree::Draw( "expression" )*

In [5]:
signalTree->Print();

******************************************************************************
*Tree    :Nominal   : Nominal                                                *
*Entries :      476 : Total =          138479 bytes  File  Size =      79858 *
*        :          : Tree compression factor =   1.47                       *
******************************************************************************
*Br    0 :sample    : string                                                 *
*Entries :      476 : Total  Size=      14381 bytes  File Size  =       1075 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=  12.92     *
*............................................................................*
*Br    1 :EventWeight : EventWeight/F                                        *
*Entries :      476 : Total  Size=       2494 bytes  File Size  =       1787 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.11     *
*...................................................

In [6]:
loader->AddVariable( "EventWeight", "EventWeight", "units", 'F' );
loader->AddVariable( "EventNumber", "EventNumber", "units", 'l' );
loader->AddVariable( "NJets", "NJets", "units", 'I' );
loader->AddVariable( "NJetsbtagged", "NJetsbtagged", "units", 'I' );
loader->AddVariable( "Tau1Pt", "Tau1Pt", "units", 'F' );
loader->AddVariable( "Tau1Eta", "Tau1Eta", "units", 'F' );
loader->AddVariable( "Tau1Phi", "Tau1Phi", "units", 'F' );
loader->AddVariable( "Tau2Pt", "Tau2Pt", "units", 'F' );
loader->AddVariable( "Tau2Eta", "Tau2Eta", "units", 'F' );
loader->AddVariable( "Tau2Phi", "Tau2Phi", "units", 'F' );
loader->AddVariable( "diTauVisM", "diTauVisM", "units", 'F' );
loader->AddVariable( "diTauVisPt", "diTauVisPt", "units", 'F' );
loader->AddVariable( "diTauVisEta", "diTauVisEta", "units", 'F' );
loader->AddVariable( "diTauVisPhi", "diTauVisPhi", "units", 'F' );
loader->AddVariable( "diTauMMCM", "diTauMMCM", "units", 'F' );
loader->AddVariable( "diTauMMCPt", "diTauMMCPt", "units", 'F' );
loader->AddVariable( "diTauMMCEta", "diTauMMCEta", "units", 'F' );
loader->AddVariable( "diTauMMCPhi", "diTauMMCPhi", "units", 'F' );
loader->AddVariable( "diTauDR", "diTauDR", "units", 'F' );
loader->AddVariable( "diTauDEta", "diTauDEta", "units", 'F' );
loader->AddVariable( "diTauDPhi", "diTauDPhi", "units", 'F' );
loader->AddVariable( "Jet1Pt", "Jet1Pt", "units", 'F' );
loader->AddVariable( "Jet1Eta", "Jet1Eta", "units", 'F' );
loader->AddVariable( "Jet1Phi", "Jet1Phi", "units", 'F' );
loader->AddVariable( "Jet1M", "Jet1M", "units", 'F' );
loader->AddVariable( "Jet2Pt", "Jet2Pt", "units", 'F' );
loader->AddVariable( "Jet2Eta", "Jet2Eta", "units", 'F' );
loader->AddVariable( "Jet2Phi", "Jet2Phi", "units", 'F' );
loader->AddVariable( "Jet2M", "Jet2M", "units", 'F' );
loader->AddVariable( "diJetM", "diJetM", "units", 'F' );
loader->AddVariable( "diJetPt", "diJetPt", "units", 'F' );
loader->AddVariable( "diJetEta", "diJetEta", "units", 'F' );
loader->AddVariable( "diJetPhi", "diJetPhi", "units", 'F' );
loader->AddVariable( "diJetDR", "diJetDR", "units", 'F' );
loader->AddVariable( "diJetDEta", "diJetDEta", "units", 'F' );
loader->AddVariable( "diJetDPhi", "diJetDPhi", "units", 'F' );
loader->AddVariable( "diHiggsMScaled", "diHiggsMScaled", "units", 'F' );
loader->AddVariable( "diHiggsM", "diHiggsM", "units", 'F' );
loader->AddVariable( "diHiggsPt", "diHiggsPt", "units", 'F' );
loader->AddVariable( "MTW_Max", "MTW_Max", "units", 'F' );
loader->AddVariable( "MTW_Clos", "MTW_Clos", "units", 'F' );
loader->AddVariable( "METCentrality", "METCentrality", "units", 'F' );
loader->AddVariable( "MET", "MET", "units", 'F' );

// You can add so-called "Spectator variables", which are not used in the MVA training,
// but will appear in the final "TestTree" produced by TMVA. This TestTree will contain the
// input variables, the response values of all trained MVAs, and the spectator variables
//loader->AddSpectator( "spec1 := var1*2",  "Spectator 1", "units", 'F' );
//loader->AddSpectator( "spec2 := var1*3",  "Spectator 2", "units", 'F' );


//  We can define also the event weights

// Set individual event weights (the variables must exist in the original TTree)
//    for signal    : factory->SetSignalWeightExpression    ("weight1*weight2");
//    for background: factory->SetBackgroundWeightExpression("weight1*weight2");
//loader->SetBackgroundWeightExpression( "weight" );


## Prepare data: split in training and test sample 

In [7]:
// Apply additional cuts on the signal and background samples (can be different)
TCut mycuts = ""; // for example: TCut mycuts = "abs(var1)<0.5 && abs(var2-0.5)<1";
TCut mycutb = ""; // for example: TCut mycutb = "abs(var1)<0.5";

// Tell the factory how to use the training and testing events
//
// If no numbers of events are given, half of the events in the tree are used 
// for training, and the other half for testing:
//    loader->PrepareTrainingAndTestTree( mycut, "SplitMode=random:!V" );
// To also specify the number of testing events, use:
//    loader->PrepareTrainingAndTestTree( mycut,
//                                         "NSigTrain=3000:NBkgTrain=3000:NSigTest=3000:NBkgTest=3000:SplitMode=Random:!V" );
loader->PrepareTrainingAndTestTree( mycuts, mycutb,
                                    "nTrain_Signal=476:nTrain_Background=15:SplitMode=Random:NormMode=NumEvents:!V" );


# Booking Classifiers Methods


We Book here the different MVA method we want to use. 
We specify the method using the appropriate enumeration, defined in *TMVA::Types*.
See the file *TMVA/Types.h* for all possible MVA methods available. 
In addition, we specify via an option string all the method parameters. For all possible options, default parameter values, see the corresponding documentation in the TMVA Users Guide. 

Note that with the booking one can also specify individual variable tranformations to be done before using the method.
For example *VarTransform=Decorrelate* will decorrelate the inputs.  

In [8]:
// Likelihood ("naive Bayes estimator")
factory.BookMethod(loader, TMVA::Types::kLikelihood, "Likelihood",
                           "H:!V:TransformOutput:PDFInterpol=Spline2:NSmoothSig[0]=20:NSmoothBkg[0]=20:NSmoothBkg[1]=10:NSmooth=1:NAvEvtPerBin=50" );

// Use a kernel density estimator to approximate the PDFs
factory.BookMethod(loader, TMVA::Types::kLikelihood, "LikelihoodKDE",
                           "!H:!V:!TransformOutput:VarTransform=D:PDFInterpol=KDE:KDEtype=Gauss:KDEiter=Adaptive:KDEFineFactor=0.3:KDEborder=None:NAvEvtPerBin=50" ); 


// Fisher discriminant (same as LD)
factory.BookMethod(loader, TMVA::Types::kFisher, "Fisher", "H:!V:Fisher:VarTransform=None:CreateMVAPdfs:PDFInterpolMVAPdf=Spline2:NbinsMVAPdf=50:NsmoothMVAPdf=10" );

//Boosted Decision Trees
factory.BookMethod(loader,TMVA::Types::kBDT, "BDT",
                   "!V:NTrees=200:MinNodeSize=2.5%:MaxDepth=2:BoostType=AdaBoost:AdaBoostBeta=0.5:UseBaggedBoost:BaggedSampleFraction=0.5:SeparationType=GiniIndex:nCuts=20" );

//Multi-Layer Perceptron (Neural Network)
factory.BookMethod(loader, TMVA::Types::kMLP, "MLP",
                   "!H:!V:NeuronType=tanh:VarTransform=N:NCycles=100:HiddenLayers=N+5:TestRate=5:!UseRegulator" );

Factory                  : Booking method: [1mLikelihood[0m
                         : 
Factory                  : Booking method: [1mLikelihoodKDE[0m
                         : 
LikelihoodKDE            : [dataset] : Create Transformation "D" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'EventWeight' <---> Output : variable 'EventWeight'
                         : Input : variable 'EventNumber' <---> Output : variable 'EventNumber'
                         : Input : variable 'NJets' <---> Output : variable 'NJets'
                         : Input : variable 'NJetsbtagged' <---> Output : variable 'NJetsbtagged'
                         : Input : variable 'Tau1Pt' <---> Output : variable 'Tau1Pt'
                         : Input : variable 'Tau1Eta' <---> Output : variable 'Tau1Eta'
                         : Input : variable 'Tau1Phi' <---> Output : variable 'Ta

## Train all methods

Here we train all previously booked methods

In [9]:
factory.TrainAllMethods();

Factory                  : [1mTrain all methods[0m
Factory                  : [dataset] : Create Transformation "I" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'EventWeight' <---> Output : variable 'EventWeight'
                         : Input : variable 'EventNumber' <---> Output : variable 'EventNumber'
                         : Input : variable 'NJets' <---> Output : variable 'NJets'
                         : Input : variable 'NJetsbtagged' <---> Output : variable 'NJetsbtagged'
                         : Input : variable 'Tau1Pt' <---> Output : variable 'Tau1Pt'
                         : Input : variable 'Tau1Eta' <---> Output : variable 'Tau1Eta'
                         : Input : variable 'Tau1Phi' <---> Output : variable 'Tau1Phi'
                         : Input : variable 'Tau2Pt' <---> Output : variable 'Tau2Pt'
                         : Input : va

                         : Ranking input variables (method unspecific)...
IdTransformation         : Ranking result (top variable is best ranked)
                         : ---------------------------------------
                         : Rank : Variable       : Separation
                         : ---------------------------------------
                         :    1 : EventNumber    : 1.000e+00
                         :    2 : EventWeight    : 1.000e+00
                         :    3 : diTauMMCPt     : 9.509e-01
                         :    4 : diTauDR        : 9.439e-01
                         :    5 : diTauVisPt     : 9.290e-01
                         :    6 : diTauDPhi      : 8.919e-01
                         :    7 : diHiggsPt      : 8.073e-01
                         :    8 : diJetPt        : 8.073e-01
                         :    9 : Jet1Pt         : 7.069e-01
                         :   10 : diTauVisEta    : 6.232e-01
                         :   11 : METCentrality 



                         : TMVA_ClassificationOutput.root:/dataset/Method_Likelihood/Likelihood
Factory                  : Training finished
                         : 
Factory                  : Train method: LikelihoodKDE for Classification
                         : 
                         : Preparing the Decorrelation transformation...
TFHandler_LikelihoodKDE  :       Variable              Mean              RMS      [        Min              Max ]
                         : -----------------------------------------------------------------------------------------
                         :    EventWeight:            -nan            -nan   [     1.7977e+308    -1.7977e+308 ]
                         :    EventNumber:            -nan            -nan   [     1.7977e+308    -1.7977e+308 ]
                         :          NJets:            -nan            -nan   [     1.7977e+308    -1.7977e+308 ]
                         :   NJetsbtagged:            -nan            -nan   [     1.7



                         : TMVA_ClassificationOutput.root:/dataset/Method_LikelihoodKDE/LikelihoodKDE
Factory                  : Training finished
                         : 
Factory                  : Train method: Fisher for Classification
                         : 
                         : 
                         : [1mH e l p   f o r   M V A   m e t h o d   [ Fisher ] :[0m
                         : 
                         : [1m--- Short description:[0m
                         : 
                         : Fisher discriminants select events by distinguishing the mean 
                         : values of the signal and background distributions in a trans- 
                         : formed variable space where linear correlations are removed.
                         : 
                         :    (More precisely: the "linear discriminator" determines
                         :     an axis in the (correlated) hyperspace of the input 
                         :     vari

                         : Elapsed time for training with 491 events: 3.51 sec         
MLP                      : [dataset] : Evaluation of MLP on training sample (491 events)
                         : Elapsed time for evaluation of 491 events: 0.011 sec       
                         : Creating xml weight file: [0;36mdataset/weights/TMVAClassification_MLP.weights.xml[0m
                         : Creating standalone class: [0;36mdataset/weights/TMVAClassification_MLP.class.C[0m
                         : Write special histos to file: TMVA_ClassificationOutput.root:/dataset/Method_MLP/MLP
Factory                  : Training finished
                         : 
                         : Ranking input variables (method specific)...
Likelihood               : Ranking result (top variable is best ranked)
                         : ---------------------------------------------
                         : Rank : Variable       : Delta Separation
                         : ------------

                         : Reading weight file: [0;36mdataset/weights/TMVAClassification_LikelihoodKDE.weights.xml[0m
                         : Reading weight file: [0;36mdataset/weights/TMVAClassification_Fisher.weights.xml[0m
                         : Reading weight file: [0;36mdataset/weights/TMVAClassification_BDT.weights.xml[0m
                         : Reading weight file: [0;36mdataset/weights/TMVAClassification_MLP.weights.xml[0m
MLP                      : Building Network. 
                         : Initializing weights


## Test  all methods

Here we test all methods using the test data set

In [10]:
factory.TestAllMethods();  

Factory                  : [1mTest all methods[0m
Factory                  : Test method: Likelihood for Classification performance
                         : 
Likelihood               : [dataset] : Evaluation of Likelihood on testing sample (3 events)
                         : Elapsed time for evaluation of 3 events: 0.000121 sec       
Factory                  : Test method: LikelihoodKDE for Classification performance
                         : 
LikelihoodKDE            : [dataset] : Evaluation of LikelihoodKDE on testing sample (3 events)
                         : Elapsed time for evaluation of 3 events: 0.000143 sec       
Factory                  : Test method: Fisher for Classification performance
                         : 
Fisher                   : [dataset] : Evaluation of Fisher on testing sample (3 events)
                         : Elapsed time for evaluation of 3 events: 1.38e-05 sec       
                         : Dataset[dataset] : Evaluation of Fisher on testing

## Evaluate  all methods

Here we evaluate all methods and compare their performances, computing efficiencies, ROC curves etc.. using both 
training and tetsing data sets. 
Several histograms are produced which can be examined with the TMVAGui or directly using the output file

In [11]:
factory.EvaluateAllMethods();  

Factory                  : [1mEvaluate all methods[0m
Factory                  : Evaluate classifier: Likelihood
                         : 
Likelihood               : [dataset] : Loop over test events and fill histograms with classifier response...
                         : 
[37;41;1m<FATAL>                         : Number of entries <= 0 (0 in histogram: MVA_Likelihood_S)[0m
***> abort program execution


Error in <TMVA::Tools::Mean>: sum of weights <= 0 ?! that's a bit too much of negative event weights :) 


## Plot ROC Curve

We plot here the produce ROC curve obtained on evaluating the methods on the test data set

In [14]:
//We enable JavaScript visualisation for the plots
//%jsroot on

In [15]:
auto c1 = factory.GetROCCurve(loader);
c1->Draw();

DataSetFactory           : [dataset] : Number of events in input trees
                         : Dataset[dataset] :     Signal     requirement: "abs(var1)<0.5 && abs(var2-0.5)<1"
                         : Dataset[dataset] :     Signal          -- number of events passed: 1502   / sum of weights: 1502 
                         : Dataset[dataset] :     Signal          -- efficiency             : 0.250333
                         : Dataset[dataset] :     Background requirement: "abs(var1)<0.5"
                         : Dataset[dataset] :     Background      -- number of events passed: 823    / sum of weights: 809.839
                         : Dataset[dataset] :     Background      -- efficiency             : 0.375207
                         : Dataset[dataset] :  you have opted for interpreting the requested number of training/testing events
                         :  to be the number of events AFTER your preselection cuts
                         : 
[37;41;1m<FATAL>                

####  Close outputfile to save all output information (evaluation result of methods)

In [16]:
outputFile->Close();