
<img src="tmva_logo.gif" height="20%" width="20%">

# TMVA Classification 

This notebook is a basic example for training and testing TMVA classifiers. 

## Declare Factory class

Create the Factory class. Later you can choose the methods
whose performance you'd like to investigate. 

The factory is the major TMVA object you have to interact with. Here is the list of parameters you need to pass

 - The first argument is the base of the name of all the output
weightfiles in the directory weight/ that will be created with the 
method parameters 

 - The second argument is the output file for the training results
  
 - The third argument is a string option defining some general configuration for the TMVA session. For example all TMVA output can be suppressed by removing the "!" (not) in front of the "Silent" argument in the option string

In [None]:
TMVA::Tools::Instance();


auto outputFile = TFile::Open("TMVA_ClassificationOutput.root", "RECREATE");

TMVA::Factory factory("TMVAClassification", outputFile,
                      "!V:ROC:!Silent:Color:!DrawProgressBar:AnalysisType=Classification" ); 

## Define the input dataset

Define input data file consisting of signal and background trees

In [None]:
TString inputFileName = "http://root.cern.ch/files/tmva_class_example.root";
//TString inputFileName = "tmva_class_example.root";

auto inputFile = TFile::Open( inputFileName );

// --- Register the training and test trees

TTree *signalTree     = (TTree*)inputFile->Get("TreeS");
TTree *backgroundTree = (TTree*)inputFile->Get("TreeB");


## Create DataLoader class

The next step is to declare the DataLoader class which provides the interface from TMVA to the input data 


In [None]:
TMVA::DataLoader * loader = new TMVA::DataLoader("dataset");

In [None]:
// global event weights per tree (see below for setting event-wise weights)
Double_t signalWeight     = 1.0;
Double_t backgroundWeight = 1.0;
   
// You can add an arbitrary number of signal or background trees
loader->AddSignalTree    ( signalTree,     signalWeight     );
loader->AddBackgroundTree( backgroundTree, backgroundWeight );


## Define input variables

Through the DataLoader we define the input variables that will be used for the MVA training.
Note that we can also use variable expressions, which can be parsed by *TTree::Draw( "expression" )*

In [None]:
signalTree->Print();

In [None]:
loader->AddVariable( "myvar1 := var1+var2", 'F' );
loader->AddVariable( "myvar2 := var1-var2", "Expression 2", "", 'F' );
loader->AddVariable( "var3",                "Variable 3", "units", 'F' );
loader->AddVariable( "var4",                "Variable 4", "units", 'F' );

// You can add so-called "Spectator variables", which are not used in the MVA training,
// but will appear in the final "TestTree" produced by TMVA. This TestTree will contain the
// input variables, the response values of all trained MVAs, and the spectator variables
loader->AddSpectator( "spec1 := var1*2",  "Spectator 1", "units", 'F' );
loader->AddSpectator( "spec2 := var1*3",  "Spectator 2", "units", 'F' );


//  We can define also the event weights

// Set individual event weights (the variables must exist in the original TTree)
//    for signal    : factory->SetSignalWeightExpression    ("weight1*weight2");
//    for background: factory->SetBackgroundWeightExpression("weight1*weight2");
loader->SetBackgroundWeightExpression( "weight" );


## Prepare data: split in training and test sample 

In [None]:
// Apply additional cuts on the signal and background samples (can be different)
TCut mycuts = ""; // for example: TCut mycuts = "abs(var1)<0.5 && abs(var2-0.5)<1";
TCut mycutb = ""; // for example: TCut mycutb = "abs(var1)<0.5";

// Tell the factory how to use the training and testing events
//
// If no numbers of events are given, half of the events in the tree are used 
// for training, and the other half for testing:
//    loader->PrepareTrainingAndTestTree( mycut, "SplitMode=random:!V" );
// To also specify the number of testing events, use:
//    loader->PrepareTrainingAndTestTree( mycut,
//                                         "NSigTrain=3000:NBkgTrain=3000:NSigTest=3000:NBkgTest=3000:SplitMode=Random:!V" );
loader->PrepareTrainingAndTestTree( mycuts, mycutb,
                                    "nTrain_Signal=4000:nTrain_Background=2000:SplitMode=Random:NormMode=NumEvents:!V" );


# Booking Classifiers Methods


We Book here the different MVA method we want to use. 
We specify the method using the appropriate enumeration, defined in *TMVA::Types*.
See the file *TMVA/Types.h* for all possible MVA methods available. 
In addition, we specify via an option string all the method parameters. For all possible options, default parameter values, see the corresponding documentation in the TMVA Users Guide. 

Note that with the booking one can also specify individual variable tranformations to be done before using the method.
For example *VarTransform=Decorrelate* will decorrelate the inputs.  

In [None]:
// Likelihood ("naive Bayes estimator")
factory.BookMethod(loader, TMVA::Types::kLikelihood, "Likelihood",
                           "H:!V:TransformOutput:PDFInterpol=Spline2:NSmoothSig[0]=20:NSmoothBkg[0]=20:NSmoothBkg[1]=10:NSmooth=1:NAvEvtPerBin=50" );

// Use a kernel density estimator to approximate the PDFs
factory.BookMethod(loader, TMVA::Types::kLikelihood, "LikelihoodKDE",
                           "!H:!V:!TransformOutput:VarTransform=D:PDFInterpol=KDE:KDEtype=Gauss:KDEiter=Adaptive:KDEFineFactor=0.3:KDEborder=None:NAvEvtPerBin=50" ); 


// Fisher discriminant (same as LD)
factory.BookMethod(loader, TMVA::Types::kFisher, "Fisher", "H:!V:Fisher:VarTransform=None:CreateMVAPdfs:PDFInterpolMVAPdf=Spline2:NbinsMVAPdf=50:NsmoothMVAPdf=10" );

//Boosted Decision Trees
factory.BookMethod(loader,TMVA::Types::kBDT, "BDT",
                   "!V:NTrees=200:MinNodeSize=2.5%:MaxDepth=2:BoostType=AdaBoost:AdaBoostBeta=0.5:UseBaggedBoost:BaggedSampleFraction=0.5:SeparationType=GiniIndex:nCuts=20" );

//Multi-Layer Perceptron (Neural Network)
factory.BookMethod(loader, TMVA::Types::kMLP, "MLP",
                   "!H:!V:NeuronType=tanh:VarTransform=N:NCycles=100:HiddenLayers=N+5:TestRate=5:!UseRegulator" );

## Train all methods

Here we train all previously booked methods

In [None]:
factory.TrainAllMethods();

## Test  all methods

Here we test all methods using the test data set

In [None]:
factory.TestAllMethods();  

## Evaluate  all methods

Here we evaluate all methods and compare their performances, computing efficiencies, ROC curves etc.. using both 
training and tetsing data sets. 
Several histograms are produced which can be examined with the TMVAGui or directly using the output file

In [None]:
factory.EvaluateAllMethods();  

## Plot ROC Curve

We plot here the produce ROC curve obtained on evaluating the methods on the test data set

In [None]:
//We enable JavaScript visualisation for the plots
//%jsroot on

In [None]:
auto c1 = factory.GetROCCurve(loader);
c1->Draw();

####  Close outputfile to save all output information (evaluation result of methods)

In [None]:
outputFile->Close();