## Example sPlot  with data fitting using a simulated PDF

First you need to create some pseudo data for fitting. You will need to run the notebook XXX

First load the hs fit classes

In [None]:
gROOT->ProcessLine(".x $HSCODE/hsfit2/LoadFit.C+")

In [None]:
gSystem->Setenv("DATA","$HSCODE/RooFitExamples2/sPlotEventsPDF/");

### Construct an sPlot object for running the fit

Set an output directory for saving weights and plots and stdout

In [None]:
sPlot RF;
RF.SetUp().SetOutDir("/work/dump/out/");

### Set an observables

These are the fit variables. In general you can fit in as many dimensions as you need by supplying more variables.

Here we are going to fit the variable corresponding to the branch in the data tree called "Mmiss". We will only fit in the limits 0->9.5.

In [None]:
///////////////////////////////Load Variables
RF.SetUp().LoadVariable("Mmiss[0,9.5]");//should be same name as variable in tree

IMPORTANT here we set the event ID variable. Each event in your tree should have a unique id with which to synchronise with weights.
    This is useful as it allows us to break the data up, perform seperate fits then combine the weights afterwards.
    The weights are not written into the input tree, rather they are stored seperately (as HS::Weights) and combined when required. This allows you to use various sets of weights when performing fits or regenerate the weights after some corrections etc.

In [None]:
RF.SetUp().SetIDBranchName("fgID");

### Create a signal PDF

We use the PDF class RooHsEventsHistPDF to make a missing mass shape for the signal. This essentially fills a missing mass histogram with simulated events and uses this as a PDF shape. As simulations are not always a perfect match to the data it allows some flexibilty in the shape to account for any systematic differences in calibrations etc. The parameters  :

      alpha = convolution with Gaussian of width alpha
      off   = offset on the Mmiss axis (alignment)
      scale = scaling of the Mmiss axis (broader scale <1; narrower scale >1)
Note these parameters are constrained to be as close as possible to the actual simulated data. This is done by adding Gaussian constraint terms to the likelihood for each parameter. The widths and means are encoded in the parameter ranges : means = inital value given for alpha, off and scale; widths = range/5 for alpha; range/10 for off and scale.

The name of the PDF we give is Signal (it does not have to be it could have been anything) and this is used to attach the simulated data file and is used to identify the output signal sWeights species. "Mmiss" is required to tells it the fit variable; alpha[0,0,20] creates a new fit parameter alpha (starting at 0 with range 0-20 ), similarily for off and scale (see note above on Gaussian constraints)

The LoadSpecies Line adds the PDF Signal to the Extended Maximum Likelihood fit and creates a corresponding yield parameter Yld_Signal which will give the number of signal events in the data after the fit.
Note, I did not have to directly load this PDF I could have created more PDFs and combined them to make a more complicated PDF shape first.

In [None]:
//////////////////////////////Make signal PDF
RF.SetUp().FactoryPDF("RooHSEventsHistPDF::Signal(Mmiss,alpha[0,0,20],off[0,-2,2],scale[1,0.8,1.2])");
RF.SetUp().LoadSpeciesPDF("Signal",1); 

### Create a background PDF

This is the same as for the signal PDF. Now we give the name BG to connect to the output weights and input data and create 3 new parameters: alphaB, offB and scaleB. Not I did not have to use new paramters here if I felt this systematic differences should be the same for the signal as background I could just have written alpha instead of alphaB[0,0,5].

In [None]:
//////////////////////////////Make background PDF
RF.SetUp().FactoryPDF("RooHSEventsHistPDF::BG(Mmiss,alphaB[0,0,5],offB[0,0,0],scaleB[1.0,0.8,1.2])");
RF.SetUp().LoadSpeciesPDF("BG",1);

### Define Bins/Splits
Often you wil want to split the data into distinct bins, in invariant mass or angle for example, and perform seperate fits to each bin. Here we can set as many different variables to split the data on as we like. Here we must use the Binner class accessed via RF.Bins() to propogate the splits into data and simulated events.

The function LoadBinVars takes a variable to make bins out of which has to be in the data tree ("Eg"), the number of bins to make and the limits of the variable. Alternatively you can supply the variable name, number of bins and an array of bin edges as for TAxis.

In [None]:
RF.Bins().LoadBinVar("Eg",5,3,4);

### Load Data

In this case I need to load real (pseudo)data for fitting and simulated data for making the signal and background PDF shapes.

The treename is MyModel and the data files are Data.root, SigData.root and BGData.root these should be pregenerated. Note the use of "Signal" and "BG" to connect to the PDFs

In [None]:
///////////////////////////Load Data
RF.LoadData("MyModel","$DATA/Data.root");
RF.LoadSimulated("MyModel","$DATA/SigData.root", "Signal");
RF.LoadSimulated("MyModel","$DATA/BGData.root", "BG");

Turn on javascript root for interactive plots (even when saved as html file!)

In [None]:
%jsroot

### Run the fit.
The test output will be redirected to your specified output directory (see code cell 2) as file logRooFit.txt. This will give full details of the fit.
Here we will just report the final result, the weights and the plots of fit PDF overlayed on the data as well as the fit residuals and pulls.

There are different options for running. 
To just run standard use Here::Go and the fits will just be performed within this interactive session, sequentially in the case of bins.

Here::One(&RF,i) will run one singal fit from the different bins produce with LoadBinVars, where i gives the index of the bin

Use Proof::Go(&RF,N) and the fits will be run in parallel via PROOF with each split/bin running on a different core, where N = number of cores to use.
    

In [None]:
Here::Go(&RF);
//Here::One(&RF,3);
//Proof::Go(&RF,1);

### Draw some weighted variables

I can now use the resulting weights to make sPlots of the other variables in the tree, disentangling signal from background. 
I use the DrawWeighted function which takes standard TTree::Draw option for first argument and the weight species with the second (i.e. here I could plot Signal or BG) 

I also reload the original data which has a flag Sig which is the truth of whether the event is signal or background so I compare histograms using this with my weighted ones.

In [None]:
auto* can=new TCanvas;
can->Divide(2,1);
can->cd(1);
RF.DrawWeighted("M1>>M1(100,0,10)","Signal");
//compare to true signal
FiledTree::Read("MyModel","$DATA/Data.root")->Tree()->Draw("M1","Sig==1","same");

can->cd(2);
RF.DrawWeighted("M2>>M2(100,0,10)","Signal");
//compare to true signal
FiledTree::Read("MyModel","$DATA/Data.root")->Tree()->Draw("M2","Sig==1","same");

can->Draw();