## Hypothesis Test Example with Toys


This notebook performs an hypothesis test and computes the significance of the test (p-value) using the asympotic approximation of the profile likelihood function.

The test statistics for computing the hypothesis test is 

$$q_\mu = - 2 \log \frac { L( x \ | \ \mu , \hat{\hat{\nu} } ) }{  L( x \ | \ \hat{\mu} , \hat{\nu}  ) } \  \  \ \mathrm{for} \  \  
\hat{\mu}  >  0$$
$$q_\mu = 0     \hspace{3cm}     \ \mathrm{for} \  \  
\hat{\mu} \le  0$$

Pseudo-experimets are used to obtain the test statistics distribution for the two hypothesis

In [None]:
using namespace RooStats;

In [None]:
HypoTestResult * result = nullptr;
ProfileLikelihoodTestStat * testStat = nullptr; 
ToyMCSampler * toymcs = nullptr; 
HypoTestPlot * plot = nullptr; 
// enable use of NLL offset for better minimizations
RooStats::UseNLLOffset(true);

#### Set the number of expected events used to compute the expected significance

In [None]:
int nexp_events = 250; 

TString fileName ="HiggsBinModel.root";  // for s simplified model
//TString fileName ="HiggsBinModelSimple.root";  
TString workspaceName = "w";
TString modelConfigName = "ModelConfig";
TString dataName = "data";
TString integrationType = "";  
//ROOT::Math::MinimizerOptions::SetDefaultMinimizer("Minuit2");

#### Reading the model (Workspace) from input file

First part is just to access the workspace file and retrieve the model and the data 

In [None]:
auto file = TFile::Open(fileName);
auto w =  (RooWorkspace*) file->Get(workspaceName);
w->Print();
auto sbModel = (RooStats::ModelConfig*) w->obj(modelConfigName);
auto  data = w->data(dataName);
auto poi = (RooRealVar*) sbModel->GetParametersOfInterest()->first();

##### Make the b Model by cloning the b model and use a value = 0 for the parameter of interest

In [None]:
auto bModel = (RooStats::ModelConfig*) sbModel->Clone();
sbModel->SetName("S+B Model");
poi->setVal(nexp_events);
sbModel->SetSnapshot( *poi);
bModel->SetName("B Model");
poi->setVal(0);
bModel->SetSnapshot( *poi  );
sbModel->Print();
bModel->Print();

We set the mass and the width to constant. 
We  fix also the background parameters $a_1$ and $a_2$ to speed up  the pseudo-experiment generation.

In [None]:
w->var("a1")->setConstant(true);
w->var("a2")->setConstant(true);

w->var("mass")->setConstant(true);
w->var("width")->setConstant(true);

### Run Asymptotic calculator to obtain asymptotic significance

In [None]:
RooStats::AsymptoticCalculator::SetPrintLevel(-1);  // to switch off print level 
RooStats::AsymptoticCalculator  asymCalc(*data, *sbModel, *bModel);

Configure the calculator

In [None]:
asymCalc.SetOneSidedDiscovery(true);  // for one-side discovery test
asymCalc.SetPrintLevel(-1);  // to suppress print level 

Run the calculator and get the result

In [None]:
result = asymCalc.GetHypoTest();
result->Print();

In [None]:
std::cout << "Asymptotic significance = " << result->Significance() << " for p-value = " << result->NullPValue() << std::endl; 

### Run the Frequentist Calculator to compute significance using toys

We run now on the same model the FrequentistCalculator. The Frequentist Calculator uses the test statistic distributions obtained with pseudo-experiments.

In [None]:
RooStats::FrequentistCalculator   fc(*data, *sbModel, *bModel);
// to enable Proof
RooStats::ProofConfig pc(*w, 0, "", kFALSE);

We configure the Frequentist calculator by specifying the number of toys for the two hypothesis 

We need also to specify the test statistics type. Here are some possible test statistics to use 

In [None]:
testStat = new RooStats::ProfileLikelihoodTestStat(*sbModel->GetPdf());
// needed for PL test statistics
testStat->SetOneSidedDiscovery(true);
// to enable debug of fitting toys
// ((RooStats::ProfileLikelihoodTestStat *)testStat)->SetPrintLevel(1);

In [None]:
toymcs = (RooStats::ToyMCSampler*)fc.GetTestStatSampler();
toymcs->SetTestStatistic(testStat);
toymcs->SetGenerateBinned(true);
// toymcs->SetProofConfig(&pc);    // to use PROOF 

In [None]:
// for number counting experiments (i.e. when we have only one event per toy)
// in general shape cases are extended model
if (!sbModel->GetPdf()->canBeExtended())
    toymcs->SetNEventsPerToy(1);

#### Set the number of pseudo-experiments

In [None]:
fc.SetToys(2000,500);    // 2000 for null (B) and 500 for alt (S+B) 

#### Run now the calculator. 

It can take some time... be patient 

In [None]:
tw = new TStopwatch(); tw->Start(); // to print the time
result = fc.GetHypoTest(); 
result->Print();
tw->Print();

Plot now the test statistics distributions

In [None]:
plot = new RooStats::HypoTestPlot(*result);
plot->SetLogYaxis(true);
plot->Draw();
gPad->Draw();

We save the result in a file. We don;t want to loose the resulting information if we have run toys for some time. 

In [None]:
fileOut = TFile::Open("HypoTestResult.root","RECREATE");
result->Write();
fileOut->Close();

#### Is Test statistic distribution like a chi-square distribution with n.d.f =1 ? 

We want to fit the null test statistic distribution to check if it is compatible with a chi2 distribution

In [None]:
dist = result->GetNullDistribution();
vec = dist->GetSamplingDistribution();
cout << "number of null toys = " << vec.size() << endl;

hdist = new TH1D("hdist","Test Statistic distribution",200,0,10);

hdist->FillN(vec.size(),vec.data(),nullptr);
// merge all underflows (failing fits) in the first bin (bin 0)
hdist->SetBinContent(1, hdist->GetBinContent(0)+hdist->GetBinContent(1));

In [None]:
%jsroot off

Create the fit function as a 1/2 chisquared. Special case for forst bin (x < 0.05) 
Also the quantity plotted is the log-likelihood ratio and not 2 x log-likelihood ratio .
0.05 is the histogram bin width. 

In [None]:
fchi2 = new TF1("chi2","[](double*x,double*p){ if (x[0] < 0.05) { return 0.5*p[0]+ 0.5*p[0]*ROOT::Math::chisquared_cdf(0.1,p[1]); } else { return 0.05*p[0]*ROOT::Math::chisquared_pdf(2*x[0],p[1]); } }",0.,10.,2,1);

In [None]:
hdist->Draw();
fchi2->SetParameters(vec.size(),1);
fchi2->SetNpx(1000);
fchi2->SetLineColor(kGreen);
fchi2->DrawCopy("SAME");
fchi2->SetLineColor(kRed);
gPad->Draw();

In [None]:
// do integral fit 
hdist->Fit(fchi2,"L I ","SAME");

In [None]:
gStyle->SetOptFit(1111);
gPad->Draw();

In [None]:
auto hdist2 = (TH1*) hdist->Clone();
hdist2->GetXaxis()->SetRange(1,20);
hdist2->Draw();
gPad->Draw();