# Critical Lambda and Limits Tutorial

This tutorial is intended to show one how to calculate the critical lambda curve and then the 90% confidence limit on the number of 0nuBB counts for the 2020 nEXO sensitivity paper. These two calculations are carried out by two scripts:

* `ComputeCriticalLambdaForFixedNumSignal.py`
* `Compute90PercentLimit_PythonCode.py`

Both scripts use the new python-based code and return the fit results in pandas dataframes, which are then written to HDF5 files. Below I will run one toy dataset with each and look at the output data.


In [1]:
# Import useful libraries for analysis

import pandas as pd
import histlite as hl
import numpy as np
from matplotlib import pyplot as plt

plt.rcParams.update({'font.size': 14})
plt.rcParams['figure.figsize'] = (7,6)

## Critical lambda calculation

First, we'll start with the critical lambda calculation. This requires one to simulate an ensemble of datasets under a specific hypothesis (meaning some assumed number of signal counts), then compute the likelihood ratio for that hypothesis by fixing the signal value to the one used to generate the dataset. Doing this over and over again will generate a distribution, and the 90% quantile of that distribution represents a threshold for which the p-value of that hypothesis will be less than 10%. (For a better explanation of the statistics, check out the first nEXO sensitivity paper).

We've wrapped this up into a python script that takes a few command line arguments and then runs all the necessary code to do this calculation. Let's take a look at that python script now. First, we need to `cd` into the directory where the script lives, since there are some hardcoded relative `PATH`s.


In [3]:
%cd ../SensitivityPaper2020_scripts/
%pwd

/g/g20/lenardo1/nEXO/sensitivity/work/SensitivityPaper2020_scripts


'/g/g20/lenardo1/nEXO/sensitivity/work/SensitivityPaper2020_scripts'

In [4]:
# Running the script with no arguements will produce an error, and a message showing what
# arguments need to be provided.
%run ComputeCriticalLambdaForFixedNumSignal.py



ERROR: ComputeCriticalLambdaForNumSignal.py requires 3 arguments
Usage:
	python ComputeCriticalLambdaForNumSignal.py <iteration_num> <input_num_signal> <num_datasets_to_generate> </path/to/output/directory/>


SystemExit: 


Quick explanation of the arguments:
* `<iteration_num>` is just a number that gets appended to the output filename. This is used when submitting a bunch of parallel jobs, so the files don't overwrite each other. For now, we can use whatever we want.
* `<input_num_signal>` is the input hypothesis, which is a number of signal counts.
* `<num_datasets_to_generate>` is how many toy datasets will be created/fit
* `</path/to/output/directory/>` is wherever you'd like the output to end up


So let's try it:

In [5]:
%run ComputeCriticalLambdaForFixedNumSignal.py 1 10. 1 ./


Loading input data froma previously-generated components table....

Loaded dataframe with 130 components.
Contains the following quantities of interest:
	PDFName
	Component
	Isotope
	MC ID
	Histogram
	HistogramAxisNames
	Total Mass or Area
	Halflife
	SpecActiv
	SpecActivErr
	RawActiv
	RawActivErr
	Activity ID
	Expected Counts
	Expected Counts Err
	Expected Counts UL
	TotalHitEff_N
	TotalHitEff_K
	Group

Fit variables:
	['SS/MS', 'Energy (keV)', 'Standoff (mm)']

Creating grouped PDFs....
	Group:     	Expected Counts:
	Far        	        4882.22
	Vessel_U238 	       19053.03
	Vessel_Th232 	        2169.26
	Off        	      139350.25
	Internals_U238 	       46351.84
	Internals_Th232 	        8667.26
	FullTPC_Co60 	         216.19
	FullTPC_K40 	    32572615.75
	Rn222      	        9107.27
	FullLXeBb2n 	    27949377.02
	FullLXeBb0n 	           0.00
	Xe137      	          46.52
	Total Sum  	    60612486.34


Running dataset 0....

Variable name:        Value:       IsFixed:  FitError   I

your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed,key->block3_values] [items->['best_fit_errors', 'best_fit_parameters', 'fixed_fit_errors', 'fixed_fit_parameters', 'input_parameters']]

  return pytables.to_hdf(path_or_buf, key, self, **kwargs)


Looks like it worked! The output file will be called `critical_lambda_calculation_num_sig_10.0_file_1.h5`. 

The output data is stored in a `pandas` dataframe, which gets written to the output file using the `pandas.DataFrame.to_hdf()` method. Let's open that up and take a look:

In [19]:
df = pd.read_hdf('critical_lambda_calculation_num_sig_10.0_file_1.h5')
df.head()

Unnamed: 0,best_fit_converged,best_fit_covar,best_fit_errors,best_fit_iterations,best_fit_parameters,fixed_fit_converged,fixed_fit_covar,fixed_fit_errors,fixed_fit_iterations,fixed_fit_parameters,input_parameters,lambda,num_signal
0,True,True,"{'Num_Far': 1169.2487677269582, 'Num_Vessel_U2...",1,"{'Num_Far': 4559.498409043134, 'Num_Vessel_U23...",True,True,"{'Num_Far': 996.5507785848697, 'Num_Vessel_U23...",2,"{'Num_Far': 4529.570475546009, 'Num_Vessel_U23...","{'Num_Far': 4882.215158717998, 'Num_Vessel_U23...",0.855804,10.0


We only ran a single dataset, so the output dataframe has a single row. The `df.head()` table format can be a bit tricky to read, so we can directly list the names of the columns contained in the output (equivalent to branches in a TTree):

In [20]:
# Look at the columns:
print(df.columns)

Index(['best_fit_converged', 'best_fit_covar', 'best_fit_errors',
       'best_fit_iterations', 'best_fit_parameters', 'fixed_fit_converged',
       'fixed_fit_covar', 'fixed_fit_errors', 'fixed_fit_iterations',
       'fixed_fit_parameters', 'input_parameters', 'lambda', 'num_signal'],
      dtype='object')


Below I inspect a few of these:

In [21]:
# We can use the DataFrame.iloc[] method to look the results from a single toy dataset.
this_evt_idx = 0
this_row = df.iloc[this_evt_idx]


# Did the best fit converge?
print('Best fit converged? {}'.format( this_row['best_fit_converged'] ) )
print('Best fit has accurate covariance matrix? {}'.format( this_row['best_fit_covar'] ) )



# For the fit with a fixed num_signal, what did the parameters converge to?
print('\n\nFixed fit parameter values:')
for key,val in (this_row['fixed_fit_parameters']).items():
    print('{:<23} {:4.4} counts'.format(key+':',val))
    
    
    
# What was lambda?
print('\n\nWhat was the profile likelhood ratio for ' +\
      'this hypothesis under the given dataset? \n' +\
      '\tLambda = {:4.4}'.format(this_row['lambda']) )





Best fit converged? True
Best fit has accurate covariance matrix? True


Fixed fit parameter values:
Num_Far:                4.53e+03 counts
Num_Vessel_U238:        2.281e+04 counts
Num_Vessel_Th232:       1.486e+03 counts
Num_Internals_U238:     3.969e+04 counts
Num_Internals_Th232:    9.056e+03 counts
Num_FullTPC_Co60:       216.2 counts
Num_FullTPC_K40:        3.257e+07 counts
Num_Rn222:              9.633e+03 counts
Num_FullLXeBb2n:        2.796e+07 counts
Num_FullLXeBb0n:        10.0 counts
Num_Xe137:              69.62 counts
Signal_Efficiency:      1.005 counts


What was the profile likelhood ratio for this hypothesis under the given dataset? 
	Lambda = 0.8558


## Computing the 90% confidence limit

In the next few days, I'll add the same thing here....