# Statistical assessment on the congruency of decay chains of superheavy nuclei

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc" style="margin-top: 1em;"><ul class="toc-item"><li><span><a href="#Statistical-assessment-on-the-congruency-of-decay-chains-of-superheavy-nuclei" data-toc-modified-id="Statistical-assessment-on-the-congruency-of-decay-chains-of-superheavy-nuclei-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Statistical assessment on the congruency of decay chains of superheavy nuclei</a></span><ul class="toc-item"><li><span><a href="#Content" data-toc-modified-id="Content-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Content</a></span></li></ul></li><li><span><a href="#Examining-data---decay-chains" data-toc-modified-id="Examining-data---decay-chains-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Examining data - decay chains</a></span><ul class="toc-item"><li><span><a href="#Extracting-data-of-interest---lifetimes" data-toc-modified-id="Extracting-data-of-interest---lifetimes-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Extracting data of interest - lifetimes</a></span></li><li><span><a href="#Motivation---Do-all-decay-chains-in-a-set-have-a-common-origin?" data-toc-modified-id="Motivation---Do-all-decay-chains-in-a-set-have-a-common-origin?-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Motivation - Do all decay chains in a set have a common origin?</a></span></li></ul></li><li><span><a href="#Schmidt-tests-on-the-set-of-short-E115-decay-chains" data-toc-modified-id="Schmidt-tests-on-the-set-of-short-E115-decay-chains-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Schmidt tests on the set of short E115 decay chains</a></span><ul class="toc-item"><li><span><a href="#The-Schmidt-test" data-toc-modified-id="The-Schmidt-test-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>The Schmidt test</a></span></li><li><span><a href="#Generalised-Schmidt-test" data-toc-modified-id="Generalised-Schmidt-test-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Generalised Schmidt test</a></span></li><li><span><a href="#The-generalised-Schmidt-method-applied" data-toc-modified-id="The-generalised-Schmidt-method-applied-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>The generalised Schmidt method applied</a></span><ul class="toc-item"><li><span><a href="#Simulation-of-expected-values-and-confidence-intervals" data-toc-modified-id="Simulation-of-expected-values-and-confidence-intervals-3.3.1"><span class="toc-item-num">3.3.1&nbsp;&nbsp;</span>Simulation of expected values and confidence intervals</a></span></li></ul></li><li><span><a href="#Summary-Generalised-Schmidt-Method" data-toc-modified-id="Summary-Generalised-Schmidt-Method-3.4"><span class="toc-item-num">3.4&nbsp;&nbsp;</span>Summary Generalised Schmidt Method</a></span></li></ul></li><li><span><a href="#Figure-of-Merit-(FoM)-method" data-toc-modified-id="Figure-of-Merit-(FoM)-method-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Figure-of-Merit (FoM) method</a></span><ul class="toc-item"><li><span><a href="#Simulations-of-expected-values-and-confidence-intervals" data-toc-modified-id="Simulations-of-expected-values-and-confidence-intervals-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Simulations of expected values and confidence intervals</a></span><ul class="toc-item"><li><span><a href="#Random-sampling-of-$\tau$-from-its-likelihood-function" data-toc-modified-id="Random-sampling-of-$\tau$-from-its-likelihood-function-4.1.1"><span class="toc-item-num">4.1.1&nbsp;&nbsp;</span>Random sampling of $\tau$ from its likelihood function</a></span></li></ul></li></ul></li></ul></div>

## Content

* Introduction to the notebook and a background to the analysis (motivate the need)
    * Include dispute and connection to naming? 
    * Present limitation to short chains. 
* Decay chains
    * What is a decay chain?
    * Representation here. 
    * Keep notation consistent with papers.
    * Files are yaml and here markdown table printing. 
* Problem formulation
    * The Schmidt test, its limitations and motivation for generalisation.
    * The generalised Schmidt test as proposed by Ulrika Forsberg.
    * Illustrate the need with an example. Could be two different origins and sampled data and compare the results. 
* Calculations
    * Reproducible tables from articles (markdowned), i.e. simply insert decay chain data into formulas.
    * Confidence limits from simulations (speed up?). 
    * Write data to file or just store final value? How to do this?
* Visualisations
    * What here?

**References**:
* Schmidt tests: [D. Rudolph et al., EPJ Web of Conferences](https://www.epj-conferences.org/articles/epjconf/pdf/2016/12/epjconf_nn2016_01001.pdf) 
* FoM method: [U. Forsberg et al., Nucl. Phys. A](http://www.sciencedirect.com/science/article/pii/S0375947416300768?via%3Dihub)
* The alleged link between element 117 and 115 decay chains: [U. Forsberg et al., Physics Letters B, ](http://www.sciencedirect.com/science/article/pii/S0370269316303495?via%3Dihub)
* [Ulrika Forsberg's PhD thesis](http://portal.research.lu.se/portal/files/7495513/thesis.pdf). 

In [1]:
import pandas as pd
import re
import numpy as np
import matplotlib.pyplot as plt

# Examining data - decay chains

A compiled table of all the short decay chains is presented in Table 1 in  [U. Forsberg et al., Nucl. Phys. A](http://www.sciencedirect.com/science/article/pii/S0375947416300768?via%3Dihub). 

All data relevant for the above mentioned articles has been compiled in a `pandas dataframe` and can be found in the file `data/ChainsDataFrame.p`. The data can be read and visualised in the following way: 

In [2]:
df = pd.read_pickle('data/ChainsDataFrame.p')
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Beam Energy (MeV),Implantation Energy (MeV),Implantation time (s),Pixel,Energy E117 (MeV),Energy E115 (MeV),Energy E113 (MeV),Energy Rg (MeV),Energy Mt (MeV),Energy Bh (MeV),...,$\sigma_E$ Db (keV),$\sigma_E$ Lr (keV),Lifetime E117 (s),Lifetime E115 (s),Lifetime E113 (s),Lifetime Rg (s),Lifetime Mt (s),Lifetime Bh (s),Lifetime Db (s),Lifetime Lr (s)
Element,Type,Lab,ID,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
E115,3n,GSI,1,242.1,13.40,0.000000,465.0,,10.290,9.240,9.77,0.825,9.06,...,0.0,,,0.3660,0.9350,11.1000,0.515,15.300,17.40000,
E115,3n,GSI,2,242.1,13.90,0.000000,748.0,,10.480,10.000,0.00,0.000,9.07,...,0.0,,,0.1770,1.1800,0.0000,0.000,8.230,25.50000,
E115,3n,GSI,3,242.1,16.30,0.000000,557.0,,0.000,9.990,9.77,9.600,9.04,...,0.0,,,0.0000,0.2400,18.0000,0.600,2.700,39.70000,
E115,3n,GSI,4,242.1,16.10,0.000000,716.0,,10.370,9.990,9.50,9.650,9.05,...,0.0,,,0.2120,1.5000,26.4000,8.950,17.800,26.60000,
E115,3n,GSI,5,242.1,16.40,0.000000,331.0,,10.440,10.000,9.76,9.650,8.55,...,0.0,,,0.3630,0.2420,11.3000,0.539,29.800,48.90000,
E115,3n,GSI,6,242.1,15.40,0.000000,621.0,,3.000,9.850,0.00,0.000,9.06,...,0.0,,,0.6530,1.0100,0.0000,0.000,52.600,79.70000,
E115,3n,GSI,7,242.1,14.10,0.000000,368.0,,10.480,9.940,1.38,9.600,9.06,...,0.0,,,0.0663,3.3000,8.2100,0.055,8.950,1.98000,
E115,3n,GSI,8,242.1,12.50,0.000000,200.0,,0.609,9.960,9.75,0.000,0.00,...,0.0,,,0.5450,2.4000,19.1000,0.000,0.000,15.50000,
E115,3n,GSI,9,245.0,15.90,0.000000,38.0,,10.380,9.960,0.00,0.000,0.00,...,0.0,,,0.1520,2.0000,0.0000,0.000,0.000,0.90500,
E115,3n,GSI,10,245.0,14.50,0.000000,1007.0,,1.660,9.840,1.28,9.680,8.97,...,0.0,,,0.0702,3.0800,1.5600,0.371,24.500,95.60000,


All rows of the compiled data table represents experimental measurements of one decay chain. Each decay chain is identified with the 4 indices:
1. **Lab**: Laboratory which conducted the experiment.
2. **Chain**: The element studied in the experiment.  
3. **Type**: For the element 115 chains the type refers to the groups _3n_, _4n_ or _Short_ chains as used in the papers. The 10 element 117 chains that are used in the third article have the type _Link_. 
4. **ID**: All chains have an ID which matches the numbers used in the articles.

The column headers present specific measured properties of the chain.

`NaN` could indicate that a certain step is missing, that the chain has already ended or that the value is irrelevant (e.g. this is the case for all columns listing E117 values for an E115 chain).   

A $\sigma_E$ value equals to 0 indicates a detected fission and end of chain.

## Extracting data of interest - lifetimes

In the context of this notebook it is of interest to e.g. extract lifetime data for the _short_ E115 chains. A new data frame with this content can be obtained as follows:  

In [3]:
#How can I locate indices on the basis of only the second and third index colum (i.e. Lab and Element)?
#df1, df2, df3 = df.loc[('Dubna', 'E115', 'Short')], df.loc[('GSI', 'E115', 'Short')], df.loc[('Berkeley', 'E115', 'Short')]
#df_short = pd.concat([df1, df2, df3])
df_short = df.loc[('E115', 'Short')]
df_short

  after removing the cwd from sys.path.


Unnamed: 0_level_0,Unnamed: 1_level_0,Beam Energy (MeV),Implantation Energy (MeV),Implantation time (s),Pixel,Energy E117 (MeV),Energy E115 (MeV),Energy E113 (MeV),Energy Rg (MeV),Energy Mt (MeV),Energy Bh (MeV),...,$\sigma_E$ Db (keV),$\sigma_E$ Lr (keV),Lifetime E117 (s),Lifetime E115 (s),Lifetime E113 (s),Lifetime Rg (s),Lifetime Mt (s),Lifetime Bh (s),Lifetime Db (s),Lifetime Lr (s)
Lab,ID,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
GSI,1,245.0,12.3,0.0,268.0,,10.51,242.0,,,,...,,,,0.227,0.378,,,,,
GSI,2,242.1,16.2,0.0,425.0,,1.45,211.0,,,,...,,,,0.0645,0.366,,,,,
GSI,3,242.1,13.9,0.0,681.0,,10.54,9.95,196.0,,,...,,,,0.261,1.15,0.343,,,,
GSI,4,242.1,14.5,0.0,344.0,,10.34,9.89,218.0,,,...,,,,1.46,0.0262,0.432,,,,
GSI,5,242.1,13.8,0.0,554.0,,10.49,9.97,135.0,,,...,,,,0.345,0.369,14.4,,,,
GSI,6,245.0,14.5,0.0,205.0,,10.53,9.89,230.0,,,...,,,,0.21,1.05,8.27,,,,
GSI,7,245.0,11.9,0.0,128.0,,0.541,3.12,230.0,,,...,,,,0.815,2.33,2.89,,,,
Dubna,1,240.5,11.38,0.0,3.0,,10.377,9.886,215.7,,,...,,,,0.2562,1.4027,1.9775,,,,
Dubna,2,241.0,15.18,0.0,6.0,,10.54,9.916,214.9,,,...,,,,0.0661,1.55,2.3638,,,,
Dubna,3,241.0,9.04,0.0,2.0,,10.373,9.579,141.1,,,...,,,,2.3507,22.5822,60.1855,,,,


Extracting only the columns with lifetimes.

In [4]:
s_find = "Lifetime .* \(s\)"
col_names = ",".join(df_short.columns)
print("Columns:", col_names)
col_found = re.findall(string=col_names, pattern=s_find)[0].split(',')
col_found

Columns: Beam Energy (MeV),Implantation Energy (MeV),Implantation time (s),Pixel,Energy E117 (MeV),Energy E115 (MeV),Energy E113 (MeV),Energy Rg (MeV),Energy Mt (MeV),Energy Bh (MeV),Energy Db (MeV),Energy Lr (MeV),$\sigma_E$ E117 (keV),$\sigma_E$ E115 (keV),$\sigma_E$ E113 (keV),$\sigma_E$ Rg (keV),$\sigma_E$ Mt (keV),$\sigma_E$ Bh (keV),$\sigma_E$ Db (keV),$\sigma_E$ Lr (keV),Lifetime E117 (s),Lifetime E115 (s),Lifetime E113 (s),Lifetime Rg (s),Lifetime Mt (s),Lifetime Bh (s),Lifetime Db (s),Lifetime Lr (s)


['Lifetime E117 (s)',
 'Lifetime E115 (s)',
 'Lifetime E113 (s)',
 'Lifetime Rg (s)',
 'Lifetime Mt (s)',
 'Lifetime Bh (s)',
 'Lifetime Db (s)',
 'Lifetime Lr (s)']

In [5]:
df_short = df_short.loc[:, col_found]
df_short

Unnamed: 0_level_0,Unnamed: 1_level_0,Lifetime E117 (s),Lifetime E115 (s),Lifetime E113 (s),Lifetime Rg (s),Lifetime Mt (s),Lifetime Bh (s),Lifetime Db (s),Lifetime Lr (s)
Lab,ID,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
GSI,1,,0.227,0.378,,,,,
GSI,2,,0.0645,0.366,,,,,
GSI,3,,0.261,1.15,0.343,,,,
GSI,4,,1.46,0.0262,0.432,,,,
GSI,5,,0.345,0.369,14.4,,,,
GSI,6,,0.21,1.05,8.27,,,,
GSI,7,,0.815,2.33,2.89,,,,
Dubna,1,,0.2562,1.4027,1.9775,,,,
Dubna,2,,0.0661,1.55,2.3638,,,,
Dubna,3,,2.3507,22.5822,60.1855,,,,


Removing all columns with only `NaN`.

In [6]:
df_short = df_short.dropna(axis=1, how='all')
df_short

Unnamed: 0_level_0,Unnamed: 1_level_0,Lifetime E115 (s),Lifetime E113 (s),Lifetime Rg (s)
Lab,ID,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
GSI,1,0.227,0.378,
GSI,2,0.0645,0.366,
GSI,3,0.261,1.15,0.343
GSI,4,1.46,0.0262,0.432
GSI,5,0.345,0.369,14.4
GSI,6,0.21,1.05,8.27
GSI,7,0.815,2.33,2.89
Dubna,1,0.2562,1.4027,1.9775
Dubna,2,0.0661,1.55,2.3638
Dubna,3,2.3507,22.5822,60.1855


For comparison, let's store two data frames, one with the complete set of _Short_ E115 chains and one excluding Dubna 3. 

In [7]:
l_df_short = [df_short, df_short.drop(('Dubna', '3'))]

## Motivation - Do all decay chains in a set have a common origin? 

# Schmidt tests on the set of short E115 decay chains

## The Schmidt test

Around the turn of the millenium [Schmidt et.al](https://link.springer.com/article/10.1007/s100500070129) suggested a new approach on how to assess decay chains. 

In the, so called _Schmidt test_, the logarithm of the lifetimes $\theta = \ln t$ and the expected value of its standard deviation ($\sigma_\theta$) of one decay step is considered instead. 
The fundamental principle of the test is that in this case the shape of the probability density distribution and in continuation the expected value of the standard deviation is independent of the decay constant $\lambda$ of the radioactive species in study.

For an infinite number of events, i.e. measured life times, analytically the expected value of the standard deviation is; $\sigma_\theta = 1.28 \pm 2.15 \cdot \sqrt{m}$ where $m$ is the number of events.

The interpretation of the obtained standard deviation is as follows: 
* A too large value indicates that the set of life times do not originate from one single radioactive species.
* A too small value indicates a lack of sensitivity in the measurements or that the data is unjustly pruned.

If the _short_ chains of E115 is considered then we arrive at the following values for the Schmidt test: 

In [8]:
Schmidt = np.nanstd(np.log(l_df_short[0]), axis=0)
col_names = []
[col_names.append("Step "+str(i+1)) for i in range(len(Schmidt))]
Schmidt = dict(zip(col_names, Schmidt))
df_Schmidt = pd.DataFrame(data=Schmidt, index=["Experimental Schmidt value"], columns=col_names)
df_Schmidt

Unnamed: 0,Step 1,Step 2,Step 3
Experimental Schmidt value,1.204459,1.75037,1.842351


Through Monte-Carlo simulations the distribution of this standard deviation could be obtained as a function of the number of events measured. 
[Schmidt et.al](https://link.springer.com/article/10.1007/s100500070129) (Table 1) presents expected values and 90% confidence intervals for $\sigma_\theta$ for a variety of number of events $m$. 

Let's add these to the table above. 

In [9]:
Schmidt_conf = [[1.19, 1.19, 1.16], ["[0.73, 1.77]", "[0.73, 1.77]", "[0.65, 1.82]"]]
df_Schmidt.append(pd.DataFrame(data=Schmidt_conf, index=["Expected value of $\sigma_θ$", "90% confidence interval"], columns=col_names))

Unnamed: 0,Step 1,Step 2,Step 3
Experimental Schmidt value,1.20446,1.75037,1.84235
Expected value of $\sigma_θ$,1.19,1.19,1.16
90% confidence interval,"[0.73, 1.77]","[0.73, 1.77]","[0.65, 1.82]"


The Schmidt values from the experimental data does fit in the 90% confidence interval and thus it is justified to draw the conclusion that the data originates from only one radioactive species. 

## Generalised Schmidt test

The Schmidt method is limited to one decay step at a time and hence not all data available is considered. If all steps of the decay chains have common origins a stricter test would be to group all steps of each chain together. This cannot be done by an arithmetic average since that would not make a difference. 

A generalisation of the Schmidt test was proposed in [D. Rudolph et al., EPJ Web of Conferences](https://www.epj-conferences.org/articles/epjconf/pdf/2016/12/epjconf_nn2016_01001.pdf) and thoroughly described in [Ulrika Forsberg's PhD thesis](http://portal.research.lu.se/portal/files/7495513/thesis.pdf). 

Assume that $m$ chains have been observed and that each chain $i$ contains $n_i$ decay steps. Let $\theta_{i_j}$ be the logarithm of the $j$:th lifetime in chain $i$. Then the measure: 

\begin{equation}
\xi_{m, n} = \sqrt[2]{\frac{\sum\limits_{i=1}^m \sqrt[n_i]{\prod\limits_{j=1}^{n_i} \left( \theta_{i_j} - \bar{\theta_j} \right)^2  }  }{m}}, \bar{\theta_j} = \frac{ \sum\limits_{i=1}^m \theta_{i_j} } {m}
\end{equation}

HOW TO ENLARGEN EQUATION?

does incorporate correlation times along decay chains instead of between single decay steps within a set of chains. 

## The generalised Schmidt method applied

To calculate the geometrical mean in an array with `NaN` values the following (non-generic) function is defined.

In [10]:
def g_nan_mean(data):
    if len(np.shape(data)) == 1:
        return data
    ret = np.empty(np.shape(data)[0])
    for i in range(np.shape(data)[0]):
        temp = 1
        steps = 0
        for j in range(np.shape(data)[1]):
            if isinstance(data, pd.DataFrame) and ~np.isnan(data.iloc[i,j]):
                temp *= data.iloc[i,j]
            elif not isinstance(data, pd.DataFrame) and ~np.isnan(data[i,j]):
                temp *= data[i,j]
            else:
                break
            steps += 1
        ret[i] = temp**(1./steps)
    return ret

In [11]:
gen_Schmidt = []
for dfs in l_df_short:
    theta = np.log(dfs)
    theta_var = np.square(theta - np.nanmean(theta, axis=0))
    gen_Schmidt_temp = g_nan_mean(theta_var)
    gen_Schmidt.append(np.sqrt(np.mean(gen_Schmidt_temp)))

col_names = ["E115 Short chains", "Dubna 3 Excl."]
df_gen_Schmidt = dict(zip(col_names, gen_Schmidt))
df_gen_Schmidt = pd.DataFrame(data=df_gen_Schmidt, index=["Experimental Generalised Schmidt value"], columns=col_names)
df_gen_Schmidt

Unnamed: 0,E115 Short chains,Dubna 3 Excl.
Experimental Generalised Schmidt value,1.302724,1.032047


### Simulation of expected values and confidence intervals

Via Monte-Carlo simulations the expected value and its confidence intervals can be obtained for the generalised Schmidt method. 10 000 sets of decay chains with the same structure as the experimental set extracted above. Each decay step is  simulated from an exponential distribution with an arbitrary decay constant $\lambda = 1$. Following this the exact same calculations are made as above. For completeness a different number of steps $n$ are included. 4 out of the 14 short chains only have two decay steps and to see the effect another index $l$ is introduced which represents the number of chains which have only two decay steps.  

In [12]:
sim_names = ["$m=14$, $n=1$", "$m=14$, $n=2$", "$m=14$, $n=3$", "$m=14$, $n=1$, $l=4$", "Excl. D3"]
nbr_sets = 10000
df_temp = np.zeros(np.shape(l_df_short[0]))
l_df_sim = [df_temp[:,0], df_temp[:,0:2], df_temp, l_df_short[0], l_df_short[1]]

sim_Schmidt = np.empty((nbr_sets, len(sim_names)))

for i, df_sim in enumerate(l_df_sim):
    print("Set: ", i, sim_names[i])
    shape = (nbr_sets, *np.shape(df_sim))
    sim = np.random.exponential(scale=1, size=shape)
    if np.count_nonzero(np.isnan(df_sim)) > 0:
        for j in range(nbr_sets):
            sim[j][np.isnan(df_sim)] = np.nan
    theta = np.log(sim)
    theta_var = np.zeros(np.shape(theta))
    temp_Schmidt = np.zeros(np.shape(theta)[0:2])
    for j in range(nbr_sets):
        theta_var[j] = np.square(theta[j] - np.nanmean(theta[j], axis=0))
        temp_Schmidt[j] = g_nan_mean(theta_var[j])
    sim_Schmidt[:, i] = np.sqrt(np.mean(temp_Schmidt, axis=1))

sim_final_Schmidt = np.mean(sim_Schmidt, axis=0)
print(sim_final_Schmidt)
print("Done")

Set:  0 $m=14$, $n=1$
Set:  1 $m=14$, $n=2$
Set:  2 $m=14$, $n=3$
Set:  3 $m=14$, $n=1$, $l=4$
Set:  4 Excl. D3
[ 1.19902699  0.92899889  0.83874633  0.86586574  0.86242165]
Done


What follows is an illustration of the effect of the generalised Schmidt method. 

In [13]:
%matplotlib notebook
colors = plt.rcParams['axes.prop_cycle'].by_key()['color'];
plt.figure()
for i, l in enumerate(sim_names[0:-1]):
    plt.hist(sim_Schmidt[:,i], bins=100, color=colors[i], label=l, histtype='step', normed=True)
plt.legend(loc='best',markerfirst=True, frameon=True,shadow=True,fancybox=True)
plt.xlabel(r"$\xi_{m,n,l}$")
plt.xlim((0.3,2.5))
plt.ylabel("Intensity")
plt.show()

<IPython.core.display.Javascript object>

The more steps that are included (larger $n$) the smaller the distribution of $\xi_{m,n,l}$ becomes. Note that the blue curve corresponds to the standard Schmidt test. 

HOW TO ADD AS CAPTION?

## Summary Generalised Schmidt Method 

Consider the _Short_ E115 chains and the set with the _Dubna 3_ chain excluded and including the simulated expected value and confidence interval.

In [14]:
s_gen_Schmidt = []
s_conf_int = "[{:1.2f}, {:1.2f}]"
s_gen_Schmidt.append(sim_final_Schmidt[3:])
s_gen_Schmidt.append([s_conf_int.format(np.percentile(sim_Schmidt[:,3], 5), np.percentile(sim_Schmidt[:,3], 95)), 
                     s_conf_int.format(np.percentile(sim_Schmidt[:,4], 5), np.percentile(sim_Schmidt[:,4], 95))])

#["["+str(np.percentile(sim_Schmidt[:,3], 5))+", "+str(np.percentile(sim_Schmidt[:,3], 95))+"]"] ]
#s_gen_Schmidt.[ [sim_final_Schmidt[4]], ["["+str(np.percentile(sim_Schmidt[:,4], 5))+", "+str(np.percentile(sim_Schmidt[:,4], 95))+"]"] ]
df_gen_Schmidt.append(pd.DataFrame(data=s_gen_Schmidt, index=[r"Expected value of $\xi_{m,n,l}$", "90% confidence interval"], columns=df_gen_Schmidt.columns))

Unnamed: 0,E115 Short chains,Dubna 3 Excl.
Experimental Generalised Schmidt value,1.30272,1.03205
"Expected value of $\xi_{m,n,l}$",0.865866,0.862422
90% confidence interval,"[0.63, 1.14]","[0.62, 1.15]"


The complete set _E115 Short chains_ can be found outside the 90% confidence interval. As stated in [D. Rudolph et al., EPJ Web of Conferences](https://www.epj-conferences.org/articles/epjconf/pdf/2016/12/epjconf_nn2016_01001.pdf) : "Therefore, it is very unlikely that it is a mistake to conclude that not all chains originate from the same radioactive species". 

If the _Dubna 3_ chain is excluded from the set $\xi_{m,n,l}$ fits well within the limits obtained from the simulation. In the article exclusions of different chains are throughly examined.

# Figure-of-Merit (FoM) method 

Described in [U. Forsberg et al., Nucl. Phys. A](http://www.sciencedirect.com/science/article/pii/S0375947416300768?via%3Dihub):
> A FoM$^{(n)}_j$, defined for each correlation time $t^{(n)}_j$ decay step $j = 1, 2, 3$ of the chain identified by the number $n$, is calculated as the value of a probability density function for a reference data  set. The geometric mean of FoM$^{(n)}_j$ over all available steps $j$ in chain $n$ defines the FoM$^{(n)}_{geom}$ for that chain. The arithmetic mean of FoM$^{(n)}_{geom}$over all $N$ chains defines the FoM for the data set with respect to the interpretation under consideration. 

> If the individual chains all deviate strongly from the average data, the FoM value will be low. If the chains are all too similar to their average behaviour, the FoM will be high. The test is similar to the Schmidt test described above.  Note, however, that a low $\sigma_\theta$ corresponds to a large FoM and vice versa.


> The probability density function is obtained through the following: 

> 1. For each step $j$, the average experimental lifetime $\bar{t}_j$ calculated, and the number of available lifetimes $N_j$ is noted. 
> 2. For each step $j$, the likelihood function for the true lifetime $\tau_j$, given by $N_j$ and $\bar{t}_j$, is  determined. 
> 3. For each step $j$, a $\tau_j$ is selected with a probability governed by the likelihood function for $\tau_j$, and then a set of $N_j$ lifetimes are generated from the exponential distribution defined by this $\tau_j$. This procedure is repeated until a smooth histogram emerges.

The analytical expression for this _smeared_ PDF for step $j$, using a reference data set with $N_j$ data points and average lifetime $\bar{t}_j$ in step $j$: 

\begin{equation}
f(t)=t(N_j−1) \frac{ (N_j \bar{t}_j)^{N_j-1} }{ (N_j \bar{t}_j + t)^{N_j} }
\end{equation}

and it is obtained by weigthing an exponential distribution $g(t) = \frac{t}{\tau} e^{- \frac{t}{\tau} }$ with the normalised likelihood function for $\tau$:

\begin{equation}
h(\tau)= \frac{ N^{N-1} }{ (N-2)! }\frac{ \bar{t}^{N-1} }{ \tau^N } e^{- \frac{N\bar{t}}{\tau}}
\end{equation}


In [36]:
def pdf_smeared(times):
    if isinstance(times, pd.DataFrame):
        times = times.as_matrix()
    N_j = np.count_nonzero(~np.isnan(times), axis=0)
    t_bar = np.nanmean(times, axis=0)
    ret = np.zeros((len(times[:]), len(N_j)))
    for i, t in enumerate(times[:]):
        temp = np.empty(len(t))
        prod = np.empty(len(t))
        np.multiply(N_j, t_bar, out=prod)
        np.power(prod, np.subtract(N_j,1), out=temp) #numerator
        np.divide(temp, np.power(np.add(prod, t), N_j), out=temp) #division with denominator
        np.multiply(temp, np.multiply(t, np.subtract(N_j, 1)), out=temp)
        ret[i] = temp
    return ret

Calculating FoM$_j$ and FoM$_{geom}$.

In [89]:
fom_j = pdf_smeared(l_df_short[0])
fom_geom = g_nan_mean(fom_j)
fom_final = np.mean(fom_geom)

Printing the data in a table.

In [92]:
s_cols = 'FoM$_{}$'
col_names = [s_cols.format(j) for j in range(len(fom_j[0]))]
col_names.append('FoM$_{geom}$')
data = dict(zip(col_names, np.append(fom_j, fom_geom[:,np.newaxis], axis=1).T))
df_fom = pd.DataFrame(data=data, index=l_df_short[0].index)
data = dict(zip(col_names, [ None, None, None, fom_final]))
df_fom = df_fom.append(pd.DataFrame(data=data, index=[("FoM", '')], columns=col_names))
df_fom = df_fom.round(decimals=3)
df_fom

Unnamed: 0_level_0,Unnamed: 1_level_0,FoM$_0$,FoM$_1$,FoM$_2$,FoM$_{geom}$
Lab,ID,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
GSI,1.0,0.282,0.124,,0.187
GSI,2.0,0.113,0.12,,0.117
GSI,3.0,0.302,0.276,0.03,0.136
GSI,4.0,0.168,0.01,0.038,0.04
GSI,5.0,0.336,0.121,0.336,0.239
GSI,6.0,0.271,0.262,0.337,0.288
GSI,7.0,0.31,0.352,0.198,0.278
Dubna,1.0,0.3,0.304,0.148,0.238
Dubna,2.0,0.116,0.317,0.17,0.184
Dubna,3.0,0.061,0.007,0.047,0.027


## Simulations of expected values and confidence intervals

Presented in [U. Forsberg et al., Nucl. Phys. A](http://www.sciencedirect.com/science/article/pii/S0375947416300768?via%3Dihub):

> 1. For the first decay step, a random $\tau_1$ was picked according to the $\tau$ likelihood function $h(\tau)$ (see Fig. 4). Fourteen random lifetimes from the exponential distribution $g(t)$ defined by this $\tau_1$ were generated. 
> 2. For the second decay step, a random $\tau_2$ was picked according to the $\tau$ likelihood function $h(\tau)$. Fourteen random lifetimes from the exponential distribution $g(t)$ defined by this $\tau_2$ were generated. 
> 3. For the third decay step, a random $\tau_3$ was picked according to the $\tau$ likelihood function $h(\tau)$. Ten random lifetimes from the exponential distribution $g(t)$ defined by this $\tau_3$ were  generated. 
> 4. The generated lifetimes were collected in fourteen chains – ten with three lifetimes, and four with two lifetimes.

The $\tau$ likelihood function is first defined. Note that the function in this case takes an input argument corresponding to the data frame of the lifetimes from e.g. the set of short E115 decay chains. In the following, all steps are treated simultaneously. 

In [109]:
from scipy.special import factorial
class tau_likelihood_func(object):
    def __init__(self, times):
        if isinstance(times, pd.DataFrame):
            times = times.as_matrix()
        self.N_j = np.count_nonzero(~np.isnan(times), axis=0)
        self.t_bar = np.nanmean(times, axis=0)
        self.factor1 = np.divide(np.multiply(np.power(self.N_j, self.N_j-1), np.power(self.t_bar, self.N_j-1)), factorial(self.N_j-2))
        self.factor2 = -np.multiply(self.N_j, self.t_bar)
        #print("factor1=", self.factor1, np.shape(self.factor1))
                                 
    def __call__(self, tau):
        #ret = np.empty((len(tau), len(self.N_j)))
        #print(tau, np.shape(tau))
        ret = np.multiply(np.divide(self.factor1, np.power(tau, self.N_j)), np.exp(np.divide(self.factor2, tau)))
        return ret

tau_likelihood = tau_likelihood_func(l_df_short[0])

### Random sampling of $\tau$ from its likelihood function

A numerical cumulative density function (CDF) is created from which the random samples of $\tau$ are generated. 

In [116]:
# Creating points for the numerical CDFs
nbr_points = 10000
tau = np.empty((nbr_points, len(l_df_short[0].columns)))
for i in range(len(tau[0])):
    tau[:,i] = np.linspace(0.00001, tau_likelihood.t_bar[i]*4, len(tau[:,0]))
tau_l = tau_likelihood(tau)

# Creating normalised CDFs 
cdf = np.cumsum(tau_l, axis=0)
cdf = np.divide(cdf, cdf[-1,:])

# Plotting the CDF
labels = ["Step "+str(i+1) for i in range(len(l_df_short[0].columns))]
plt.figure()
for i, l in enumerate(labels):
    plt.plot(tau[:,i], cdf[:,i], label=l)
plt.legend(loc='best',markerfirst=True, frameon=True,shadow=True,fancybox=True)
plt.xlabel(r"$\tau$ (s)")
plt.ylabel(r"CDF")
plt.show()

<IPython.core.display.Javascript object>

Random sampling from the numerical CDF for 100 000 sets of decay chains. 

In [149]:
nbr_sets = 100000

# Generating random numbers from uniform [0, 1)
rands = np.random.rand(nbr_sets, len(l_df_short[0].columns))

# Get the corresponding taus from the CDF
rand_taus = np.empty(np.shape(rands))
for i in range(len(l_df_short[0].columns)):
    inds = np.searchsorted(cdf[:,i], rands[:,i])
    rand_taus[:,i] = tau[inds,i]
  

f, axes = plt.subplots(3, 1)# sharex=True, sharey=True)

# add a big axes, hide frame
f.add_subplot(111, frameon=False)
# hide tick and tick label of the big axes
plt.tick_params(labelcolor='none', top='off', bottom='off', left='off', right='off')

# For bold title in legend
plt.rc('text', usetex=True)

# Plotting the sampled taus and their pdf:s
for i,ax in enumerate(axes):
    ax.plot(tau[:,i],tau_l[:,i], color='black', label="Real PDF")
    ax.hist(rand_taus[:,i], bins=100, color=colors[i], normed=True, label="Sampled Distribution")
    #ax.annotate(an[i],xy=(0.75,.8), fontsize=24, xycoords='axes fraction', color='k')
    ax.legend(loc='upper right',markerfirst=False, title=r"\bf{"+labels[i]+"}", frameon=True, shadow=True, fancybox=True)
    #ax.set_xscale('log')
    #ax.set_xlim(0.2)
plt.ylabel("PDF")
plt.xlabel(r"$\tau$ (s)")
plt.show()
plt.rc('text', usetex=False)

<IPython.core.display.Javascript object>

Simulating 100 000 sets of decay chains and calculating their FoM values. This could take a minute or two.

In [146]:
sim_names = ["E115 Short Chains"] #, "D3 Excl."]
df_temp = np.zeros(np.shape(l_df_short[0]))
l_df_sim = [l_df_short[0]]

sim_FoM = np.empty((nbr_sets, len(sim_names)))

#Looping over data sets as defined in l_df_sim
for i, df_sim in enumerate(l_df_sim):
    print("Set: ", i, sim_names[i])
    
    # Simulations of all sets of decay chains
    shape = (nbr_sets, *np.shape(df_sim))
    sim = np.empty(shape)
    for j in range(len(sim_FoM[:,0])):
        for k in range(len(df_sim.columns)):
            sim[j, :, k] = np.random.exponential(scale=rand_taus[j, k], size=shape[1])
    if np.count_nonzero(np.isnan(df_sim)) > 0:
        for j in range(nbr_sets):
            sim[j][np.isnan(df_sim)] = np.nan
    print("    Simulated all sets of decay chains")
    # FoM calculations
    fom = np.empty(shape[1:])
    fom_geom = np.empty(shape[1])
    for j in range(nbr_sets):
        fom = pdf_smeared(sim[j])
        fom_geom = g_nan_mean(fom)
        sim_FoM[j,i] = np.mean(fom_geom)
print("Done")
#sim[0:2]

Set:  0 E115 Short Chains
    Simulated all sets of decay chains
Done


array([[[  1.16162680e-01,   1.26219148e+00,              nan],
        [  2.28967414e-01,   4.25631626e+00,              nan],
        [  7.26249378e-01,   8.04522027e-01,   8.29320141e+00],
        [  7.09829841e-01,   1.25386627e-01,   1.21281833e+00],
        [  7.14758953e-01,   1.69411830e+00,   1.35457415e+01],
        [  2.57513462e-02,   9.94836512e+00,   1.59572965e+00],
        [  7.05325074e-02,   1.24166476e+00,   9.28210822e+00],
        [  1.13574049e+00,   6.56173485e-01,   1.89289433e+00],
        [  7.19208603e-01,   1.12581000e+00,   2.56422408e+01],
        [  1.28548425e-01,   1.88033983e-01,   4.45112750e+00],
        [  1.81066634e-01,   1.12911188e+00,   2.76373803e+00],
        [  1.03412552e+00,   1.44304576e+00,   1.37319979e+01],
        [  4.87142971e-01,   3.27547676e+00,              nan],
        [  4.73896678e-02,   8.37552210e+00,              nan]],

       [[  1.30754435e+00,   5.42224143e-01,              nan],
        [  3.74876355e-01,   8.5885800

In [157]:
q5, q95 = np.percentile(sim_FoM, 5), np.percentile(sim_FoM, 95)
q2p5, q97p5 = np.percentile(sim_FoM, 2.5), np.percentile(sim_FoM, 97.5)
q1, q99 = np.percentile(sim_FoM, 1), np.percentile(sim_FoM, 99)
sim_FoM_mean = np.mean(sim_FoM)

col_names = ["E115 Short Chains"]
data = [sim_FoM_mean]
data.append("["+str(q5)+", "+str(q95)+"]")
data.append("["+str(q2p5)+", "+str(q97p5)+"]")
data.append("["+str(q1)+", "+str(q99)+"]")
print(data)

s_conf_int = "[{:1.2f}, {:1.2f}]"
s_gen_Schmidt.append(sim_final_Schmidt[3:])
s_gen_Schmidt.append([s_conf_int.format(np.percentile(sim_Schmidt[:,3], 5), np.percentile(sim_Schmidt[:,3], 95)), 
                     s_conf_int.format(np.percentile(sim_Schmidt[:,4], 5), np.percentile(sim_Schmidt[:,4], 95))])


inds = ["Expected value FoM", "90% conf. interval", "95% conf. interval", "98% conf. interval"]
pd.DataFrame(data=dict(zip(col_names, data)), index=inds)

[0.22960103231696591, '[0.194094230947, 0.263413143657]', '[0.18703012275, 0.269632832533]', '[0.179079114676, 0.276374890508]']


Unnamed: 0,E115 Short Chains
Expected value FoM,0.229601
90% conf. interval,0.229601
95% conf. interval,0.229601
98% conf. interval,0.229601


In [153]:
plt.figure()
ax = plt.gca()
plt.hist(sim_FoM, bins=100, normed=True, label="FoM distribution")
ax.axvline(sim_FoM_mean, linestyle="--", color='k', label="Mean")
ax.vlines([q5, sim_FoM_mean, q95], ymin=0, ymax=15, linestyle="--", color=colors[1], label="90% confidence limits")
plt.xlabel("FoM")
plt.ylabel("Intensity")
plt.ylim((0, 25))
plt.xlim((0.15, 0.35))
plt.legend(loc='best',markerfirst=True, frameon=True,shadow=True,fancybox=True)
plt.show()

<IPython.core.display.Javascript object>