# Statistical analysis on the congruency of decay chains of superheavy nuclei

## Content

* Introduction to the notebook and a background to the analysis (motivate the need)
    * Include dispute and connection to naming? 
    * Present limitation to short chains. 
* Decay chains
    * What is a decay chain?
    * Representation here. 
    * Keep notation consistent with papers.
    * Files are yaml and here markdown table printing. 
* Problem formulation
    * The Schmidt test, its limitations and motivation for generalisation.
    * The generalised Schmidt test as proposed by Ulrika Forsberg.
    * Illustrate the need with an example. Could be two different origins and sampled data and compare the results. 
* Calculations
    * Reproducible tables from articles (markdowned), i.e. simply insert decay chain data into formulas.
    * Confidence limits from simulations (speed up?). 
    * Write data to file or just store final value? How to do this?
* Visualisations
    * What here?

## Decay Chains

A compiled table of all the short decay chains is presented in Table 1 in  [U. Forsberg et al., Physics Letters B](http://www.sciencedirect.com/science/article/pii/S0375947416300768?via%3Dihub). 

All data relevant for the above mentioned articles has been compiled in a `pandas dataframe` and can be found in the file `data/ChainsDataFrame.p`. The data can be read and visualised in the following way: 

In [42]:
import pandas as pd
import re
import numpy as np

In [35]:
df = pd.read_pickle('data/ChainsDataFrame.p')
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Beam Energy (MeV),Implantation Energy (MeV),Implantation time (s),Pixel,Energy E117 (MeV),Energy E115 (MeV),Energy E113 (MeV),Energy Rg (MeV),Energy Mt (MeV),Energy Bh (MeV),...,$\sigma_E$ Db (keV),$\sigma_E$ Lr (keV),Life time E117 (s),Life time E115 (s),Life time E113 (s),Life time Rg (s),Life time Mt (s),Life time Bh (s),Life time Db (s),Life time Lr (s)
Element,Type,Lab,ID,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
E115,3n,GSI,1,242.1,13.40,0.000000,465.0,,10.290,9.240,9.77,0.825,9.06,...,0.0,,,0.3660,0.9350,11.1000,0.515,15.300,17.40000,
E115,3n,GSI,2,242.1,13.90,0.000000,748.0,,10.480,10.000,0.00,0.000,9.07,...,0.0,,,0.1770,1.1800,0.0000,0.000,8.230,25.50000,
E115,3n,GSI,3,242.1,16.30,0.000000,557.0,,0.000,9.990,9.77,9.600,9.04,...,0.0,,,0.0000,0.2400,18.0000,0.600,2.700,39.70000,
E115,3n,GSI,4,242.1,16.10,0.000000,716.0,,10.370,9.990,9.50,9.650,9.05,...,0.0,,,0.2120,1.5000,26.4000,8.950,17.800,26.60000,
E115,3n,GSI,5,242.1,16.40,0.000000,331.0,,10.440,10.000,9.76,9.650,8.55,...,0.0,,,0.3630,0.2420,11.3000,0.539,29.800,48.90000,
E115,3n,GSI,6,242.1,15.40,0.000000,621.0,,3.000,9.850,0.00,0.000,9.06,...,0.0,,,0.6530,1.0100,0.0000,0.000,52.600,79.70000,
E115,3n,GSI,7,242.1,14.10,0.000000,368.0,,10.480,9.940,1.38,9.600,9.06,...,0.0,,,0.0663,3.3000,8.2100,0.055,8.950,1.98000,
E115,3n,GSI,8,242.1,12.50,0.000000,200.0,,0.609,9.960,9.75,0.000,0.00,...,0.0,,,0.5450,2.4000,19.1000,0.000,0.000,15.50000,
E115,3n,GSI,9,245.0,15.90,0.000000,38.0,,10.380,9.960,0.00,0.000,0.00,...,0.0,,,0.1520,2.0000,0.0000,0.000,0.000,0.90500,
E115,3n,GSI,10,245.0,14.50,0.000000,1007.0,,1.660,9.840,1.28,9.680,8.97,...,0.0,,,0.0702,3.0800,1.5600,0.371,24.500,95.60000,


All rows of the compiled data table represents experimental measurements of one decay chain. Each decay chain is identified with the 4 indices:
1. **Lab**: Laboratory which conducted the experiment.
2. **Chain**: The element studied in the experiment.  
3. **Type**: For the element 115 chains the type refers to the groups _3n_, _4n_ or _Short_ chains as used in the papers. The 10 element 117 chains that are used in the third article have the type _Link_. 
4. **ID**: All chains have an ID which matches the numbers used in the articles.

The column headers present specific measured properties of the chain.

`NaN` could indicate that a certain step is missing, that the chain has already ended or that the value is irrelevant (e.g. this is the case for all columns listing E117 values for an E115 chain).   

A $\sigma_E$ value equals to 0 indicates a detected fission and end of chain.

## Extracting data of interest

In the context of this notebook it is of interest to e.g. extract lifetime data for the _short_ E115 chains. A new data frame with this content can be obtained as follows:  

In [4]:
df.index.values

array([('GSI', 'E115', '3n', '1'), ('GSI', 'E115', '3n', '2'),
       ('GSI', 'E115', '3n', '3'), ('GSI', 'E115', '3n', '4'),
       ('GSI', 'E115', '3n', '5'), ('GSI', 'E115', '3n', '6'),
       ('GSI', 'E115', '3n', '7'), ('GSI', 'E115', '3n', '8'),
       ('GSI', 'E115', '3n', '9'), ('GSI', 'E115', '3n', '10'),
       ('GSI', 'E115', '3n', '11'), ('GSI', 'E115', '3n', '12'),
       ('GSI', 'E115', '3n', '13'), ('GSI', 'E115', '3n', '14'),
       ('GSI', 'E115', '3n', '15'), ('GSI', 'E115', '3n', '16'),
       ('GSI', 'E115', '3n', '17'), ('GSI', 'E115', '3n', '18'),
       ('GSI', 'E115', '3n', '19'), ('GSI', 'E115', '3n', '20'),
       ('GSI', 'E115', '3n', '21'), ('GSI', 'E115', '3n', '22'),
       ('GSI', 'E115', 'Short', '1'), ('GSI', 'E115', 'Short', '2'),
       ('GSI', 'E115', 'Short', '3'), ('GSI', 'E115', 'Short', '4'),
       ('GSI', 'E115', 'Short', '5'), ('GSI', 'E115', 'Short', '6'),
       ('GSI', 'E115', 'Short', '7'), ('Dubna', 'E115', 'Short', '1'),
       ('Dubna',

In [37]:
#How can I locate indices on the basis of only the second and third index colum (i.e. Lab and Element)?
#df1, df2, df3 = df.loc[('Dubna', 'E115', 'Short')], df.loc[('GSI', 'E115', 'Short')], df.loc[('Berkeley', 'E115', 'Short')]
#df_short = pd.concat([df1, df2, df3])
df_short = df.loc[('E115', 'Short')]
df_short

  after removing the cwd from sys.path.


Unnamed: 0_level_0,Unnamed: 1_level_0,Beam Energy (MeV),Implantation Energy (MeV),Implantation time (s),Pixel,Energy E117 (MeV),Energy E115 (MeV),Energy E113 (MeV),Energy Rg (MeV),Energy Mt (MeV),Energy Bh (MeV),...,$\sigma_E$ Db (keV),$\sigma_E$ Lr (keV),Life time E117 (s),Life time E115 (s),Life time E113 (s),Life time Rg (s),Life time Mt (s),Life time Bh (s),Life time Db (s),Life time Lr (s)
Lab,ID,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
GSI,1,245.0,12.3,0.0,268.0,,10.51,242.0,,,,...,,,,0.227,0.378,,,,,
GSI,2,242.1,16.2,0.0,425.0,,1.45,211.0,,,,...,,,,0.0645,0.366,,,,,
GSI,3,242.1,13.9,0.0,681.0,,10.54,9.95,196.0,,,...,,,,0.261,1.15,0.343,,,,
GSI,4,242.1,14.5,0.0,344.0,,10.34,9.89,218.0,,,...,,,,1.46,0.0262,0.432,,,,
GSI,5,242.1,13.8,0.0,554.0,,10.49,9.97,135.0,,,...,,,,0.345,0.369,14.4,,,,
GSI,6,245.0,14.5,0.0,205.0,,10.53,9.89,230.0,,,...,,,,0.21,1.05,8.27,,,,
GSI,7,245.0,11.9,0.0,128.0,,0.541,3.12,230.0,,,...,,,,0.815,2.33,2.89,,,,
Dubna,1,240.5,11.38,0.0,3.0,,10.377,9.886,215.7,,,...,,,,0.2562,1.4027,1.9775,,,,
Dubna,2,241.0,15.18,0.0,6.0,,10.54,9.916,214.9,,,...,,,,0.0661,1.55,2.3638,,,,
Dubna,3,241.0,9.04,0.0,2.0,,10.373,9.579,141.1,,,...,,,,2.3507,22.5822,60.1855,,,,


Extracting only the columns with lifetimes

In [38]:
s_find = "Life time .* \(s\)"
col_names = ",".join(df_short.columns)
print("Columns:", col_names)
col_found = re.findall(string=col_names, pattern=s_find)[0].split(',')
col_found

Columns: Beam Energy (MeV),Implantation Energy (MeV),Implantation time (s),Pixel,Energy E117 (MeV),Energy E115 (MeV),Energy E113 (MeV),Energy Rg (MeV),Energy Mt (MeV),Energy Bh (MeV),Energy Db (MeV),Energy Lr (MeV),$\sigma_E$ E117 (keV),$\sigma_E$ E115 (keV),$\sigma_E$ E113 (keV),$\sigma_E$ Rg (keV),$\sigma_E$ Mt (keV),$\sigma_E$ Bh (keV),$\sigma_E$ Db (keV),$\sigma_E$ Lr (keV),Life time E117 (s),Life time E115 (s),Life time E113 (s),Life time Rg (s),Life time Mt (s),Life time Bh (s),Life time Db (s),Life time Lr (s)


['Life time E117 (s)',
 'Life time E115 (s)',
 'Life time E113 (s)',
 'Life time Rg (s)',
 'Life time Mt (s)',
 'Life time Bh (s)',
 'Life time Db (s)',
 'Life time Lr (s)']

In [39]:
df_short = df_short.loc[:, col_found]
df_short

Unnamed: 0_level_0,Unnamed: 1_level_0,Life time E117 (s),Life time E115 (s),Life time E113 (s),Life time Rg (s),Life time Mt (s),Life time Bh (s),Life time Db (s),Life time Lr (s)
Lab,ID,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
GSI,1,,0.227,0.378,,,,,
GSI,2,,0.0645,0.366,,,,,
GSI,3,,0.261,1.15,0.343,,,,
GSI,4,,1.46,0.0262,0.432,,,,
GSI,5,,0.345,0.369,14.4,,,,
GSI,6,,0.21,1.05,8.27,,,,
GSI,7,,0.815,2.33,2.89,,,,
Dubna,1,,0.2562,1.4027,1.9775,,,,
Dubna,2,,0.0661,1.55,2.3638,,,,
Dubna,3,,2.3507,22.5822,60.1855,,,,


Removing all columns with only `NaN`.

In [40]:
df_short = df_short.dropna(axis=1, how='all')
df_short

Unnamed: 0_level_0,Unnamed: 1_level_0,Life time E115 (s),Life time E113 (s),Life time Rg (s)
Lab,ID,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
GSI,1,0.227,0.378,
GSI,2,0.0645,0.366,
GSI,3,0.261,1.15,0.343
GSI,4,1.46,0.0262,0.432
GSI,5,0.345,0.369,14.4
GSI,6,0.21,1.05,8.27
GSI,7,0.815,2.33,2.89
Dubna,1,0.2562,1.4027,1.9775
Dubna,2,0.0661,1.55,2.3638
Dubna,3,2.3507,22.5822,60.1855


## The Schmidt test

Around the turn of the millenium [Schmidt et.al](https://link.springer.com/article/10.1007/s100500070129) suggested a new approach on how to assess decay chains. 

In the, so called _Schmidt test_, the logarithm of the lifetimes $\theta = \ln t$ and the expected value of its standard deviation ($\sigma_\theta$) of one decay step is considered instead. 
The fundamental principle of the test is that in this case the shape of the probability density distribution and in continuation the expected value of the standard deviation is independent of the decay constant $\lambda$ of the radioactive species in study.

For an infinite number of events, i.e. measured life times, analytically the expected value of the standard deviation is; $\sigma_\theta = 1.28 \pm 2.15 \cdot \sqrt{m}$ where $m$ is the number of events.

The interpretation of the obtained standard deviation is as follows: 
* A too large value indicates that the set of life times do not originate from one single radioactive species.
* A too small value indicates a lack of sensitivity in the measurements or that the data is unjustly pruned.

If the _short_ chains of E115 is considered then we arrive at the following values for the Schmidt test: 

In [73]:
Schmidt = np.nanstd(np.log(df_short), axis=0)
col_names = []
[col_names.append("Step "+str(i+1)) for i in range(len(Schmidt))]
Schmidt = dict(zip(col_names, Schmidt))
df_Schmidt = pd.DataFrame(data=Schmidt, index=["Exp. Schmidt value"], columns=col_names)
df_Schmidt

Unnamed: 0,Step 1,Step 2,Step 3
Exp. Schmidt value,1.204459,1.75037,1.842351


Through Monte-Carlo simulations the distribution of this standard deviation could be obtained as a function of the number of events measured. 
[Schmidt et.al](https://link.springer.com/article/10.1007/s100500070129) (Table 1) presents expected values and 90% confidence intervals for $\sigma_\theta$ for a variety of number of events $m$. 

Let's add these to the table above. 

In [75]:
Schmidt_conf = [[1.19, 1.19, 1.16], ["[0.73, 1.77]", "[0.73, 1.77]", "[0.65, 1.82]"]]
df_Schmidt.append(pd.DataFrame(data=Schmidt_conf, index=["E($\sigma_θ)$", "90% confidence interval"], columns=col_names))

Unnamed: 0,Step 1,Step 2,Step 3
Exp. Schmidt value,1.20446,1.75037,1.84235
E($\sigma_θ)$,1.19,1.19,1.16
90% confidence interval,"[0.73, 1.77]","[0.73, 1.77]","[0.65, 1.82]"


The Schmidt values from the experimental data does fit in the 90% confidence interval and thus it is justified to draw the conclusion that the data originates from only one radioactive species. 

## Generalised Schmidt method

The Schmidt method is limited to one decay step at a time and hence not all data available is considered. If all steps of the decay chains have common origins a stricter test would be to group all steps of each chain together. This cannot be done by an arithmetic average since that would not make a difference. 

A generalisation of the Schmidt test was proposed in [D. Rudolph et al., EPJ Web of Conferences](https://www.epj-conferences.org/articles/epjconf/pdf/2016/12/epjconf_nn2016_01001.pdf) and thoroughly described in [Ulrika Forsberg's PhD thesis](http://portal.research.lu.se/portal/files/7495513/thesis.pdf). 

Assume that $m$ chains have been observed and that each chain $i$ contains $n_i$ decay steps. Let $\theta_{i_j}$ be the logarithm of the $j$:th lifetime in chain $i$. Then the measure: 

$\xi_{m, n} = \sqrt[2]{\frac{\sum\limits_{i=1}^m \sqrt[n_i]{\prod\limits_{j=1}^{n_i} \left( \theta_{i_j} - \bar{\theta_j} \right)^2  }  }{m}}$, $\bar{\theta_j} = \frac{ \sum\limits_{i=1}^m \theta_{i_j} } {m}$

HOW TO ENLARGEN EQUATION?

does incorporate correlation times along decay chains instead of between single decay steps within a set of chains. 

### The generalised Schmidt method is applied to the set of short decay chains:

To calculate the geometrical mean in an array with `NaN` values the following (non-generic) function is defined.

In [126]:
def g_nan_mean(data):
    if len(np.shape(data)) == 1:
        return data
    ret = np.empty(np.shape(data)[0])
    for i in range(np.shape(data)[0]):
        temp = 1
        steps = 0
        for j in range(np.shape(data)[1]):
            if isinstance(data, pd.DataFrame) and ~np.isnan(data.iloc[i,j]):
                temp *= data.iloc[i,j]
            elif not isinstance(data, pd.DataFrame) and ~np.isnan(data[i,j]):
                temp *= data[i,j]
            else:
                break
            steps += 1
        ret[i] = temp**(1./steps)
    return ret

In [127]:
theta = np.log(df_short)
theta_var = np.square(theta - np.nanmean(theta, axis=0))
gen_Schmidt = g_nan_mean(theta_var)
gen_Schmidt = np.sqrt(np.mean(gen_Schmidt))
gen_Schmidt = {"E115 Short chains": gen_Schmidt}
df_gen_Schmidt = pd.DataFrame(data=gen_Schmidt, index=["Exp. Generalised Schmidt value"])
df_gen_Schmidt

Unnamed: 0,E115 Short chains
Exp. Generalised Schmidt value,1.302724


#### Simulating expected value and confidence interval

Via Monte-Carlo simulations the expected value and its confidence intervals can be obtained for the generalised Schmidt method. 10 000 sets of decay chains with the same structure as the experimental set extracted above. Each decay step is  simulated from an exponential distribution with an arbitrary decay constant $\lambda = 1$. Following this the generalised the exact same calculations are made as above. For completeness a different number of steps $n$ are included. 4 out of the 14 short chains only have two decay steps and to see the effect another index $l$ is introduced which represents the number of chains which have only two decay steps.  

In [131]:
nbr_sets = 10000
shape = (nbr_sets, *np.shape(df_short))
sim = np.random.exponential(scale=1, size=shape)
#for i in range(nbr_sets):
#    sim[i][np.isnan(df_short)] = np.nan

sim_names = ["$m=14$, $n=1$", "$m=14$, $n=2$", "$m=14$, $n=3$", "$m=14$, $n=1$, $l=4$"]
sim_Schmidt = np.empty((nbr_sets, len(sim_names)))
for i in range(len(sim_names)):
    if i == 0:
        temp = sim[:, :, 0]
    theta = np.log(temp)
    theta_var = np.zeros(np.shape(theta))
    temp_Schmidt = np.zeros(np.shape(theta))
    for j in range(nbr_sets):
        theta_var[j] = np.square(theta[j] - np.nanmean(theta[j], axis=0))
        temp_Schmidt[j] = g_nan_mean(theta_var[j])
    sim_Schmidt[:, i] = np.sqrt(np.mean(temp_Schmidt, axis=1))

Applying the method:
1. Calculate the \sigma\theta on the basis of the experimentally measured lifetimes.
2. See where the obtained value lies in the distribution of \sigma\theta.

In [43]:
np.log(df_short)

Unnamed: 0_level_0,Unnamed: 1_level_0,Life time E115 (s),Life time E113 (s),Life time Rg (s)
Lab,ID,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
GSI,1,-1.482805,-0.972861,
GSI,2,-2.74109,-1.005122,
GSI,3,-1.343235,0.139762,-1.070025
GSI,4,0.378436,-3.641996,-0.83933
GSI,5,-1.064211,-0.996959,2.667228
GSI,6,-1.560648,0.04879,2.112635
GSI,7,-0.204567,0.845868,1.061257
Dubna,1,-1.361797,0.338399,0.681833
Dubna,2,-2.716587,0.438255,0.86027
Dubna,3,0.854713,3.117162,4.097431
