# Predicting Tritium Thermo-Desorption Spectra using twinLab

*** 
### What is tritium desorption?

The interior of the reactor wall is bombarded by tritium isotopes from the tritium breeding loop and plasma bombardment, resulting in adsorption and diffusion through the material. 

Additionally to the mobile tritium which diffuses through the material, there are regions of the atomic lattice called trapping sites, whereby tritium can become trapped in a potential well of a given energy. Understanding both the diffusion of the mobile tritium and the energy and density of these traps is essential in predicting the amount of tritium which is retained within the fusion reactor structures. This has implications in the required tritium breeding ratio, tritium inventory, future waste classification and material degradation. Characteristing these trapping sites is therefore essential in making engineering predictions of future fusion powerplants.

If the material is heated up above a certain termperature, the tritium can become excited and released from these trapping site. This can be measured through an experiment known as thermal-desorption spectrometry. A piece of material which has been implanted with tritium is heated at incremental steps, with the amount of tritium released being measured as a function of temperature. the physical properties of these traps can then be determined from the characteristics of the spectra.

*** 
### ML Challenge 

Computer simulations are required to model the behaviour of these materials over a range of parameters, but simulations are expensive in terms of computational power, and cannot be run at every point in parameter space under consideration. `twinLab` can be used to train simulation surrogate models using data from a sparse array of simulations. This allows for meaningful interpolation and extrapolation to unexplored regions of parameter space, together with a calibrated uncertainty estimate on the accuracy of the simulation surrogate.

In this example, we look at the ability of `twinLab` Gaussian Processes to model of tritium in the wall of a fusion reactor using tritium desorption spectra. 

Taking a training set of known correlations between tritium desorption spectra (TDS) of form $f(T)$ and physical trapping properties ($E_1$, $E_2$, $E_3$, $n_1$, $n_2$), a Gaussian Process can then be traing to predict the shape of a TDS given a any new set of physical parameters along with the uncertainty in these predictions.

***
### Importing libraries

First, import the required libraries

In [None]:
# Third-party imports
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Project imports
import twinlab as tl

We need to provide `twinLab` with locations of data files (in `.csv` format) and directories, together with a chosen surrogate emulator name (here `tritium_desorption`) and a list of the input parameters that we are going to use as inputs to our model. 

***
### Importing data

In [None]:
data = tl.load_example_dataset('tritium-desorption')

In [None]:
df_train = data.iloc[:900,:]
df_test = data.iloc[-100:,:]

df_grid = pd.DataFrame(np.linspace(300, 800, 624))

***
### View the training data

In [None]:
df_train.head()

### Inputs

In [None]:
df_train.iloc[:,:5].head()

### Outputs

In [None]:
df_train.iloc[:,5:].head()

### Temperature grid:

In [None]:
df_grid.head()

The 5 parameters $E_1$, $E_2$, $E_3$, $n_1$, and $n_2$ are physical properties of the material characterising the trapping of tritium isotopes.

- $E_i$ refers to the detrapping energies of tritium trap sites in $eV$. 
- $n_i$ are the densities of the trapping site.

The incrementing number $i$ relates to discrete trapping sites.

It might be noticed that the trapping density $n_3$ which should compliment $E_3$ is not present. This trap is an implantation trap created through irradiation damage and is not considered in this model.

### View the test data

In [None]:
df_test.head()

***
### Uploading data to the `twinLab` cloud

In [None]:
dataset = tl.Dataset("tritium_desorption")

dataset.upload(df_train, verbose=True)

### Define the columns

In [None]:
inputs = ["E1", "E2", "E3", "n1", "n2"]
outputs = [f"y{i}" for i in range(len(df_grid))]

### Check dataset has uploaded

In [None]:
dataset_summary = dataset.summarise()
dataset_summary

### Dimensional reduction

In [None]:
variance = dataset.analyse_output_variance(outputs)

In [None]:
variance.iloc[:10]

In [None]:
plt.plot(variance['Number of Dimensions'], variance['Cumulative Variance'], 'x--')
plt.xscale('log')
plt.ylim(0.9,1.01)

*** 
### Setting up an emulator

In this case we are training a functional model, which means that we want to return a *function* at every point in parameter space ($E_1$, $E_2$, $E_3$, $n_1$, $n_2$). In this case, our function describes the tritium desorption rate, $D$, of the material of the reactor wall as a function of temperature (rate of emitted nuclei per reactor wall area). The training of the surrogate is agnostic to the values of reactor temperature, $T$, so we must provide this by hand (`df_grid` above, from `file_grid`). The output of our model will therefore be the function $D(T; E_1, E_2, E_3, n_1, n_2)$.

The `twinLab` model achieves this by predicting the value of $D$ at $\sim 500$ points in $T$ in a regularly-spaced grid between $300\mathrm{K}$ and $800\mathrm{K}$. The correlations between points adjacent in $T$ are incorporated naturally by the model, and `twinLab` provides a model uncertainty. Here we call the outputs `y`, rather than `D`, as per the typical data-science convention.

### Instantiate emulator

In [None]:
emulator = tl.Emulator('tritium_desorption') 

### Set emulator parameters

In [None]:
estimator_params = tl.EstimatorParams(detrend=False, covar_module='M32', estimator_type='single_task_gp')
train_params = tl.TrainParams(output_retained_dimensions=5,
                              estimator_params=estimator_params)

### Train the emulator 

In [None]:
emulator.train(dataset, inputs, outputs, train_params)

## Check emulator score

In [None]:
print(f"MSE  = {emulator.score(tl.ScoreParams(metric='MSE', combined_score=True))}")

In [None]:
df_test[inputs].head()

Evaluate the trained emulator on `X` ($E_1, E_2, E_3, n_1, n_2$) from the evaluation file

In [None]:
df_mean, df_std = emulator.predict(df_test[inputs], verbose=False)

display(df_mean.head())
display(df_std.head())

*** 
### Test the emulator 

In [None]:
i = 50

if i > len(df_test): 
    raise ValueError('The index given is out of the range of the test dataframe')

df_mean_i = df_mean.iloc[i].values
df_std_i = df_std.iloc[i].values
df_test_i = df_test.iloc[i,5::].values
params = df_test.iloc[i,:5].values

fig = plt.figure(figsize=(8,6))
plt.plot(df_grid[::5], df_test_i[::5],'xr', label='Test data')
plt.plot(df_grid, df_mean_i, label='GP Prediction')
plt.fill_between(df_grid[0], df_mean_i-2*df_std_i, df_mean_i+2*df_std_i, alpha=0.4)
plt.title(f'D(E1={round(params[0],2)},E2={round(params[1],2)},E3={round(params[2],2)}, n1={params[3]:.2e}, n2={params[4]:.2e})')
plt.legend()

# Load the streamlit App

In [None]:
!streamlit run app.py