# Predicting Tritium Thermo-Desorption Spectra

___

### Context

Tritium is an isotope of Hydrogen used as a fuel in fusion research. In testing, the reactor walls are bombarded by tritium isotopes, which enter the metal lattice through difussion. Regions of the atomic lattice called "trapping sites" retain tritium atoms.

When a reactor wall is removed, the trapped tritium is a huge threat. It will slowly escape the material, and persist in the atmospehre irradiating anything nearby. The estimated cost of decomissioning tritiated metal is $5,000,000 per gram of tritium. 



### Thermal-Desorption Spectrometry

When tritiated metal is heated, the tritium can escape the trapping sides and diffuse out of the metal. This process is called desorption and it's measured using Thermal-Desorption Spectrometry. The amount of tritium released can be modeled as a function of temperature. 

*** 

### ML Challenge 

The computer simulations used to model tritium desorption spectra are computationaly expensive. This means we want to use `twinLab` emulators to predict the Thermal-Desorption Spectra at new inputs to inform design choices under uncertainty.


***

### Imports

In [1]:
# Project imports
import twinlab as tl
from fusion_energy.plot import plot_test, style_axes

# Third-party imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


          Version     : 2.4.0
          Server      : https://twinlab.digilab.co.uk
          Environment : /Users/joe/repos/FusionEnergy/.env



***

### Downloading Data

The Thermal-Desorption Spectra data is available to you on the `twinLab` cloud as an example dataset. 

In [2]:
data = tl.load_example_dataset('tritium-desorption')

Dataframe downloaded successfully


***

### Inputs

The 5 simulation inputs are properties of the wall material describing the trapping of tritium isotopes:

- $E_i$ - the detrapping energies of tritium trap sites in $eV$. 
- $n_i$ - the densities of the trapping sites.


In [3]:
inputs = data.columns[:5].to_list()
data[inputs][:3]

Unnamed: 0,E1,E2,E3,n1,n2
0,0.726163,1.248766,1.128671,0.002004,0.000307
1,0.787837,0.925174,1.226117,0.001564,0.000768
2,0.710103,0.902457,1.558539,0.00486,0.000613


___

### Outputs

Every simulation outputs n many the tritium desorption rates $D_n$ as the temperature $T$ is increased from 300K to 800K. In our case n = 624 which means there are 624 desroption rates and 624 temperature values.


___

In [4]:
temperatures = pd.DataFrame(np.linspace(300, 800, len(outputs))).T
temperatures.columns = [f"T{n}" for n in range(len(outputs))]
temperatures

NameError: name 'outputs' is not defined

In [None]:
outputs = [f"D{n}" for n in range(624)]
data.columns = list(data.columns[:5]) + outputs

In [None]:
data[outputs]

***

### Uploading data

We'll create a copy of the example dataset in the `twinLab` cloud to work with.

In [None]:
dataset = tl.Dataset("tritium_desorption")
dataset.upload(data, verbose = False)

***

### Workflow



### Dimensionality reduction

Our dataset has 624 outputs or ***dimensions***, which makes our problem complicated. Luckily we can make it simpler using `twinLab`. 

`twinLab` can learn to reperesent our outputs in fewer dimensions. This makes training and evaluating our emulator faster and cheaper. `twinLab` then reconstructs the more complex outputs when you need them, and you won't be able to tell the difference.

To check if this will work for our problem, we need to:

1. See how well our data can be represented using fewer dimensions.
2. Select a sensible number of dimensions to use.

We can do this using the `analyse_variance` function of our `twinLab` dataset, which tells us how well we can represent our original data using a given number of dimensions.

In [None]:
variance = dataset.analyse_variance(outputs)[1:11]
variance

In [None]:
number_dimensions = variance['Number of Dimensions'][1:11]
cumulative_variance = variance['Cumulative Variance'][1:10]

In [None]:
plt.plot(number_dimensions, cumulative_variance, 'kx--')
plt.ylabel("Cumulative Variance")
plt.xlabel("Number of Dimensions")
style_axes(plt.gca())
plt.show()


*** 

### Emulation

Our emulator will use an input $(E_1, E_2, E_3, n_1, n_2)$ to predict the desorption rate and the uncertainty at every temperature. This is a functional emulator because each input corrasponds to the output of a function.

Before we can start training, we will set aside 20% of our data to test our emulator. This data won't be used in training and will be used to check how our emulator performs on "new" data.

<!-- We will use a `single_task_gp` estimator, because ...
 how the emulator performs on new inputs.
 -->

To get started we will:
1. Create an emulator on the `twinlab` cloud.

In [None]:
emulator = tl.Emulator('tritium_desorption') 

2. Set our training parameters.

In [None]:
train_test_ratio = 0.8
output_retained_dimensions = 8
# estimator_params = tl.EstimatorParams(covar_module='M32', estimator_type='single_task_gp')

In [None]:
train_params = tl.TrainParams(
    train_test_ratio = train_test_ratio,
    output_retained_dimensions = output_retained_dimensions,
    # estimator_params = estimator_params,
    seed = 42
)

3. Start!

In [None]:
emulator.train(dataset, inputs, outputs, train_params, verbose = True)

### Score

Now the emulator is trained, we can use `emulator.score` to see how well it performs.
We're going to use the Root Mean Squared Error (MSE) metric.

In [None]:
rmse = np.sqrt(emulator.score(tl.ScoreParams(metric = 'RMSE', combined_score = True)))
print(f"RMSE  = {rmse:.3e}")

In [None]:
test_data = emulator.view_test_data()

In [None]:
mean, std = emulator.predict(test_data[inputs], verbose=False)

*** 

### Test the emulator 

In [None]:
i = np.random.randint(0, test_data.shape[0])

In [None]:
plot_test(i, test_data, mean, std, temperatures)
plt.show()

### Calibration

# Load the streamlit App

In [None]:
%%capture
# !streamlit run app.py