In [1]:
!pip install git+https://github.com/myerspat/pyMAISE.git

Collecting git+https://github.com/myerspat/pyMAISE.git
  Cloning https://github.com/myerspat/pyMAISE.git to /tmp/pip-req-build-j4ohvqnp
  Running command git clone --filter=blob:none --quiet https://github.com/myerspat/pyMAISE.git /tmp/pip-req-build-j4ohvqnp
  Resolved https://github.com/myerspat/pyMAISE.git to commit 3c3bf090ad6a32461aa80f7f1fdd6e0832d4c96c
  Preparing metadata (setup.py) ... [?25l[?25hdone


This notebook is intended to introduce and demonstrate some of the features of pyMAISE and examine the performance of machine learning models on a nuclear engineering application. For further information on the capabilities of the classes and functions shown in this notebook, please refer to the [pyMAISE API reference documentation](https://pymaise.readthedocs.io/en/latest/pymaise_api.html).


# Thermal Storage Tank DT

## Context

This simulated dataset has been generated using a digital twin (DT) of the thermal energy delivery system (TEDS) at Idaho National Laboratory (INL) for a project which focuses on uncertainty quantification of this DT.

The dataset is made for sensitivity analysis exercise using Sobol and FAST methods in order to identify major sources of uncertainty in a Dymola physics-based model of TEDS. To achieve this, each parameter is perturbed within the specified boundary condition interval for each sample. The sensitivity study focuses only on the thermal storage tank, while the rest of the components in TEDS, like heat exchanger, pipes, heaters, etc. are omitted. Additionally, previous analyses allowed the elimination of two input parameters: outlet pressure and outlet temperature.

**Our goal is to identify optimal ML models with pyMAISE that can predict the outlet temperatures accurately given the previously specified input parameters and boundary conditions, and then use these ML models to accelerate the process of performing detailed sensitivity analysis.**

<br>

***

## Input

Input data is a 3D tensor with shape (1024, 46, 4) for (samples, timesteps, features).

There are 1024 simulations/samples, each simulation is specified with 3 input parameters + time index (total 4 features). The "timesteps" axis specifies the value of the input as a function of time for 46 timesteps of 400s.

The features in order are:

- **time** : Elapsed experiment time (s)
- **massflow** : Inlet mass flow rate (boundary conditions : 0.0 - 2.5 $kg/s$)
- **shape_factor** : Constant factor for tank fillers' packing scheme (boundaryconditions : 2 - 3)
- **porosity** : Constant ratio of void volume and total volume (boundary conditions : 0.2 - 0.9)

"At each timestep, **massflow** adheres to specific boundary conditions, and its overall time-dependent trajectory follows this pattern:


                      peak due to sudden valve opening                                    
                     (t2,m2)                                                              
                           /\    reach steady                                             
                          /  \  (t3,m3)                                                   
                         /    \__________ (t4, m4)  charging finished, valve start closing
                        /                \                                                
                       /                  \                                               
      (0, 0) _________/                    \____________                                  
                  (t1, 0)               (t5,0)     (10960,0)                              
                  start charging         flow reaches 0       getting into discharge      

<br>

***

## Output

Output data is a 3D tensor with shape (1024, 13, 6) for (samples, sensors, time_slices).

There are 1024 simulations/samples, each simulation yields **inlet temperature (K)** measurements from 13 sensors, which are distributed around the tank, for 6 specific time points : 4000s, 6000s, 8000s, 10000s, 12000s, 14000s.

Sensors are thermocouples with the following names :

![thermocline.png](attachment:thermocline.png)

<br>

The following are a few standard packages and functions that will prove helpful while using pyMAISE along with pyMAISE-specific functionality.

In [2]:
import xarray as xr
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
from scipy.stats import uniform, randint
from sklearn.preprocessing import MinMaxScaler

# Plot settings
matplotlib_settings = {
    "font.size": 12,
    "legend.fontsize": 11,
    "figure.figsize": (8, 8)
}
plt.rcParams.update(**matplotlib_settings)

We need to import several functions and classes for machine learning tuning and analysis with pyMAISE. We must split the data into training/testing data and scale it. For this, we use the `pyMAISE.preprocessing` Python module. The remaining classes we can get from `pyMAISE` directly which we import as `mai` for convenience.

In [3]:
from pyMAISE.preprocessing import correlation_matrix, train_test_split, scale_data
import pyMAISE as mai

## pyMAISE Initialization

Starting any pyMAISE job requires initialization. This includes the definition of global settings used throughout pyMAISE. These settings and their defaults include:

- `problem_type`: the problem type, either regression or classification, defined by `pyMAISE.ProblemType`,
- `verbosity=0`: the level of output from pyMAISE,
- `random_state=None`: the seed for the random number generator, which can be used to get reproducible results from pyMAISE,
- `num_configs_saved=5`: the number of top hyperparameter configurations for each model evaluated during tuning,
- `new_nn_architecture=True`: a boolean that dictates whether to use the old deprecated pyMAISE neural network tuning architecture,
- `cuda_visible_devices=None`: sets the `CUDA_VISIBLE_DEVICES` environment variable.

The only argument that needs to be specified is `problem_type`. We also pass `"-1"` to `cuda_visible_devices` to ensure we only use tensorflow on the CPU. This is useful for this problem since we will build relatively simple, dense feedforward neural networks with a reasonably small data set. Therefore, running tensorflow on a GPU may hurt our performance. We leave the others default, giving us five hyperparameter configurations for each model, keeping the stochastic nature of some of the algorithms, and using the current neural network hyperparameter tuning architecture.

In [4]:
global_settings = mai.init(
    problem_type=mai.ProblemType.REGRESSION,   # Define a regression problem
    cuda_visible_devices="-1"                  # Use CPU only
)

## Data Loading and Preprocessing

We import the TEDS csv data from Github using `pandas.read_csv`, then we convert them to `xarray` dataset, which is the format pyMAISE uses for data manipulation.

In [5]:
#load the input and output dataframes from github
xdf=pd.read_csv('https://raw.githubusercontent.com/aims-umich/ners590data/main/teds_x.csv')
ydf=pd.read_csv('https://raw.githubusercontent.com/aims-umich/ners590data/main/teds_y.csv')

xnames=xdf.columns
ynames=ydf.columns

# Convert np arrays into xarrays
x = xr.DataArray(xdf, name="Input data", dims=["Samples", "Features"], coords={"Features":xnames})
y = xr.DataArray(ydf, name="Target data",dims=["Samples", "Features"], coords={"Features":ynames})

In [7]:
x

In [8]:
y

We end up with the following features in order :

**Input**

- `sf`: Constant factor for tank fillers' packing scheme (boundaryconditions : 2 - 3)
- `por`: Constant ratio of void volume and total volume (boundary conditions : 0.2 - 0.9)
- `msf_6` : Inlet mass flow rate after 7 timesteps of elapsed time (boundary conditions : 0.19 - 1.65 $kg.m^{-1}$)
- `msf_7` : Inlet mass flow rate after 8 timesteps of elapsed time (boundary conditions : 0.66 - 2.48 $kg.m^{-1}$)
<br>**.**
<br>**.**
<br>**.**
- `msf_22` : Inlet mass flow rate after 23 timesteps of elapsed time (boundary conditions : 0.06 - 0.21 $kg.m^{-1}$)
- `msf_23` : Inlet mass flow rate after 24 timesteps of elapsed time (boundary conditions : 0.00 - 0.01 $kg.m^{-1}$)

**Output**

- `TE_4_t4000` : Inlet temperature reading from the fourth east thermocouple after 4000s of elapsed time ($K$)
- `TE_4_t6000` : Inlet temperature reading from the fourth east thermocouple after 6000s of elapsed time ($K$)
<br>**.**
<br>**.**
<br>**.**
- `TN_1_1_t12000` : Inlet emperature reading from the 1st north thermocouple after 12000s of elapsed time ($K$)
- `TN_1_1_t14000` : Inlet temperature reading from the 1st north thermocouple after 14000s of elapsed time ($K$)



## Cut the notebook from here for Lab 8