In [None]:
import pandas as pd
import numpy as np
import os
import plotly.express as px
from pathlib import Path
pd.options.mode.chained_assignment = None  # default='warn'

---
# Characterisation method tutorial

The aim of this tutorial is to provide a complete workflow for material physical properties identification of a reference wall using **Modelitool** (modelica simulator) and **CorrAI** models by following these steps:

1. **Measurement import and verification**:
    - We will import measurement dataframe into the notebook and check measumreent before selecting an analysis period.
    
2. **Model Creation/Import**:
    - We will either create a new mathematical model or impor model (FMU, openModelica).

3. **Loading Measurements**:
    - We will load the necessary measurements or data that will be used for our sensitivity analysis and model id.f our model.

4. **Performing Sensitivity Analysis**:
    - Sensitivity analysis will be conducted to determine how different input variables affect the output of the model. This step helps in identifying the most influential parameters in the model.

5. **Model Identification**:
    - Using the results from the sensitivity analysis, we will identify the parameters that need to be adjusted to improve the model's accuracy. This involves fitting the model to the data and refining its parameters.

65. **Results Visualization and Interpretation**:
    - Finally, we will visualize the results of our analysis and interpret the findings. This includes plotting the sensitivity indices and comparing the model's predictions with the liability of the model.

---

# 1 Introduction


## 1.1 Use case presentation

A "real-scale" test bench is used. The **O3BET** (or in this example BEF test bench in Anglet, France) offers experimental conditions to evaluate building façade solutions. Heat exchanges in a cell are restricted on five of its faces, while the sixth face is dedicated to the tested solution. Internal temperature and humidity conditions can be controlled or monitored. External conditions, including temperatures and solar radiation, are measured.

The tested technology here is a green façade, coupled wiht insulated panels. The experimental setup is presented in the following picture: one cell is equiped with the technology (right one) and another serves as a reference (insulation only).

| Figure : Pictures of reference wall and Urban Canopee installation |
| :---: |
| <img src="images/cladding.png"  style="height:200px;">  <img src="images/BEF_facades.jpeg"  style="height:200px;"> |

Sensors (heatflux density meters, thermocouples, RTD) are positioned in several parts: in the middle of insulation panels, between insulation and concrete layers, between leaves, in substrates, indoor. Climatic conditions (external temperature, incident solar radiation) are also monitored.


| Figure : Sensors installation scheme |
| :---: |
| <img src="images/sensors.png"  style="height:200px;">  <img src="images/Sensor_photo1.jpeg"  style="height:200px;"> <img src="images/Sensor_photo2.jpeg"  style="height:200px;">   | 

- Measure campaign spans from  april 2024 to october 2024
- Acquisition timestep is 60 secondes minimum.

## 1.2 Identification framework
The following framework is proposed to identify the **REFERENCE** wall thermal conductivity, and provides:
- Physical model description using Python
- Sensitivity analysis to identify materials properties which have an influence on the discrepancy between model outputs and measured phenomenon
- Wall thermal conductivity identification using optimization algorithm

## 2.2 Load measurement file
First, let's load the reference cell measurement data on python that will be used as boundary conditions. Note that the data loaded here should be cleaned beforhand (see Tutorial **"MeasuredDat example"** if needed)

In [None]:
TUTORIAL_DIR = Path(os.getcwd()).as_posix()

In [None]:
reference_df = pd.read_csv(
    Path(TUTORIAL_DIR) / "resources/tuto_data_SA.csv",
    index_col=0,
    sep=";",
    decimal=".",
    parse_dates=True
)

In [None]:
reference_df.head()

In [None]:
reference_df.isna().any().any()

There seems to be no missing data. Let's first plot all data to check if nothing stands out. Although data was cleaned (supposedly, some measurement errors and irregularities might have been missed in the process.

In [None]:
px.line(reference_df)

Here we can see that one of the temperature sensors installed in the insulation panels stopped measuring on september 7th. Only T_ins_2 will be used as the insulation temperature.

In [None]:
reference_df.loc[:,"T_ins"] = reference_df['T_ins_2'] # middle of insulation temperature
reference_df.loc[:,"T_int"] =reference_df[['Tint_1', 'Tint_2']].mean(axis=1) # indoor temperature
reference_df.loc[:,"T_interface"] =  reference_df[['T_interface_1', 'T_interface_2']].mean(axis=1) # interface temperature, between insulation and concrete panels

For modeling purpose, we convert the temperature from °C to Kelvin.

In [None]:
temperatures = [
    'T_ins',
    'T_int',
    'T_interface', 
    'T_ext'
]
reference_df[temperatures] += 273.15

Also, for the first simulation, let's chose a short period with both sunny and cloudy days:

In [None]:
simulation_df = reference_df.loc["2024-09-04 00:00":"2024-09-09 00:00"]

# 2. Modeling approach and set-up

## 2.1 Proposed model 

For this example we propose a resistance/capacity approach.
 Based on electrical circuit analogy, each layer of the wall is modeled by two resistance and a capacity:


| Figure : RC model|
| :---: |
| <img src="images/RC_model.png"  style="height:400px;">   | 




The following is a brief description of the thermal model:

- Each wall layer is modeled by 2 thermal resistances and a capacity.
    - Resistances to create a gradiant and better resolution of distribution of heat flow : $ R_1 = R_2 = \frac{ep_{layer}}{lambda_{layer} \times 2} $ 
    - Capacity in the middle of both our layers, representing its thermal mass and ability to store heat. : $ C = ep_{layer} \times rho_{layer} \times cap_{layer} $
 
- Inside and outside convection/conduction transfers are model as a constant value thermal resistance.

- Infrared transfers are considered :
    - With the sky, with $ T_{sky} = 0.0552T_{ext}^{1.5} $ as the sky is a significant source of infrared radiation, especially at night. This radiation can have a considerable impact on the thermal behavior of the system, influencing both heating and cooling processes
    - With the surrounding considered to be at $ T_{ext} $ as surroundings or environment also emit infrared radiation

- Short wave solar radiation heat flux is computed $Sw_{gain} = Pyr \times \alpha_{coat} $ with $Pyr$ the measured solar radiation onthe wall (W/m²) and  $\alpha_{coat}$ the coating solar absorbtion coefficient.

- Temperatures $ T_{ext}$ and $T_{int} $ are boundary conditions.  $ T_{int}$ represents the temperature within the controlled environment of the system.

Here are somes theoretical parameters for the model:

In [None]:
# Surface of the tested wall
S_wall =  7

# Thickness of layer
ep_concrete = 0.200 #m
ep_ins = 0.140 #m

# Conductivity of layer
lambda_concrete = 0.13 # (W/mK)
lambda_ins = 0.031 # W/(mK)

# Density of layer
rho_concrete = 2400 # kg/m3 
rho_ins = 32.5  # kg/m3"


sc_concrete = 880 # J/kg.K
sc_ins = 1000 # J/kg.K

# solar paremetesr
alpha = 0.2 # absorption coefficient
epsilon = 0.8 # emissivity 
fview = 0.5 # view factor of tested wall

## 2.3 Define a simulator

You can either load a Modelica model or FMU (using **Modelitool** library, see: https://github.com/BuildingEnergySimulationTools/modelitool), 
or directly write a model on Python. For either, here are some in common parameters definition. 

For the model to work with corrAI and modelitool libraries (sensitityv analyses, optimisation, etc), the model will be written as a class with :
- A simulation method
- Simulation options

How it works:

- Parameters : The wall's thermal resistances (R_ext, R_int, R_concrete, R_ins) and heat capacities (C_concrete, C_ins) are initialized with default values, but can be overridden by user inputs. Other physical parameters include the wall surface area (S_wall), view factor (fview), and material emissivity (epsilon).
- Dataframe : The model uses a dataframe input which includes external and internal temperatures (T_ext, T_int), time (time_sec), and solar radiation (Pyr from a pyranometer).
- Initial Conditions: The temperatures of the external surface, concrete, insulation, and interfaces are initialized based on the external and internal temperatures at the start.
- Time-Stepping: The simulation proceeds over time steps, updating temperatures at each layer.
- Radiative Heat Transfer: The model calculates radiative heat transfer between the wall surface and the sky, ambient, and direct solar radiation.
- Temperature Update: At each time step, temperatures are updated based on thermal resistances and capacitances of each layer using a finite difference approach.


In [None]:
from corrai.base.model import Model

class OpaqueWallSimple(Model):

    def simulate(self, parameter_dict, simulation_options):
    
        default_parameters = {
                    "R_ext": 0.005,        
                    "R_int": 0.01,        
                    "R_concrete": 0.10,     
                    "R_ins": 0.32,         
                    "C_concrete": 2.95e6,   
                    "C_ins": 3.64e4,        
                    "alpha": 0.2,           
                    "S_wall": 7,            
                    "epsilon": 0.4,         
                    "fview": 0.5,        
                }
    
        parameters = {**default_parameters, **parameter_dict}

        R_ext = parameters["R_ext"]
        R_int = parameters["R_int"]
        R_concrete = parameters["R_concrete"]
        R_ins = parameters["R_ins"]

        C_concrete = parameters["C_concrete"]
        C_ins = parameters["C_ins"]
        alpha = parameters["alpha"]
        S_wall = parameters["S_wall"]
        epsilon = parameters["epsilon"]
        fview = parameters["fview"]

        
        sigma = 5.67e-8  # Stefan-Boltzmann constant in W/m^2/K^4

        # Extract simulation data
        df = simulation_options["dataframe"]
        time = df["time_sec"].values
        T_ext = df["T_ext"].values
        T_int = df["T_int"].values
        Q_rad = df["Pyr"].values

        # Extract simulation options
        startTime = simulation_options.get("startTime", time[0])
        stopTime = simulation_options.get("stopTime", time[-1])

        # Select the data within the specified time range
        mask = (time >= startTime) & (time <= stopTime)
        time = time[mask]
        T_ext = T_ext[mask]
        T_int = T_int[mask]
        Q_rad = Q_rad[mask]

        # Initialize temperature arrays
        T_se = np.zeros(len(time))  # External surface
        T_concrete = np.zeros(len(time))  # Concrete layer
        T_ins = np.zeros(len(time))  # Insulation layer
        T_interface = np.zeros(len(time))  # Insulation-CLT interface
        T_si = np.zeros(len(time))  # Internal surface interface
        T_sky = np.zeros(len(time))  # Sky temperature

        # Set initial conditions
        T_se[0] = T_ext[0]
        T_concrete[0] = 299 #T_ext[0]
        T_ins[0] = T_int[0]
        T_interface[0] = (T_ins[0] + T_concrete[0]) / 2
        T_si[0] = T_int[0]
        T_sky[0] = T_int[0]

        # Perform simulation
        for t in range(1, len(time)):
            dt = time[t] - time[t - 1]

            # Calculate sky temperature
            T_sky[t] = 0.0552 * (T_ext[t] ** 1.5)

            # Calculate radiative heat flow
            Q_rad_sky = epsilon * fview * sigma * (T_se[t - 1] ** 4 - T_sky[t] ** 4) * S_wall
            Q_rad_amb = epsilon * fview * sigma * (T_se[t - 1] ** 4 - T_ext[t - 1] ** 4) * S_wall
            Q_rad_dir = Q_rad[t - 1] * alpha * S_wall

            # Calculate interface temperatures
            T_se[t] = (T_ext[t - 1] / R_ext + T_ins[t - 1] / (R_ins / 2)
                       + Q_rad_dir - Q_rad_sky - Q_rad_amb) / (1 / R_ext + 1 / (R_ins / 2))

            T_interface[t] = (T_ins[t - 1] / (R_ins / 2) + T_concrete[t - 1] / (R_concrete / 2)) / \
                             (1 / (R_concrete / 2) + 1 / (R_ins / 2))

            T_si[t] = (T_int[t - 1] / R_int + T_concrete[t - 1] / (R_concrete / 2)) \
                      / (1 / R_int + 1 / (R_concrete / 2))

            # Update temperatures based on capacitance
            T_ins[t] = T_ins[t - 1] + dt / C_ins * ((T_se[t] - T_ins[t - 1]) / (R_ins / 2)
                                                    + (T_interface[t] - T_ins[t - 1]) / (R_ins / 2))
            
            T_concrete[t] = T_concrete[t - 1] + dt / C_concrete * (
                        (T_interface[t] - T_concrete[t - 1]) / (R_concrete / 2)
                        + (T_si[t] - T_concrete[t - 1]) / (R_concrete / 2))

        # Prepare output dataframe
        df_out = pd.DataFrame({
            # "T_ext": T_ext,
            # "T_se": T_se,
            "T_concrete": T_concrete,
            "T_interface": T_interface,
            "T_ins": T_ins,
            # "T_si": T_si,
            # "T_int": T_int,
            # "T_sky": T_sky,
        }, index=df.index[mask])

        self.simulation_options = simulation_options

        return df_out

    def save(self, file_path):
        pass


Datetime should be in second: we can use datetime_to_seconds from modelitool. Moreover, data are in minutes, we should resample them to 5min samples.

In [None]:
from modelitool.combitabconvert import datetime_to_seconds

In [None]:
simulation_df.loc[:,"time_sec"] = datetime_to_seconds(simulation_df.index)
simulation_df_resample = simulation_df.resample("5min").mean()

Now, let's define: 
- simulation options, with starttime, endtime, and a dataframe for boundary conditions
- a dictionary, containing values for ou different parameters

In [None]:
second_index = datetime_to_seconds(simulation_df_resample.index)

In [None]:
simulation_options_PYTH={
    "dataframe":simulation_df_resample,
    "startTime": second_index[0],
    "endTime": second_index[-1],  
}

In [None]:
parameter_dict_PYTH = {
    "R_ext": 0.04/S_wall,       
    "R_int": 0.13/S_wall,      
    "R_concrete": 1 / (lambda_concrete / ep_concrete) / 2 / S_wall,   
    "R_ins": 1 / (lambda_ins / ep_ins) / 2 / S_wall, 
    "C_ins": rho_ins*ep_ins*S_wall*sc_ins,  
    "C_concrete": rho_concrete*ep_concrete*S_wall*sc_concrete,       
    "alpha": alpha,       
    "S_wall": S_wall,         
    "epsilon": epsilon,
    'fview': fview
}

We can now instantiate the model and run the simulation. 

In [None]:
simu_PYTH = OpaqueWallSimple()

init_res_PYTH = simu_PYTH.simulate(
    parameter_dict=parameter_dict_PYTH, 
    simulation_options=simulation_options_PYTH
)

Let's compare results to measurement:

In [None]:
#renaming
copy_res = init_res_PYTH
copy_res.index = copy_res.index.tz_localize(None)

copy_res = copy_res.rename(columns={
    "T_concrete": "T_concrete_PYTHON",
    "T_interface": "T_interface_PYTHON",
    "T_ins": "T_insulation_PYTHON",
})

In [None]:
measure_comp = pd.concat([
    simulation_df_resample[["T_interface", "T_ins"]], 
    copy_res[["T_interface_PYTHON", "T_insulation_PYTHON" ]]], axis = 1)

color_map = {
    "T_interface": "darkblue", 
    "T_interface_PYTHON": "blue", 
    "T_ins": "darkgreen", 
    "T_insulation_PYTHON": "green" 
}

fig = px.line(measure_comp)

for trace in fig.data:
    trace_name = trace.name
    if trace_name in color_map:
        trace.line.color = color_map[trace_name]
        
fig.show()


Not so good. A sensitivity analysis should be performed to "rank" the parameters by order of influence on the error between measured temperature and model prediction.


# 3. Sensitivity analysis

## 3.1. Error function
The chosen error function is the CV_RMSE. The formula for CV_RMSE is given by:

$$
CV\_RMSE = \frac{RMSE}{\bar{y}}
$$

Where:
- *RMSE* is the root mean squared error,
- *bar{y}* is the mean of the observed values.

The RMSE is calculated as:

$$
RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}
$$

Where:
- *n* is the number of observations,
- *y_i* is the observed value for the \( i \)-th observation,
- *hat{y}_i* is the predicted value for the \( i \)-th observation.

The CV_RMSE measures the variation of the RMSE relative to the mean of the observed values. It provides a standardized measure of the error, which can be useful for comparing the performance of different models across different datasets.


## 3.2. Tested parameters

The chosen parameters are all the model parameters θ=(R_concrete,R_ins,C_ins,C_concrete,alpha,epsilon, R_ext, R_int ). They must be described using a dictionary.
As you can see, simulation results are very inaccurate and give very high temperatures.  We should use the measurement to obtain optimal values of parameters. 

In [None]:
from corrai.base.parameter import Parameter

In [None]:
high_conf = 0.25
low_conf = 0.8

coef = high_conf

params = [
    {
        Parameter.NAME: 'R_concrete',
        Parameter.INTERVAL: ((1-coef)*0.109, (1+coef) * 0.109),
        Parameter.TYPE: "Real",
    },
    {
        Parameter.NAME: 'R_ins',
        Parameter.INTERVAL: ((1-coef) * 0.322, (1+coef)  * 0.322),
        Parameter.TYPE: "Real",
    },
    {
        Parameter.NAME: 'C_ins',
        Parameter.INTERVAL: ((1-coef) * 31850, (1+coef)  * 31850),
        Parameter.TYPE: "Real",
    },
    {
        Parameter.NAME: 'C_concrete',
        Parameter.INTERVAL: ((1-coef) * 2956800, (1+coef)  * 2956800),
        Parameter.TYPE: "Real",
    },

    {
        Parameter.NAME: 'alpha',
        Parameter.INTERVAL: (0.1,0.6),
        Parameter.TYPE: "Real",
    },
    {
        Parameter.NAME: 'epsilon',
        Parameter.INTERVAL: (0.2, 0.9),
        Parameter.TYPE: "Real",
    }, 
    {
        Parameter.NAME: 'R_ext',
        Parameter.INTERVAL: ((1-coef) * 0.04/S_wall, (1+coef)  * 0.04/S_wall),
        Parameter.TYPE: "Real",
    }, 
    {
        Parameter.NAME: 'R_int',
        Parameter.INTERVAL: ((1-coef) * 0.13/S_wall, (1+coef)  * 0.13/S_wall),
        Parameter.TYPE: "Real",
    }, 
]

## 3.3. Problem description
We can now use a <code>SAnalysis</code> to set-up the study, requiring a sensitivity method. 

*Note: for now, only <code>SOBOL</code>, <code>FAST</code>, <code>RBD_FAST</code>, 
and <code>MORRIS</code> methods are implemented.*

In [None]:
from corrai.sensitivity import SAnalysis, Method

### 3.1.A MORRIS method

Let's try a first screening using <code>MORRIS</code> method.

In [None]:
sa_study = SAnalysis(
    parameters_list=params,
    method=Method.MORRIS,
)

Then we draw a sample of parameters to be simulated. Each method has its sampling method. Please see SALib documentation for further explanation (https://salib.readthedocs.io/en/latest/index.html)

Note that: 
- Convergence properties of the Sobol' sequence is only valid if
        `N` (100) is equal to `2^n`.
        N (int) – The number of samples to generate. Ideally a power of 2 and <= skip_values.
- Convergence properties of the Fast' method is only valid if sample size N > 4M^2 (M=4 by default)

In [None]:
sa_study.draw_sample(
    n=20, 
)

In [None]:
len(sa_study.sample)

The sample is available as a 2d array <code>sa_study.sample</code>. Lines are simulations
to run and columns are parameters values.

Let's run the simulations. **CAREFUL depending on your computer and on the number of samples, it can be long.**

In [None]:
import warnings

sa_study.evaluate(
    model = simu_PYTH, 
    simulation_options=simulation_options_PYTH,
)

We can plot all simulations in one graph and compare the simulated internal temperature to measured T_int. Argument <code>show_legends</code> can be set to True if you want see associated parameters values.

In [None]:
from corrai.sensitivity import plot_sample

In [None]:
plot_sample(
    sample_results=sa_study.sample_results,
    ref=simulation_df_resample["T_ins"],
    indicator="T_ins",
    show_legends=False,
    y_label="Temperature [K]",
    x_label="Time [s]",
)

This graph can be very instructive as at some moments, simulations are far from measurements. It show that whatever the values of our parameters, it still does not fit reality: this is either due to a problem of measurement, or to our modeling approach (physical inconsistency, physical phenomenon not properly taken into account, etc.).

We can also look at results using a parallel coordinate plot, with <code>plot_pcp</code>, for all parameter values and for  an indicator. It displays how the variation of the parameters has an impact on one of the model output. 
 
The chosen indicator (for instance output T_wall_i.T,  calculated temperature in one of the hybrid wall layers) must be aggragated though the aggragation_method argument (Default to mean).

Now that all simulations are run, we can analyze these results regarding an indicator with method <code>analyze</code>. We can either choose an aggregation method on Ti (for instance the average temperature throughout the timerange of simulations), or an aggregation function between predicted and measured temperatures.

In [None]:
from corrai.metrics import cv_rmse, nmbe

sa_study.analyze(
    indicator="T_ins",
    reference_df=simulation_df_resample["T_ins"],
    agg_method=cv_rmse,
)

We can now have a look at the sensitivity analysis results.
They are stored in <code>sensitivity_results</code>. It holds the output formatted
by <code>SALib</code>.

In [None]:
sa_study.sensitivity_results

According to the method used, we can sum the indices of partial or total order. You can do it manually or use method <code>calculate_sensitivity_indicators</code>.

In [None]:
sa_study.calculate_sensitivity_indicators()

For <code>MORRIS</code> method, two indices, µj* for the mean of the absolute values of these effects and σj for the standard deviation of these effects, are calculated as follows:

$$
mu_{j}^{*} = \frac{1}{r} \sum_{i=1}^{r} E_{ij}
$$

$$
sigma_{j} = \sqrt{\frac{1}{r-1} \sum_{i=1}^{r} (E_{ij} - \mu_{j}^{*})^2}
$$

The highter $mu_{j}$, the more the parameter $j$ contributes to an uncertain output, and the higher $sigma_{j}$, the more pronounced the interaction effects between the model parameters are. Plotting $sigma_{j}$ against $mu_{j}^{}$ is often used to distinguish factors with negligible, linear, and/or interaction effects.

In [None]:
from corrai.sensitivity import plot_morris_scatter 
plot_morris_scatter(salib_res=sa_study.sensitivity_results, title='Elementary effects', unit='J', autosize=True) 

Besides this visual interpretation, it is also possible to calculate the Euclidean distance $d$ to the origin to obtain the total effect of the uncertain parameter:

In [None]:
# from corrai.sensitivity import plot_morris_st_bar
# plot_morris_st_bar(sa_study.sensitivity_results)

Five parameters seem to have more impact on the error than other: $alpha$, $R_{ins}$, $epsilon$, $R_{ext}$, and $R_{concrete}$. 
Let's check if it is consistant with <code>SOBOL</code> method.

### 3.1.A SOBOL method

In [None]:
sa_study = SAnalysis(
    parameters_list=params,
    method=Method.SOBOL,
)

In [None]:
sa_study.draw_sample(
    n=2**8, 
)
len(sa_study.sample)

In [None]:
sa_study.evaluate(
    model = simu_PYTH, 
    simulation_options=simulation_options_PYTH,
)

In [None]:
sa_study.analyze(
    indicator="T_ins",
    reference_df=simulation_df_resample["T_ins"],
    agg_method=cv_rmse,
)

In [None]:
sa_study.calculate_sensitivity_indicators()

The sum of all the indices should be close to 1. Also, the mean confidence interval should be very low. In that case, results of the sensitivity analysis can be considered as robust.

In [None]:
from corrai.sensitivity import plot_sobol_st_bar
plot_sobol_st_bar(sa_study.sensitivity_results)

For analyses other than Sobol or Morris, you can plot results using: 

In [None]:
# import plotly.graph_objects as go

# # Retrieve results
# S1 = sa_study.sensitivity_results['S1']
# S1_conf = sa_study.sensitivity_results['S1_conf']
# names = sa_study.sensitivity_results['names']

# fig = go.Figure()

# fig.add_trace(go.Bar(
#     x=names,
#     y=S1,
#     error_y=dict(
#         type='data',
#         array=S1_conf,
#         visible=True
#     ),
#     marker=dict(color='orange')  
# ))

# fig.update_layout(
#     title='Sensitivity indices with error bars',
#     xaxis=dict(title='Parameter'),
#     yaxis=dict(title='Indices')
# )

# # Afficher le graphique
# fig.show()


## 3.4 Conclusion on sensitivity analysis

The sensitivity analysis allows us to rank the influence of uncertain parameter
on an indicator. 

Results with Sobol are consistant with Morris.  $alpha$ is the most influencial parameters, followed by $R_{ins}$, $epsilon$, $R_{ext}$, and $R_{concrete}$

In the following section, we will see how to use corrai to identify the
optimal values for these parameters in order to fit the measurement.

# 4. Identification
Now, we proceed to finding optimal values for these parameters by minimizing the coefficient of variation of root mean square error (cv_rmse) between one or several measured nodes, and one or several relevant outputs of our model.


### Step-by-Step Process

1. **Define the model and parameters**: by defining `OpaqueWallSimple`(this we already did) and specifying the parameters to be identify. Each parameter includes a name, an interval of possible values, a type, and an initial value.

2. **Instantiate an objective function**: We create an instance of the `ObjectiveFunction` class, providing the model, simulation options, list of parameters, and indicators. The `scipy_obj_function` method of `ObjectiveFunction` will be used as the objective function for optimization. This method calculates the cv_rmse for the given parameter values.

3. **Perform optimization**: We use the `minimize_scalar` function from `scipy.optimize` to minimize the objective function. Different methods can be chosen for optimization, such as Brent, bounded, and golden.


## 4.1. Objective function
In parameter optimization, we aim to adjust certain model parameters to minimize the difference between simulated and observed data. The objective function is a scalar function that quantifies this difference. In this case, the CV_RMSE (Coefficient of Variation of Root Mean Square Error) is used as a measure of how well the model output matches the reference measurements.

**How the ObjectiveFunction Class Works** : The ObjectiveFunction class simplifies the process of optimizing model parameters by providing a structured way to:

- Run simulations: For a given set of parameters, the model is simulated over the input data.
- Calculate error metrics (e.g., CV_RMSE): The model output is compared to the reference measurements, and a scalar error metric is calculated (such as the CV_RMSE).
Optimize: The class can then be used with optimization algorithms (like scipy.optimize or pymoo) to adjust the model parameters in order to minimize the error metric.
Attributes of the ObjectiveFunction Class

To do so, the  `ObjectiveFunction` class is designed to facilitate the optimization of model parameters using `scipy.optimize` or `pymoo` optimization methods by encapsulating the logic for simulation and indicator calculation. The `ObjectiveFunction` class takes a model, simulation options, a list of parameters to be calibrated, and a list of indicators as input. It provides methods to calculate the objective function, which can be used by optimization routines to find the optimal parameters.

### Attributes

- **model**: The model to be calibrated.
- **simulation_options**: A dictionary containing simulation options, including input data.
- **param_list**: A list of dictionaries specifying the parameters to be calibrated.
- **indicators**: A list of indicators to be used in the objective function.
- **agg_methods_dict**: A dictionary specifying aggregation methods for each indicator (optional).
- **reference_dict**: A dictionary mapping indicators to reference columns in `reference_df` (optional).
- **reference_df**: A DataFrame containing reference data for the indicators (optional).
- **custom_ind_dict**: A dictionary for custom indicators (optional).

Before we start using optimization functions, we need to instantiate the `ObjectiveFunction` with the appropriate parameters and indicators.
Let's import `ObjectiveFunction` first and define an anlysis daterange.


In [None]:
from corrai.sensitivity import ObjectiveFunction

And define the analysis daterange. 

In [None]:
feat_train = simulation_df_resample

## 4.2. One-dimensional optimization problem

First, we can try scalar functions optimization from scipy (different methods: brent, boulded, golden ... see documentation on scipy website).
For Scypi, each objective function is minimized for optimization:
- Here we chose as indicators the temperature calculated and measured within the wall insulation. Note this could be another node (a heat flux densitiy, another temperature node).
- The identified parameter is  $alpha$.

Let's define a reference dictionnary, setting observation and prediction to be used for the CV_RMSE calculation. (Here they have have the same name, so it can be confusing)

In [None]:
reference_dict = {
    "T_ins": "T_ins",
}

An aggregation method for this node: 

In [None]:
from corrai.metrics import cv_rmse

agg_methods_dict = {
    "T_ins": cv_rmse,
}

We also need to set which parameter should be calibrated.

In [None]:
calibration_params = [
    {
        Parameter.NAME: 'alpha',
        Parameter.INTERVAL: (0.1,0.6),
        Parameter.TYPE: "Real",
    },
]

We can now instanciate an objective function, using `ObjectiveFunction`.

In [None]:
obj_func = ObjectiveFunction(
    model=simu_PYTH,
    simulation_options=simulation_options_PYTH,
    param_list=calibration_params,
    agg_methods_dict=agg_methods_dict,
    indicators=["T_ins"],
    reference_df=feat_train,
    reference_dict=reference_dict,
)

The `minimize_scalar` function in `scipy.optimize` is used for scalar function minimization, specifically for one-dimensional optimization problems. This function finds the minimum value of a scalar function over a specified interval. The main methods are `Brent`, `bounded`, and `golden`.

The function returns an optimization result object that contains information about the optimization process and the final solution.

- **Brent** :The Brent method uses Brent’s algorithm, which combines a parabolic interpolation with the golden section search. This method does not require the interval bounds.
- **Golden** : Employs the golden section search method, which reduces the interval of uncertainty using the golden ratio. Simple and reliable for unimodal functions, but may be slower than Brent's method.
- **Bounded** : Restricts the search to the specified bounds using a combination of golden section search and parabolic interpolation.
Advantages: Ensures that the solution remains within the given bounds, making it ideal for constrained problems.

In [None]:
from scipy.optimize import minimize_scalar

In [None]:
import warnings
warnings.filterwarnings('ignore', category=RuntimeWarning)

result = minimize_scalar(
    obj_func.scipy_obj_function, 
    bounds=obj_func.bounds,
    method="Bounded"
)

result

A solution is found with a value of 0.12 for alpha and 0.36 for the CV_RMSE. Let's check if the parameter value is close to the boundaries.

In [None]:
obj_func.bounds

It is a bit close to 0.1 but not a the limit. Let's now run the simulation using this parameter value and compare with the initial simulation.

In [None]:
parameter_names = [param[Parameter.NAME] for param in calibration_params]
parameter_dict1 = {param_name: result.x for i, param_name in enumerate(parameter_names)}

result_optim = simu_PYTH.simulate(
    parameter_dict=parameter_dict1, 
    simulation_options=simulation_options_PYTH
)

In [None]:
import plotly.graph_objects as go

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=init_res_PYTH.index,
    y=simulation_df_resample["T_ins"],
    fill=None,
    mode='lines',
    line_color='green',
    name="T_insulation - Measurement"
))

fig.add_trace(go.Scatter(
    x=init_res_PYTH.index,
    y=init_res_PYTH["T_ins"],
    fill=None,
    mode='lines',
    line_color='orange',
    name="T_insulation - Initial results"
))

fig.add_trace(go.Scatter(
    x=init_res_PYTH.index,
    y=result_optim["T_ins"],
    fill=None,
    mode='lines',
    line_color='brown',
    name="T_insulation - Optimization results"
))


fig.update_layout(
    title='Optimization vs. Measurement ',
    xaxis_title='Date',
    yaxis_title='Temperature [K]')

fig.show()

Results are closer to measurements but still far off.

## 4.3. Multi- objectives and parameters optimization

Let's use Pymoo, integrated into `MyProblem` class of corrAI.
Note that for Pymoo as well, each objective function is supposed to be minimized, and each constraint needs to be provided in the form of ≤0.

For multiobjectives and new parametres, we need to redefine the objective function. 
Here, we can add the interferace temperature (between insulation and concrete panels) as an observation. 

In [None]:
from corrai.metrics import cv_rmse

agg_methods_dict = {
    "T_ins": cv_rmse,
    "T_interface": cv_rmse,
}

In [None]:
reference_dict = {
    "T_ins": "T_ins",
    "T_interface": "T_interface"
}

### 4.3.1. Few parameters

In [None]:
coef = 0.5

calibration_params = [
    {
        Parameter.NAME: 'R_concrete',
        Parameter.INTERVAL: ((1-coef)*0.109, (1+coef) * 0.109),
        Parameter.TYPE: "Real",
    },
    {
        Parameter.NAME: 'R_ins',
        Parameter.INTERVAL: ((1-coef) * 0.322, (1+coef)  * 0.322),
        Parameter.TYPE: "Real",
    },
    {
        Parameter.NAME: 'R_ext',
        Parameter.INTERVAL: ((1-coef) * 0.04/S_wall, (1+coef)  * 0.04/S_wall),
        Parameter.TYPE: "Real",
    },  

    {
        Parameter.NAME: 'alpha',
        Parameter.INTERVAL: (0.1,0.3), ## shorter interval, as 0.12 was found with scipy
        Parameter.TYPE: "Real",
    },
]

In [None]:
obj_func = ObjectiveFunction(
    model=simu_PYTH,
    simulation_options=simulation_options_PYTH,
    param_list=calibration_params,
    agg_methods_dict=agg_methods_dict,
    indicators=["T_ins", "T_interface"],
    reference_df=feat_train,
    reference_dict=reference_dict,
)

Now let's instanciate MyProblem. For mixed parameters (integer, binary, choices between specific values, ...), you can use MyMixedProblem. 
If any, they should have been properly defined in calibration_params, with the <code>Parameter.TYPE</code>.

In [None]:
from corrai.multi_optimize import MyProblem 

problem = MyProblem(
    parameters=calibration_params,
    obj_func_list=[obj_func],
    func_list=[],
    function_names=["T_ins", "T_interface"], #measurement
    constraint_names=[],
)

For a two objective problem, we choose here **NSGA2**, as a well-known multi-objective optimization algorithm based on non-dominated sorting and crowding.
List of algorithms here https://pymoo.org/algorithms/list.html#nb-algorithms-list.

If the verbose=True, some printouts during the algorithm’s execution are provided. This can very from algorithm to algorithm. Here, we execute NSGA2 on a problem where pymoo has no knowledge about the optimum. Each line represents one iteration. The first two columns are the current generation counter and the number of evaluations so far. For constrained problems, the next two columns show the minimum constraint violation (cv (min)) and the average constraint violation (cv (avg)) in the current population. This is followed by the number of non-dominated solutions (n_nds) and two more metrics which represents the movement in the objective space.

In [None]:
from pymoo.algorithms.moo.nsga2 import NSGA2
from pymoo.optimize import minimize
from pymoo.operators.crossover.pntx import TwoPointCrossover
from pymoo.termination import get_termination
from pymoo.algorithms.moo.nsga2 import NSGA2
from pymoo.operators.crossover.sbx import SBX
from pymoo.operators.mutation.pm import PM
from pymoo.operators.sampling.rnd import FloatRandomSampling

In [None]:
algorithm = NSGA2(
    pop_size=50,
    #n_offsprings=10,
    #sampling=FloatRandomSampling(),
    crossover=SBX(prob=0.9, eta=15),
    mutation=PM(eta=20),
    eliminate_duplicates=True
)

Here, we will run 15 generation with a population of 50.

In [None]:
termination = get_termination("n_gen", 15)

res = minimize(problem,
               algorithm,
               termination,
               seed=42,
               verbose=True)

print("Best solution found: \nX = %s\nF = %s" % (res.X, res.F))

Let's visualize the objectives functions results.

In [None]:
from pymoo.visualization.scatter import Scatter
Scatter().add(res.F).show()

For a bi-objective problem,and helping us chosing the best set of parameters value, wethen use the decomposition method called Augmented Scalarization Function (ASF), a well-known metric in the multi-objective optimization literature.
Let us assume the are equally important by setting the weights to 0.5 and 0.5 and setting these

In [None]:
from pymoo.decomposition.asf import ASF
F = res.F
approx_ideal = F.min(axis=0)
approx_nadir = F.max(axis=0)
nF = (F - approx_ideal) / (approx_nadir - approx_ideal)

fl = nF.min(axis=0)
fu = nF.max(axis=0)
weights = np.array([0.5, 0.5])
decomp = ASF()

i = decomp.do(nF, 1/weights).argmin()

parameter_names = [param[Parameter.NAME] for param in calibration_params]
parameter_dict = {param_name: res.X[i][j] for j, param_name in enumerate(parameter_names)}

print(
    "Best regarding ASF: Point \ni = %s\nF = %s" % (i,  F[i]),
    parameter_dict
)


The estimated parameters seem consistent with our expectations. We can compare the profile of measured indoor temperature with the output that the model predicts given the identified optimal parameters. 

In [None]:
result_optim = simu_PYTH.simulate(
    parameter_dict=parameter_dict, 
    simulation_options=simulation_options_PYTH,
)

In [None]:
import plotly.graph_objects as go

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=init_res_PYTH.index,
    y=simulation_df_resample["T_ins"],
    fill=None,
    mode='lines',
    line_color='green',
    name="T_insulation - Measurement"
))

fig.add_trace(go.Scatter(
    x=init_res_PYTH.index,
    y=init_res_PYTH["T_ins"],
    fill=None,
    mode='lines',
    line_color='orange',
    name="T_insulation - Initial results"
))


fig.add_trace(go.Scatter(
    x=init_res_PYTH.index,
    y=result_optim["T_ins"],
    fill=None,
    mode='lines',
    line_color='brown',
    name="T_insulation - Optimization results"
))


fig.update_layout(
    title='Optimization vs. Measurement ',
    xaxis_title='Date',
    yaxis_title='Temperature [K]')

fig.show()

In [None]:
import plotly.graph_objects as go

fig = go.Figure()


fig.add_trace(go.Scatter(
    x=init_res_PYTH.index,
    y=simulation_df_resample["T_interface"],
    fill=None,
    mode='lines',
    line_color='green',
    name="T_interface - Measurement"
))

fig.add_trace(go.Scatter(
    x=init_res_PYTH.index,
    y=init_res_PYTH["T_interface"],
    fill=None,
    mode='lines',
    line_color='orange',
    name="T_interface - Initial results"
))

fig.add_trace(go.Scatter(
    x=init_res_PYTH.index,
    y=result_optim["T_interface"],
    fill=None,
    mode='lines',
    line_color='brown',
    name="T_interface - Optimization results"
))

fig.update_layout(
    title='Optimization vs. Measurement ',
    xaxis_title='Date',
    yaxis_title='Temperature [K]')

fig.show()

### 4.3.1. All parameters

We can try to identify all parameters at once, with larger intervals.

In [None]:
coef = 0.99

calibration_params = [
    {
        Parameter.NAME: 'R_concrete',
        Parameter.INTERVAL: ((1-coef)*0.109, (1+coef) * 0.109),
        Parameter.TYPE: "Real",
    },
    {
        Parameter.NAME: 'R_ins',
        Parameter.INTERVAL: ((1-coef) * 0.322, (1+coef)  * 0.322),
        Parameter.TYPE: "Real",
    },
    {
        Parameter.NAME: 'C_ins',
        Parameter.INTERVAL: ((1-coef) * 31850, (1+coef)  * 31850),
        Parameter.TYPE: "Real",
    },
    {
        Parameter.NAME: 'C_concrete',
        Parameter.INTERVAL: ((1-coef) * 2956800, (1+coef)  * 2956800),
        Parameter.TYPE: "Real",
    },

    {
        Parameter.NAME: 'alpha',
        Parameter.INTERVAL: (0.1,0.6),
        Parameter.TYPE: "Real",
    },
    {
        Parameter.NAME: 'epsilon',
        Parameter.INTERVAL: (0.2, 0.9),
        Parameter.TYPE: "Real",
    }, 
    {
        Parameter.NAME: 'R_ext',
        Parameter.INTERVAL: ((1-coef) * 0.04/S_wall, (1+coef)  * 0.04/S_wall),
        Parameter.TYPE: "Real",
    }, 
    {
        Parameter.NAME: 'R_int',
        Parameter.INTERVAL: ((1-coef) * 0.13/S_wall, (1+coef)  * 0.13/S_wall),
        Parameter.TYPE: "Real",
    }, 
]

obj_func = ObjectiveFunction(
    model=simu_PYTH,
    simulation_options=simulation_options_PYTH,
    param_list=calibration_params,
    agg_methods_dict=agg_methods_dict,
    indicators=["T_ins", "T_interface"],
    reference_df=feat_train,
    reference_dict=reference_dict,
)


problem = MyProblem(
    parameters=calibration_params,
    obj_func_list=[obj_func],
    func_list=[],
    function_names=["T_ins", "T_interface"], #measurement
    constraint_names=[],
)

algorithm = NSGA2(
    pop_size=50,
    #n_offsprings=10,
    #sampling=FloatRandomSampling(),
    crossover=SBX(prob=0.9, eta=15),
    mutation=PM(eta=20),
    eliminate_duplicates=True
)

termination = get_termination("n_gen", 15)

res = minimize(problem,
               algorithm,
               termination,
               verbose=True)

print("Best solution found: \nX = %s\nF = %s" % (res.X, res.F))

In [None]:
from pymoo.visualization.scatter import Scatter
Scatter().add(res.F).show()

In [None]:
from pymoo.decomposition.asf import ASF
F = res.F
approx_ideal = F.min(axis=0)
approx_nadir = F.max(axis=0)
nF = (F - approx_ideal) / (approx_nadir - approx_ideal)

fl = nF.min(axis=0)
fu = nF.max(axis=0)
weights = np.array([0.9, 0.1])
decomp = ASF()

i = decomp.do(nF, 1/weights).argmin()

parameter_names = [param[Parameter.NAME] for param in calibration_params]
parameter_dict2 = {param_name: res.X[i][j] for j, param_name in enumerate(parameter_names)}

print("Best regarding ASF: Point \ni = %s\nF = %s" % (i, F[i]),
      parameter_dict2
     )

In [None]:
result_optim2 = simu_PYTH.simulate(
    parameter_dict=parameter_dict2, 
    simulation_options=simulation_options_PYTH,
)

In [None]:
import plotly.graph_objects as go

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=init_res_PYTH.index,
    y=simulation_df_resample["T_ins"],
    fill=None,
    mode='lines',
    line_color='green',
    name="T_insulation - Measurement"
))

fig.add_trace(go.Scatter(
    x=init_res_PYTH.index,
    y=init_res_PYTH["T_ins"],
    fill=None,
    mode='lines',
    line_color='orange',
    name="T_insulation - Initial results"
))


fig.add_trace(go.Scatter(
    x=init_res_PYTH.index,
    y=result_optim["T_ins"],
    fill=None,
    mode='lines',
    line_color='brown',
    name="T_insulation - Optimization results 1"
))

fig.add_trace(go.Scatter(
    x=init_res_PYTH.index,
    y=result_optim2["T_ins"],
    fill=None,
    mode='lines',
    line_color='black',
    name="T_insulation - Optimization results 2"
))

fig.update_layout(
    title='Optimization vs. Measurement ',
    xaxis_title='Date',
    yaxis_title='Temperature [K]')

fig.show()

### Validation set
An important step is to check identified parameters on validation set. Let's try on an new period using the last identified values.

In [None]:
validation_set = reference_df.loc["2024-09-09 00:00":"2024-09-14 00:00"]
validation_set.loc[:,"time_sec"] = datetime_to_seconds(validation_set.index)

validation_set = validation_set.resample('5min').mean()
second_index = datetime_to_seconds(validation_set.index)

new_simulation_options_PYTH={
    "dataframe":validation_set,
    "startTime": second_index[0],
    "endTime": second_index[-1],  
}

In [None]:
validation_results = simu_PYTH.simulate(
    parameter_dict2,
    new_simulation_options_PYTH
)

In [None]:
import plotly.graph_objects as go

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=validation_results.index,
    y=validation_results["T_ins"] ,
    fill=None,
    mode='lines',
    line_color='orange',
    name="T_insulation - Validation result"
))

fig.add_trace(go.Scatter(
    x=validation_results.index,
    y=validation_set["T_ins"],
    fill=None,
    mode='lines',
    line_color='green',
    name="T_insulation - Measurement"
))

fig.update_layout(
    title='Simulation vs. Measurement ',
    xaxis_title='Date',
    yaxis_title='Temperature [K]')

fig.show()

In [None]:
cv_rmse(
    validation_results["T_ins"],
    validation_set["T_ins"]
)