In [11]:
from pathlib import Path

***Notebooks are written for Jupyter and might not display well in Gitlab***


# Tutorial material property identification using modelitool 

The aim of this tutorial is to provide a complete workflow from measured data treatment to material physical properties identification using **Modelitool** and **OpenModelica**.

### _The use case is an example and the validity of the physical model or of the scientific approach is not discussed here_

## Use case presentation

The objective of the study is to identify the thermal conductivity of the insulation material "ETICS"
using an experimental setup.

A "real scale" benchmark is used. The Nobatek **BEF** (Banc d'Essais Façade)
 provides experimental cells to test building façade solutions. The heat exchanges
 in a cell are limited on 5 of its faces. The 6th face is dedicated to the tested solution.
  Internal temperature and hydrometry conditions can be controlled or monitored.
External conditions are measured (temperatures and solar radiation).

The experimental setup is presented in the following pictures:

| Figure 1: picture of the benchmark | Figure 2: wall layers from the inside (right) to the outside (left) |
| :---: | :---: |
|<img src="images/etics_pict.png"  height="300"> | <img src="images/etics_sch.png"  height="300"> |

The stars represent the sensors positions. For each position 2 sensors are used providing two measures.

- Measure campaign spans from the 07/06/2017 to the 20/06/2017
- Acquisition timestep is probably 1min

## Identification framework
The following framework is proposed to identify The ETICS thermal conductivity
- Measured data analysis and correction
- Physical model description using Openmodelica
- Sensitivity analysis to identify materials properties influence on the discrepancy
- between model outputs and measured phenomenon
- Etics thermal conductivity identification using optimization algorithm

All these steps are addressed using **Modelitool** library

# Measured data analysis and correction

Measured data are loaded using <code>pandas</code> python library

In [None]:
import pandas as pd

In [None]:
raw_data = pd.read_csv(
    Path(r"ressources/tuto_data.csv"),
    sep=",",
    index_col=0,
    parse_dates=True
)

Plotting the raw temperatures gives precious information on the dataset

In [None]:
# raw_data[raw_data.columns[:3]].plot()
raw_data['T_ext'].plot()

At first sight, the dataset looks ok. It doesn't mean it's clean. Missing values or incorrect variation are not always visible on a graphe.

In order to be fed to the physical (and to get results as accurate as possible) data must be processed and corrected. The following steps are propposed.

#### 1- Identify anomalies:
- __upper__ and __lower__ values as boundaries. Measured values outside the interval are considered wrong
- upper and lower "__rates__". Measured value increasing beyond or below a defined threshold are considered wrong

These boundaries are set depending on the measured physical phenomenon.
Of course power and temperature will be configured differently.

#### 2- Missing data interpolation
Physical model don't like missing values, thus for each sensor we provide a method
to interpolate wrong data. Here we choose a linear interpolation between missing point.
 Errors at the beginning or at the end of the time series are filled with first or last correct value

#### 3- Reducing dataset size
Finally, a 1-minute acquisition timestep provides a heavy dataset.
Moreover, a small timestep is not required to identify physical phenomenon.
It is necessary to provide an aggregation method to _resample_ the dataset

We use the class <code>MeasuredDats</code> from **Modelitool** to do so.


In [None]:
from modelitool.measure import MeasuredDats

In [None]:
my_data = MeasuredDats(
    data = raw_data,
    data_type_dict = {
        "temperatures": [
            'T_Wall_Ins_1', 'T_Wall_Ins_2', 'T_Ins_Ins_1', 'T_Ins_Ins_2',
            'T_Ins_Coat_1', 'T_Ins_Coat_2', 'T_int_1', 'T_int_2', 'T_ext', 'T_garde'
        ],
        "illuminance": ["Lux_CW"],
        "radiation": ["Sol_rad"]
    },
    corr_dict = {
        "temperatures": {
            "minmax": {
                "upper": 100,
                "lower": -20
            },
            "derivative": {
                "upper_rate": 2,
                "lower_rate": 0,
            },
            "fill_nan": [
                "linear_interpolation",
                "bfill",
                "ffill"
            ],
            "resample": 'mean',
        },
        "illuminance": {
            "minmax": {
                "upper": 1000,
                "lower": 0,
            },
            "derivative": {
                "upper_rate": 10E8, # Specifying high value is a way to discard correction
                "lower_rate": -1, # Specifying negative value is a way to discard correction
            },
            "fill_nan": [
                "linear_interpolation",
                "bfill",
                "ffill"
            ],
            "resample": 'mean',
        },
        "radiation": {
            "minmax": {
                "upper": 1000,
                "lower": 0,
            },
            "derivative": {
                "upper_rate": 10E8, # Specifying high value is a way to discard correction
                "lower_rate": -1, # Specifying negative value is a way to discard correction
            },
            "fill_nan": [
                "linear_interpolation",
                "bfill",
                "ffill"
            ],
            "resample": 'mean',
        }
    }
)

The <code>plot</code> method can be used to plot the data.

Provide a <code>list</code> to the argument <code>cols</code> to specify the entry you want to plot.

A new y axis will be created for each data type.

In [None]:
my_data.plot(
    cols=['T_Wall_Ins_1', 'Sol_rad', 'Lux_CW'],
    begin='2018-04-15',
    end='2018-04-18',
    title='Plot uncorrected data',
)

Plotted data are the <code>corrected_data</code>. <code>plot_raw=True</code> to display raw data. This is useful to assess the impact of the correction and of the resampling methods

For now no corrections have been applied, so <code>corrected_data</code> is equal to <code>data</code>


The object <code>my_data</code> contains the original dataset and methode configuration for the correction.

The <code>correction_journal</code> properties holds information on the data.

Let's have a look

In [None]:
my_data.correction_journal

Before correction, the journal shows that ~2% of the data are missing for the temperature sensor and ~3% for external temperature, "garde" temperature and solar radiation. it correspond to data having a timstamp, but with missing value. In this specific case, this is not related to sensors errors. 2 distinct acquisition device were used to perform the measurement. The merging of the data from the two devices created troubles in timestamp "alignement". Also measurement stopped a bit earlier for the second device.  

#### 1- Identify anomalies:
Now let's apply the remove anomalies method to delete invalid data according to the specifications

In [None]:
my_data.remove_anomalies()

Let's have a look at the <code>correction_journal</code>.
Not all of it, as it stores every correction "effect". It will get big rapidly.
First we want to see the new percentage of missing data after correction

In [None]:
my_data.correction_journal["remove_anomalies"]["missing_values"]["Percent_of_missing"]

It looks like the applied corrections removed several data.
For example, the sensors measuring the cell internal temperature have now up to __4.5%__ of missing data.

Few corrections were applied to the outside temperature sensor.

The journal of correction holds further information on the gaps of data.
For example if we want to know more about the missing values of <code>T_int_1</code>

In [None]:
my_data.correction_journal["remove_anomalies"]["gaps_stats"]["T_int_1"]

- There are 11233 gaps.
- The size of 75% of these gaps do not exceed 1 timestep (~1min)
- The biggest is 1h

It is also possible to "aggregate" the gaps in to know when at least one of the data is missing

In [None]:
my_data.correction_journal["remove_anomalies"]["gaps_stats"]["combination"]

- There are 28066 gaps (~10% of the dataset).
- The size of 75% of these gaps do not exceed 1 timestep (~1min)
- The biggest gap is 1h

There is not a lot of difference. It looks like the values are missing at the same timestamps.

This is a good news, it means that there are a lot of periods with all data available

The plotting method <code>plot_gaps<code> can be used to visualize where the gap happened.

This dataset holds a lot of values, sol we just plot the entry <code>'T_int_1'</code> that is supposed to have the more gaps

We are interested in gaps lasting more than 15 minutes.

In [None]:
import datetime as dt
my_data.plot_gaps(cols=['T_int_1'], gaps_timestep=dt.timedelta(minutes=15))

There seem to be only 1 gap greater than 15minutes, and it is a bit hard to see. You can zoom in to the 2018-03-25, you will notice a gap between ~02:00 and ~3:00.
Yhis is the gap we saw in the correction journal.

We may want to access the new corrected data set to perform further investigations. It is available at <code>corrected_data</code> in <code>MeasuredDats</code> object.

_Note that the original data set is left untouched in <code>data</code>_

#### 2- Missing data interpolation
Fill the missing data using specified interpolation and <code>fill_nan()</code> methods

In [None]:
my_data.fill_nan()

Once again lets ahe a look to the <code>correction_journal</code>

In [None]:
my_data.correction_journal["fill_nan"]["missing_values"]["Percent_of_missing"]

Wow, perfect dataset !

Be careful 0 data missing doesn't mean 0 problem.
 If you had a crappy dataset, it is still crappy.
 You just filled the gaps by copying values or drawing lines between (_what seems to be_) valid points

#### 3- Reducing dataset size
As we said earlier 1min timestep is too small.
Regarding the physical phenomenon involved here, we could say that 5min is ok.

So lets resample the dataset to this value

In [None]:
my_data.resample("5T")

Let's have a look at our corrected data versus the raw data.

We select a period around the gap we identified (from the 2018-03-24 to the 2018-03-26)

In [None]:
my_data.plot(
    title="Raw data versus corrected data",
    cols=['T_int_1'],
    begin='2018-03-25 00:00:00',
    end='2018-03-25 05:00:00',
    plot_raw=True)

On the above graph you can see the effects of mean resampling, that diminishes the number of points and smooths out the data.

The gap have been filled using linear interpolation at the required timestep.

It is important to compare your data before and after applying the correction methods. For example, resampling with a large timestep can lead to a loss of information

And that's it, we have fairly clean data to work with.

We could export them directly into a **combiTimetable** file, but given the big gaps,
 and the measure campaign spanning over several months,
  we probably want to identify an appropriate period for the identification.

In our case, given the problem, we decide that this period shall be 
- short: not more than 7 days
- with high temperature difference between <code>T_int</code> and <code>T_Ins_Coat</code>, that would maximise the heta transfers in the wall

The following figure is designed to help us select the appropriate period. <code>Lux_CW</code> and <code>Sol_rad</code> are not shown for clarity purpose

In [None]:
# Figure code
import plotly.graph_objects as go

fig = go.Figure()

for temp in my_data.corrected_data.columns[:-2]:
    fig.add_trace(go.Scatter(
        x=my_data.corrected_data.index,
        y=my_data.corrected_data[temp],
        name = temp
    ))

# Edit the layout
fig.update_layout(
    title='Measured temperatures [°C]',
    xaxis_title='Time',
    yaxis_title='Temperature [°C]')


fig.show()

We decide to select 2 consecutive days for the identification (22/03 and 23/03)
 and 2 other dys for the validation (26/03 and 28/03)

During the first day the coating temperature is rising up to 50 °C while cell
 temperature is controlled at ~26 °C and external temperature does not rise above 13 °C.
 It is characteristic of a highly insulated day, with heat transfers due to interior/exterior temperature drop .

During the second day, coating temperature only reach ~21 °C (7 °C above external temperature).
 Which is characteristic of a fairly cloudy day. _Note that we could check solar radiation on another graph_

**Regarding measure confidence:**
- There is a 1 °C gap between the measurement of indoor air temperature.
- The gap between temperatures measured at the interface of the two layer of insulation reaches 4 °C.

This problem is not observed with the remaining sensors.
 This discrepancy is not negligible and shall be further investigated.
 However, in the scope of the tutorial, we will just take the mean of the sensor values.

In [None]:
# Pandas lines to combine sensors measures
combined_measure = my_data.corrected_data[['T_ext', 'Sol_rad']].copy()
combined_measure['T_Wall_Ins'] = my_data.corrected_data[
    ['T_Wall_Ins_1', 'T_Wall_Ins_2']].mean(axis=1)
combined_measure['T_Ins_Ins'] = my_data.corrected_data[
    ['T_Ins_Ins_1', 'T_Ins_Ins_2']].mean(axis=1)
combined_measure['T_Ins_Coat'] = my_data.corrected_data[
    ['T_Ins_Coat_1', 'T_Ins_Coat_2']].mean(axis=1)
combined_measure['T_int'] = my_data.corrected_data[
    ['T_int_1', 'T_int_2']].mean(axis=1)

For future use in next chapter, we save the corrected measure during
the period of interest, so we can load it in the next chapters

In [None]:
combined_measure.loc["2018-03-22":"2018-03-28"].to_csv(
    "ressources/study_df.csv")

Part of the measurements will be used as boundary conditions for the simulation.
__Modelitool__ allows you to pass <code>DataFrame</code> to the <code>Simulator</code> that handles OpenModelica models.
This way you don't have to worry about generating boundary text file.


But if you want to generate your own boundary file, (to simulate using OpemModelica Editor for example),
you can use <code>combitabconvert.df_to_combitimetable()</code> to generate the boundary
 condition file.

In [None]:
from modelitool.combitabconvert import df_to_combitimetable


In [None]:
df_to_combitimetable(
    df=combined_measure.loc["2018-03-22":"2018-03-23"],
    filename="ressources/boundary_temp.txt"
)

It is now time to build the physical model using modelica !!!

**_Note : Note that you will have to manually configure the file path in
the <code>combiTimetable</code> of your modelica model_**
