<img src=../figures/Brown_logo.svg width=50%>

## Data-Driven Design & Analyses of Structures & Materials (3dasm)

## Lecture 21

### Martin van der Schelling | <a href = "mailto: m.p.vanderschelling@tudelft.nl">m.p.vanderschelling@tudelft.nl</a>  | Doctoral Candidate

**What:** A lecture of the "3dasm" course

**Where:** This notebook comes from this [repository](https://github.com/bessagroup/3dasm_course)

**Reference for entire course:** Murphy, Kevin P. *Probabilistic machine learning: an introduction*. MIT press, 2022. Available online [here](https://probml.github.io/pml-book/book1.html)

**How:** We try to follow Murphy's book closely, but the sequence of Chapters and Sections is different. The intention is to use notebooks as an introduction to the topic and Murphy's book as a resource.
* If working offline: Go through this notebook and read the book.
* If attending class in person: listen to me (!) but also go through the notebook in your laptop at the same time. Read the book.
* If attending lectures remotely: listen to me (!) via Zoom and (ideally) use two screens where you have the notebook open in 1 screen and you see the lectures on the other. Read the book.

## **OPTION 1**. Run this notebook **locally in your computer**:
1. Confirm that you have the '3dasm' mamba (or conda) environment (see Lecture 1).
2. Go to the 3dasm_course folder in your computer and pull the last updates of the [repository](https://github.com/bessagroup/3dasm_course):
```
git pull
```
    - Note: if you can't pull the repo due to conflicts (and you can't handle these conflicts), use this command (with **caution**!) and your repo becomes the same as the one online:
        ```
        git reset --hard origin/main
        ```
3. Open command window and load jupyter notebook (it will open in your internet browser):
```
jupyter notebook
```
5. Open notebook of this Lecture and choose the '3dasm' kernel.

## **OPTION 2**. Use **Google's Colab** (no installation required, but times out if idle):

1. go to https://colab.research.google.com
2. login
3. File > Open notebook
4. click on Github (no need to login or authorize anything)
5. paste the git link: https://github.com/bessagroup/3dasm_course
6. click search and then click on the notebook for this Lecture.

In [1]:
# Basic plotting tools needed in Python.

import matplotlib.pyplot as plt # import plotting tools to create figures
import numpy as np # import numpy to handle a lot of things!

%config InlineBackend.figure_format = "retina" # render higher resolution images in the notebook
plt.rcParams["figure.figsize"] = (8,4) # rescale figure size appropriately for slides

# To limit the number of rows to show in a dataframe, for presentation purposes:
import pandas as pd

pd.set_option('display.max_rows', 10)

## Outline for today

* Introducing advanced usage of the `f3dasm.datageneration` submodule

**Reading material**: This notebook

### Installing `f3dasm`

You can install `f3dasm` with pip:

_Make sure you install the correct version (1.5.4)_

In [2]:
try:
    import f3dasm
except ModuleNotFoundError: # If f3dasm is not found in current environment, install the correct version from pip
    %pip install f3dasm==1.5.3 --quiet
    import f3dasm

Optionally, it is also possible to install from source:

```
git clone https://github.com/bessagroup/f3dasm
pip install -e .
```

For more installation instruction you can check the [installation documentation](https://github.com/bessagroup/f3dasm)

### `f3dasm`: streamlining your data-driven process!

The **f**ramework for **d**ata-**d**riven **d**esign and **a**nalysis of **s**tructures and **m**aterials (`f3dasm`) aims to generalize this workflow with interfaces (templates of code that you have to fill in)


<center><img src="../figures/f3dasm_overview.svg" title="f3dasm Car stopping distance" width="70%"></center>

### Advanced data generation with `f3dasm`

First, we load the `ExperimentData` with our car velocities from the previous lecture:

In [3]:
from f3dasm import ExperimentData
experimentdata_raw = ExperimentData.from_file('../f3dasm_lecture_1/your_data')

Previously we have seen that we can feed a function to the `ExperimentData` object in order to evaluate each design:

In [4]:
from scipy.stats import norm # import the normal dist, as we learned before!

# Define our car stopping distance function
def compute_distance(x):
    z = norm.rvs(1.5, 0.5, size=1) # randomly draw 1 sample from the normal dist.
    y = z*x + 0.1*x**2 # compute the stopping distance
    return y

#### Creating a `DataGenerator` class

However, we might want to create a custom class that has custom attributes and methods to allow more flexibility over the data generation process. 

In order to do this, you have to create a new class that inherits from the `f3dasm.datageneration.DataGenerator` class:

In [5]:
from f3dasm.datageneration import DataGenerator

In [6]:
class CarStoppingDistance(DataGenerator):
    def __init__(self, mu_z: float, sigma_z: float):
        self.mu_z = mu_z
        self.sigma_z = sigma_z
        
    def execute(self):
        x = self.experiment_sample.input_data['x']
        z = norm.rvs(self.mu_z, self.sigma_z, size=1)
        y = z*x + 0.1*x**2
        self.experiment_sample.store(name='y', object=y)

#### Creating the `execute()` method

- The custom class can have any methods or attributes that you want, but you have to implement the `execute()` method:
- This method will be called for every experiment in the `ExperimentData` object

#### The `ExperimentSample` object
- Each experiment will be converted into an `ExperimentSample` object
- The `ExperimentSample` can be seen as a 'row' in the table of experiments
- It can be retrieved in the `execute()` method as `self.experiment_sample`

In [7]:
experimentdata_raw

Unnamed: 0_level_0,jobs,input,output,output
Unnamed: 0_level_1,Unnamed: 1_level_1,x,y,y_pred
0,finished,3.0,4.630869,-94.300043
1,finished,5.5,11.547948,-69.566919
2,finished,8.0,18.945815,-44.833796
3,finished,10.5,24.441391,-20.100673
4,finished,13.0,59.016256,4.63245
...,...,...,...,...
28,finished,73.0,589.310748,598.227409
29,finished,75.5,666.446058,622.960532
30,finished,78.0,659.64853,647.693655
31,finished,80.5,795.758157,672.426778


In [8]:
my_experimentsample = experimentdata_raw.get_experiment_sample(2)
print(my_experimentsample)

ExperimentSample(2 (finished) :{'x': 8.0} - {'y': 18.94581494134064, 'y_pred': -44.83379623022947})


We can retrieve the current `input_data`, `output_data` and `job_number` from the experiment sample

In [9]:
print(f" Input data: {my_experimentsample.input_data}")
print(f" Output data: {my_experimentsample.output_data}")
print(f" Job number: {my_experimentsample.job_number}")

 Input data: {'x': 8.0}
 Output data: {'y': 18.94581494134064, 'y_pred': -44.83379623022947}
 Job number: 2


Alternatively, the `get()` method can also retrieve individual parameters:

In [10]:
my_experimentsample.get('x')

8.0

Another useful feature is to convert the experiment sample object to a tuple of numpy arrays:

In [11]:
print(my_experimentsample.to_numpy())

(array([8.]), array([ 18.94581494, -44.83379623]))


Storing objects back into the ExperimentData object can be done with the `store()` method

In [12]:
my_experimentsample.store(name='y', object=3.4)
my_experimentsample

ExperimentSample(2 (finished) :{'x': 8.0} - {'y': 3.4, 'y_pred': -44.83379623022947})

#### Running the datagenerator on your experimentdata

Since we already computed the stopping distance with the functional approach, we retrieve only the input data:

In [13]:
my_experimentdata = experimentdata_raw.get_input_data()

In order to evaluate the experiments again, we mark all jobs `'open'`

In [14]:
my_experimentdata.mark_all('open')

In [15]:
my_experimentdata

Unnamed: 0_level_0,jobs,input
Unnamed: 0_level_1,Unnamed: 1_level_1,x
0,open,3.0
1,open,5.5
2,open,8.0
3,open,10.5
4,open,13.0
...,...,...
28,open,73.0
29,open,75.5
30,open,78.0
31,open,80.5


In [16]:
csd = CarStoppingDistance(mu_z=1.5, sigma_z=0.5)

In [17]:
my_experimentdata.evaluate(csd)

In [18]:
my_experimentdata

Unnamed: 0_level_0,jobs,input,output
Unnamed: 0_level_1,Unnamed: 1_level_1,x,y
0,finished,3.0,6.899893
1,finished,5.5,13.424319
2,finished,8.0,17.21366
3,finished,10.5,33.899676
4,finished,13.0,40.123885
...,...,...,...
28,finished,73.0,642.028542
29,finished,75.5,662.317426
30,finished,78.0,753.002283
31,finished,80.5,771.796369


In [19]:
my_experimentdata.domain

Domain(space={'x': _ContinuousParameter(lower_bound=3.0, upper_bound=83.0, log=False)}, output_space={'y': _OutputParameter(to_disk=False)})