<img src=../figures/Brown_logo.svg width=50%>

## Data-Driven Design & Analyses of Structures & Materials (3dasm)

## Lecture 19

### Martin van der Schelling | <a href = "mailto: m.p.vanderschelling@tudelft.nl">m.p.vanderschelling@tudelft.nl</a>  | Doctoral Candidate

**What:** A lecture of the "3dasm" course

**Where:** This notebook comes from this [repository](https://github.com/bessagroup/3dasm_course)

**Reference for entire course:** Murphy, Kevin P. *Probabilistic machine learning: an introduction*. MIT press, 2022. Available online [here](https://probml.github.io/pml-book/book1.html)

**How:** We try to follow Murphy's book closely, but the sequence of Chapters and Sections is different. The intention is to use notebooks as an introduction to the topic and Murphy's book as a resource.
* If working offline: Go through this notebook and read the book.
* If attending class in person: listen to me (!) but also go through the notebook in your laptop at the same time. Read the book.
* If attending lectures remotely: listen to me (!) via Zoom and (ideally) use two screens where you have the notebook open in 1 screen and you see the lectures on the other. Read the book.

## **OPTION 1**. Run this notebook **locally in your computer**:
1. Confirm that you have the '3dasm' mamba (or conda) environment (see Lecture 1).
2. Go to the 3dasm_course folder in your computer and pull the last updates of the [repository](https://github.com/bessagroup/3dasm_course):
```
git pull
```
    - Note: if you can't pull the repo due to conflicts (and you can't handle these conflicts), use this command (with **caution**!) and your repo becomes the same as the one online:
        ```
        git reset --hard origin/main
        ```
3. Open command window and load jupyter notebook (it will open in your internet browser):
```
jupyter notebook
```
5. Open notebook of this Lecture and choose the '3dasm' kernel.

## **OPTION 2**. Use **Google's Colab** (no installation required, but times out if idle):

1. go to https://colab.research.google.com
2. login
3. File > Open notebook
4. click on Github (no need to login or authorize anything)
5. paste the git link: https://github.com/bessagroup/3dasm_course
6. click search and then click on the notebook for this Lecture.

In [4]:
# Basic plotting tools needed in Python.

import matplotlib.pyplot as plt # import plotting tools to create figures
import numpy as np # import numpy to handle a lot of things!

%config InlineBackend.figure_format = "retina" # render higher resolution images in the notebook
plt.rcParams["figure.figsize"] = (8,4) # rescale figure size appropriately for slides

# To limit the number of rows to show in a dataframe, for presentation purposes:
import pandas as pd

pd.set_option('display.max_rows', 10)

## Outline for today

* Using `f3dasm` to do hyperparameter search

**Reading material**: This notebook + `f3dasm` documentation page ([link](https://f3dasm.readthedocs.io/en/latest/))

### Installing `f3dasm`

You can install `f3dasm` with pip:

_Make sure you install the correct version (1.5.4)_

In [5]:
try:
    import f3dasm
except ModuleNotFoundError: # If f3dasm is not found in current environment, install the correct version from pip
    %pip install f3dasm==1.5.4 --quiet
    import f3dasm

Optionally, it is also possible to install from source:

```
git clone https://github.com/bessagroup/f3dasm
pip install -e .
```

For more installation instruction you can check the [installation documentation](https://github.com/bessagroup/f3dasm)

## Outline for today

* Using `f3dasm` for hyperparameter optimization/selection

**Reading material**: This notebook + `f3dasm` documentation page ([link](https://f3dasm.readthedocs.io/en/latest/))

### Linear regression model with polynomial features

In this notebook, we will use `f3dasm` to train a Linear Regression model with polynomials of various degrees. While you've done a similar exercise before, you'll find that `f3dasm` simplifies the workflow.

First we create the function to learn and uniformly spaced datapoints:

In [52]:
def f(x):
    return x * np.sin(x)

Data_x = np.linspace(0., 10., 50) # Creating a number line of 50 values from 0 to 10
Data_y = f(Data_x) # Evaluating Data_x on the function f(x)=x*sin(x)

We use the `test_train_split` function from `scikit-learn` to split the data:

In [53]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(Data_x.reshape(-1, 1), Data_y, random_state=123)

Normally what we would do, is creating the `LinearRegression` model with `PolynomialFeatures`, fitting the model on `X_train` and `y_train` and predicting on the testing data `X_test`:

In [65]:
from sklearn.preprocessing import PolynomialFeatures # For Polynomial fit
from sklearn.linear_model import LinearRegression # For Least Squares
from sklearn.pipeline import make_pipeline # to link different objects

# Linear regression model with polynomial features of degree 2
model = make_pipeline(PolynomialFeatures(degree=2), LinearRegression()) 

# Fit the training data
model.fit(X_train, y_train)

# Predicting on the testing data
y_pred = model.predict(X_test)

We can evaluate how good the predictions are by evaluating some performance metrics:

In [62]:
from sklearn.metrics import r2_score, mean_squared_error

r2 = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)

print(f"R^2 score: {r2:.2f}")
print(f"MSE: {mse:.2f}")


R^2 score: 0.93
MSE: 0.50


### Vary the degree of the polynomial

In order to investigate the influence of the degree of the polynomial features, we are going to create a function that accepts the degree as an input, trains the model and returns the performance metrics that we are interested in:

In [70]:
def evaluate_degree_polynomial(degree: int):
    
    # Linear regression model with polynomial features of the given degree
    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression())
    # Fit the training data
    model.fit(X_train, y_train)

    # Predicting on the testing data
    y_pred = model.predict(X_test)
    
    # Calculate the performance metrics
    r2 = r2_score(y_test, y_pred)
    mse = mean_squared_error(y_test, y_pred)
    
    return r2, mse

Calling this function will output the $R^2$ value and MSE for a particular degree:

In [71]:
evaluate_degree_polynomial(3)

(-1.0244557026452443, 15.229091145654158)

### Structuring a hyperparameter search with `f3dasm`

We can investigate different values of our hyperparameter in a structured way with `f3dasm`

First, we create a `Domain()` object with the degree as a discrete parameter:

In [73]:
from f3dasm.design import Domain

domain_degree = Domain()
domain_degree.add_int(name='degree', low=0, high=10)

Then, we can create our design-of-experiments by creating an `ExperimentData` object with different values of degree:

In [84]:
from f3dasm import ExperimentData

experiment_data = ExperimentData(input_data=np.array([1, 3, 5, 7, 11]), domain=domain_degree)
experiment_data

Unnamed: 0_level_0,jobs,input
Unnamed: 0_level_1,Unnamed: 1_level_1,degree
0,open,1
1,open,3
2,open,5
3,open,7
4,open,11


Now we evaluate our designs on the `evaluate_degree_polynomial` function:

In [85]:
experiment_data.evaluate(evaluate_degree_polynomial, output_names=['r2', 'mse'])

In [86]:
experiment_data

Unnamed: 0_level_0,jobs,input,output,output
Unnamed: 0_level_1,Unnamed: 1_level_1,degree,r2,mse
0,finished,1,-0.190814,8.957968
1,finished,3,-1.024456,15.229091
2,finished,5,0.759018,1.812801
3,finished,7,0.98481,0.11427
4,finished,11,0.999999,1.1e-05
