# Data-Driven Design and Analysis of Structures and Materials

### Instructor: Miguel A. Bessa

## Homework 7

Coding exercises to explore the data-driven process and [`f3dasm`](https://f3dasm.readthedocs.io/en/latest/) package.

**General instructions**:

- Read the questions and answer in the cells under the "PUT YOUR CODE IN THE CELL BELOW" message.
- Work through the notebook and make sure you fill in any place that says `YOUR CODE HERE` or `YOUR ANSWER HERE`. You should start by removing the `'raise NotImplementedError()'` code in those cells.
- After "END OF YOUR CODE" , there is a cell that contains simple tests (with `assert` statements) to see if you did the exercises correctly. Not all exercises have tests! If you run the cell containing the tests and no error is given, you have succesfully solved the exercise!
- Make sure you have the right version of `f3dasm` (1.5.4). The cell below checks this:

- **ONLY WORK ON THE EXERCISE IN A JUPYTER NOTEBOOK ENVIRONMENT**

> The homework assignments are generated and automatically graded by the `nbgrader` extension. If you open and save the notebook in Google Colab, metadata from Colab will be added, and the `nbgrader` metadata will be altered. As a result, `nbgrader` will be unable to automatically grade your homework. Therefore, please work on the notebook in Jupyter Notebook.

- **DO NOT ADD OR REMOVE CELLS IN THE NOTEBOOK**

> Most cells containing tests are set to read-only, but VS Code can bypass this restriction. Modifying or removing cells in the notebook may disrupt the `nbgrader` system, preventing automatic grading of your homework.

**Instructions for handing in the homework**

- Upload the Jupyter Notebook (`.ipynb file`) to canvas

If there are any questions about the homework, send an email to Miguel or Martin (m.p.vanderschelling@tudelft.nl)

The cell below checks if you have the right version of `f3dasm` (1.5.4):

In [None]:
import f3dasm
import IPython

# Check if the f3dasm version is correct
assert f3dasm.__version__ == '1.5.4', "Your version of f3dasm is incorrect, please update it."

# Check if the IPython version is at least 3
assert IPython.version_info[0] >= 3, "Your version of IPython is too old, please update it."

# If no assert statements are triggered, print a succes message
print("Your environment is set up correctly :)")

You can put your name in the cell below:

In [None]:
NAME = ""

---

### Homework 7

In this homework you will explore the basic usage of the `f3dasm` package:

At the end of this homework you will know
- how to do Design of Experiments, including creating the `Domain` object, and `ExperimentData` object from a numpy array and a pandas DataFrame.
- how to define the Data Generation module, including creating your custom evaluation function.
- how to save your `ExperimentData` and later retrieve it from disk.
- how to use the package to do simple model selection.

In [None]:
# Import some packages we might need later
import numpy as np

---
### Exercise 1

1.1 Consider the function $f(x) = x  \; sin(x)$ in the domain $x \in [0, 10]$

Do the Design of Experiments manually by creating a `f3dasm.ExperimentData` dataset with $50$ input points that are equally spaced within those bounds.

You are going to do this step-by-step!

---

- Create a `Domain` object called `my_domain` and add the input parameter $x$ to it. Make sure the bounds of the
variable are between $0.0$ and $10.0$.

PUT YOUR CODE IN THE CELL BELOW

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

END OF YOUR CODE

In [None]:
# This cell checks if you did the exercise correctly
assert isinstance(my_domain, Domain), "my_domain is not an instance of the Domain class!"
assert 'x' in my_domain.space, "There is no parameter named 'x' in your domain!"
assert my_domain.space['x'].lower_bound == 0.0, "The lower bound of the paramater x is not 0.0"
assert my_domain.space['x'].upper_bound == 10.0, "The upper bound of the parameter x is not 10.0"

---

- Create a `numpy` vector `x_data` of $50$ points that are equally spaced between $0$ and $10$.

PUT YOUR CODE IN THE CELL BELOW

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

END OF YOUR CODE

In [None]:
# This cell checks if you did the exercise correctly
assert np.isclose(x_data[3], 0.61224), "The value of x_data in position 3 is not correct!"
assert np.isclose(x_data[6], 1.22448), "The value of x_data in position 6 is not correct!"
assert np.isclose(x_data[-1], 10.0), "The value of x_data in the last position is not correct!"

---

- Create a new `ExperimentData` object called `my_experimentdata` with the input data `x_data`
and the domain object created in the first step.

PUT YOUR CODE IN THE CELL BELOW

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

END OF YOUR CODE

---

- Create a function $f(x)$ that computes $x \; sin(x)$

PUT YOUR CODE IN THE CELL BELOW

In [None]:
def f(x: float) -> float:
    # YOUR CODE HERE
    raise NotImplementedError()
    return y

END OF YOUR CODE

In [None]:
# This cell checks if you did the exercise correctly
assert np.isclose(f(1), 0.84147), "When prompting the function with input x=1, the output is not correct!"
assert np.isclose(f(2), 1.81859), "When prompting the function with input x=2, the output is not correct!"

---

- Evaluate the input data of the `f3dasm.ExperimentData` object with function $f(x)$ and name
the output `y`

PUT YOUR CODE IN THE CELL BELOW

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

END OF YOUR CODE

In [None]:
# This cell checks if you did the exercise correctly
assert all(np.isclose(my_experimentdata.to_numpy()[1].ravel(), f(x_data))), "The output of the experimentdata is not correct!"

---

1.2 Plot the function from the 50 points that you defined, label the x-axis as "x" and the y axis as "y",
and include a title "Exercise 1" in the plot.

PUT YOUR CODE IN THE CELL BELOW

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

END OF YOUR CODE

---

1.3. Make a new folder called "HW7_exercise1" and save the `ExperimentData` object to this folder.

> Note: Make sure you put a relative path as the storing location!
>
> So: `./HW7_exercise` = good
>
> But: `/home/username/Documents/GitHub/3dasm_course/Assignments/your_Assignments/HW7_exercise1` = bad

PUT YOUR CODE IN THE CELL BELOW

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

END OF YOUR CODE

---

1.4. Load the ExperimentData object you saved previously into a variable called "`my_loaded_experimentdata`".
Print this `ExperimentData` object and check that it is the same one you saved.

PUT YOUR CODE IN THE CELL BELOW

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

END OF YOUR CODE

In [None]:
# This cell checks if you did the exercise correctly
assert my_loaded_experimentdata == my_experimentdata, "The experimentdata in memory and the reloaded experimentdata are not the same!" 

---
##  Exercise 2

In this exercise, you will use `f3dasm` to train different Gaussian Process Regressor (GPR) models with different kernels. While you've done a similar exercise before, you'll find that `f3dasm` simplifies the workflow, making the process more efficient.

2.1 Convert `my_experimentdata` from the previous exercise into two `numpy` arrays, `X` and `Y`, where `X` contains the input data and `Y` contains the output data.

PUT YOUR CODE IN THE CELL BELOW

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

END OF YOUR CODE

In [None]:
# This cell checks if you did the exercise correctly
assert isinstance(X, np.ndarray), "X is not a numpy array"
assert isinstance(Y, np.ndarray), "Y is not a numpy array"
assert X.shape == (50, 1), "The shape of X is not correct"

---

Let's add Gaussian noise ($z$) to the output $y$, such that:

$$
z \sim \mathcal{N}(0, \sigma_z^2), \quad \sigma_z \sim \mathcal{U}(0.5, 1.5)
$$




In [None]:
rng = np.random.default_rng(seed=123) # Create a new random number generator
std_devs = rng.uniform(0.5, 1.5, size=Y.shape) # Generaterandom standard deviation between 0.5 and 1.5
z = rng.normal(0.0, std_devs) # Sample from a Gaussian distribution with the standard deviations
Y += z # Add Gaussian noise to the data output

---

2.2 With the `train_test_split` function of `scikit-learn`, split the dataset into a train and test set, with ratio 80/20 and use $123$ as the random seed:

PUT YOUR CODE IN THE CELL BELOW

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

END OF YOUR CODE

In [None]:
# This cell checks if you did the exercise correctly
assert (X_train.shape == y_train.shape == (40, 1)), "The shape of X_train or y_trainis not correct!"
assert (X_test.shape == y_test.shape == (10, 1)), "The shape of X_test or y_test is not correct"

2.3 Create a dictionary called `kernels` where the keys are `'RBF'`, `'Matern'` and `'ExpSineSquared'` and the values are the respective kernel functions from [`sklearn.gaussian_process.kernels`](https://scikit-learn.org/1.5/api/sklearn.gaussian_process.html#module-sklearn.gaussian_process.kernels).

PUT YOUR CODE IN THE CELL BELOW

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

END OF YOUR CODE

In [None]:
# This cell checks if you did the exercise correctly
from sklearn.gaussian_process.kernels import Kernel
assert isinstance(kernels, dict)
assert all(kernel_name in kernels for kernel_name in ['RBF', 'Matern', 'ExpSineSquared'])
assert all(isinstance(kernel, Kernel) for kernel in kernels.values()), 'test'

2.4 Create a new domain object named `domain_kernel` and add a categorical parameter called `kernel`. Add a list of the dictionary keys, `list(kernels.keys())`, as the available categories.

PUT YOUR CODE IN THE CELL BELOW

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

END OF YOUR CODE

In [None]:
# This cell checks if you did the exercise correctly
assert isinstance(domain_kernel, Domain), "domain_kernel is not Domain instance!"
assert 'kernel' in domain_kernel.space, "there is not parameter named 'kernel' in the domain!"

2.5 Create a function `evaluate_regressor` that inputs the `kernel` variable (the name of the kernel).
- The function creates a Gaussian process regressor with the kernel given by the input of the function
- The model is trained on the training data and predicts the testing data
- The function returns the $R^2$ and MSE on the test data

PUT YOUR CODE IN THE CELL BELOW

In [None]:
# You might want to import some functions ..
# YOUR CODE HERE
raise NotImplementedError()

def evaluate_regressor(kernel: str):
    # YOUR CODE HERE
    raise NotImplementedError()
    return r2, mse

END OF YOUR CODE

2.5 Create an `ExperimentData` object called `experimentdata_gpr` and add experiments `Data_kernels` (this array is given below) . Evaluate the experiments on the `evaluate_regressor` function.

PUT YOUR CODE IN THE CELL BELOW

In [None]:
# The array of kernel names
Data_kernels = np.array(['RBF', 'Matern', 'ExpSineSquared'])

# YOUR CODE HERE
raise NotImplementedError()

END OF YOUR CODE

In [None]:
assert np.isclose(experimentdata_gpr.to_pandas()[1].astype(float).to_numpy(),
                  np.array([[-3.39304466, 38.55734305],
                           [-0.06724966,  9.36715066],
                           [ 0.6915303 ,  2.70740975]])
                 ).all(), "The values of R2 and MSE are not correct"

End of the homework!

---