# ENGN2350 Data-Driven Design and Analysis of Structures and Materials

_Homeworks for fall semester 2025-2026_

Coding exercises to explore the [`f3dasm`](https://f3dasm.readthedocs.io/en/latest/) package.

**General instructions**:

- Read the questions and answer in the cells under the "PUT YOUR CODE IN THE CELL BELOW" message.
- Work through the notebook and make sure you fill in any place that says `YOUR CODE HERE` or `YOUR ANSWER HERE`. You can remove the `'raise NotImplementedError()'` code.
- After "END OF YOUR CODE" , there is a cell that contains simple tests (with `assert` statements) to see if you did the exercises correctly. Not all exercises have tests! If you run the cell containing the tests and no error is given, you have succesfully solved the exercise!
- Make sure you have the right version of `f3dasm` (2.1.0).

> You can check your `f3dasm` version by running `pip show f3dasm`

- **ONLY WORK ON THE EXERCISE IN A JUPYTER NOTEBOOK ENVIRONMENT**

> The homework assignments are generated and automatically graded by the `nbgrader` extension. If you open and save the notebook in Google Colab, metadata from Colab will be added, and the `nbgrader` metadata will be altered. As a result, `nbgrader` will be unable to automatically grade your homework. Therefore, we kindly ask students to only work on the notebook in Jupyter Notebook.

- **DO NOT ADD OR REMOVE CELLS IN THE NOTEBOOK**

> Most cells containing tests are set to read-only, but VS Code can bypass this restriction. Modifying or removing cells in the notebook may disrupt the `nbgrader` system, preventing automatic grading of your homework.

**Instructions for handing in the homework**

- Upload the Jupyter Notebook (`.ipynb file`) to Canvas

If there are any questions about the homework, send an email to Samik (samik_mukhopadhyay@brown.edu) or Elvis (elvis_alexander_aguero_vera@brown.edu)

**Grading**

- In each homework, you can obtain a maximum of 20 points
- Next to each subquestion, the maximum amount of obtainable points is listed

Good luck!

You can put your name in the cell below:

In [None]:
NAME = ""

---

## Homework 8

In this homework you will explore advanced usage of the `f3dasm` package

At the end of this homework you will know
- how to use the built-in defaults of `f3dasm`
- how to evaluate different sampling techniques in a structured way with `f3dasm`
- how to use `f3dasm` for hyperparameter analysis for classification methods


In [None]:
# Import some packages we might need later
import numpy as np
import f3dasm

---
## Exercise 1

We will explore the default built-in implementation that are available in `f3dasm`.
You can learn more about those in the [documentation page](https://f3dasm.readthedocs.io/en/latest/rst_doc_files/defaults.html).

---

1.1 _(1 point)_ Create Domain with one array-parameter named `'x'` with bounds $-32.768$ and $32.768$ and shape `(3,)` and name the domain `domain_3d`:

PUT YOUR CODE IN THE CELL BELOW

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

END OF YOUR CODE

In [None]:
# This cell checks if you did the exercise correctly
assert isinstance(domain_3d, Domain)
assert 'x' in domain_3d.array.input_names
assert domain_3d.input_space['x'].shape == (3,)
assert np.allclose(domain_3d.input_space['x'].lower_bound, -32.768)
assert np.allclose(domain_3d.input_space['x'].upper_bound, 32.768)

---

1.2 _(1 point)_ Use the built-in latin hypercube sampler to create 20 design points and name the result `experimentdata_3d`. Set the seed to $123$

PUT YOUR CODE IN THE CELL BELOW

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

END OF YOUR CODE

---

1.3 _(2 points)_ Plot all the points in 2D scatter plots for all possible combinations of the input features, i.e. every pair of features as $x$ and $y$ of the scatter plots. Observe the sampling points and how they are distributed in the domain.

PUT YOUR CODE IN THE CELL BELOW

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

END OF YOUR CODE

---

1.4 _(1 point)_ Evaluate all the designs with the built-in ['Ackley'](https://www.sfu.ca/~ssurjano/ackley.html) function

PUT YOUR CODE IN THE CELL BELOW

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

END OF YOUR CODE

---

1.5 _(1 points)_ Optimize this experiment with the built-in Nelder-Mead optimizer for $50$ iterations:

PUT YOUR CODE IN THE CELL BELOW

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

END OF YOUR CODE

---
### Exercise 2

In this next exercise, we are going to train a gaussian process regressor with data from built-in benchmark functions.
We are going to vary the benchmark function used, the sampling technique and the number of training points

Before we start exploring, we want to create a dataset with the built-in benchmark functions.

We would like to:
- .. create a domain with an array parameter named `'x'` with bounds $x \in [-5, 5]$ ;
- .. sample $N$ points with a given sampler, using $123$ as the seed for the random number generator;
- .. evaluate the samples on a given benchmark function and use `'y'` as the output name

2.1 _(2 points)_ Make a function `create_data` that inputs the type of built-in sampler, the number of sampling points and the built-in benchmark function be evaluated. The function returns a `f3dasm.ExperimentData` object with those samples, evaluated by the benchmark function

PUT YOUR CODE IN THE CELL BELOW

In [None]:
# YOUR CODE HERE
raise NotImplementedError()
def create_data(n_samples: int, sampler: str, benchmark_fn: str) -> f3dasm.ExperimentData:
    # YOUR CODE HERE
    raise NotImplementedError()
    return experiment_data

END OF YOUR CODE

In [None]:
# This cell checks if you did the exercise correctly 
import numpy as np
data_rastrigin = create_data(n_samples=300, sampler='random', benchmark_fn='rastrigin')
data_ackley = create_data(n_samples=300, sampler='latin', benchmark_fn='ackley')

---
2.2 _(1 points)_ Next, we are going to create a helper function that trains the model and returns error metrics in order to evaluate the predictions


Create a function `evaluate_regressor` that requires the training data, `X_train` and `y_train`, the testing data `X_test` and `y_test` and the scikit-learn model `model`

The function should :

- standard scale both the input and the output data first
- use the `model.fit` function to fit `X_train` and `y_train`
- predict on `X_test` to create `y_pred`
- calculate the $R^2$ and MSE on the testing data and return the two metrics as a tuple of two floats

*Hint*: if you want to test your code, you can export the resulting `ExperimentData` from a `create_data` function call to a `numpy` array. However, the resulting `numpy` arrays are actually a nested list of numpy arrays. In order to stack the arrays, you can use the following code:

```python
X = np.stack(X.flatten())
```

---

PUT YOUR CODE IN THE CELL BELOW

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

def evaluate_regressor(X_train: np.ndarray, y_train: np.ndarray, X_test: np.ndarray, y_test: np.ndarray, model):
    # YOUR CODE HERE
    raise NotImplementedError()
    return r2, mse

END OF YOUR CODE

In [None]:
# This cell checks if you did the exercise correctly 
import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.model_selection import train_test_split
X, y = data_ackley.to_numpy()

# X and y are nested list of arrays. The following operations make a single array for all samples and reshapes
X = np.stack(X.flatten())
y = np.stack(y.flatten())


X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=123)
model = GaussianProcessRegressor(random_state=123)
r2, mse = evaluate_regressor(X_train, y_train, X_test, y_test, model)

---
2.3 _(4 points)_ Now we are putting everything together in a custom `DataGenerator` class:

1. Create a new `Domain` object with 3 parameters:

- `n_samples`, the number of training samples, as a categorical variable with possible values $12$, $25$ and $150$
- `sampler`, the chosen built-in sampler, as a categorical variable with possible values `'random'`, `'sobol'` and `'latin'`
- `benchmark_fn`, the chosen built-in benchmark function, as a categorical variable with possible values `'Styblinski Tang'`, `'Branin'` and `'Sphere'`

2. Create an `ExperimentData` object from the domain and use the built-in `'grid'` sampler to create all possible combinations of `n_samples`, `sampler` and `benchmark_fn`
> Hint: you should have an ExperimentData object with 27 different experiments

3. Create a `EvaluateRegressor` class that inherits from `f3dasm.datageneration.DataGenerator`
- Inside the `execute()` method:
  - extract the values `n_samples` (type=`int`), `sampler` (type=`str`) and `benchmark_fn` (type=`str`) from the `experiment_sample` argument
  - Create `X_train` and `y_train` by calling the `create_data` function with the extracted values created earlier
  - Create `X_test` and `y_test` by calling the `create_data` function, but set `n_samples=100` and `sampler='random'`
  - Create a GaussianProcessRegressor from `sklearn` with random seed $123$
  - Call the `evaluate_regressor` function with the training data, testing data and GPR model
  - Store the $R^2$ and the MSE values in the `experiment_sample` with names `r2` and `mse`

4. Create an instance of the `EvaluateRegressor` class and `evaluate` the `ExperimentData` object.

---

PUT YOUR CODE IN THE CELL BELOW

In [None]:
# You might want to import some classes/functions
# YOUR CODE HERE
raise NotImplementedError()

# YOUR CODE HERE
raise NotImplementedError()

END OF YOUR CODE

---
1.3 _(1 point)_ Reflect on your findings: Can you explain why certain the performance differs when changing the number of training points and the sampling technique?

---

PUT YOUR WRITTEN ANSWER IN THE CELL BELOW

YOUR ANSWER HERE

END OF YOUR WRITTEN ANSWER

---
### Exercise 3

We can also do hyperparameter investigation effectively with `f3dasm`.

For the following exercise, we are going to investigate the influence of the regularization parameter $C$ of the [Suppor Vector Classifier](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) on the [iris dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html#sklearn.datasets.load_iris).

3.1 _(1 point)_ Load the iris dataset from `scikit-learn`, select only the first two feature ('sepal length (cm)' and 'sepal width (cm)') and split the data in a 75/25 test and training set. Set the seed for the random number generator to $123$. 

---

PUT YOUR CODE IN THE CELL BELOW

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

END OF YOUR CODE

In [None]:
# This cell checks if you did the exercise correctly
assert (X_train[0] == [5.4, 3.9]).all()

---

3.2 _(1 point)_ Create a domain object called `domain` with continuous parameter `C` and construct an `ExperimentData` object called `expeirment_data` with `Data_c` (given below)

---

PUT YOUR CODE IN THE CELL BELOW

In [None]:
Data_c = np.logspace(-2.3, 2.5, 40)
# YOUR CODE HERE
raise NotImplementedError()

END OF YOUR CODE

In [None]:
# This cell checks if you did the exercise correctly
assert isinstance(domain, Domain)
assert 'C' in domain.input_names
assert domain.input_space['C']
assert isinstance(experiment_data, ExperimentData)

---

3.3 _(2 points)_ Fill in the `evaluate_classifier` method and `execute` function of the `EvaluateClassifier` class below:
- The `evaluate_classifier` takes in the model and fits it to `self.X_train`, `self.y_train`, predicts the labels of `self.X_test` and returns the accuracy of the predictions
- The `execute` function extract the `C` parameter from `self.experiment_sample`, creates the Support Vector Classifier with the extracted $C$ value and `random_state=123`, calls the `evaluate_classifier` method and stores the accuracy value back to the `self.experiment_sample`.
  

---

PUT YOUR CODE IN THE CELL BELOW

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

class EvaluateClassifier(f3dasm.datageneration.DataGenerator):
    def __init__(self, X_train, y_train, X_test, y_test):
        self.X_train = X_train
        self.y_train = y_train
        self.X_test = X_test
        self.y_test = y_test
        
    def evaluate_classifier(self, model) -> float:
        # YOUR CODE HERE
        raise NotImplementedError()
        return accuracy
    
    def execute(self, experiment_sample) -> None:
        # YOUR CODE HERE
        raise NotImplementedError()
        return experiment_sample

END OF YOUR CODE

---

3.4 _(2 points)_ Evaluate all the experiments and plot the accuracy w.r.t. the regularization parameter $C$. Set the scale of the x-axis to be logarithmic

---

PUT YOUR CODE IN THE CELL BELOW

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

END OF YOUR CODE

---

End of the homework!

---