# Homework 4

For this assignment, you will be developing an artificial neural network to classify data given in the __[Dry Beans Data Set](https://archive.ics.uci.edu/ml/datasets/Dry+Bean+Dataset#)__. This data set was obtained as a part of a research study by Selcuk University, Turkey, in which a computer vision system was developed to distinguish seven different registered varieties of dry beans with similar features. More details on the study can be found in the following __[research paper](https://www.sciencedirect.com/science/article/pii/S0168169919311573)__. <br>
### **Make sure to use the lecture notebook on an introduction to keras and cross validation located [here](https://colab.research.google.com/drive/1ksEGL7SJ_wutCIyPYx7Loe5EPdOij6dJ?usp=sharing) and [here](https://colab.research.google.com/drive/1C9Mwf1J2ril1Q4l6n2BjQMb8YaFySG5_?usp=sharing)**.

## About the Data Set
Seven different types of dry beans were used in a study in Selcuk University, Turkey, taking into account the features such as form, shape, type, and structure by the market situation. A computer vision system was developed to distinguish seven different registered varieties of dry beans with similar features in order to obtain uniform seed classification. For the **classification** model, images of 13611 grains of 7 different registered dry beans were taken with a high-resolution camera. Bean images obtained by computer vision system were subjected to segmentation and feature extraction stages, and a total of 16 features - 12 dimensions and 4 shape forms - were obtained from the grains.

Number of Instances (records in the data set): __13611__

Number of Attributes (fields within each record, including the class): __17__

### Data Set Attribute Information:

1. __Area (A)__ : The area of a bean zone and the number of pixels within its boundaries.
2. __Perimeter (P)__ : Bean circumference is defined as the length of its border.
3. __Major axis length (L)__ : The distance between the ends of the longest line that can be drawn from a bean.
4. __Minor axis length (l)__ : The longest line that can be drawn from the bean while standing perpendicular to the main axis.
5. __Aspect ratio (K)__ : Defines the relationship between L and l.
6. __Eccentricity (Ec)__ : Eccentricity of the ellipse having the same moments as the region.
7. __Convex area (C)__ : Number of pixels in the smallest convex polygon that can contain the area of a bean seed.
8. __Equivalent diameter (Ed)__ : The diameter of a circle having the same area as a bean seed area.
9. __Extent (Ex)__ : The ratio of the pixels in the bounding box to the bean area.
10. __Solidity (S)__ : Also known as convexity. The ratio of the pixels in the convex shell to those found in beans.
11. __Roundness (R)__ : Calculated with the following formula: (4piA)/(P^2)
12. __Compactness (CO)__ : Measures the roundness of an object: Ed/L
13. __ShapeFactor1 (SF1)__
14. __ShapeFactor2 (SF2)__
15. __ShapeFactor3 (SF3)__
16. __ShapeFactor4 (SF4)__

17. __Classes : *Seker, Barbunya, Bombay, Cali, Dermosan, Horoz, Sira*__

### Libraries that can be used :
- NumPy, SciPy, Pandas, Sci-Kit Learn, TensorFlow, Keras. You may also use PyTorch (though support may be limited)
- Any other library used during the lectures and discussion sessions.

### Other Notes
- Don't worry about not being able to achieve high accuracy, it is neither the goal nor the grading standard of this assignment.
- Discussion and Lecture materials should be helpful for doing the assignments.
- The homework submission should be a .ipynb file.


In [14]:
!git clone https://github.com/ucsd-cse151a-ss25/hw4.git

fatal: destination path 'hw4' already exists and is not an empty directory.


In [15]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, MinMaxScaler


## Exercise 1 : Building a Feed-Forward Neural Network(50 points)

### Exercise 1.1 : Data Preprocessing (10 points)

- As the classes are categorical, use one-hot encoding to represent the set of classes. You will find this useful when developing the output layer of the neural network.
- Split the data into training and testing set by __90:10__ and use the training set for training the model and the test set to evaluate the model performance. Please set verbose=0 to suppress output during training.
- Normalize each field of the input data using the min-max normalization technique.

__Notes:__

- Splitting of the dataset should be done __before__ the normalization step and __after__ the one-hot encoding.

In [16]:
df = pd.read_csv("hw4/Dry_Beans_Dataset.csv")

In [17]:
df

Unnamed: 0,Area,Perimeter,MajorAxisLength,MinorAxisLength,AspectRation,Eccentricity,ConvexArea,EquivDiameter,Extent,Solidity,roundness,Compactness,ShapeFactor1,ShapeFactor2,ShapeFactor3,ShapeFactor4,Class
0,44830,814.955,320.731947,178.405838,1.797766,0.831018,45297,238.912806,0.658877,0.989690,0.848226,0.744899,0.007154,0.001359,0.554874,0.997534,SIRA
1,33476,691.826,258.837971,165.220760,1.566619,0.769773,33907,206.453305,0.721155,0.987289,0.878921,0.797616,0.007732,0.001930,0.636191,0.996669,DERMASON
2,27057,606.138,227.460904,151.860320,1.497830,0.744491,27358,185.607226,0.801831,0.988998,0.925436,0.815996,0.008407,0.002299,0.665850,0.997330,DERMASON
3,49483,844.283,326.602913,194.689529,1.677558,0.802907,50289,251.005403,0.680179,0.983973,0.872348,0.768534,0.006600,0.001420,0.590644,0.990840,SIRA
4,22461,544.584,192.801303,148.541136,1.297966,0.637517,22699,169.110122,0.774731,0.989515,0.951720,0.877121,0.008584,0.003134,0.769342,0.998579,DERMASON
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13606,39956,745.166,273.867402,186.564001,1.467954,0.732079,40504,225.551678,0.796000,0.986470,0.904244,0.823580,0.006854,0.001945,0.678284,0.995690,DERMASON
13607,171914,1595.676,598.541646,368.358372,1.624889,0.788194,174673,467.854361,0.815980,0.984205,0.848461,0.781657,0.003482,0.000802,0.610988,0.992788,BOMBAY
13608,48266,817.340,304.682706,202.282198,1.506226,0.747812,48780,247.899536,0.807232,0.989463,0.907916,0.813632,0.006313,0.001706,0.661997,0.997117,SIRA
13609,43279,843.066,336.280446,164.667135,2.042183,0.871907,43813,234.743550,0.614566,0.987812,0.765181,0.698059,0.007770,0.001138,0.487286,0.995128,HOROZ


In [18]:
# 1. One-hot encode the class labels
class_encoder = OneHotEncoder(sparse_output=False)
y_onehot = class_encoder.fit_transform(df[['Class']])

# 2. Train-test split (before normalization)
X = df.drop(columns=['Class']).values  # shape (13611, 16)
X_train, X_test, y_train, y_test = train_test_split(X, y_onehot, test_size=0.10, random_state=42, shuffle=True)

# 3. Normalize using min-max (fit only on X_train)
scaler = MinMaxScaler()
X_train_norm = scaler.fit_transform(X_train)
X_test_norm = scaler.transform(X_test)

### Exercise 1.2 : Training and Testing the Neural Network (40 points)

Design a 3-layer (3 hidden layers and this does not include the input or output layer) artificial deep neural network, specifically a feed-forward multi-layer perceptron (using the sigmoid activation function), to classify the type of 'Dry Bean' given the other attributes in the data set, similar to the one mentioned in the paper above. Please note that this is a **multi-class classification** problem so select the right number of nodes accordingly for the input and output layers.

Consider the following hyperparameters while developing your model:

- Model type: Keras Sequential
- Make sure your input layer matches the size of your X matrix
- Number and type of hidden layers: 3 and Dense
- Number of nodes in each hidden layer: 12
- Learning rate should be 0.3
- Number of epochs should be 100
- The sigmoid function is to be used as the activation function in each layer
- Your output layer has to use a sigmoid function and the number of outputs should match the shape of your y
- Your loss function should be MSE
- Stochastic Gradient Descent should be used to minimize the error rate

**Note:** We are having you use MSE as your loss function for this model, is this a good choice? Why or why not? If not, what should you use instead in future models? Answer below

__Requirements once the model has been trained :__

- A confusion matrix for all classes, specifying the true positive, true negative, false positive, and false negative cases for each category in the class
- Since we do have OHE output (multi-class output) you will need to either reshape or argmax your outputs. Make sure they have already been thresholded as well i.e. look at yhat and do you see 1's and 0's?
- The accuracy and mean squared error (MSE) of the model
- The precision and recall for each label in the class

__Notes :__

- The mean squared error (MSE) values obtained __should be positive__.


In [19]:
import tensorflow as tf
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD

input_dim = X_train_norm.shape[1]
output_dim = y_train.shape[1]

model = Sequential([
    Dense(12, activation='sigmoid', input_shape=(input_dim,)),
    Dense(12, activation='sigmoid'),
    Dense(12, activation='sigmoid'),
    Dense(output_dim, activation='sigmoid')
])

optimizer = SGD(learning_rate=0.3)
model.compile(optimizer=optimizer, loss='mean_squared_error', metrics=['accuracy'])


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [20]:
history = model.fit(
    X_train_norm, y_train,
    epochs=100,
    batch_size=32,
    verbose=0,      # Suppress output as required
    validation_data=(X_test_norm, y_test)
)

In [21]:
import numpy as np

# Predict on test data
yhat_prob = model.predict(X_test_norm)

# Convert probabilities to hard 1/0 using argmax for multiclass OHE
yhat_classes = np.zeros_like(yhat_prob)
yhat_classes[np.arange(len(yhat_prob)), yhat_prob.argmax(axis=1)] = 1

# For metrics below, also get the integer class labels
y_test_class = y_test.argmax(axis=1)
yhat_class = yhat_prob.argmax(axis=1)

[1m43/43[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step


In [22]:
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, mean_squared_error

cm = confusion_matrix(y_test_class, yhat_class)
print("Confusion Matrix:\n", cm)


Confusion Matrix:
 [[  0   0   0  11  72   0  54]
 [  0   0   0   0  63   0   0]
 [  0   0   0   1 188   0   6]
 [  0   0   0 329   1   6   6]
 [  0   0   0   0 177   0   4]
 [  0   0   0 112   0  87   1]
 [  0   0   0 145  19   0  80]]


In [23]:
print(classification_report(y_test_class, yhat_class, target_names=class_encoder.categories_[0]))

              precision    recall  f1-score   support

    BARBUNYA       0.00      0.00      0.00       137
      BOMBAY       0.00      0.00      0.00        63
        CALI       0.00      0.00      0.00       195
    DERMASON       0.55      0.96      0.70       342
       HOROZ       0.34      0.98      0.50       181
       SEKER       0.94      0.43      0.59       200
        SIRA       0.53      0.33      0.41       244

    accuracy                           0.49      1362
   macro avg       0.34      0.39      0.31      1362
weighted avg       0.42      0.49      0.40      1362



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [24]:
acc = accuracy_score(y_test_class, yhat_class)
mse = mean_squared_error(y_test, yhat_classes)
print(f"Accuracy: {acc:.4f}")
print(f"Mean Squared Error: {mse:.4f}")

Accuracy: 0.4941
Mean Squared Error: 0.1445


Why is MSE not the optimal loss for this problem?

MSE works but is not optimal for classification, especially multiclass.

MSE penalizes the squared distance between predicted and actual vectors but does not encourage probability distributions to fit actual classes as well as other losses.

Better Choice: Use categorical_crossentropy as loss (with softmax output), which is mathematically designed for probability outputs and one-hot labels in multi-class settings. This loss directly optimizes correct class probability.



## Exercise 2 : k-fold Cross Validation (20 points)

In order to avoid **using biased models**, use 10-fold cross validation to generalize the model from Ex1.2 on the given data set. You can choose a n_repeats value of 1-5

__Requirements :__
- Print the accuracy values during each iteration of the **cross validation** not the iterations per epoch or the epochs
- Print the overall average accuracy per each n_fold value, look at the documentation for the scoring parameter



In [25]:
def build_model(input_dim, output_dim):
    model = Sequential([
        Dense(12, activation='sigmoid', input_shape=(input_dim,)),
        Dense(12, activation='sigmoid'),
        Dense(12, activation='sigmoid'),
        Dense(output_dim, activation='sigmoid')
    ])
    optimizer = SGD(learning_rate=0.3)
    model.compile(optimizer=optimizer, loss='mean_squared_error', metrics=['accuracy'])
    return model

In [26]:
from sklearn.model_selection import KFold
from sklearn.metrics import accuracy_score

In [27]:
n_splits = 10
n_repeats = 3  # you may choose 1-5 as instructed
all_fold_accuracies = []

for repeat in range(n_repeats):
    print(f"\n=== Repeat {repeat+1} ===")
    fold_accuracies = []
    kf = KFold(n_splits=n_splits, shuffle=True, random_state=repeat)

    for fold, (train_index, test_index) in enumerate(kf.split(X_train_norm)):
        X_tr, X_val = X_train_norm[train_index], X_train_norm[test_index]
        y_tr, y_val = y_train[train_index], y_train[test_index]

        model = build_model(X_tr.shape[1], y_tr.shape[1])
        model.fit(X_tr, y_tr, epochs=100, batch_size=32, verbose=0)

        y_pred = model.predict(X_val)
        y_pred_labels = np.argmax(y_pred, axis=1)
        y_true_labels = np.argmax(y_val, axis=1)

        acc = accuracy_score(y_true_labels, y_pred_labels)
        fold_accuracies.append(acc)
        print(f"Fold {fold+1}: Accuracy = {acc:.4f}")

    mean_acc = np.mean(fold_accuracies)
    print(f"Average accuracy for repeat {repeat+1}: {mean_acc:.4f}")
    all_fold_accuracies.append(mean_acc)

overall_mean_acc = np.mean(all_fold_accuracies)
print(f"\nOverall average accuracy across repeats: {overall_mean_acc:.4f}")


=== Repeat 1 ===


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 1: Accuracy = 0.5282


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 2: Accuracy = 0.5004


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step
Fold 3: Accuracy = 0.3755


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 4: Accuracy = 0.5176


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 5: Accuracy = 0.2767


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 6: Accuracy = 0.5902


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 7: Accuracy = 0.5012


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 8: Accuracy = 0.4286


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 9: Accuracy = 0.8229


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 10: Accuracy = 0.5891
Average accuracy for repeat 1: 0.5130

=== Repeat 2 ===


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 1: Accuracy = 0.5918


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 2: Accuracy = 0.4792


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 3: Accuracy = 0.6065


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step
Fold 4: Accuracy = 0.5453


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 5: Accuracy = 0.5624


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 6: Accuracy = 0.2482


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 7: Accuracy = 0.4808


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 8: Accuracy = 0.2522


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 9: Accuracy = 0.6000


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 10: Accuracy = 0.5384
Average accuracy for repeat 2: 0.4905

=== Repeat 3 ===


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 1: Accuracy = 0.5518


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 2: Accuracy = 0.7029


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 3: Accuracy = 0.2580


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 4: Accuracy = 0.4661


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 5: Accuracy = 0.4702


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 6: Accuracy = 0.5771


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 7: Accuracy = 0.5543


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 8: Accuracy = 0.4392


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 9: Accuracy = 0.4408


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Fold 10: Accuracy = 0.2663
Average accuracy for repeat 3: 0.4727

Overall average accuracy across repeats: 0.4921


## Exercise 3 : Hyperparameter Tuning (25 points)

Use either grid search or random search methodology to find the optimal number of nodes required in each hidden layer, as well as the optimal learning rate and the different activation functions or optimization approaches, [keras_tuner examples](https://keras.io/guides/keras_tuner/getting_started/) such that the accuracy of the model is maximum for the given data set.

__Requirements :__
- The set of optimal hyperparameters
- Try your best to maximize accuracy using this set of optimal hyperparameters

__Note :__ Hyperparameter tuning takes a lot of time to execute. Make sure that you choose the appropriate number of each hyperparameter (preferably 3 of each), and that you allocate enough time to execute your code. Make sure to tune at least three parameters with three options each at a minimum, but feel free to experiment with more, just recognize that it will grow exponentially in running time

In [30]:
!pip install keras-tuner -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/129.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━[0m [32m122.9/129.1 kB[0m [31m4.6 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m129.1/129.1 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[?25h

In [31]:
import keras_tuner as kt
from tensorflow.keras.optimizers import SGD, Adam, RMSprop

def build_model(hp):
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Dense(
        hp.Choice('units', [8, 12, 16]),
        activation=hp.Choice('activation', ['sigmoid', 'relu', 'tanh']),
        input_shape=(X_train_norm.shape[1],)))
    for _ in range(2):  # Three hidden layers in total
        model.add(tf.keras.layers.Dense(
            hp.Choice('units', [8, 12, 16]),
            activation=hp.Choice('activation', ['sigmoid', 'relu', 'tanh'])
        ))
    model.add(tf.keras.layers.Dense(y_train.shape[1], activation='softmax'))
    opt_choice = hp.Choice('optimizer', ['SGD', 'Adam', 'RMSprop'])
    lr_choice = hp.Choice('learning_rate', [0.01, 0.1, 0.3])
    optimizer = {'SGD': SGD(lr_choice), 'Adam': Adam(lr_choice), 'RMSprop': RMSprop(lr_choice)}[opt_choice]
    model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [32]:
tuner = kt.RandomSearch(
    build_model,
    objective='val_accuracy',
    max_trials=18,  # 3x3x2 is a manageable grid, adjust if needed
    executions_per_trial=1,
    directory='drybean_tuning',
    project_name='bean_hpsearch'
)


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [33]:
tuner.search(
    X_train_norm, y_train,
    epochs=50,
    validation_split=0.1,
    verbose=0
)

In [34]:
best_hp = tuner.get_best_hyperparameters(num_trials=1)[0]
print("Optimal Hyperparameters:")
print(f"Units per hidden layer: {best_hp.get('units')}")
print(f"Activation function: {best_hp.get('activation')}")
print(f"Optimizer: {best_hp.get('optimizer')}")
print(f"Learning rate: {best_hp.get('learning_rate')}")

Optimal Hyperparameters:
Units per hidden layer: 8
Activation function: relu
Optimizer: RMSprop
Learning rate: 0.01


In [35]:
best_model = tuner.hypermodel.build(best_hp)
history = best_model.fit(X_train_norm, y_train, epochs=100, validation_data=(X_test_norm, y_test), verbose=0)
test_acc = best_model.evaluate(X_test_norm, y_test, verbose=0)[1]
print(f"Test set accuracy with optimal hyperparameters: {test_acc:.4f}")

Test set accuracy with optimal hyperparameters: 0.8957


## Exercise 4 - Collaborative Statement (5 points)

It is mandatory to include a Statement of Collaboration in each submission, that follows the guidelines below.
Include the names of everyone involved in the discussions (especially in-person ones), and what was discussed.
All students are required to follow the academic honesty guidelines posted on the course website. For
programming assignments in particular, I encourage students to organize (perhaps using Piazza) to discuss the
task descriptions, requirements, possible bugs in the support code, and the relevant technical content before they
start working on it. However, you should not discuss the specific solutions, and as a guiding principle, you are
not allowed to take anything written or drawn away from these discussions (no photographs of the blackboard,
written notes, referring to Piazza, etc.). Especially after you have started working on the assignment, try to restrict
the discussion to Piazza as much as possible, so that there is no doubt as to the extent of your collaboration.

Even if you did not use any outside resources or collaborate with anyone, please state that explicitly in the space below.

In [None]:
z