<a href="https://colab.research.google.com/github/diane-park/Deep_Learning_HW04/blob/main/DeepLearning__overfitting.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Objective:
Design and implement an MLP using Keras that incorporates both residual and additional skip connections. Your model will be trained to perfectly overfit a single batch (batch size = 128) from a large dataset while performing poorly on validation data. Additionally, you will visualize your network architecture using the Netron app and include the exported diagram. The final submission must be uploaded to GitHub, and the submission text must start with the GitHub links.









In [None]:
# Import statements
import tensorflow
from tensorflow.keras.utils import to_categorical
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, Flatten, Dropout
from tensorflow.keras.initializers import HeNormal
from tensorflow.keras.callbacks import EarlyStopping


Dataset & Preprocessing:

Use a large dataset such as the UCI Covertype Dataset, but you can use your own (e.g. from project work).
Preprocess the data by:
Handling missing values.
Normalizing numerical features.
Encoding categorical variables.
Split the dataset into training and validation sets.

In [None]:
pip install ucimlrepo

Collecting ucimlrepo
  Downloading ucimlrepo-0.0.7-py3-none-any.whl.metadata (5.5 kB)
Downloading ucimlrepo-0.0.7-py3-none-any.whl (8.0 kB)
Installing collected packages: ucimlrepo
Successfully installed ucimlrepo-0.0.7


In [None]:
# Load in UCI Dataset

from ucimlrepo import fetch_ucirepo

# fetch dataset
covertype = fetch_ucirepo(id=31)




581012
   Cover_Type
0           5
1           5
2           2
3           2
4           5


In [None]:
# data (as pandas dataframes)
X = covertype.data.features
y = covertype.data.targets

print(X.head())
print(y.head())

NameError: name 'summary' is not defined

In [None]:
# According to website, no missing or NA values are in this dataset
print(y.head())

   Cover_Type
0           5
1           5
2           2
3           2
4           5


In [None]:
# Split into training and validation sets
from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(
    X, y,
    test_size=0.2,
    random_state=42,  # for reproducibility
    shuffle=True

)

In [None]:
# normalize numerical cols
from sklearn.preprocessing import StandardScaler

numerical_cols = [
    "Elevation", "Aspect", "Slope",
    "Horizontal_Distance_To_Hydrology", "Vertical_Distance_To_Hydrology",
    "Horizontal_Distance_To_Roadways", "Hillshade_9am",
    "Hillshade_Noon", "Hillshade_3pm", "Horizontal_Distance_To_Fire_Points"
]

for col in numerical_cols:
  mean = np.mean(X_train[col], axis=0)
  std  = np.std(X_train[col], axis=0)

  X_train[col] = (X_train[col]-mean) / std
  X_val[col] = (X_val[col]-mean) / std

In [None]:
# turn y labels into categorical
y_train["Cover_Type"] = to_categorical(y_train["Cover_Type"] - 1, num_classes = 7)
y_val["Cover_Type"] = to_categorical(y_val["Cover_Type"] - 1, num_classes = 7)

Model Architecture: keep the number of trainable parameters as low as possible. Define the following neural network:

Initial Layers: Build an MLP in Keras to process the input features.
Custom Residual Block:
Using the Keras Functional API, create a block with at least two Dense layers with ReLU activations.
Implement a residual connection by adding the block’s input to its output (apply a linear projection with an extra Dense layer if the dimensions differ).
Additional Skip Connection:
Implement an extra skip connection that bypasses one or more intermediate layers outside the residual block.
Final Layers:
Add further Dense layers.
Include an output layer appropriate for the task (e.g., a single unit with sigmoid activation for binary classification).


In [61]:

from tensorflow.keras.layers import Input, Dense, Add
from tensorflow.keras.models import Model


# We have 54 inputs (one per feature)
inputs = Input(shape=(54,))

dense_layer1 = Dense(7, activation='relu')(inputs)

dense_layer2 = Dense(7, activation='relu')(dense_layer1)

dense_layer3 = Dense(7, activation='relu')(dense_layer2)

# Residual connection
res_connection = Add()([dense_layer1, dense_layer3])

dense_layer4 = Dense(7, activation='relu')(res_connection)

# Skip connection outside of residual block
skip_connection = Add()([dense_layer4, dense_layer1])

dense_layer5 = Dense(7, activation='relu')(skip_connection)

dense_layer5 = Dense(7, activation='relu')(dense_layer5)

# Output layer with 7 neurons (one per cover type)
outputs = Dense(7, activation='softmax')(dense_layer5)

model = Model(inputs=inputs, outputs=outputs)


model.summary()
model.save('HW_overfitting_model.h5')



Save your complete model (e.g., as a .h5 file or in JSON format).
Open the saved model in the Netron app (https://netron.app/) and export the network diagram as an image.
Ensure that the exported image clearly shows all parts of your architecture, including both residual and skip connections.

In [62]:
# loss function and optimizer
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Training & Evaluation:

Overfitting Experiment:
Select a single batch of 128 samples from the training set.
Train your model exclusively on this batch until you approach 0 loss.
Validation Check:
Evaluate the overfitted model on the validation set to confirm that it performs poorly, demonstrating a lack of generalization.
Conclusions:
At the end of your code, print the following information:
Number of parameters:
Final training loss:
Final validation loss:


In [76]:
# First lets make our subset of 128 samples
X_sample = X_train[:128]
y_sample = y_train[:128]

network_history = model.fit(X_sample, y_sample,
                            validation_data=(X_val,y_val),
                            batch_size=128,
                            epochs=1000,
                            verbose=2)

Epoch 1/1000
1/1 - 2s - 2s/step - accuracy: 1.0000 - loss: 0.0096 - val_accuracy: 0.5502 - val_loss: 5.4068
Epoch 2/1000
1/1 - 3s - 3s/step - accuracy: 1.0000 - loss: 0.0095 - val_accuracy: 0.5501 - val_loss: 5.4125
Epoch 3/1000
1/1 - 5s - 5s/step - accuracy: 1.0000 - loss: 0.0094 - val_accuracy: 0.5501 - val_loss: 5.4181
Epoch 4/1000
1/1 - 3s - 3s/step - accuracy: 1.0000 - loss: 0.0093 - val_accuracy: 0.5501 - val_loss: 5.4234
Epoch 5/1000


KeyboardInterrupt: 

In [77]:
# Evaluating (hopefully observing overfitting with validation set)
import torch
import torch.nn as nn

preds = model(X_val)
correct = 0

for pred, label in zip(preds, y_val):
  if np.argmax(pred) == np.argmax(label):
    correct += 1

print("number correct:", correct)
print("validation accuracy:", correct/len(y_val))

number correct: 1
validation accuracy: 8.605629803017133e-06


# Conclusions

Number Parameters: 721

Final Training Loss: 0.0093

Final Validation Loss: 5.4234