

Objective:

Design and implement an MLP using Keras that incorporates both residual and additional skip connections. Your model will be trained to perfectly overfit a single batch (batch size = 128) from a large dataset while performing poorly on validation data. Additionally, you will visualize your network architecture using the Netron app and include the exported diagram. The final submission must be uploaded to GitHub, and the submission text must start with the GitHub links.

Task Description:

   - Dataset & Preprocessing:
        - Use a large dataset such as the UCI Covertype Dataset, but you can use your own (e.g. from project work).
        - Preprocess the data by:
            - Handling missing values.
            - Normalizing numerical features.
            - Encoding categorical variables.
        - Split the dataset into training and validation sets.
   - Model Architecture: keep the number of trainable parameters as low as possible. Define the following neural network:
        - Initial Layers: Build an MLP in Keras to process the input features.
        - Custom Residual Block:
            - Using the Keras Functional API, create a block with at least two Dense layers with ReLU activations.
            - Implement a residual connection by adding the block’s input to its output (apply a linear projection with an extra Dense layer if the dimensions differ).
        - Additional Skip Connection:
            - Implement an extra skip connection that bypasses one or more intermediate layers outside the residual block.
        - Final Layers:
            - Add further Dense layers.
            - Include an output layer appropriate for the task (e.g., a single unit with sigmoid activation for binary classification).
   - Visualization:
        - Save your complete model (e.g., as a .h5 file or in JSON format).
        - Open the saved model in the Netron app (https://netron.app/) and export the network diagram as an image.
        - Ensure that the exported image clearly shows all parts of your architecture, including both residual and skip connections.
   - Training & Evaluation:
        - Overfitting Experiment:
            - Select a single batch of 128 samples from the training set.
            - Train your model exclusively on this batch until you approach 0 loss.
        - Validation Check:
            - Evaluate the overfitted model on the validation set to confirm that it performs poorly, demonstrating a lack of generalization.
        - Conclusions:
            - At the end of your code, print the following information:
                - Number of parameters:
                - Final training loss:
                - Final validation loss:
   - Submission Requirements:
        - Upload your complete code and the exported network diagram image to GitHub.
        - Submission Text Format:
            - Line 1: Provide the link to the Python script in Jupyter Notebook file, with all the outputs saved in the file (don't submit pure Python script).
            - Line 2: Provide the link to the image of the architecture.


In [32]:
import pandas as pd
import gzip
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from tensorflow.keras.layers import Input, Dense, Add
from tensorflow.keras.models import Model

In [33]:
# DATASET & PREPROCESSING

In [34]:
column_names = [
    "Elevation", "Aspect", "Slope", "Horizontal_Distance_To_Hydrology",
    "Vertical_Distance_To_Hydrology", "Horizontal_Distance_To_Roadways",
    "Hillshade_9am", "Hillshade_Noon", "Hillshade_3pm",
    "Horizontal_Distance_To_Fire_Points"
] + [f"Wilderness_Area_{i}" for i in range(4)] + [f"Soil_Type_{i}" for i in range(40)] + ["Cover_Type"]

with gzip.open("covertype/covtype.data.gz", 'rt') as f:
    covtype = pd.read_csv(f, header=None, names=column_names)

# covtype.head()

In [35]:
numerical_cols = [
    "Elevation", "Aspect", "Slope", "Horizontal_Distance_To_Hydrology",
    "Vertical_Distance_To_Hydrology", "Horizontal_Distance_To_Roadways",
    "Hillshade_9am", "Hillshade_Noon", "Hillshade_3pm",
    "Horizontal_Distance_To_Fire_Points"
]

scaler = StandardScaler()
covtype[numerical_cols] = scaler.fit_transform(covtype[numerical_cols])

covtype.head()

Unnamed: 0,Elevation,Aspect,Slope,Horizontal_Distance_To_Hydrology,Vertical_Distance_To_Hydrology,Horizontal_Distance_To_Roadways,Hillshade_9am,Hillshade_Noon,Hillshade_3pm,Horizontal_Distance_To_Fire_Points,...,Soil_Type_31,Soil_Type_32,Soil_Type_33,Soil_Type_34,Soil_Type_35,Soil_Type_36,Soil_Type_37,Soil_Type_38,Soil_Type_39,Cover_Type
0,-1.297805,-0.935157,-1.48282,-0.053767,-0.796273,-1.180146,0.330743,0.439143,0.14296,3.246283,...,0,0,0,0,0,0,0,0,0,5
1,-1.319235,-0.89048,-1.616363,-0.270188,-0.899197,-1.257106,0.293388,0.590899,0.221342,3.205504,...,0,0,0,0,0,0,0,0,0,5
2,-0.554907,-0.148836,-0.681563,-0.006719,0.318742,0.532212,0.816364,0.742654,-0.196691,3.126965,...,0,0,0,0,0,0,0,0,0,2
3,-0.622768,-0.005869,0.520322,-0.129044,1.227908,0.474492,0.965786,0.742654,-0.536343,3.194931,...,0,0,0,0,0,0,0,0,0,2
4,-1.301377,-0.98877,-1.616363,-0.547771,-0.813427,-1.256464,0.293388,0.540313,0.195215,3.165479,...,0,0,0,0,0,0,0,0,0,5


In [36]:
X = covtype.drop("Cover_Type", axis=1)
y = covtype["Cover_Type"]

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, stratify=y, random_state=16)

In [37]:
# MODEL ARCHITECTURE

In [38]:
def custom_residual_block(x, units=64):
    residual = x
    x = Dense(units, activation='relu')(x) # Dense layer # 1 with ReLU activation
    x = Dense(units, activation='relu')(x) # Dense layer # 2 with ReLU activation

    if residual.shape[-1] != units: # Check dimensions
        residual = Dense(units)(residual)

    x = Add()([x, residual]) # Residual connection
    return x

input = Input(shape=(X_train.shape[1],)) # Input

x_initial = Dense(64, activation='relu')(input) # Initial layers

x_residual = custom_residual_block(x_initial, units=64) # Custom residual block

x_intermediate = Dense(64, activation='relu')(x_residual) # Intermediate layers

x_skip = Add()([x_intermediate, x_initial]) # Additional skip connection

x_final = Dense(32, activation='relu')(x_skip) # Final layers

output = Dense(7, activation='softmax')(x_final) # Output

model = Model(inputs=input, outputs=output)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.save("mlp.h5")
model.summary()



In [39]:
# TRAINING & EVALUATION

In [40]:
batch = 128
X_batch = X_train.sample(n=batch, random_state=16)
y_batch = y_train.loc[X_batch.index]
y_batch = y_batch - 1 # Fix zero indexing

model_run = model.fit(X_batch, y_batch, epochs=500, batch_size=batch, verbose=1)

Epoch 1/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 477ms/step - accuracy: 0.2891 - loss: 1.8086
Epoch 2/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step - accuracy: 0.4141 - loss: 1.7190
Epoch 3/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step - accuracy: 0.4609 - loss: 1.6317
Epoch 4/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step - accuracy: 0.5000 - loss: 1.5481
Epoch 5/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step - accuracy: 0.5156 - loss: 1.4693
Epoch 6/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.5391 - loss: 1.3944
Epoch 7/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.5391 - loss: 1.3240
Epoch 8/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step - accuracy: 0.5547 - loss: 1.2597
Epoch 9/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[

In [41]:
y_val = y_val - 1  # Fix zero indexing

val_loss, val_accuracy = model.evaluate(X_val, y_val, verbose=0)

number_params = model.count_params()
train_loss = model_run.history['loss'][-1]

print(f"Number of parameters: {number_params}")
print(f"Final Training Loss: {train_loss}")
print(f"Final Validation Loss: {val_loss}")

Number of parameters: 18311
Final Training Loss: 0.0001559455704409629
Final Validation Loss: 3.819810390472412
