# Project 2

In this project, I investigate the impact of initializing all model weights to zero on its performance.

## Importing the libraries

In [1]:
import tensorflow as tf
from tensorflow import keras

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

from sklearn.preprocessing import StandardScaler

from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV



# turn off warnings for final notebook
import warnings
warnings.filterwarnings('ignore')

%matplotlib inline

## Load and Prepare Data

In [2]:
# Load the MNIST dataset
(X_train, y_train), (X_test, y_test) = keras.datasets.cifar10.load_data()

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
[1m170498071/170498071[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 0us/step


We want to split the data by our desired ratio. In order to do this, first we have to concatenate default splitted data.

In [3]:
#Concatenate the default splitted data
X = np.concatenate((X_train,  X_test))
y = np.concatenate((y_train, y_test))

In [4]:
y[:10]

array([[6],
       [9],
       [9],
       [4],
       [1],
       [1],
       [2],
       [7],
       [8],
       [3]], dtype=uint8)

In [5]:
#Split the data into training set and test set
X_train_full, X_test, y_train_full, y_test = train_test_split(X, y, test_size=0.15, random_state = 42)

X_train_full, X_test = X_train_full/255.0, X_test/255.0

#Split the training data into training set and validation set
X_train, X_valid, y_train, y_valid = train_test_split(X_train_full, y_train_full, test_size=0.15, random_state = 42)

## Define Model

In [9]:
# define zero initializer
zero_init = tf.keras.initializers.Zeros()

model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[32, 32, 3]),
    keras.layers.Dense(50, activation='relu', kernel_initializer=zero_init),
    keras.layers.Dense(50, activation='relu', kernel_initializer=zero_init),
    keras.layers.Dense(10, activation='softmax', kernel_initializer=zero_init)
])

In [10]:
model.summary()

In [13]:
print("model weights before training: ")
print()
print(model.get_weights())

model weights before training: 

[array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32), array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
      dtype=float32), array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32), array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

## Compile Model

In [15]:
model.compile(loss="sparse_categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

# Fit the Model

In [16]:
history = model.fit(X_train, y_train, epochs=50, validation_data = (X_valid, y_valid))

Epoch 1/50
[1m1355/1355[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 4ms/step - accuracy: 0.0952 - loss: 2.3027 - val_accuracy: 0.0942 - val_loss: 2.3027
Epoch 2/50
[1m1355/1355[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 3ms/step - accuracy: 0.1028 - loss: 2.3027 - val_accuracy: 0.1005 - val_loss: 2.3026
Epoch 3/50
[1m1355/1355[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 4ms/step - accuracy: 0.0988 - loss: 2.3027 - val_accuracy: 0.1020 - val_loss: 2.3026
Epoch 4/50
[1m1355/1355[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 4ms/step - accuracy: 0.0957 - loss: 2.3028 - val_accuracy: 0.0942 - val_loss: 2.3029
Epoch 5/50
[1m1355/1355[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 4ms/step - accuracy: 0.1002 - loss: 2.3027 - val_accuracy: 0.1024 - val_loss: 2.3027
Epoch 6/50
[1m1355/1355[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.0971 - loss: 2.3027 - val_accuracy: 0.0942 - val_loss: 2.3026
Epoch 7/50
[1

 **Training and validation accuracy** remained nearly constant at ~10% throughout 50 epochs. Because zero initialization causes all neurons in a layer to produce the same outputs and receive the same gradients. As a result, symmetry is not broken and the model cannot learn.

In [19]:
# printing layer 1 weights after training
layer1_weights = model.layers[1].get_weights()[0]
print("layer 1 weights after 50 epochs: ")
print(layer1_weights)

# دقت نهایی
final_accuracy = model.evaluate(X_test, y_test)[1]
print(f"\n final_accuracy: {final_accuracy:.4f}")


layer 1 weights after 50 epochs: 
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.0953 - loss: 2.3028

 final_accuracy: 0.0964


We can see that all the weights of layer1 remained zero and final accuracy of model is roughly 10%. This means that model did not learn anything and performed like random guessing across the 10 classes of CIFAR-10.

# Conclusion

In this experiment, the weights of all neurons were initialized to zero.
As a result, the model was unable to learn meaningful features from the data.
The training and validation accuracy remained nearly constant at around 10%, which is equivalent to random guessing across the 10 classes of CIFAR-10.
This outcome highlights the importance of proper weight initialization.