# Why We should not Use These Weight Initialization technique:
- **1. Zero Initialization**
    - **Problem:** If all weights are initialized to zero, every neuron in a layer learns the same features during training. This leads to a situation called symmetry, where all neurons        in the same layer are identical and therefore unable to learn different features.
    - **Consequence:** The network becomes less expressive, as it can only learn a single representation. As a result, the model may perform poorly, unable to capture the complexities of       the data.
- **2. Constant Initialization**
   - **Problem:** Similar to zero initialization, initializing all weights to the same non-zero constant can also lead to symmetry. Although it may help break symmetry slightly more than        zero initialization, it still causes all neurons in a layer to learn the same features.
   - **Consequence:** The model may fail to converge effectively because it does not exploit the diversity of features across different neurons. This often leads to suboptimal learning        and poorer model performance.
- **3. Random Weight Initialization from Small Values**
  **Why Not Use:**
    - **Saturation of Activation Functions:** If weights are initialized with very small values, the outputs from neurons in early layers can become very close to zero, especially with           activation functions like sigmoid or tanh. This leads to:
    - **Vanishing Gradients:** During backpropagation, gradients may become too small, effectively stopping weight updates, which makes it challenging for deep networks to learn.
    - **Ineffective Learning:** Similar to zero and constant initializations, small random weights may not allow the network to explore a meaningful parameter space, leading to poor              convergence.
- **3. Random Weight Initialization from Large Values**
    - **Problem:** Initializing weights with large random values can lead to issues such as saturation of activation functions (e.g., Sigmoid, Tanh) where neurons output values very            close to their limits. For example, if you're using the Sigmoid activation function, large positive or negative weights will push the output to 0 or 1, causing gradients to be            close to zero (vanishing gradients).
   -  **Consequence:** This saturation makes it difficult for the model to learn effectively, as backpropagation signals become very weak, leading to slow convergence or even complete            failure to learn during training.

# 1. Import necessary libraries and Load Dataset

In [107]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow import keras
from keras import Sequential
from keras.layers import Dense

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


/kaggle/input/bostonhoustingmlnd/housing.csv


In [108]:
df = pd.read_csv("/kaggle/input/bostonhoustingmlnd/housing.csv")

In [109]:
df.head()

Unnamed: 0,RM,LSTAT,PTRATIO,MEDV
0,6.575,4.98,15.3,504000.0
1,6.421,9.14,17.8,453600.0
2,7.185,4.03,17.8,728700.0
3,6.998,2.94,18.7,701400.0
4,7.147,5.33,18.7,760200.0


In [110]:
df.shape

(489, 4)

In [111]:
#check null value
df.isnull().sum()

RM         0
LSTAT      0
PTRATIO    0
MEDV       0
dtype: int64

In [112]:
# check duplicate value 
df.duplicated().sum()

0

In [113]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 489 entries, 0 to 488
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   RM       489 non-null    float64
 1   LSTAT    489 non-null    float64
 2   PTRATIO  489 non-null    float64
 3   MEDV     489 non-null    float64
dtypes: float64(4)
memory usage: 15.4 KB


# 2. Preprocessing

In [114]:
X = df.drop('MEDV',axis=1) # feature
y = df['MEDV'] # target variable

In [115]:
X.shape,y.shape

((489, 3), (489,))

In [116]:
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [117]:
X_train

Unnamed: 0,RM,LSTAT,PTRATIO
325,5.869,9.80,20.2
140,6.174,24.16,21.2
433,6.749,17.44,20.2
416,6.436,16.22,20.2
487,6.794,6.48,21.0
...,...,...,...
106,5.836,18.66,20.9
270,7.820,3.76,14.9
348,6.112,12.67,20.2
435,6.297,17.27,20.2


In [118]:
X_train.shape

(391, 3)

In [119]:
# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [120]:
X_train

array([[-0.57729555, -0.42657057,  0.80699513],
       [-0.12427302,  1.59067125,  1.26588852],
       [ 0.72978585,  0.64666951,  0.80699513],
       ...,
       [-0.21636285, -0.02340316,  0.80699513],
       [ 0.05842131,  0.62278851,  0.80699513],
       [ 0.21883585, -0.30997512,  1.1282205 ]])

# 3. Define function to Build Model with different Initialization

In [121]:
import tensorflow as tf
# Define Neural Network
def build_model(initializer):
    
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Dense(10, activation='relu', kernel_initializer=initializer, input_shape=(X_train.shape[1],)))
    model.add(tf.keras.layers.Dense(5, activation='relu', kernel_initializer=initializer))
    model.add(tf.keras.layers.Dense(1))  # Output layer for regression
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [None]:
# 4. Weight Initialization Techniques

# 4. Weight Initialization Techniques
- a. Initialize Weights from Zero

In [123]:
model_zero = build_model(tf.keras.initializers.Zeros())
model_zero.fit(X_train, y_train, epochs=100, verbose=0)

<keras.src.callbacks.history.History at 0x7edbd8156950>

In [126]:
model_constant = build_model(tf.keras.initializers.Constant(value=0.1))
model_constant.fit(X_train, y_train, epochs=100, verbose=0)

<keras.src.callbacks.history.History at 0x7edc02889360>

In [127]:
model_random_small = build_model(tf.keras.initializers.RandomNormal(mean=0.0, stddev=0.01))
model_random_small.fit(X_train, y_train, epochs=100, verbose=0)

<keras.src.callbacks.history.History at 0x7edbf1647ac0>

In [128]:
model_random_large_normal = build_model(tf.keras.initializers.RandomNormal(mean=0.0, stddev=5.0))
model_random_large_normal.fit(X_train, y_train, epochs=100, verbose=0)

<keras.src.callbacks.history.History at 0x7edbf0c377c0>

In [129]:
mse_zero = model_zero.evaluate(X_test, y_test, verbose=0)
mse_constant = model_constant.evaluate(X_test, y_test, verbose=0)
mse_random_small = model_random_small.evaluate(X_test, y_test, verbose=0)
mse_random_large_normal = model_random_large_normal.evaluate(X_test, y_test, verbose=0)

print(f'MSE with Zero Initialization: {mse_zero}')
print(f'MSE with Constant Initialization: {mse_constant}')
print(f'MSE with Random Small Values Initialization: {mse_random_small}')
print(f'MSE with Random Normal Initialization (Large Values): {mse_random_large_normal}')

MSE with Zero Initialization: 202395140096.0
MSE with Constant Initialization: 201631563776.0
MSE with Random Small Values Initialization: 201870786560.0
MSE with Random Normal Initialization (Large Values): 201914023936.0


In [125]:
mse_zero

202395140096.0

In [92]:
model.get_weights()

[array([[ 0.30117166,  0.4882027 , -0.65265244,  0.54126215,  0.48530948,
          0.20859343,  0.61264193, -0.54155123,  0.6586546 ,  0.5061133 ],
        [-0.11177373, -0.58637124, -0.19706395,  0.5533215 , -0.35316822,
         -0.3621001 ,  0.23664027, -0.16318911, -0.4795492 ,  0.5085734 ],
        [-0.356909  , -0.06509054,  0.31833792, -0.03454667, -0.47390473,
          0.26749474, -0.41945362,  0.13260967,  0.5204623 ,  0.07378364]],
       dtype=float32),
 array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32),
 array([[-0.05104458],
        [-0.29337302],
        [ 0.5298676 ],
        [-0.40404063],
        [-0.7077781 ],
        [ 0.1824584 ],
        [ 0.42178053],
        [ 0.4736392 ],
        [ 0.6513776 ],
        [-0.35008767]], dtype=float32),
 array([0.], dtype=float32)]

In [93]:
initial_weights = model.get_weights()