# Dying ReLU

The dying ReLU problem occurs when several neurons only output a value of zero. This happens primarily when the input is negative. This offers an advantage of network sparsity to ReLU, but it creates a major problem when most of the inputs to the neurons are negative. The worst-case scenario is when the entire network dies and only a constant function remains.

When most of the neurons output zero, the gradient fails to flow and the weights stop getting updated. Thus, the network stops learning. 

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras import initializers

In [2]:
# Generate some synthetic training data
np.random.seed(42)
X_train = np.random.rand(1000, 10)
y_train = np.random.randint(2, size=(1000, 1))

#### ReLU Activation
We create a simple sequential model with two hidden layers and an output layer. Both hidden layers use the ReLU activation function. We initialize the weights of the layers using a constant value 0.5 or random normal distribution with a mean of 0 and a standard deviation of 0.1.

In [3]:
# Model using ReLU
inputs = Input(shape=(10,))
hidden1 = Dense(10, activation='relu', kernel_initializer=initializers.Constant(0.5))(inputs)
hidden2 = Dense(10, activation='relu', kernel_initializer=initializers.Constant(-0.5))(hidden1)
outputs = Dense(1, activation='sigmoid')(hidden2)

#### Swish Activation

In [6]:
#Model using Swish
inputs = Input(shape=(10,))
hidden1 = Dense(10, activation='swish', kernel_initializer=initializers.Constant(0.5))(inputs)
hidden2 = Dense(10, activation='swish', kernel_initializer=initializers.Constant(-0.5))(hidden1)
outputs = Dense(1, activation='sigmoid')(hidden2)

In [7]:
model = Model(inputs=inputs, outputs=outputs)

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy')

# Train the model
model.fit(X_train, y_train, epochs=10, batch_size=32)

# Extract and print neuron values
get_layer_outputs = tf.keras.backend.function([model.layers[0].input], [model.layers[1].output, model.layers[2].output])
layer1_values, layer2_values = get_layer_outputs([X_train])

print("Layer 1 neuron values:")
print(layer1_values)
print("Layer 2 neuron values:")
print(layer2_values)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Layer 1 neuron values:
[[2.2742062 2.2742062 2.2742062 ... 2.2742062 2.2742062 2.2742062]
 [1.6155307 1.6155307 1.6155307 ... 1.6155307 1.6155307 1.6155307]
 [1.5957775 1.5957775 1.5957775 ... 1.5957775 1.5957775 1.5957775]
 ...
 [2.4228406 2.4228406 2.4228406 ... 2.4228406 2.4228406 2.4228406]
 [2.1356766 2.1356766 2.1356766 ... 2.1356766 2.1356766 2.1356766]
 [2.4301813 2.4301813 2.4301813 ... 2.4301813 2.4301813 2.4301813]]
Layer 2 neuron values:
[[-2.05692311e-04 -5.63098292e-05 -3.58903140e-04 ... -5.16848377e-05
  -3.84886516e-04 -5.09663987e-05]
 [-3.43180611e-03 -1.39147195e-03 -5.04527893e-03 ... -1.31058076e-03
  -5.29633462e-03 -1.29781757e-03]
 [-3.72623908e-03 -1.52883097e-03 -5.44990459e-03 ... -1.44107360e-03
  -5.71740652e-03 -1.42722193e-03]
 ...
 [-1.07474742e-04 -2.69329885e-05 -1.94892811e-04 ... -2.45776027e-05
  -2.10005921e-04 -2.42129445e-05]
 [-3.75216