<a href="https://colab.research.google.com/github/PrateekKaushal15/Deep-Learning-and-Data-Analytics-Lab-2025/blob/main/Experiment_no_05_Dropout.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Implementing the dropout function for a single layer. Draw samples from the uniform distribution S[0,1] . Keep those nodes for which the corresponding sample is greater than p probability , dropping the rest. Implement a dropout_layer function that drops out the elements in the tensor input X with probability dropout, rescaling the remainder. Testing the dropout layer with few examples

In [None]:
import numpy as np

def dropout_layer(X, dropout_prob):
    """
    Applies dropout to the input tensor X.

    Parameters:
    X (numpy.ndarray): Input tensor.
    dropout_prob (float): Dropout probability (probability of dropping a unit).

    Returns:
    numpy.ndarray: Output tensor with dropout applied.
    """
    assert 0 <= dropout_prob < 1, "Dropout probability must be in the range [0, 1)."

    # Create a mask of the same shape as X, with values drawn from Uniform[0,1]
    mask = np.random.uniform(0, 1, X.shape) > dropout_prob

    # Apply mask and rescale
    return (X * mask) / (1 - dropout_prob) if dropout_prob > 0 else X

# Test the dropout function with a few examples
np.random.seed(42)  # For reproducibility

X = np.array([[1.0, 2.0, 3.0],
              [4.0, 5.0, 6.0]])

dropout_prob = 0.3  # 30% probability of dropping a node
output = dropout_layer(X, dropout_prob)

print("Input:")
print(X)
print("\nDropout Output:")
print(output)

Input:
[[1. 2. 3.]
 [4. 5. 6.]]

Dropout Output:
[[1.42857143 2.85714286 4.28571429]
 [5.71428571 0.         0.        ]]


#Implement Dropout layer in neural network model after every fully connected layer

dataset used:

1. MNIST : A standard data set of handwritten digits;

2. Reuters-RCV1 : A collection of Reuters newswire articles.

3. CIFAR10 : a collection of color images. It contains 60,000 32 × 32 images in 10 different classes

In [None]:


import numpy as np
import tensorflow as tf
import wandb

# 1. Dropout Function (using TensorFlow's Dropout layer)
# We'll use tf.keras.layers.Dropout within the model definition

# 2. Neural Network with Dropout after every Fully Connected Layer
def create_model(dropout_rate=0.5, dataset="mnist"):
    """Creates a model with dropout after each dense layer."""
    model = tf.keras.models.Sequential()

    # Input layer (adjust based on dataset)
    if dataset == "mnist":
        model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
    elif dataset == "cifar10":
        model.add(tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
        model.add(tf.keras.layers.MaxPooling2D((2, 2)))
        model.add(tf.keras.layers.Conv2D(64, (3, 3), activation='relu'))
        model.add(tf.keras.layers.MaxPooling2D((2, 2)))
        model.add(tf.keras.layers.Conv2D(64, (3, 3), activation='relu'))
        model.add(tf.keras.layers.Flatten())
    elif dataset == "reuters":
        # Assuming preprocessed input with vocabulary size and max sequence length
        vocab_size = 10000
        max_length = 100
        model.add(tf.keras.layers.Embedding(vocab_size, 128, input_length=max_length))
        model.add(tf.keras.layers.GlobalAveragePooling1D())
    else:
        raise ValueError("Invalid dataset name. Choose from 'mnist', 'cifar10', or 'reuters'.")

    # Hidden layers with dropout
    model.add(tf.keras.layers.Dense(128, activation='relu'))
    model.add(tf.keras.layers.Dropout(dropout_rate))
    model.add(tf.keras.layers.Dense(64, activation='relu'))
    model.add(tf.keras.layers.Dropout(dropout_rate))

    # Output layer (adjust based on dataset)
    if dataset in ["mnist", "cifar10"]:
        model.add(tf.keras.layers.Dense(10, activation='softmax'))
    elif dataset == "reuters":
        num_classes = 46  # Replace with your number of classes
        model.add(tf.keras.layers.Dense(num_classes, activation='softmax'))

    return model

# 3. Weight Initialization Strategies
def initialize_weights(model, strategy="random", pretraining_path=None, threshold=None):
    if strategy == "random":
        # Already initialized randomly by default
        pass
    elif strategy == "pretraining":
        if pretraining_path:
            model.load_weights(pretraining_path)
        else:
            print("Error: Pretraining path not provided.")
    elif strategy == "threshold":
        # Clip weights to a threshold
        for layer in model.layers:
            if isinstance(layer, tf.keras.layers.Dense):  # Apply to Dense layers only
                weights = layer.get_weights()
                clipped_weights = [np.clip(w, -threshold, threshold) for w in weights]
                layer.set_weights(clipped_weights)
    else:
        print("Error: Invalid weight initialization strategy.")

# 4. Training and Visualization
def train_and_visualize(config):
    wandb.init(project="dropout-experiment", config=config)

    model = create_model(config['dropout_rate'], config['dataset'])
    initialize_weights(model, strategy=config.get('weight_init_strategy', 'random'),
                      threshold=config.get('weight_threshold'))  # Default to 'random'

    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])

    # Data loading (adjust based on dataset)
    if config['dataset'] == "mnist":
        (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
        x_train, x_test = x_train / 255.0, x_test / 255.0
    elif config['dataset'] == "cifar10":
        (x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
        x_train, x_test = x_train / 255.0, x_test / 255.0
    elif config['dataset'] == "reuters":
        # Load and preprocess Reuters data here

        raise NotImplementedError("Reuters data loading not implemented in this example.")
    else:
        raise ValueError("Invalid dataset name.")

    history = model.fit(x_train, y_train, epochs=config['epochs'],
                        validation_data=(x_test, y_test))

    # Log metrics to wandb
    for metric in history.history:
        for epoch, value in enumerate(history.history[metric]):
            wandb.log({metric: value}, step=epoch)

    wandb.finish()

# Example Configuration
config = {
    "dropout_rate": 0.5,
    "dataset": "mnist",  # or "cifar10" or "reuters"
    "epochs": 10,
    "weight_init_strategy": "random",  # or "pretraining" or "threshold"
    "weight_threshold": 1.0  # Only used if weight_init_strategy is "threshold"
}

train_and_visualize(config)

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33m24mcs120[0m ([33m24mcs120-national-institute-of-technology-hamirpur[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


  super().__init__(**kwargs)


[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
Epoch 1/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 6ms/step - accuracy: 0.7069 - loss: 0.8968 - val_accuracy: 0.9398 - val_loss: 0.1910
Epoch 2/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 6ms/step - accuracy: 0.9067 - loss: 0.3331 - val_accuracy: 0.9542 - val_loss: 0.1508
Epoch 3/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 6ms/step - accuracy: 0.9249 - loss: 0.2715 - val_accuracy: 0.9614 - val_loss: 0.1313
Epoch 4/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 6ms/step - accuracy: 0.9329 - loss: 0.2427 - val_accuracy: 0.9649 - val_loss: 0.1197
Epoch 5/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 7ms/step - accuracy: 0.9381 - loss: 0.2163 - val_accuracy: 0.9664 - val_loss: 0.1130
Epoch 6/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 6ms/step -

0,1
accuracy,▁▆▇▇▇█████
loss,▁
val_accuracy,▁
val_loss,▁

0,1
accuracy,0.94902
loss,0.17381
val_accuracy,0.9712
val_loss,0.10123


#conducting experiment on different configuration

StandardNeuralNet   -   Logistic   -   2layers     -      100 units

StandardNeuralNet   -   Logistic   -   2layers     -      800 units

DropoutNN           -       Logistic    -  3layers    -      1024 units

DropoutNN           -       ReLU     -    3layers     -   1024 units

Apply dropout at

1) randomly at all hidden layers with the probability p

2) after every fully connected layer with the probability p

3) input  and the first hidden layer  with a drop rate of 20%-50% in each  update cycle.

4) Dropconnect approach

5) Dropblock

6) Maxdropout

7)Biased droput

8) Flipover



In [None]:
import numpy as np
import tensorflow as tf
import wandb

# 1. Dropout Function (using TensorFlow's Dropout layer)
# We'll use tf.keras.layers.Dropout within the model definition

# 2. Neural Network with Dropout after every Fully Connected Layer
def create_model(dropout_rate=0.5, dataset="mnist", dropout_type="standard", num_layers=2, units=100, activation="sigmoid"):
    """Creates a model with different dropout options."""
    model = tf.keras.models.Sequential()

    # Input layer (adjust based on dataset)
    if dataset == "mnist":
        model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
    elif dataset == "cifar10":
        model.add(tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
        model.add(tf.keras.layers.MaxPooling2D((2, 2)))
        model.add(tf.keras.layers.Conv2D(64, (3, 3), activation='relu'))
        model.add(tf.keras.layers.MaxPooling2D((2, 2)))
        model.add(tf.keras.layers.Conv2D(64, (3, 3), activation='relu'))
        model.add(tf.keras.layers.Flatten())
    elif dataset == "reuters":
        # Assuming preprocessed input with vocabulary size and max sequence length
        vocab_size = 10000
        max_length = 100
        model.add(tf.keras.layers.Embedding(vocab_size, 128, input_length=max_length))
        model.add(tf.keras.layers.GlobalAveragePooling1D())
    else:
        raise ValueError("Invalid dataset name. Choose from 'mnist', 'cifar10', or 'reuters'.")

    # Hidden layers with dropout
    for _ in range(num_layers):
        model.add(tf.keras.layers.Dense(units, activation=activation))
        if dropout_type in ["standard", "all_hidden"]:
            model.add(tf.keras.layers.Dropout(dropout_rate))

    # Input and first hidden layer dropout
    if dropout_type == "input_first_hidden":
        model.layers[0] = tf.keras.layers.Dropout(dropout_rate)(model.layers[0].output)  # Input layer dropout
        model.layers[1] = tf.keras.layers.Dropout(dropout_rate)(model.layers[1].output)  # First hidden layer dropout

    # Output layer (adjust based on dataset)
    if dataset in ["mnist", "cifar10"]:
        model.add(tf.keras.layers.Dense(10, activation='softmax'))
    elif dataset == "reuters":
        num_classes = 46  # Replace with your number of classes
        model.add(tf.keras.layers.Dense(num_classes, activation='softmax'))

    return model

#  3. Weight Initialization Strategies
def initialize_weights(model, strategy="random", pretraining_path=None, threshold=None):
    if strategy == "random":
        # Already initialized randomly by default
        pass
    elif strategy == "pretraining":
        if pretraining_path:
            model.load_weights(pretraining_path)
        else:
            print("Error: Pretraining path not provided.")
    elif strategy == "threshold":
        # Clip weights to a threshold
        for layer in model.layers:
            if isinstance(layer, tf.keras.layers.Dense):  # Apply to Dense layers only
                weights = layer.get_weights()
                clipped_weights = [np.clip(w, -threshold, threshold) for w in weights]
                layer.set_weights(clipped_weights)
    else:
        print("Error: Invalid weight initialization strategy.")

# 4. Training and Visualization
def train_and_visualize(config):
    wandb.init(project="dropout-experiment", config=config)

    model = create_model(config['dropout_rate'], config['dataset'])
    initialize_weights(model, strategy=config.get('weight_init_strategy', 'random'),
                      threshold=config.get('weight_threshold'))  # Default to 'random'

    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])

    # Data loading (adjust based on dataset)
    if config['dataset'] == "mnist":
        (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
        x_train, x_test = x_train / 255.0, x_test / 255.0
    elif config['dataset'] == "cifar10":
        (x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
        x_train, x_test = x_train / 255.0, x_test / 255.0
    elif config['dataset'] == "reuters":
        # Load and preprocess Reuters data here

        raise NotImplementedError("Reuters data loading not implemented in this example.")
    else:
        raise ValueError("Invalid dataset name.")

    history = model.fit(x_train, y_train, epochs=config['epochs'],
                        validation_data=(x_test, y_test))

    # Log metrics to wandb
    for metric in history.history:
        for epoch, value in enumerate(history.history[metric]):
            wandb.log({metric: value}, step=epoch)

    wandb.finish()

In [None]:
# Example experiment with StandardNeuralNet, Logistic, 2 layers, 100 units
config = {
    "dropout_rate": 0.5,
    "dataset": "mnist",
    "epochs": 10,
    "dropout_type": "standard",  # or "input_first_hidden", "all_hidden", etc.
    "num_layers": 2,
    "units": 100,
    "activation": "sigmoid"  # Logistic activation
}

train_and_visualize(config)

Epoch 1/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 5ms/step - accuracy: 0.5557 - loss: 1.3327 - val_accuracy: 0.9148 - val_loss: 0.2949
Epoch 2/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 6ms/step - accuracy: 0.8707 - loss: 0.4485 - val_accuracy: 0.9307 - val_loss: 0.2331
Epoch 3/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 5ms/step - accuracy: 0.8984 - loss: 0.3508 - val_accuracy: 0.9434 - val_loss: 0.1902
Epoch 4/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - accuracy: 0.9114 - loss: 0.3091 - val_accuracy: 0.9489 - val_loss: 0.1702
Epoch 5/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 6ms/step - accuracy: 0.9215 - loss: 0.2770 - val_accuracy: 0.9530 - val_loss: 0.1523
Epoch 6/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 5ms/step - accuracy: 0.9273 - loss: 0.2536 - val_accuracy: 0.9548 - val_loss: 0.1428
Epoch 7/10

0,1
accuracy,▁▆▇▇▇█████
loss,▁
val_accuracy,▁
val_loss,▁

0,1
accuracy,0.93818
loss,0.21089
val_accuracy,0.9623
val_loss,0.11759


In [None]:
# Example experiment with StandardNeuralNet, Logistic, 2 layers, 100 units
config = {
    "dropout_rate": 0.5,
    "dataset": "cifar10",
    "epochs": 10,
    "dropout_type": "input_first_hidden",  # or "input_first_hidden", "all_hidden", etc.
    "num_layers": 2,
    "units": 100,
    "activation": "sigmoid"  # Change 'logistic' to 'sigmoid'
}

train_and_visualize(config)

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m170498071/170498071[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 0us/step
Epoch 1/10
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m84s[0m 52ms/step - accuracy: 0.1889 - loss: 2.1563 - val_accuracy: 0.4554 - val_loss: 1.4987
Epoch 2/10
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m81s[0m 52ms/step - accuracy: 0.4260 - loss: 1.5666 - val_accuracy: 0.5140 - val_loss: 1.3191
Epoch 3/10
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m82s[0m 52ms/step - accuracy: 0.5074 - loss: 1.3782 - val_accuracy: 0.5774 - val_loss: 1.1768
Epoch 4/10
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m88s[0m 56ms/step - accuracy: 0.5572 - loss: 1.2550 - val_accuracy: 0.6079 - val_loss: 1.0985
Epoch 5/10
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m142s[0m 57ms/step - accuracy: 0.5938 - loss: 1.1656 - val_accuracy: 0.6227 - val_loss: 1.0616
Epoch 6/10
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m137s[0m 5

0,1
accuracy,▁▄▅▆▆▇▇▇██
loss,▁
val_accuracy,▁
val_loss,▁

0,1
accuracy,0.69068
loss,0.90745
val_accuracy,0.6863
val_loss,0.90599


In [None]:
# Configurations
configs = [
    {
        "dropout_rate": 0,  # No dropout for StandardNeuralNet
        "dataset": "mnist",  # or "cifar10"
        "epochs": 10,
        "dropout_type": "standard",
        "num_layers": 2,
        "units": 100,
        "activation": "sigmoid",
        "experiment_name": "StandardNeuralNet_Logistic_2layers_100units"
    },
    {
        "dropout_rate": 0,  # No dropout for StandardNeuralNet
        "dataset": "mnist",  # or "cifar10"
        "epochs": 10,
        "dropout_type": "standard",
        "num_layers": 2,
        "units": 800,
        "activation": "sigmoid",
        "experiment_name": "StandardNeuralNet_Logistic_2layers_800units"
    },
    {
        "dropout_rate": 0.5,  # Adjust dropout rate as needed
        "dataset": "mnist",  # or "cifar10"
        "epochs": 10,
        "dropout_type": "standard",
        "num_layers": 3,
        "units": 1024,
        "activation": "sigmoid",
        "experiment_name": "DropoutNN_Logistic_3layers_1024units"
    },
    {
        "dropout_rate": 0.5,  # Adjust dropout rate as needed
        "dataset": "mnist",  # or "cifar10"
        "epochs": 10,
        "dropout_type": "standard",
        "num_layers": 3,
        "units": 1024,
        "activation": "relu",
        "experiment_name": "DropoutNN_ReLU_3layers_1024units"
    }
]

# Run experiments
for config in configs:
    train_and_visualize(config)

Epoch 1/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 5ms/step - accuracy: 0.7870 - loss: 0.8384 - val_accuracy: 0.9369 - val_loss: 0.2072
Epoch 2/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - accuracy: 0.9423 - loss: 0.1962 - val_accuracy: 0.9541 - val_loss: 0.1512
Epoch 3/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 4ms/step - accuracy: 0.9602 - loss: 0.1359 - val_accuracy: 0.9641 - val_loss: 0.1188
Epoch 4/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - accuracy: 0.9703 - loss: 0.0993 - val_accuracy: 0.9700 - val_loss: 0.0951
Epoch 5/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 6ms/step - accuracy: 0.9774 - loss: 0.0761 - val_accuracy: 0.9722 - val_loss: 0.0902
Epoch 6/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 4ms/step - accuracy: 0.9822 - loss: 0.0602 - val_accuracy: 0.9707 - val_loss: 0.0905
Epoch 7/10


0,1
accuracy,▁▅▆▇▇▇████
loss,▁
val_accuracy,▁
val_loss,▁

0,1
accuracy,0.9924
loss,0.02716
val_accuracy,0.9764
val_loss,0.07761


Epoch 1/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 5ms/step - accuracy: 0.7730 - loss: 0.8522 - val_accuracy: 0.9370 - val_loss: 0.2111
Epoch 2/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 6ms/step - accuracy: 0.9442 - loss: 0.1914 - val_accuracy: 0.9556 - val_loss: 0.1476
Epoch 3/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 5ms/step - accuracy: 0.9619 - loss: 0.1283 - val_accuracy: 0.9639 - val_loss: 0.1165
Epoch 4/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - accuracy: 0.9716 - loss: 0.0923 - val_accuracy: 0.9677 - val_loss: 0.1018
Epoch 5/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 6ms/step - accuracy: 0.9791 - loss: 0.0721 - val_accuracy: 0.9726 - val_loss: 0.0871
Epoch 6/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 5ms/step - accuracy: 0.9834 - loss: 0.0575 - val_accuracy: 0.9738 - val_loss: 0.0829
Epoch 7/10


0,1
accuracy,▁▅▆▇▇▇████
loss,▁
val_accuracy,▁
val_loss,▁

0,1
accuracy,0.99278
loss,0.02582
val_accuracy,0.9759
val_loss,0.08068


Epoch 1/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 6ms/step - accuracy: 0.5498 - loss: 1.3551 - val_accuracy: 0.9121 - val_loss: 0.2997
Epoch 2/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 5ms/step - accuracy: 0.8723 - loss: 0.4404 - val_accuracy: 0.9340 - val_loss: 0.2253
Epoch 3/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - accuracy: 0.8963 - loss: 0.3600 - val_accuracy: 0.9435 - val_loss: 0.1892
Epoch 4/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 6ms/step - accuracy: 0.9098 - loss: 0.3092 - val_accuracy: 0.9494 - val_loss: 0.1667
Epoch 5/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 5ms/step - accuracy: 0.9217 - loss: 0.2740 - val_accuracy: 0.9528 - val_loss: 0.1548
Epoch 6/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 6ms/step - accuracy: 0.9257 - loss: 0.2542 - val_accuracy: 0.9559 - val_loss: 0.1408
Epoch 7/10


0,1
accuracy,▁▆▇▇▇█████
loss,▁
val_accuracy,▁
val_loss,▁

0,1
accuracy,0.93718
loss,0.21363
val_accuracy,0.9618
val_loss,0.12082


Epoch 1/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 6ms/step - accuracy: 0.5530 - loss: 1.3299 - val_accuracy: 0.9172 - val_loss: 0.2950
Epoch 2/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 6ms/step - accuracy: 0.8723 - loss: 0.4363 - val_accuracy: 0.9323 - val_loss: 0.2246
Epoch 3/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 6ms/step - accuracy: 0.8980 - loss: 0.3540 - val_accuracy: 0.9414 - val_loss: 0.1926
Epoch 4/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - accuracy: 0.9090 - loss: 0.3090 - val_accuracy: 0.9488 - val_loss: 0.1667
Epoch 5/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 6ms/step - accuracy: 0.9183 - loss: 0.2779 - val_accuracy: 0.9534 - val_loss: 0.1518
Epoch 6/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 5ms/step - accuracy: 0.9248 - loss: 0.2523 - val_accuracy: 0.9552 - val_loss: 0.1454
Epoch 7/10

0,1
accuracy,▁▆▇▇▇█████
loss,▁
val_accuracy,▁
val_loss,▁

0,1
accuracy,0.93913
loss,0.20942
val_accuracy,0.9626
val_loss,0.11913


#OUTPUT ANALYSIS:

 if we change the dropout probabilities for different layers then,

**Generalization:** Dropout helps improve generalization by preventing overfitting. Varying dropout probabilities across layers can further fine-tune this effect.

**Early Layers:** Higher dropout probabilities in early layers can encourage the network to learn more robust features, as it forces it to rely on a wider range of inputs.

**Later Layers**: Lower dropout probabilities in later layers might help preserve important information for the final classification.

**Optimal Configuration:** The optimal configuration of dropout probabilities depends on the specific dataset and model architecture. Experimentation is key to finding the best settings.

#changing no. of epoch

analysis:

Qualitative Observations:

Generalization: Dropout is expected to improve generalization, especially with more epochs. Look for signs of this in the validation accuracy curves.


Overfitting: Dropout should help prevent overfitting. Compare the training and validation loss curves to see if the models with dropout show a smaller gap between these losses.
Quantitative Comparison:

Final Accuracy: Compare the final validation accuracy values of the models with and without dropout. Quantify the improvement (if any) achieved by using dropout.

Training Time: Observe if using dropout has any significant impact on the training time.

#What is the variance of the activations in each hidden layer when dropout is and is not applied

Without Dropout: The variance of activations might increase over time as the network learns and the activations become more specialized.

With Dropout: The variance of activations is expected to be lower and more stable due to the regularization effect of dropout. Dropout randomly sets activations to zero, preventing individual neurons from dominating and reducing the overall variance.

#Why is dropout not typically used at test time?

Dropout is a valuable technique for improving generalization during training, but it's typically not used at test time because:

We want consistent and predictable predictions.

We want to utilize the full capacity of the network.

We want to approximate the averaging effect of dropout

without introducing randomness.

We want deterministic predictions.

Instead of applying dropout directly during testing, the effect of dropout is incorporated by scaling the weights of the network. This ensures that the network behaves consistently and leverages the full knowledge it gained during training.

Compare the effects of using dropout and weight decay. What happens when dropout and weight decay are used at the same time? Are the results additive? Are there diminished returns (or worse)? Do they cancel each other out?

Dropout and weight decay are both effective regularization techniques that can prevent overfitting.

They work through different mechanisms: dropout reduces co-adaptation, while weight decay reduces model complexity.

When used together, their effects can be additive, leading to further improvements in generalization.
There might be cases of diminished returns or partial cancellation, but these are less common.

The optimal combination of dropout and weight decay needs to be determined empirically through experimentation.

#What happens if we apply dropout to the individual weights of the weight matrix rather than the activations?

This technique is known as DropConnect.

DropConnect vs. Dropout:

Dropout: Randomly sets activations (outputs of neurons) to zero during training.
DropConnect: Randomly sets individual weights in the weight matrix to zero during training.
Effects of DropConnect:

Sparsity: DropConnect introduces sparsity in the weight matrix, which can lead to a more compact and efficient model.

Regularization: Similar to dropout, DropConnect acts as a regularizer, preventing overfitting by reducing co-adaptation between neurons.

Ensemble Effect: DropConnect can be viewed as creating an ensemble of networks with different weight matrices, similar to how dropout creates an ensemble of networks with different neuron subsets. This ensemble effect can improve generalization performance.

Robustness: By randomly dropping connections between neurons, DropConnect forces the network to learn more robust features that are less sensitive to individual weights.