## CV_Assignment_4
1. What is the concept of cyclical momentum?
2. What callback keeps track of hyperparameter values (along with other data) during training?
3. In the color dim plot, what does one column of pixels represent?
4. In color dim, what does "poor teaching" look like? What is the reason for this?
5. Does a batch normalization layer have any trainable parameters?
6. In batch normalization during preparation, what statistics are used to normalize? What about during the validation process?
7. Why do batch normalization layers help models generalize better?
8. Explain between MAX POOLING and AVERAGE POOLING is number eight.
9. What is the purpose of the POOLING LAYER?
10. Why do we end up with Completely CONNECTED LAYERS?
11. What do you mean by PARAMETERS?
12. What formulas are used to measure these PARAMETERS?

In [None]:
'''Ans 1:- Cyclical momentum is a variation of the momentum
optimization technique used in training neural networks. Unlike
traditional momentum, which maintains a constant momentum coefficient
throughout training, cyclical momentum dynamically changes the
momentum coefficient during training in a cyclical pattern. This
allows the model to explore a wider range of learning rates and
momentums over time. Cyclical momentum can enhance optimization by
adapting to the changing dynamics of the loss landscape,
potentially leading to faster convergence and improved generalization.
It's often used in conjunction with cyclical learning rates.'''

In [None]:
'''Ans 2:- The TensorBoard callback is commonly used to keep track of
hyperparameter values and other training-related data. It allows for
real-time visualization and logging of various metrics such as loss,
accuracy, and learning rates during training. This visualization and
logging can help monitor the training process, fine-tune
hyperparameters, and identify issues in the model, making it a valuable
tool for deep learning practitioners.'''

In [None]:
'''Ans 3:-  In a color dimension plot, one column of pixels
represents the variation in color intensity or value for a specific
pixel position across the entire image. Each pixel in the column
corresponds to the same spatial position (e.g., the same x and y
coordinates) within the image but varies in color based on the pixel's
color channel values (e.g., Red, Green, Blue). This
representation is a visual way to observe how color information changes
vertically at a specific location in the image, offering insights
into color gradients and patterns.'''

In [None]:
'''Ans 4:- In a color dimension plot, "poor teaching" would manifest
as a lack of diversity and contrast in the colors represented
within a column of pixels. This means that the color values
across the entire column are similar or nearly uniform,
indicating a limited range of colors at that specific spatial
position. The reason for this could be an insufficient variety of
colors in the input data or ineffective training, which fails to
capture the complexity and diversity of color patterns in the
images, potentially hindering the model's ability to learn
meaningful features.'''

In [None]:
'''Ans 5:- Yes, a batch normalization layer in a neural network has
trainable parameters. It includes two learnable parameters per
feature/channel: scale (gamma) and shift (beta). These parameters are
learned during training to scale and shift the normalized
activations, allowing the network to adapt and maintain the desired
mean and variance for each feature, aiding in stable and
efficient training.'''

In [1]:
'''Ans 6:- During training (preparation), batch normalization
calculates the mean and variance of activations within the current
mini-batch to normalize the data. This mini-batch statistics helps
stabilize and speed up training. However, during the validation
process, the network is usually tested on individual examples or a
smaller batch without re-calculating batch statistics. Instead,
the running population statistics (accumulated during
training) are used for normalization. This ensures consistent
normalization behavior during inference and prevents overfitting to the
validation data.'''

import torch
import torch.nn as nn

# Batch size: 32, Channels: 64, Height: 28, Width: 28
sample_data = torch.randn(32, 64, 28, 28)  

# Define a neural network with batch normalization
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(64, 128, kernel_size=3)
        self.bn1 = nn.BatchNorm2d(128)
        self.fc = nn.Linear(128 * 26 * 26, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = torch.relu(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        return x

# Create an instance of the network
net = Net()

print(net)

Net(
  (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1))
  (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (fc): Linear(in_features=86528, out_features=10, bias=True)
)


In [None]:
'''Ans 7:- Batch normalization layers help models generalize
better for several reasons:-

1. Stabilizing Activation Distributions: By normalizing
activations, batch normalization reduces internal covariate shift
during training, making it easier for the model to learn and
converge.

2. Regularization Effect: The noise introduced by batch
normalization during training acts as a form of regularization, reducing
overfitting.

3. Allowing Higher Learning Rates: It enables the use of
higher learning rates, accelerating convergence without
instability.

4. Maintaining Consistent Statistics: During inference, batch
normalization uses accumulated statistics, ensuring consistent behavior
and improving generalization to unseen data.'''

In [13]:
'''Ans 8:- Max pooling and average pooling are two common pooling
operations in convolutional neural networks. In max pooling, the
maximum value within each pooling window is retained, highlighting
the most significant feature. In average pooling, the average
value is computed, giving a smoother representation. Max pooling
is effective for detecting key features, while average
pooling provides a more generalized representation, potentially
preserving more spatial information. The choice depends on the
specific task and desired trade-off between robustness and detail
retention.'''

import numpy as np
import tensorflow as tf

input_data = np.array([
    [1, 28, 3, 45],
    [15, 6, 77, 8],
    [7, 10, 55, 12],
    [113, 914, 15, 186]
], dtype=np.float32)

input_data = input_data.reshape((1, 4, 4, 1))

# Create a TensorFlow model with max pooling and average pooling layers
max_pooling_model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(4, 4, 1)),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='valid')
])

average_pooling_model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(4, 4, 1)),
    tf.keras.layers.AveragePooling2D(pool_size=(2, 2), strides=(2, 2), padding='valid')
])

# Perform pooling
max_pooled_data = max_pooling_model.predict(input_data)
average_pooled_data = average_pooling_model.predict(input_data)

print("Original Input Data:")
print(input_data.reshape(4, 4))
print("\nMax Pooled Output:")
print(max_pooled_data.reshape(2, 2))
print("\nAverage Pooled Output:")
print(average_pooled_data.reshape(2, 2))

Original Input Data:
[[  1.  28.   3.  45.]
 [ 15.   6.  77.   8.]
 [  7.  10.  55.  12.]
 [113. 914.  15. 186.]]

Max Pooled Output:
[[ 28.  77.]
 [914. 186.]]

Average Pooled Output:
[[ 12.5   33.25]
 [261.    67.  ]]


In [2]:
'''Ans 9:- The pooling layer in a convolutional neural network serves
several purposes:-

1. Downsampling: It reduces the spatial dimensions of the
feature maps, decreasing computational complexity.

2. Feature Selection: By selecting the most important
information within each pooling window (e.g., max pooling), it retains
key features while discarding less relevant details.

3. Translation Invariance: Pooling helps the network
recognize features regardless of their precise location in the
input.

4. Reduction of Overfitting: Pooling can act as a form of
regularization by reducing the model's sensitivity to noise and minor
variations in the data.

The output pooling with a 2x2 window and a stride of 2.
Here's how the max pooling operation works on the input data:

The input data is divided into non-overlapping 2x2 windows:-

Window 1: [[1, 2], [5, 6]]
Window 2: [[3, 4], [7, 8]]
Window 3: [[9, 10], [13, 14]]
Window 4: [[11, 12], [15, 16]]

For each window, the maximum value is extracted

Max of Window 1: 6
Max of Window 2: 8
Max of Window 3: 14
Max of Window 4: 16
'''

import numpy as np

# Sample 2D input data (4x4)
input_data = np.array([[1, 2, 3, 4],
                      [5, 6, 7, 8],
                      [9, 10, 11, 12],
                      [13, 14, 15, 16]])

# Define max pooling parameters (2x2 window, stride of 2)
window_size = (2, 2)
stride = 2

# Perform max pooling
output_data = []
for i in range(0, input_data.shape[0], stride):
    for j in range(0, input_data.shape[1], stride):
        window = input_data[i:i+window_size[0], j:j+window_size[1]]
        max_value = np.max(window)
        output_data.append(max_value)

# Reshape the output
output_data = np.array(output_data).reshape((2, 2))

print("Input Data:")
print(input_data)
print("\nMax Pooled Output:")
print(output_data)

Input Data:
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]

Max Pooled Output:
[[ 6  8]
 [14 16]]


In [4]:
'''Ans 10:- Fully connected layers, also known as dense layers, are
often used at the end of a neural network architecture to make
predictions or perform classification tasks. They connect every neuron
from the previous layer to each neuron in the current layer,
allowing the model to learn complex relationships in the data.

In this example, the Dense layers represent fully
connected layers, and input_dim and output_dim specify the input and
output dimensions, respectively. These layers enable the model to
learn from the extracted features and make predictions for
various tasks, such as image classification or regression.

We import the necessary libraries from Keras. Create a
sequential model using Sequential(). Add dense layers with different
configurations, specifying the number of units and activation functions.
Finally, we use model.summary() to print the architecture of the
model, which includes the layer names, output shapes, and the
number of trainable parameters.'''

from keras.models import Sequential
from keras.layers import Dense

# Create a Sequential model
model = Sequential()

input_dim = 64 
output_dim = 10 

# Add a dense layer with 128 units and ReLU activation function
model.add(Dense(128, activation='relu', input_shape=(input_dim,)))

# Add another dense layer with 64 units and ReLU activation function
model.add(Dense(64, activation='relu'))

# Add a final dense layer with the desired output units and activation function
model.add(Dense(output_dim, activation='softmax'))

model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 128)               8320      
                                                                 
 dense_1 (Dense)             (None, 64)                8256      
                                                                 
 dense_2 (Dense)             (None, 10)                650       
                                                                 
Total params: 17226 (67.29 KB)
Trainable params: 17226 (67.29 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [5]:
'''ANs 11:- In a neural network, parameters are the internal variables
that the model learns from the training data. They include
weights and biases associated with each neuron in the network.
These parameters are adjusted during training to minimize the
loss function, enabling the model to make accurate predictions.
Here's a code example illustrating parameters in a simple linear
regression model.

In this example, model.parameters() returns a list of
parameters, which includes the weight matrix and bias term of the
linear regression model.'''

import torch.nn as nn

# linear regression model
model = nn.Linear(in_features=1, out_features=1)

# Parameters (weights and bias) of the model
params = list(model.parameters())
print(params)

[Parameter containing:
tensor([[0.5465]], requires_grad=True), Parameter containing:
tensor([0.5374], requires_grad=True)]


In [9]:
'''Ans 12:- In a neural network, parameters are measured using
various formulas during training. The primary formulas include:-

1. Weight Update: Parameters are updated using gradient
descent or its variants, where the new parameter value is computed
by subtracting the gradient of the loss function with respect
to the parameter.

2. Bias Update: Similarly, bias terms are updated using gradient descent.

In this code, the optimizer updates the parameters
(model.parameters()) based on the gradients computed during the backpropagation
step, which follows the loss function. The specific formulas
used depend on the optimization algorithm employed.'''

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

# Sample data
np.random.seed(0)
input_data = torch.tensor(np.random.rand(100, 2), dtype=torch.float32)
target_data = (2 * input_data[:, 0] + 3 * input_data[:, 1] + 1).view(-1, 1).detach()

# Define a simple linear regression model
model = nn.Linear(in_features=2, out_features=1)

# Define a loss function (mean squared error)
loss_fn = nn.MSELoss()

# Define an optimizer (stochastic gradient descent)
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Define the number of training epochs
num_epochs = 1000

# Training loop
for epoch in range(num_epochs):
    # Forward pass
    predictions = model(input_data)
    
    # Compute the loss
    loss = loss_fn(predictions, target_data)
    
    # Backpropagation to compute gradients
    optimizer.zero_grad()
    loss.backward()
    
    # Update parameters
    optimizer.step()
    
    # Print the loss at each epoch
    if (epoch + 1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item()}')

# Print the final trained parameters
print('Trained Parameters:')
for name, param in model.named_parameters():
    if param.requires_grad:
        print(f'{name}: {param.data.numpy()}')

Epoch [100/1000], Loss: 0.30929169058799744
Epoch [200/1000], Loss: 0.2134174406528473
Epoch [300/1000], Loss: 0.16733044385910034
Epoch [400/1000], Loss: 0.13184557855129242
Epoch [500/1000], Loss: 0.10433553904294968
Epoch [600/1000], Loss: 0.08289802819490433
Epoch [700/1000], Loss: 0.06610986590385437
Epoch [800/1000], Loss: 0.052900925278663635
Epoch [900/1000], Loss: 0.04246228560805321
Epoch [1000/1000], Loss: 0.03417882323265076
Trained Parameters:
weight: [[1.6925193 2.4134233]]
bias: [1.4666103]
