## Non-programming Assignment

### Q1. What is Hadamard matrix product?

Answer:

The Hadamard matrix product, also known as the Hadamard product or element-wise product, is a mathematical operation performed on two matrices or arrays of the same shape. In this operation, each element of one matrix is multiplied by the corresponding element of the other matrix. The result is a new matrix or array of the same shape, where each element at position (i, j) is the product of the elements at the same position in the original matrices.

Mathematically, if you have two matrices A and B of the same shape, the Hadamard product (denoted by ⊙) is calculated as follows:

![image](1.png)

The Hadamard product is different from matrix multiplication (dot product), where the elements of the resulting matrix are computed based on a combination of elements from both input matrices using summation. In contrast, the Hadamard product operates element-wise and is used in various mathematical and computational applications, including image processing, element-wise operations in neural networks, and more.

### Q2. Describe matrix multiplication? 

Answer:

Matrix multiplication, also known as matrix product, is a fundamental operation in linear algebra used to combine two matrices to produce a third matrix. Matrix multiplication is defined for matrices of compatible dimensions, where the number of columns in the first matrix must be equal to the number of rows in the second matrix. If matrices 
A and B are being multiplied to produce matrix C, the dimensions must satisfy the condition: 
A is of shape m×n, and B is of shape n×p, where m, n, and p are positive integers.
The resulting matrix C will have dimensions m×p. Each element C[i][j] of the resulting matrix is calculated as the sum of the products of elements from the corresponding rows of A and columns of B:

![image](2.png)

Here's a step-by-step description of how matrix multiplication is performed:

1. Take the element in the ith row and jth column of matrix C.
2. Multiply each element in the ith row of matrix A by the corresponding element in the jth column of matrix B.
3. Sum all the products obtained in step 2 to calculate the value of C[i][j].

Matrix multiplication is not commutative, meaning that the order in which matrices are multiplied matters. In other words, A⋅B is not necessarily equal to B⋅A unless special conditions are met. It is an essential operation in various mathematical and scientific applications, including solving systems of linear equations, transformations, computer graphics, and machine learning, particularly in neural networks and deep learning.

### Q3. What is transpose matrix and vector?

Answer:

A transpose matrix, also known as the transpose of a matrix, is obtained by interchanging the rows and columns of the original matrix. If you have a matrix A of dimensions m×n, the transpose of A, also denoted as A^T or A', will have dimensions n×m, and its elements will be arranged such that the rows of the original matrix become columns in the transposed matrix, and vice versa.

For each element A[i][j] in the original matrix A, the corresponding element in the transposed matrix A' is [j][i]. In other words, the element at row i and column j in A becomes the element at row j and column i in A'.

Here's a simple example to illustrate matrix transposition:

![image](3.png)

Matrix transposition is a fundamental operation in linear algebra with various applications in mathematics, physics, engineering, and computer science.

A transposed vector is a vector that has been transposed into a matrix format. In the context of vectors, a vector is typically represented as a column vector, meaning it is a single column of numbers. Transposing a vector means converting it into a row vector, which is a single row of numbers. This transformation is done by changing the orientation of the vector without altering its elements.

For example, consider the following column vector v:

1 \\
2 \\
3
\end{bmatrix}\]
The transposed vector \(v^T\) is a row vector:
\[v^T = \begin{bmatrix}
1 & 2 & 3
\end{bmatrix}\]
The notation \(v^T\) indicates that the vector \(v\) has been transposed into a row vector.
Transpose operations are commonly used in linear algebra, matrix calculations, and various mathematical and computational applications. They play a crucial role in transformations, solving systems of linear equations, and many other mathematical operations.

# Q4. Describe loss (cost or error) function in neural network

Answer:

In a neural network, a loss function, also known as a cost function or error function, is a mathematical function that quantifies the difference between the predicted output (or activations) of the network and the actual target values for a given set of input data. The primary purpose of a loss function is to measure how well or poorly the neural network is performing in terms of its ability to make accurate predictions.

The choice of the appropriate loss function depends on the type of task the neural network is designed for. Different tasks, such as classification, regression, or generative modeling, require different types of loss functions. Here are some common types of loss functions used in neural networks:

1. Mean Squared Error (MSE) Loss:

Used for regression tasks.
Calculates the average of the squared differences between the predicted values and the actual target values.

![image](4.png)

where y of i is the actual target, ^y of i is the predicted value, and n is the number of samples.

2. Cross-Entropy Loss (Log Loss):

Used for classification tasks, especially in binary and multiclass classification.
Measures the dissimilarity between the predicted class probabilities and the true class labels.
It is often used with softmax activation in the output layer.

![image](5.png)

where y of i is the true class probability, ^y of i is the predicted class probability, and C is the number of classes.

3. Hinge Loss (SVM Loss):

Used in support vector machines (SVMs) and binary classification tasks.
Encourages the correct class score to be higher than the sum of the incorrect class scores by a certain margin.

![image](6.png)

where y is the true class label (-1 or 1), and ^y is the predicted score.

4. Kullback-Leibler (KL) Divergence Loss:

Used in probabilistic models and variational autoencoders (VAEs).
Measures the difference between two probability distributions, such as a learned distribution and a target distribution.

![image](7.png)

where p of i is the true distribution, and q of i is the predicted distribution.

The goal during training is to minimize the value of the loss function. This optimization process, often referred to as backpropagation, involves adjusting the neural network's parameters (weights and biases) using gradient descent or other optimization algorithms. The choice of the loss function and optimization method depends on the specific problem being solved and the architecture of the neural network.

In summary, a loss function quantifies the error or discrepancy between the neural network's predictions and the actual target values, guiding the training process to improve the model's accuracy and performance.

# Q5. Describe the foundations of neural network supervised training.

Answer:

The foundations of supervised training in neural networks form the basis for how these models learn from data to make predictions. Supervised training is a process where the neural network is provided with a labeled dataset, consisting of input data and corresponding target labels or values. The goal is for the network to learn a mapping from inputs to outputs that can generalize to unseen data. Here are the key components of supervised training in neural networks:

Dataset Preparation:

In supervised learning, you start with a labeled dataset that consists of input samples (features) and their corresponding target labels or values.
The dataset is typically divided into two subsets: a training set used for training the model and a validation or test set used to evaluate the model's performance.
Neural Network Architecture:

You define the architecture of the neural network, which includes the number and structure of layers, the types of activation functions, and the number of neurons or units in each layer.
The architecture depends on the specific problem you are trying to solve (e.g., classification, regression) and the complexity of the data.
Initialization:

Neural network parameters, including weights and biases, are initialized with small random values or specific techniques like Xavier/Glorot initialization.
Proper initialization helps the network start learning effectively and avoids getting stuck in local minima.
Forward Propagation:

During training, input data is passed through the neural network in a forward direction.
Each neuron computes a weighted sum of its inputs, applies an activation function, and passes the result to the next layer.
The final layer's output represents the predicted values or class probabilities.
Loss Function:

A loss function (also known as a cost function or error function) quantifies the difference between the predicted outputs and the actual target labels.
Common loss functions include mean squared error (MSE) for regression and cross-entropy loss for classification.
Backpropagation:

After forward propagation, the network calculates the loss.
Backpropagation is the process of computing the gradients of the loss with respect to the model's parameters (weights and biases) using the chain rule.
These gradients guide the optimization algorithm to update the parameters in a way that reduces the loss.
Optimization Algorithm:

An optimization algorithm (e.g., gradient descent, Adam, RMSprop) adjusts the model's parameters to minimize the loss.
The learning rate determines the step size for parameter updates, and other hyperparameters may influence the optimization process.
Training Iterations (Epochs):

The training process consists of multiple iterations called epochs.
During each epoch, the entire training dataset is processed by the network, and the parameters are updated.
The number of epochs is a hyperparameter that affects how long the training process continues.
Validation and Testing:

Periodically, the model's performance is evaluated on a separate validation or test dataset that it has not seen during training.
Metrics such as accuracy, mean squared error, or others are used to assess the model's generalization to new data.
Early Stopping:

To prevent overfitting, you can implement early stopping, where training is halted when the validation loss starts increasing.
Model Deployment:

Once the model achieves satisfactory performance on the validation set, it can be deployed to make predictions on new, unseen data.
In summary, supervised training in neural networks involves setting up the network architecture, initializing parameters, propagating data forward through the network, computing a loss, backpropagating gradients, and iteratively optimizing the model. The process continues until the model converges to a solution or meets predefined stopping criteria. The key is to find a balance between model complexity and generalization to achieve the best performance on unseen data.

# Q6. Describe forward propagation and backpropagation.

Answer:

Forward propagation and backpropagation are fundamental processes in training neural networks. They are part of the supervised learning framework and are used to update the model's parameters (weights and biases) based on the provided input data and target labels. Here's a detailed description of both forward propagation and backpropagation:

Forward Propagation:

Input Data: Forward propagation begins with the input data. Each input sample is represented as a feature vector, and these vectors are fed into the neural network.

Layer Computation: The input data is passed through the network layer by layer, from the input layer to the output layer.

For each layer, the following steps are performed:
Weighted Sum: The input data is linearly transformed by computing the weighted sum of the inputs. Each neuron in the layer has its set of weights, and this sum includes the weights and a bias term.
Activation Function: The weighted sum is passed through an activation function, which introduces non-linearity into the network. Common activation functions include ReLU, sigmoid, and tanh.
Output: The result of the activation function becomes the output of the current layer and serves as input to the next layer.
Final Output: After propagating through all layers, the final output of the neural network is obtained. This output can represent predictions for tasks like classification or regression.

Loss Calculation: A loss function (also known as a cost function) is used to compute the error between the predicted output and the actual target labels. The loss function quantifies how far off the predictions are from the truth.

Backpropagation:

Gradient Calculation: Backpropagation starts with the calculation of gradients of the loss with respect to the model's parameters (weights and biases). These gradients measure how much the loss would change if each parameter were adjusted slightly.

Error Propagation: The gradients are propagated backward through the network, starting from the output layer and moving toward the input layer. This is done using the chain rule of calculus.

For each layer, the following steps are performed:
Gradient of Activation: The gradient of the activation function is computed for the layer's output. This measures how sensitive the output is to small changes.
Error Propagation: The gradients from the previous layer and the gradient of activation are combined to calculate the gradients for the weights and biases in the current layer.
Parameter Update: The model's parameters (weights and biases) are updated using the computed gradients. The learning rate determines the step size of these updates.
Iterative Process: Steps 1 and 2 are repeated for each training sample in the dataset (or a mini-batch) during the training process. This process is known as stochastic gradient descent (SGD).

Epochs: The entire dataset is passed through the network multiple times (epochs), allowing the model to learn from the data and gradually reduce the loss.

Optimization Algorithm: The optimization algorithm (e.g., gradient descent, Adam) specifies how the parameters are updated based on the gradients. It can include techniques like momentum and adaptive learning rates.

Convergence: The training process continues until a stopping criterion is met, such as a maximum number of epochs, early stopping based on validation loss, or achieving a desired level of accuracy.

In summary, forward propagation computes predictions based on input data, and backpropagation calculates gradients to update the model's parameters. These processes are repeated iteratively until the neural network converges to a state where the loss is minimized, and the model can make accurate predictions on new, unseen data.

## Programming Assignment

In [1]:
# Activation class
# Define activation functions (e.g., sigmoid, ReLU)

import numpy as np

class Activation:
    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

    def relu(self, x):
        return np.maximum(0, x)

In [2]:
# Neuron class
# Define neuron properties and operations

class Neuron:
    def __init__(self, input_size, activation_func):
        self.weights = np.random.randn(input_size)
        self.bias = np.random.randn()
        self.activation_func = activation_func

    def forward(self, inputs):
        weighted_sum = np.dot(inputs, self.weights) + self.bias
        return self.activation_func(weighted_sum)

In [3]:
# Layer class
# Define a layer of neurons and manage forward/backward passes

class Layer:
    def __init__(self, input_size, output_size, activation_func):
        self.neurons = [Neuron(input_size, activation_func) for _ in range(output_size)]

    def forward(self, inputs):
        return [neuron.forward(inputs) for neuron in self.neurons]

In [4]:
# Parameters class
# Store and manage weights and biases

class Parameters:
    def __init__(self):
        self.weights = []
        self.biases = []

In [5]:
# Model class
# Define the neural network architecture and methods for forward/backward passes

class Model:
    def __init__(self):
        self.layers = []

    def add_layer(self, layer):
        self.layers.append(layer)

    def forward(self, inputs):
        for layer in self.layers:
            inputs = layer.forward(inputs)
        return inputs

In [6]:
# LossFunction class
# Implement loss functions (e.g., MSE, Cross-Entropy)

class LossFunction:
    def mean_squared_error(self, y_true, y_pred):
        return np.mean((y_true - y_pred) ** 2)

In [7]:
# ForwardProp class
# Implement forward propagation logic

class ForwardProp:
    def forward_pass(self, model, inputs):
        return model.forward(inputs)

In [8]:
# BackProp class
# Implement backpropagation logic

class BackProp:
    def backward_pass(self, model, loss_func, inputs, targets, learning_rate):
        # Backpropagation logic to update weights and biases
        # Calculate gradients and update weights and biases
        for layer in reversed(model.layers):
            # Calculate gradients
            gradients = []
            for neuron in layer.neurons:
                gradient = (neuron.activation_func(neuron.weights.dot(inputs) + neuron.bias) - targets) * \
                           neuron.activation_func(neuron.weights.dot(inputs) + neuron.bias) * \
                           (1 - neuron.activation_func(neuron.weights.dot(inputs) + neuron.bias))
                gradients.append(gradient)

            # Update weights and biases
            for i, neuron in enumerate(layer.neurons):
                neuron.weights -= learning_rate * gradients[i] * inputs
                neuron.bias -= learning_rate * gradients[i]

            inputs = [neuron.activation_func(neuron.weights.dot(inputs) + neuron.bias) for neuron in layer.neurons]

In [10]:
# GradDescent class
# Implement gradient descent or other optimization algorithms

class GradDescent:
    def stochastic_gradient_descent(self, model, loss_func, inputs, targets, learning_rate, epochs):
        for _ in range(epochs):
            # Forward pass
            predictions = ForwardProp().forward_pass(model, inputs) 

            # Calculate loss
            loss = loss_func.mean_squared_error(targets, predictions)

            # Backward pass and weight updates
            BackProp().backward_pass(model, loss_func, inputs, targets, learning_rate) 

In [19]:
class Training:
    def train(self, model, loss_func, optimizer, inputs, targets, learning_rate, epochs):
        for _ in range(epochs):
            predictions = ForwardProp().forward_pass(model, inputs)
            loss = loss_func.mean_squared_error(targets, predictions)
            print(f"Epoch {_ + 1}/{epochs}, Loss: {loss}")
            BackProp().backward_pass(model, loss_func, inputs, targets, learning_rate)

if __name__ == "__main__":
    # Create and train the neural network
    input_size = 3
    output_size = 2
    activation_func = Activation()
    
    model = Model()
    model.add_layer(Layer(input_size, 4, activation_func.sigmoid))
    model.add_layer(Layer(4, output_size, activation_func.sigmoid))
    
    inputs = np.array([0.1, 0.2, 0.3])
    targets = np.array([0.4, 0.5])
    
    optimizer = ForwardProp()
    loss_func = LossFunction()
    trainer = Training()
    
    trainer.train(model, loss_func, optimizer, inputs, targets, learning_rate=0.01, epochs=100)
    
    # Use the trained model for predictions
    test_inputs = np.array([0.2, 0.3, 0.4])
    predictions = model.forward(test_inputs)
    print("Predictions:", predictions)

Epoch 1/100, Loss: 0.0222645868202835


ValueError: shapes (4,) and (3,) not aligned: 4 (dim 0) != 3 (dim 0)

## Class hierarchy design and software development

So, now let's discuss the class hierarchy design and software development for the neural network implementation we have provided.

Class Hierarchy Design:

#### Activation:

This class represents activation functions such as sigmoid, ReLU, etc.
Provides methods for various activation functions.

#### Neuron:

Represents an individual neuron in a neural network.
Has attributes for weights, bias, and an activation function.
Performs the forward pass for a single neuron.

#### Layer:

Represents a layer of neurons.
Contains multiple Neuron instances.
Allows the creation of hidden layers in the neural network.

#### Parameters:

Could be used to store and manage weights and biases for the entire network.
Currently not implemented in the provided code.

#### Model:

Represents the neural network architecture.
Contains multiple Layer instances.
Performs the forward pass for the entire network.

#### LossFunction:

Represents the loss or error function used to measure the network's performance.
Provides methods for calculating loss, e.g., mean squared error.

#### ForwardProp:

Implements the forward propagation logic for the network.
Propagates inputs through the layers to generate predictions.

#### BackProp:

Implements the backpropagation logic for updating weights and biases.
Calculates gradients and updates neuron parameters.

#### GradDescent:

Currently not implemented in the provided code.
Typically used for gradient descent or other optimization algorithms.

#### Training:

Manages the training process, including data loading and training loops.
Uses forward and backward propagation to update model parameters.
Software Development:
Object-Oriented Approach:

The code follows an object-oriented approach, which is a good practice for organizing and encapsulating functionality.

#### Modular Design:

Each class is designed to have a specific responsibility, making the code modular and easier to maintain.

#### Flexibility:

The design allows flexibility in choosing activation functions, network architecture, and loss functions.

#### Training Loop:

The Training class manages the training process, including the forward and backward passes.
It iterates over epochs, calculates loss, and updates model parameters.
Extensibility:

You can easily extend this code by adding new activation functions, loss functions, or optimization algorithms.

#### Debugging:

The modular structure makes it easier to debug and test individual components of the neural network.

#### Usage Example:

The code includes a usage example in the main section, demonstrating how to create and train a neural network.
Overall, the provided class hierarchy design and software development approach are well-structured and suitable for building and training neural networks.