## Fundamental Principles, Assumptions, and Equations Involved

A neural network is a computational model inspired by the way biological neural networks in the human brain process information. It consists of interconnected nodes (neurons) organized in layers: an input layer, one or more hidden layers, and an output layer. The fundamental principles of a neural network involve the following components and steps:

- *Neurons and Layers*:
  - Each neuron receives one or more inputs, processes them, and passes the output to the next layer.
  - The neurons are organized in layers: input layer, hidden layers, and output layer.
  
- *Weights and Biases*:
  - Each connection between neurons has a weight that determines the strength and direction of the connection.
  - Each neuron also has an associated bias that shifts the activation function.
  
- *Activation Function*:
  - The activation function introduces non-linearity into the model. Common activation functions include the sigmoid, tanh, and ReLU (Rectified Linear Unit).
  
- *Forward Propagation*:
  - In forward propagation, inputs are passed through the network layer by layer to generate the output.
  
- *Loss Function*:
  - The loss function quantifies the difference between the predicted output and the true output. For binary classification, the cross-entropy loss function is commonly used.
  
- *Backpropagation and Gradient Descent*:
  - Backpropagation calculates the gradient of the loss function with respect to each weight by the chain rule, layer by layer backward from the output layer to the input layer.
  - Gradient descent updates the weights to minimize the loss function.

## Mathematical Equations

### Forward Propagation

For a neural network with one hidden layer:

$$
\mathbf{z}^{(1)} = \mathbf{X} \mathbf{W}^{(1)} + \mathbf{b}^{(1)}
$$

$$
\mathbf{a}^{(1)} = \sigma(\mathbf{z}^{(1)})
$$

$$
\mathbf{z}^{(2)} = \mathbf{a}^{(1)} \mathbf{W}^{(2)} + \mathbf{b}^{(2)}
$$

$$
\mathbf{a}^{(2)} = \sigma(\mathbf{z}^{(2)})
$$

where:
- $\mathbf{X}$ is the input matrix.
- $\mathbf{W}^{(1)}$ and $\mathbf{W}^{(2)}$ are weight matrices for the hidden and output layers, respectively.
- $\mathbf{b}^{(1)}$ and $\mathbf{b}^{(2)}$ are bias vectors for the hidden and output layers, respectively.
- $\sigma$ is the activation function (e.g., sigmoid function).

### Cross-Entropy Loss

$$
L(\mathbf{y}, \mathbf{\hat{y}}) = -\frac{1}{N} \sum_{i=1}^{N} \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right]
$$

where:
- $\mathbf{y}$ is the true label.
- $\mathbf{\hat{y}}$ is the predicted label.
- $N$ is the number of samples.

### Backpropagation

$$
\delta^{(2)} = \mathbf{a}^{(2)} - \mathbf{y}
$$

$$
\frac{\partial L}{\partial \mathbf{W}^{(2)}} = \mathbf{a}^{(1)T} \delta^{(2)}
$$

$$
\frac{\partial L}{\partial \mathbf{b}^{(2)}} = \sum \delta^{(2)}
$$

$$
\delta^{(1)} = (\delta^{(2)} \mathbf{W}^{(2)}) \sigma'(\mathbf{z}^{(1)})
$$

$$
\frac{\partial L}{\partial \mathbf{W}^{(1)}} = \mathbf{X}^T \delta^{(1)}
$$

$$
\frac{\partial L}{\partial \mathbf{b}^{(1)}} = \sum \delta^{(1)}
$$

## How the Model Learns from Data and Makes Predictions

1. *Initialization*:
   - Initialize weights and biases randomly or using a specific initialization method.

2. *Forward Pass*:
   - Pass input data through the network to get predictions.

3. *Compute Loss*:
   - Calculate the loss using the cross-entropy loss function.

4. *Backward Pass*:
   - Perform backpropagation to compute gradients of the loss with respect to weights and biases.

5. *Update Weights*:
   - Update weights and biases using gradient descent or other optimization algorithms.

6. *Repeat*:
   - Repeat steps 2-5 for a number of epochs or until convergence.


Importing important libraries

In [72]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

Functions to calculate sigmoid and its derivative

In [73]:
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def sigmoid_derivative(z):
    return sigmoid(z) * (1 - sigmoid(z))

Function to initialise the weights and bias with random values

In [74]:
def initialize_parameters(input_dim, hidden_dim, output_dim):
    np.random.seed(42)
    W1 = np.random.randn(hidden_dim, input_dim) * 0.01
    b1 = np.zeros((hidden_dim, 1))
    W2 = np.random.randn(output_dim, hidden_dim) * 0.01
    b2 = np.zeros((output_dim, 1))
    return W1, b1, W2, b2

Function for forward propagation

In [75]:
def forward_propagation(X, W1, b1, W2, b2):
    Z1 = np.dot(W1, X) + b1
    A1 = np.tanh(Z1)
    Z2 = np.dot(W2, A1) + b2
    A2 = sigmoid(Z2)
    return Z1, A1, Z2, A2

Function to compute the loss

In [76]:
def compute_loss(A2, Y):
    m = Y.shape[1]
    # Ensure Y has the same shape as A2 for element-wise multiplication
    Y = Y.reshape(A2.shape)  # reshape Y to match the shape of A2
    logprobs = np.multiply(np.log(A2), Y) + np.multiply(np.log(1 - A2), 1 - Y)
    loss = -np.sum(logprobs) / m
    return loss

Function for backward propagation

In [77]:
def backward_propagation(X, Y, Z1, A1, Z2, A2, W2):
    m = X.shape[1]
    Y = Y.reshape(A2.shape) 
    dZ2 = A2 - Y
    dW2 = np.dot(dZ2, A1.T) / m
    db2 = np.sum(dZ2, axis=1, keepdims=True) / m
    dZ1 = np.dot(W2.T, dZ2) * (1 - np.power(A1, 2))
    dW1 = np.dot(dZ1, X.T) / m
    db1 = np.sum(dZ1, axis=1, keepdims=True) / m
    return dW1, db1, dW2, db2


Function to update the weights and bias to minimise loss

In [78]:
def update_parameters(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate):
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2
    return W1, b1, W2, b2

Reading the csv file into a dataframe

In [79]:
file_path="C:/Users/DELL/Downloads/data2_train.csv"
df=pd.read_csv(file_path)

In [80]:
X = df[['Feature_1', 'Feature_2']].values
y = df[['Target']].values.T

Split the dataset into training and testing sets

In [81]:
X_train, X_test, y_train, y_test = train_test_split(X, y.T, test_size=0.3, random_state=42)
X_train, X_test = X_train.T, X_test.T  # Transpose to be (features, samples)
y_train, y_test = y_train.T, y_test.T  # Transpose to be (1, samples)

Standardize the features

In [82]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train.T).T  
X_test = scaler.transform(X_test.T).T

Initialize network parameters

In [83]:
input_dim = X_train.shape[0]
hidden_dim = 10
output_dim = 1
W1, b1, W2, b2 = initialize_parameters(input_dim, hidden_dim, output_dim)

In [84]:
lr = [0.01,0.05,0.005]
no_iters = [10000, 20000, 30000]
max_acc=0

Tuning of the hyperparameters (learning rate and number of iterations) and calculating accuracy

In [85]:
for learning_rate in lr:
    for num_iterations in no_iters:
        for i in range(num_iterations):
            Z1, A1, Z2, A2 = forward_propagation(X_train, W1, b1, W2, b2)
            loss = compute_loss(A2, y_train)
            dW1, db1, dW2, db2 = backward_propagation(X_train, y_train, Z1, A1, Z2, A2, W2)
            W1, b1, W2, b2 = update_parameters(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate)
        
            #if i % 1000 == 0:
               # print(f'Iteration {i}, Loss: {loss:.4f}')
        # Predict on test set
        _, _, _, A2_test = forward_propagation(X_test, W1, b1, W2, b2)
        predictions = (A2_test > 0.5).astype(int)
        
        # Calculate accuracy
        accuracy = np.mean(predictions == y_test)
        if accuracy>max_acc:
            max_acc=accuracy
            final_lr=learning_rate
            final_no_iters=num_iterations
        print(f'Accuracy for learning rate={learning_rate} and no. of iterations={num_iterations}: {accuracy * 100:.2f}%')

Accuracy for learning rate=0.01 and no. of iterations=10000: 99.17%
Accuracy for learning rate=0.01 and no. of iterations=20000: 99.17%
Accuracy for learning rate=0.01 and no. of iterations=30000: 99.17%
Accuracy for learning rate=0.05 and no. of iterations=10000: 99.17%
Accuracy for learning rate=0.05 and no. of iterations=20000: 99.17%
Accuracy for learning rate=0.05 and no. of iterations=30000: 99.17%
Accuracy for learning rate=0.005 and no. of iterations=10000: 99.17%
Accuracy for learning rate=0.005 and no. of iterations=20000: 99.17%
Accuracy for learning rate=0.005 and no. of iterations=30000: 99.17%


Implementing the neural networks on training and testing data and calculating accuracy

In [86]:
file_path="C:/Users/DELL/Downloads/data2_train.csv"
df=pd.read_csv(file_path)
X_train = df[['Feature_1', 'Feature_2']].values.T
y_train = df[['Target']].values.T

file_path="C:/Users/DELL/Downloads/data2_test.csv"
df=pd.read_csv(file_path)
X_test = df[['Feature_1', 'Feature_2']].values.T
y_test = df[['Target']].values.T

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train.T).T# Standardize and transpose back
X_test = scaler.transform(X_test.T).T

# Initialize network parameters
input_dim = X_train.shape[0]
hidden_dim = 10
output_dim = 1
W1, b1, W2, b2 = initialize_parameters(input_dim, hidden_dim, output_dim)

for i in range(final_no_iters):
    Z1, A1, Z2, A2 = forward_propagation(X_train, W1, b1, W2, b2)
    loss = compute_loss(A2, y_train)
    dW1, db1, dW2, db2 = backward_propagation(X_train, y_train, Z1, A1, Z2, A2, W2)
    W1, b1, W2, b2 = update_parameters(W1, b1, W2, b2, dW1, db1, dW2, db2, final_lr)

    #if i % 1000 == 0:
     #   print(f'Iteration {i}, Loss: {loss:.4f}')

# Predict on test set
_, _, _, A2_test = forward_propagation(X_test, W1, b1, W2, b2)
predictions = (A2_test > 0.5).astype(int)

# Calculate accuracy
accuracy = np.mean(predictions == y_test)
print(f'Custom Neural Network Accuracy: {accuracy * 100:.2f}%')

Custom Neural Network Accuracy: 99.00%


Implementing the neural networks from scikit learn on training and testing data and calculating accuracy

In [87]:

X_train=X_train.T
X_test=X_test.T
y_train=y_train.T.ravel()
y_test=y_test.T.ravel()



# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Initialize and train the MLPClassifier
mlp = MLPClassifier(hidden_layer_sizes=(10,), max_iter=10000, random_state=42)
mlp.fit(X_train, y_train)

# Predict on test set
mlp_predictions = mlp.predict(X_test)

# Calculate accuracy
mlp_accuracy = accuracy_score(y_test, mlp_predictions)
print(f'Scikit-learn MLPClassifier Accuracy: {mlp_accuracy * 100:.2f}%')

Scikit-learn MLPClassifier Accuracy: 99.00%
