## Fundamental Principles, Assumptions, and Equations Involved

A neural network is a computational model inspired by the way biological neural networks in the human brain process information. It consists of interconnected nodes (neurons) organized in layers: an input layer, one or more hidden layers, and an output layer. The fundamental principles of a neural network involve the following components and steps:

- *Neurons and Layers*:
  - Each neuron receives one or more inputs, processes them, and passes the output to the next layer.
  - The neurons are organized in layers: input layer, hidden layers, and output layer.
  
- *Weights and Biases*:
  - Each connection between neurons has a weight that determines the strength and direction of the connection.
  - Each neuron also has an associated bias that shifts the activation function.
  
- *Activation Function*:
  - The activation function introduces non-linearity into the model. Common activation functions include the sigmoid, tanh, and ReLU (Rectified Linear Unit).
  
- *Forward Propagation*:
  - In forward propagation, inputs are passed through the network layer by layer to generate the output.
  
- *Loss Function*:
  - The loss function quantifies the difference between the predicted output and the true output. For binary classification, the cross-entropy loss function is commonly used.
  
- *Backpropagation and Gradient Descent*:
  - Backpropagation calculates the gradient of the loss function with respect to each weight by the chain rule, layer by layer backward from the output layer to the input layer.
  - Gradient descent updates the weights to minimize the loss function.

## Mathematical Equations

### Forward Propagation

For a neural network with one hidden layer:

$$
\mathbf{z}^{(1)} = \mathbf{X} \mathbf{W}^{(1)} + \mathbf{b}^{(1)}
$$

$$
\mathbf{a}^{(1)} = \sigma(\mathbf{z}^{(1)})
$$

$$
\mathbf{z}^{(2)} = \mathbf{a}^{(1)} \mathbf{W}^{(2)} + \mathbf{b}^{(2)}
$$

$$
\mathbf{a}^{(2)} = \sigma(\mathbf{z}^{(2)})
$$

where:
- $\mathbf{X}$ is the input matrix.
- $\mathbf{W}^{(1)}$ and $\mathbf{W}^{(2)}$ are weight matrices for the hidden and output layers, respectively.
- $\mathbf{b}^{(1)}$ and $\mathbf{b}^{(2)}$ are bias vectors for the hidden and output layers, respectively.
- $\sigma$ is the activation function (e.g., sigmoid function).

### Cross-Entropy Loss

$$
L(\mathbf{y}, \mathbf{\hat{y}}) = -\frac{1}{N} \sum_{i=1}^{N} \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right]
$$

where:
- $\mathbf{y}$ is the true label.
- $\mathbf{\hat{y}}$ is the predicted label.
- $N$ is the number of samples.

### Backpropagation

$$
\delta^{(2)} = \mathbf{a}^{(2)} - \mathbf{y}
$$

$$
\frac{\partial L}{\partial \mathbf{W}^{(2)}} = \mathbf{a}^{(1)T} \delta^{(2)}
$$

$$
\frac{\partial L}{\partial \mathbf{b}^{(2)}} = \sum \delta^{(2)}
$$

$$
\delta^{(1)} = (\delta^{(2)} \mathbf{W}^{(2)}) \sigma'(\mathbf{z}^{(1)})
$$

$$
\frac{\partial L}{\partial \mathbf{W}^{(1)}} = \mathbf{X}^T \delta^{(1)}
$$

$$
\frac{\partial L}{\partial \mathbf{b}^{(1)}} = \sum \delta^{(1)}
$$

## How the Model Learns from Data and Makes Predictions

1. *Initialization*:
   - Initialize weights and biases randomly or using a specific initialization method.

2. *Forward Pass*:
   - Pass input data through the network to get predictions.

3. *Compute Loss*:
   - Calculate the loss using the cross-entropy loss function.

4. *Backward Pass*:
   - Perform backpropagation to compute gradients of the loss with respect to weights and biases.

5. *Update Weights*:
   - Update weights and biases using gradient descent or other optimization algorithms.

6. *Repeat*:
   - Repeat steps 2-5 for a number of epochs or until convergence.


Importing important libraries

In [74]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression


Reading and processing the data 

In [75]:
# Load the dataset
file_path = "C:/Users/DELL/Downloads/data1_train.csv"
df = pd.read_csv(file_path)

# Extract features and target
X = df[['Feature_1', 'Feature_2', 'Feature_3']].values
y = df['Target'].values

# One-hot encode the target
y_onehot = np.zeros((y.size, y.max() + 1))
y_onehot[np.arange(y.size), y] = 1


In [76]:
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y_onehot, test_size=0.2, random_state=1234)

# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


Softmax Function

In [77]:
# Softmax function
def softmax(z):
    exp_z = np.exp(z - np.max(z, axis=1, keepdims=True))
    return exp_z / np.sum(exp_z, axis=1, keepdims=True)


Function to implement Logistic Regression

In [78]:

# Training function using gradient descent
def fit(X_train, y_train, learning_rate, n_iters):
    n_samples, n_features = X_train.shape
    n_classes = y_train.shape[1]
    
    weights = np.zeros((n_features, n_classes))
    bias = np.zeros(n_classes)
    
    for i in range(n_iters):
        # Compute linear model
        linear_model = np.dot(X_train, weights) + bias
        # Apply softmax
        y_pred = softmax(linear_model)
        
        # Compute gradients
        dw = (1 / n_samples) * np.dot(X_train.T, (y_pred - y_train))
        db = (1 / n_samples) * np.sum(y_pred - y_train, axis=0)
        
        # Update weights and bias
        weights -= learning_rate * dw
        bias -= learning_rate * db
        
        if i % 1000 == 0:
            # Compute loss (cross-entropy loss)
            loss = -np.mean(np.sum(y_train * np.log(y_pred + 1e-15), axis=1))
            print(f"Iteration {i}: Loss = {loss}")
    
    return weights, bias


Function to predict the labels

In [79]:
# Prediction function
def predict(X, weights, bias):
    linear_model = np.dot(X, weights) + bias
    y_pred = softmax(linear_model)
    return np.argmax(y_pred, axis=1)


Tuning of hyperparameter (learning rate) and training the data

In [80]:
# Fit the model
lr = [0.01,0.03,0.05]
n_iters = 10000
max_acc=0
for learning_rate in lr:
    weights, bias = fit(X_train, y_train, learning_rate, n_iters)
    # Make predictions on test set
    y_test_pred = predict(X_test, weights, bias)
    y_test_actual = np.argmax(y_test, axis=1)
    
    # Calculate accuracy
    accuracy = np.mean(y_test_pred == y_test_actual) * 100
    if max_acc<accuracy:
        max_acc=accuracy
        final_lr=learning_rate
    print(f"Accuracy for learning rate={learning_rate}% is: {accuracy:.2f}%")


Iteration 0: Loss = 1.098612288668107
Iteration 1000: Loss = 0.20506980722622078
Iteration 2000: Loss = 0.14928654862289742
Iteration 3000: Loss = 0.12746718569356624
Iteration 4000: Loss = 0.11544076298101484
Iteration 5000: Loss = 0.10770540788906888
Iteration 6000: Loss = 0.10226815873993218
Iteration 7000: Loss = 0.0982185142200932
Iteration 8000: Loss = 0.09507671734832063
Iteration 9000: Loss = 0.09256411429074338
Accuracy for learning rate=0.01% is: 98.12%
Iteration 0: Loss = 1.098612288668107
Iteration 1000: Loss = 0.12742374167283205
Iteration 2000: Loss = 0.10225250695101529
Iteration 3000: Loss = 0.09255567429721977
Iteration 4000: Loss = 0.08733181453722975
Iteration 5000: Loss = 0.0840585508737296
Iteration 6000: Loss = 0.08182007580445042
Iteration 7000: Loss = 0.08019895033066428
Iteration 8000: Loss = 0.07897633157938785
Iteration 9000: Loss = 0.07802596329653252
Accuracy for learning rate=0.03% is: 98.12%
Iteration 0: Loss = 1.098612288668107
Iteration 1000: Loss = 0.1

Implementing the Multinomial Logistic Regression on test data and train data and calculating accuracy

In [81]:

# Assuming the fit, predict, and softmax functions are already defined as per previous discussions

# Load training data
file_path_train = "C:/Users/DELL/Downloads/data1_train.csv"
df_train = pd.read_csv(file_path_train)

X_train = df_train[['Feature_1', 'Feature_2', 'Feature_3']].values
y_train = df_train['Target'].values

# Load test data
file_path_test = "C:/Users/DELL/Downloads/data1_test.csv"
df_test = pd.read_csv(file_path_test)

X_test = df_test[['Feature_1', 'Feature_2', 'Feature_3']].values
y_test = df_test['Target'].values

# One-hot encode the target (if necessary for your custom implementation)
y_train_onehot = np.zeros((y_train.size, y_train.max() + 1))
y_train_onehot[np.arange(y_train.size), y_train] = 1

y_test_onehot = np.zeros((y_test.size, y_test.max() + 1))
y_test_onehot[np.arange(y_test.size), y_test] = 1

# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Define and fit the model using custom implementation
learning_rate = 0.01
n_iters = 10000

# Assuming the fit and predict functions are defined similarly as in the previous corrections

# Fit the model
weights, bias = fit(X_train, y_train_onehot, learning_rate, n_iters)

# Make predictions on test set
y_test_pred = predict(X_test, weights, bias)
y_test_actual = np.argmax(y_test_onehot, axis=1)

# Calculate accuracy
accuracy = np.mean(y_test_pred == y_test_actual) * 100
print(f"Custom Logistic Regression Accuracy: {accuracy:.2f}%")


Iteration 0: Loss = 1.098612288668107
Iteration 1000: Loss = 0.19947257359384912
Iteration 2000: Loss = 0.14323139422245404
Iteration 3000: Loss = 0.12134719846601814
Iteration 4000: Loss = 0.10932632763466288
Iteration 5000: Loss = 0.1016084294723415
Iteration 6000: Loss = 0.09618614796911679
Iteration 7000: Loss = 0.09214558276939476
Iteration 8000: Loss = 0.08900677098411297
Iteration 9000: Loss = 0.08649177222575251
Custom Logistic Regression Accuracy: 98.00%


Implementing the Multinomial Logistic Regression using scikit learn on test data and train data and calculating accuracy

In [82]:
model = LogisticRegression(multi_class='multinomial', solver='lbfgs', max_iter=10000)
model.fit(X_train, y_train)
sklearn_accuracy = model.score(X_test, y_test) * 100
print(f"Sklearn Logistic Regression Accuracy: {sklearn_accuracy:.2f}%")


Sklearn Logistic Regression Accuracy: 98.00%


