## Fundamental Principles, Assumptions, and Equations Involved

A neural network is a computational model inspired by the way biological neural networks in the human brain process information. It consists of interconnected nodes (neurons) organized in layers: an input layer, one or more hidden layers, and an output layer. The fundamental principles of a neural network involve the following components and steps:

- *Neurons and Layers*:
  - Each neuron receives one or more inputs, processes them, and passes the output to the next layer.
  - The neurons are organized in layers: input layer, hidden layers, and output layer.
  
- *Weights and Biases*:
  - Each connection between neurons has a weight that determines the strength and direction of the connection.
  - Each neuron also has an associated bias that shifts the activation function.
  
- *Activation Function*:
  - The activation function introduces non-linearity into the model. Common activation functions include the sigmoid, tanh, and ReLU (Rectified Linear Unit).
  
- *Forward Propagation*:
  - In forward propagation, inputs are passed through the network layer by layer to generate the output.
  
- *Loss Function*:
  - The loss function quantifies the difference between the predicted output and the true output. For binary classification, the cross-entropy loss function is commonly used.
  
- *Backpropagation and Gradient Descent*:
  - Backpropagation calculates the gradient of the loss function with respect to each weight by the chain rule, layer by layer backward from the output layer to the input layer.
  - Gradient descent updates the weights to minimize the loss function.

## Mathematical Equations

### Forward Propagation

For a neural network with one hidden layer:

$$
\mathbf{z}^{(1)} = \mathbf{X} \mathbf{W}^{(1)} + \mathbf{b}^{(1)}
$$

$$
\mathbf{a}^{(1)} = \sigma(\mathbf{z}^{(1)})
$$

$$
\mathbf{z}^{(2)} = \mathbf{a}^{(1)} \mathbf{W}^{(2)} + \mathbf{b}^{(2)}
$$

$$
\mathbf{a}^{(2)} = \sigma(\mathbf{z}^{(2)})
$$

where:
- $\mathbf{X}$ is the input matrix.
- $\mathbf{W}^{(1)}$ and $\mathbf{W}^{(2)}$ are weight matrices for the hidden and output layers, respectively.
- $\mathbf{b}^{(1)}$ and $\mathbf{b}^{(2)}$ are bias vectors for the hidden and output layers, respectively.
- $\sigma$ is the activation function (e.g., sigmoid function).

### Cross-Entropy Loss

$$
L(\mathbf{y}, \mathbf{\hat{y}}) = -\frac{1}{N} \sum_{i=1}^{N} \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right]
$$

where:
- $\mathbf{y}$ is the true label.
- $\mathbf{\hat{y}}$ is the predicted label.
- $N$ is the number of samples.

### Backpropagation

$$
\delta^{(2)} = \mathbf{a}^{(2)} - \mathbf{y}
$$

$$
\frac{\partial L}{\partial \mathbf{W}^{(2)}} = \mathbf{a}^{(1)T} \delta^{(2)}
$$

$$
\frac{\partial L}{\partial \mathbf{b}^{(2)}} = \sum \delta^{(2)}
$$

$$
\delta^{(1)} = (\delta^{(2)} \mathbf{W}^{(2)}) \sigma'(\mathbf{z}^{(1)})
$$

$$
\frac{\partial L}{\partial \mathbf{W}^{(1)}} = \mathbf{X}^T \delta^{(1)}
$$

$$
\frac{\partial L}{\partial \mathbf{b}^{(1)}} = \sum \delta^{(1)}
$$

## How the Model Learns from Data and Makes Predictions

1. *Initialization*:
   - Initialize weights and biases randomly or using a specific initialization method.

2. *Forward Pass*:
   - Pass input data through the network to get predictions.

3. *Compute Loss*:
   - Calculate the loss using the cross-entropy loss function.

4. *Backward Pass*:
   - Perform backpropagation to compute gradients of the loss with respect to weights and biases.

5. *Update Weights*:
   - Update weights and biases using gradient descent or other optimization algorithms.

6. *Repeat*:
   - Repeat steps 2-5 for a number of epochs or until convergence.


Importing important libraries

In [204]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler


Reading the data

In [205]:
file_path="C:/Users/DELL/Downloads/data2_train.csv"
df=pd.read_csv(file_path)

In [206]:
df.shape

(800, 3)

In [207]:
df.head()

Unnamed: 0,Feature_1,Feature_2,Target
0,8.160646,88.799326,0
1,31.149536,102.335826,0
2,13.103383,92.902908,0
3,15.950445,77.412565,0
4,35.856965,94.44155,0


In [208]:
# Check for class balance
print(df['Target'].value_counts())

Target
0    419
1    381
Name: count, dtype: int64


Function to define sigmoid function

In [209]:
def sigmoid(x):
    return 1/(1+np.exp(-x))

Function to implement Logistic Regression

In [210]:
def fit(X, Y, lr, n_iters=10000):
    n_samples, n_features=X.shape
    weights=np.zeros(n_features)
    bias=0

    for _ in range(n_iters):
        linear_pred=np.dot(X, weights) + bias
        predictions=sigmoid(linear_pred)

        dw= (1/n_samples) * np.dot(X.T, (predictions-Y))
        db= (1/n_samples) * np.sum(predictions-Y)

        weights = weights-lr*dw
        bias = bias-lr*db
    return weights, bias

Function to predict the labels

In [211]:
def predict(X, weights, bias):
    linear_pred=np.dot(X, weights) + bias
    y_pred=sigmoid(linear_pred)
    class_pred=[0 if y<=0.5 else 1 for y in y_pred]
    return class_pred

In [212]:
X = df[['Feature_1', 'Feature_2']].values
Y = df[['Target']].values.flatten()
X_train, X_test, Y_train, Y_test= train_test_split(X,Y, test_size=0.2, random_state=1234)



Standardize features

In [213]:
scaler = StandardScaler()
X = scaler.fit_transform(X)


Tuning of hyperparameter (learning rate) and training the data

In [214]:
learning_rate=[0.01,0.03,0.05]
max_acc=0
for lr in learning_rate:
    weights, bias=fit(X_train, Y_train,lr)
    y_pred=predict(X_test, weights, bias)
    accuracy=np.sum(y_pred==Y_test)/len(Y_test)
    if max_acc<accuracy:
        max_acc=accuracy
        final_lr=lr
    print(f"accuracy when learning_rate={lr} is: {accuracy}")


accuracy when learning_rate=0.01 is: 0.975
accuracy when learning_rate=0.03 is: 0.975
accuracy when learning_rate=0.05 is: 0.9


Implementing the Logistic Regression on test data and train data and calculating accuracy

In [215]:
file_path="C:/Users/DELL/Downloads/data2_train.csv"
df=pd.read_csv(file_path)

X_train = df[['Feature_1', 'Feature_2']].values
Y_train = df[['Target']].values.flatten()

file_path="C:/Users/DELL/Downloads/data2_test.csv"
df_test=pd.read_csv(file_path)

X_test = df_test[['Feature_1', 'Feature_2']].values
Y_test = df_test[['Target']].values.flatten()

# Standardize features
scaler = StandardScaler()
X = scaler.fit_transform(X)

weights, bias=fit(X_train, Y_train,final_lr)
y_pred=predict(X_test, weights, bias)

df2=pd.DataFrame(X_test, columns=['Feature_1', 'Feature_2'])
df2['Target']=Y_test
df2['Predicted']=y_pred

accuracy=np.sum(y_pred==Y_test)/len(Y_test)
print (accuracy)

0.93


Implementing the Logistic Regression using scikit learn on test data and train data and calculating accuracy

In [216]:
from sklearn.linear_model import LogisticRegression


# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

model = LogisticRegression(solver='lbfgs', max_iter=10000)
model.fit(X_train, Y_train)

# Make predictions on the test set
y_test_pred = model.predict(X_test)

# Compute accuracy
accuracy=np.sum(y_pred==Y_test)/len(Y_test)
print(f"Accuracy: {accuracy:.2f}")

Accuracy: 0.93
