Problem Statement: 
    
    You have a synthetic dataset containing information about individuals, including features like age, BMI, blood pressure, and more. The target variable is 'Outcome', which indicates whether the individual has diabetes (1) or not (0). You have to develop a basic neural network from scratch to perform binary classification. 

Tasks to Perform:

    * Load the synthetic diabetes dataset.
    * Split the dataset into two parts: a training set and a testing set.
    * Create a function to initialize the neural network parameters(Weights and bias).
    * Define the sigmoid activation function.
    * Implement the forward propagation step.
    * Training the Neural Network, develop the training logic for the neural network.
    * Use the trained neural network to make predictions on the test set (X_test).
    * Calculate the accuracy and generate a classification report that includes precision, recall, F1-score.

Goals:

    * Ensure you have a CSV file named 'synthetic_diabetes_dataset.csv' in the same directory as your Python script, containing your dataset.
    * Import necessary libraries: Pandas, NumPy, scikit-learn functions for data splitting and evaluation.
    * Load the dataset.
    * Split data into features (X) and target (y).
    * Further split into training and testing sets (80% training, 20% testing) using train_test_split.
    * Initialize parameters (weights and biases) using initialize_parameters.
    * The sigmoid activation function is ready to use as sigmoid(z).
    * The predict function calculates predictions based on parameters and input data.
    * Update parameters iteratively to minimize the chosen loss function.
    * After training (or using pre-trained parameters), predict on the test set (X_test) using the predict function.
    * Calculate accuracy with accuracy_score by comparing y_test and y_pred.
    * Generate a classification report using classification_report for precision, recall, F1-score, and support metrics.
    * By following these concise steps, you will execute the code and obtain the output, including model accuracy and a classification report for your diabetes prediction model.

Given: Our Dataset has a total of 9 features, 1 of those features will be removed for our y(outcome) feature: Need our Neural network to have 8 input layers, and 1 output layer.  

Need to create a perceptron class with methods to: 
1. Initalize random weights and biases
2. Take inputs: as a list of 7 feature values, 1 'ground truth outcome'
3. Pass 7 feature value inputs to an activation function, while keeping the ground truth hidden
4. Make a prediction by performing a weighted sum on those 7 inputs, with their respective weights. 
5. Pass our weighted output to our activation function. AF --> sigmoid: (1/ (1 + e^(-x))) --> maps x to values between [-1 and 1]
6. Calculate our Error via comparing output with the ground truth : yTrue - yPred --> (ground truth of 1 - predict 1 : 0 error), (ground truth of 1 - predict 0 : 0 error),
7. Perform Back-Propogation by teweaking our hidden layer based on error (Wi = Wi + (Learning Rate)* (Error * Xi))



In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Perceptron
from sklearn.metrics import accuracy_score, classification_report


data = pd.read_csv('synthetic_diabetes_dataset.csv')

#Split the data into features (X) and the target variable (y)
y = data['Outcome']
X = data.drop('Outcome', axis=1)


EDA

In [2]:
X = X.astype('float32') 
X['Pregnancies'] = X['Pregnancies'].astype('int8') 

X.dtypes
X

Unnamed: 0,Age,BMI,BloodPressure,Glucose,Insulin,Pregnancies,PedigreeFunction
0,67.640526,27.223850,97.006187,123.899101,98.400475,0,0.372299
1,54.001572,28.569895,94.320450,108.530838,135.311005,2,0.183971
2,59.787380,23.310741,120.692024,98.282692,88.132729,2,0.848308
3,72.408936,25.418856,105.624382,103.774452,114.093430,2,0.755758
4,68.675583,25.912212,118.787827,81.837166,30.000000,2,1.378158
...,...,...,...,...,...,...,...
995,54.128708,25.391003,151.187653,92.441650,76.958771,4,1.663665
996,48.016010,30.606094,106.388008,126.693390,102.399971,3,1.997590
997,50.941921,25.633736,117.113937,99.541412,107.875458,5,0.734910
998,38.523891,20.432394,101.812263,123.656326,86.882538,3,1.021854


In [3]:
y = y.astype('int8') 
y

0      0
1      0
2      0
3      0
4      0
      ..
995    0
996    0
997    0
998    1
999    0
Name: Outcome, Length: 1000, dtype: int8

In [4]:
#Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.8)

In [5]:
X_train.shape

(800, 7)

In [None]:
class NeuralNetBinaryClassification:
    
    def __init__(self, n_input_nodes, n_hidden_layers=1, seed=0, debug=False):
        
        self.weights = {}
        self.bias = []
        
        self.n_input_nodes = n_input_nodes
        self.n_hidden_layers = n_hidden_layers
        
        self.seed = seed
        self.debug = debug
        
        
    def initialize_parameters(self):
        #set seed for repeat ability
        np.random.seed(self.seed)
        
        self.bias = [np.random.random_sample(size=self.n_hidden_layers)]
        
        ##populates w_and_b with keys of Input nodes called X1, X2... Xn. With randomly generated values between  of [weight(respective), bias(respective)]
        for i in range(1, self.n_input_nodes+1, 1):
            self.weights[f'X{i}'] = [list()] * self.n_hidden_layers 
            
            for j in range(len(self.weights[f'X{i}'])):
                    self.weights[f'X{i}'][j] = np.random.random_sample(size=1)
    
    
    def train(self, x_train_input, y_train_output, number_epochs):
        #Epochs is the number of times to train on the training data
        for epoch in range(number_epochs):
            
            #if(self.debug): 
            print(f'Epoch #: {epoch+1}')
            
            
            output_list = []
            
            #Train on the training data
            for data_point in range(x_train_input.shape[0]):
                
                Input = x_train_input.iloc[data_point, :]

                output = self.feed_forward(Input)
                output_list.append(output)
                
                #caluclate error
                error = y_train_output.iloc[data_point] - output
                
                #pass through sigmoid
                adjustment = self.sigmoidFunction(Input.T * error * output)
                
                if(self.debug): 
                    print(f'Input: \n{Input}')
                    print(f'Output: \n{output}')
                    print(f'Error: \n{error}')
                    print(f'\n Adjustment:\n{adjustment} \n')
                
            self.loss_MSE(z_True=y_train_output, z_Pred=output_list)
            
        pass
    
    
    # Define sigmoid activation function
    def sigmoidFunction(self, Z):
        return (1 / (1 + np.exp(-Z)))
    
    
    def sigmoidFunctionDerivative(self, Z):
        return np.exp(-Z)/((1 + np.exp(-Z))**2)
        
        
    def feed_forward(self, X):

        A0 = X.T
        pass_num = 0
        while pass_num < self.n_hidden_layers:
            
            #update / instantiate inital parameters
            temp_bias = float()
            if(self.debug): 
                print(pass_num)
                print(self.bias[0][pass_num])  
            
            
            temp_bias = self.bias[0][pass_num]
            temp_weights = np.empty(shape = self.n_input_nodes)
            for i in range(1, self.n_input_nodes+1, 1):
                temp_weights[i-1] = self.weights[f'X{i}'][pass_num]
            
            
            # Forward propagation
            if(self.debug): 
                print("\nTemp Weights: ", temp_weights)
                print('\n Forward propagation \n')
                
            Z = np.dot(temp_weights, A0) + temp_bias
            A0 = self.sigmoidFunction(Z)
            
            pass_num +=1
        
        return A0
    
    def loss_MSE(self, z_True, z_Pred):
        front = 1/z_True.shape[0]
        
        if(self.debug): 
            print(z_True.shape)
            print(len(z_Pred))
        
        summation = 0
        for i in range(z_True.shape[0]):
            summation += ((z_True.iloc[i] - z_Pred[i])**2)
        
        ret_val = (summation/front)
        print(f'Loss Using MSE: {ret_val}')
            
        return ret_val
        
    def predict(self, input_X_test, input_y_test):
        
        pred_A0 = input_X_test.T
        pass_num = 0
        while pass_num < self.n_hidden_layers:
            
            #update / instantiate inital parameters
            pred_bias = float()
            if(self.debug): 
                print(pass_num)
                print(self.bias[0][pass_num])  
            
            
            pred_bias = self.bias[0][pass_num]
            pred_weights = np.empty(shape = self.n_input_nodes)
            
            for i in range(1, self.n_input_nodes+1, 1):
                pred_weights[i-1] = self.weights[f'X{i}'][pass_num]
            
            
            # Forward push
            if(self.debug): 
                print("\nPred Weights: ", pred_weights)
                print('\n Forward push \n')
                
            Z = np.dot(pred_weights, pred_A0) + pred_bias
            pred_A0 = self.sigmoidFunction(Z)
            
            pass_num +=1
        
        return pred_A0
        
        
    def backProp(self):
        pass
    

In [9]:
nnbc = NeuralNetBinaryClassification(n_input_nodes=7, n_hidden_layers=1, seed=0, debug=True)
nnbc.initialize_parameters()
print(nnbc.weights, nnbc.bias)

nnbc.train(x_train_input = X_train, y_train_output = y_train, number_epochs=1)

{'X1': [array([0.71518937])], 'X2': [array([0.60276338])], 'X3': [array([0.54488318])], 'X4': [array([0.4236548])], 'X5': [array([0.64589411])], 'X6': [array([0.43758721])], 'X7': [array([0.891773])]} [array([0.5488135])]
Epoch #: 1
0
0.5488135039273248

Temp Weights:  [0.71518937 0.60276338 0.54488318 0.4236548  0.64589411 0.43758721
 0.891773  ]

 Forward propagation 

Input: 
Age                  43.475914
BMI                  26.690321
BloodPressure       127.449226
Glucose              86.003830
Insulin              99.631805
Pregnancies           0.000000
PedigreeFunction      0.864138
Name: 274, dtype: float32
Output: 
1.0
Error: 
-1.0

 Adjustment:
Age                 1.314167e-19
BMI                 2.561774e-12
BloodPressure       0.000000e+00
Glucose             4.456678e-38
Insulin             0.000000e+00
Pregnancies         5.000000e-01
PedigreeFunction    2.964756e-01
Name: 274, dtype: float32 

0
0.5488135039273248

Temp Weights:  [0.71518937 0.60276338 0.54488318 0.423

  result = getattr(ufunc, method)(*inputs, **kwargs)


In [10]:
# Predict on the test set
y_pred = nnbc.predict(X_test, y_test)
print(y_pred)

accuracy = accuracy_score(y_true=y_test, y_pred=y_pred)
report = classification_report(y_true=y_test, y_pred=y_pred)

print(f'Accuracy: {accuracy * 100:.2f}%')
print(report)

0
0.5488135039273248

Pred Weights:  [0.71518937 0.60276338 0.54488318 0.4236548  0.64589411 0.43758721
 0.891773  ]

 Forward push 

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1.]
Accuracy: 14.50%
              precision    recall  f1-score   support

           0       0.00      0.00      0.00       171
           1       0.14      1.00      0.25        29

    accuracy                           0.14       200
   macro avg       0.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
