# Shallow Neural Network Implementation

## Importing Required Libraries

First, import the necessary libraries. Note that in this assignment, you are only allowed to use the libraries provided in the notebook.

In [37]:
import numpy as np
import pandas as pd

## Dataset

In this exercise, we will use the simple yet famous **Pima Indians Diabetes** dataset. This dataset includes information from **768 Native American women** from the Pima tribe, collected to examine the risk factors for developing type 2 diabetes. The data includes age, weight, height, family history of diabetes, blood pressure, blood glucose levels, and other factors.

<center>
<div style="line-height:200%; font-size:medium">
    
| Column | Description |
|:------:|:-----------:|
|Pregnancies|Number of pregnancies|
|Glucose|Blood glucose level (mg/dL)|
|BloodPressure|Systolic blood pressure (mmHg)|
|SkinThickness|Skin thickness (mm)|
|Insulin|Blood insulin level (μU/mL)|
|BMI|Body mass index (kg/m²)|
|DiabetesPedigreeFunction|Function representing family history of diabetes|
|Age|Age of the woman (years)|
|Outcome|Non-diabetic (0) or diabetic (1)|

</div>
</center>

### Reading the Dataset

First, you need to read the dataset file. You can read the training data from the file `diabetes_train.csv` located in the `data` folder and use the samples in it to train the model. The model's performance will be evaluated on `diabetes_test.csv`, which has the same structure as the training data except that the `Outcome` column is removed.

In [38]:
train_data = pd.read_csv('./data/diabetes_train.csv')
test_data = pd.read_csv('./data/diabetes_test.csv')
# test_data.head()
train_data.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


## Preprocessing and Feature Engineering

First, store the target variable column (`Outcome`) in a separate DataFrame and then remove this column from the `train_data` DataFrame to create the equivalent matrices $X$ and $y$.


In [39]:
train_data_outcome = train_data['Outcome'].copy()
train_data = train_data.drop('Outcome', axis=1)

One of the crucial preprocessing steps is feature scaling to a normal distribution, commonly referred to as normalization. Normalization helps reduce significant weight fluctuations and accelerates model convergence. In this assignment, you should normalize each feature so that their mean is `0` and their variance is `1`. This can be done using the following formula:

For a data series `X = [x_1, x_2, ..., x_n]`, subtract the mean from each data sample (`x_i`) and divide by the standard deviation (sigma) to obtain the normalized data series.

$$ Z = \frac{x_i - \bar{x}}{\sigma} $$

**Note:** Since we only have access to the training data when building the model, use the mean and standard deviation from the training samples to normalize the test samples as well.


In [40]:
for column in train_data.columns:
  mean = train_data[column].mean()
  std = train_data[column].std()
  train_data[column] = (train_data[column] - mean) / std
  test_data[column] = (test_data[column] - mean) / std
  
train_data.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
0,0.649833,0.854539,0.166518,0.90088,-0.687695,0.222281,0.438405,1.443781
1,-0.835754,-1.096441,-0.140758,0.526362,-0.687695,-0.672046,-0.370035,-0.178571
2,1.244068,1.938416,-0.243184,-1.283807,-0.687695,-1.093658,0.570216,-0.093184
3,-0.835754,-0.972569,-0.140758,0.151844,0.123855,-0.480405,-0.908995,-1.032441
4,-1.132872,0.513891,-1.47229,0.90088,0.762734,1.436011,5.303692,-0.007797


Next, add a bias term to the DataFrame. To do this, add a column with a value of `1` at the beginning of the dataset.


In [41]:
train_bias = pd.Series(1 , index=train_data.index)
train_data['bias'] = train_bias

test_bias = pd.Series(1 , index=test_data.index)
test_data['bias'] = test_bias

test_data.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,bias
0,0.649833,-0.693858,-0.55046,0.776041,0.952672,0.273386,-0.138634,0.846073,1
1,1.541186,1.040346,0.473794,0.588782,0.175656,-0.122674,-0.917783,1.016847,1
2,0.649833,1.380993,-0.038333,0.339103,0.762734,0.222281,0.450122,1.358395,1
3,-0.835754,-0.66289,-0.55046,-0.659611,-0.687695,-0.825359,0.215791,-1.032441,1
4,1.838303,-1.622896,1.907751,0.151844,-0.264653,0.465027,-0.563358,1.187621,1


Before designing and training the model, convert the datasets from DataFrames to NumPy arrays. Therefore, in this step, convert the DataFrames `train_data` and `train_data_outcome` to NumPy arrays. Additionally, use the `train_test_split` function to split this dataset into training and validation sets with a ratio of `0.2`.

**Note:** According to previous lectures, each **row** of the input matrix represents a **feature**, and each **column** represents a **sample**. Therefore, you need to transpose the feature matrix. This step should also be applied to the target variable (`train_data_outcome`).


In [42]:
from sklearn.model_selection import train_test_split

X = train_data.to_numpy()
y = train_data_outcome.to_numpy().T



X_train, X_validation, y_train, y_validation = train_test_split(X, y, test_size=0.2)
X_train = np.array(X_train).T
X_validation = np.array(X_validation).T
y_train = np.array(y_train)
y_validation = np.array(y_validation)
test_data_numpy = test_data.to_numpy().T

# print(X_train)
# print(y_train)
# print(X_validation)
# print(y_validation)
#test_data_numpy

To ensure the correctness of input and output settings, running the next cell should produce the following output:

```
X_train.shape:(9, 534), y_train.shape:(534,)
X_validation.shape:(9, 134), y_validation.shape:(134,)
test_data_numpy.shape:(9, 100)
```

In [54]:
print(f'X_train.shape:{X_train.shape}, y_train.shape:{y_train.shape}')
print(f'X_validation.shape:{X_validation.shape}, y_validation.shape:{y_validation.shape}')
print(f'test_data_numpy.shape:{test_data_numpy.shape}')

X_train.shape:(9, 534), y_train.shape:(534,)
X_validation.shape:(9, 134), y_validation.shape:(134,)
test_data_numpy.shape:(9, 100)


## Modeling

Now that the data is processed and ready, it's time for the main part—building the model. You are required to implement a simple shallow neural network using gradient descent from scratch. We will explain each component of this model step by step to guide you through its implementation.

This model is a shallow neural network with one hidden layer containing `1000` neurons. The activation function for this layer is the Rectified Linear Unit (ReLU), which you are familiar with from the activation function lectures. The activation function for the output layer is the sigmoid function. Note that the required formulas for each part are provided below. We have also implemented the two activation functions for you.

```python
sigmoid_Z = 1 / (1 + np.exp(-Z))
```

```python
ReLU_Z = np.maximum(0, Z)
```

**Note:** Only use the NumPy library for mathematical operations and computations, and define your lists as NumPy arrays.

### Reminder: Sigmoid Function

| Sigmoid Function | Derivative of Sigmoid Function |
| :---: | :--: |
| $f(z) = \frac{1}{1 + e^{-z}}$ | $f'(z) = f(z)(1-f(z))$ |


### Reminder: ReLU Activation Function

| ReLU Function  | Derivative of ReLU Function  |
| :---: | :--: |
|$$f(z) = \begin{cases} 0 & \text{if } z < 0 \\ z & \text{if } z \geq 0\end{cases}$$|$$f'(z) = \begin{cases} 0 & \text{if } z < 0 \\ 1 & \text{if } z \geq 0\end{cases}$$|

### `Model` Class Construction

Create a class named `Model` that contains the following three methods. We will explain each method in detail below.

In [44]:
def __init__(self,input_size , hidden_size , output_size):
    pass
def predict(self, inputs):
    pass
def update_weights_for_one_epoch(self, inputs, outputs, learning_rate):
    pass
def fit(self, inputs, outputs, learning_rate, epochs=64):
    pass

#### `__init__` Method

In the `__init__(self)` method, initialize the weights of the hidden and output layers (`w1` and `w2`) randomly with a mean of `0` and a standard deviation of `0.01`. You can use the `np.random.randn` function for this purpose. Note that `np.random.randn` generates random numbers with a mean of `0` and a standard deviation of `1`, so you need to adjust these values accordingly to meet the problem requirements.

#### `predict` Method

The `predict(self, inputs)` method takes the inputs and sequentially returns the outputs of both layers (`A_1` and `A_2`). Implement this process according to the following formulas:

$$Z^{[1]}=W^{[1]}.X$$
$$A^{[1]}=ReLU(Z^{[1]})$$
$$Z^{[2]}=W^{[2]}A^{[1]}$$
$$A^{[2]}=\sigma(Z^{[2]})=\frac{1}{1+e^{-Z^{[2]}}}=Y_{pred}$$

**Hint:** To perform matrix multiplication between two matrices, use the `arr1.dot(arr2)` function. For example, the formula $Z^{[1]}=W^{[1]}X$ corresponds to `W_1.dot(X)` in Python. Alternatively, you can use the `@` operator as `W_1 @ X`.


#### `update_weights_for_one_epoch` Method

In the `update_weights_for_one_epoch(self, inputs, outputs, learning_rate)` method, update the network's weights for one epoch. Note that `learning_rate` is the learning rate or alpha. The required formulas for this section are provided below. In the next chapter, we will explain in detail how to compute them.

**Weight Update for `w2`:**

$$W^{[2]} = W^{[2]} + \Delta W^{[2]}$$
$$\Delta W^{[2]} = - \alpha \frac{\partial cost}{\partial W^{[2]}}$$
$$\frac{\partial cost}{\partial W^{[2]}} = \left(\frac{-2}{n}(Y_{true}-A^{[2]}) \odot A^{[2]} \odot (1-A^{[2]})\right) \bullet A^{[1]T}$$
$$W^{[2]} = W^{[2]} + \left(\frac{2 \alpha}{n}(Y_{true}-A^{[2]}) \odot A^{[2]} \odot (1-A^{[2]})\right) \bullet A^{[1]T}$$

**Weight Update for `w1`:**

$$W^{[1]} = W^{[1]} + \Delta W^{[1]}$$
$$\Delta W^{[1]} = - \alpha \frac{\partial cost}{\partial W^{[1]}}$$

$$\frac{\partial cost}{\partial W^{[1]}} = \left(\left(\frac{-2}{n}(Y_{true}-A^{[2]}) \odot A^{[2]} \odot (1-A^{[2]})\right)^T \bullet W^{[2]}\right)^T \odot \frac{\partial A^{[1]}}{\partial Z^{[1]}} \bullet X^T$$

$$W^{[1]} = W^{[1]} + \left(\left(\frac{2 \alpha}{n}(Y_{true}-A^{[2]}) \odot A^{[2]} \odot (1-A^{[2]})\right)^T \bullet W^{[2]}\right)^T \odot \frac{\partial A^{[1]}}{\partial Z^{[1]}} \bullet X^T$$

**Note:** The symbol $\odot$ represents element-wise multiplication, and the symbol $\bullet$ represents matrix multiplication.

To obtain the value of $\frac{\partial A^{[1]}}{\partial Z^{[1]}}$, which is the derivative of the ReLU function, use the following code snippet. This will produce a matrix of the same size as $Z^{[1]}$, composed of `0` and `1`, where cells corresponding to $Z^{[1]} > 0$ will have a value of `1`, and `0` otherwise. Note that although you pass `A_1` as input to this function, it does not affect the output.

```python
relu_gradient = np.where(A_1 > 0, 1, 0)
```

**Important:** Part of $\Delta W^{[1]}$ is already computed in $\Delta W^{[2]}$. By storing it, you can avoid redundant calculations.


#### `fit` Method

The `fit(self, inputs, outputs, learning_rate, epochs=64)` method updates the network's weights for the specified number of epochs (`epochs`). You do not need to make any changes to this method; simply use it in the subsequent steps.


### Model Class Implementation

In [None]:
class Model:

    def __init__(self, input_size , hidden_size , output_size):
        self.w1 = np.random.rand(hidden_size , input_size) * 0.01
        self.w2 = np.random.rand(output_size , hidden_size) * 0.01
 
    def predict(self, inputs):
        x = inputs
        
        Z_1 = self.w1.dot(x) # hidden layer input
        A_1 = self.ReLU(Z_1) # hidden layer output

        Z_2 = self.w2.dot(A_1) # output layer input
        A_2 = self.Sigmoid(Z_2) # output layer output - predicted y

        return A_1, A_2

    def update_weights_for_one_epoch(self, inputs, outputs, learning_rate):
        x, y_true = inputs, outputs
        A_1, A_2 = self.predict(inputs)

        n = x.shape[1]

        
        shared_coefficient = (2 * learning_rate / 2) * (y_true - A_2) * self.Sigmoid_Derivative(A_2)
        relu_gradient = self.ReLU(A_1)
   
        dW2 = shared_coefficient.dot(A_1.T)
        self.w2 += dW2
        
        
        dA_1 =self.w2.T.dot(shared_coefficient)
        dZ_1 = dA_1 * relu_gradient
        dW1= dZ_1.dot(x.T)
        self.w1 += dW1
        
        

    def fit(self, inputs, outputs, learning_rate, epochs=64):
        for i in range(epochs):
            self.update_weights_for_one_epoch(inputs, outputs, learning_rate)
            
    def Sigmoid(self , x):
        return 1 / (1 + np.exp(-x)) 
    
    def Sigmoid_Derivative(self , x):
        return x * (1 - x)    
    
    def ReLU(self, x):
        return np.maximum(0 , x)
    
    def ReLU_Derivative(x):
        return np.where(x > 0, 1, 0)

### Training and Evaluation

After designing the network structure, you can create an instance of the `Model()` class and then call the `fit` method with appropriate arguments to start training the model. It is recommended to experiment with different learning rates (such as `0.1`, `0.01`, `0.001`, etc.) and different numbers of training epochs, and compare the results on the validation samples.

To assess the model's accuracy, you can use the `evaluation(model, inputs, outputs)` function.


In [52]:
def evaluation(model, inputs, outputs):
  _, A_2 = model.predict(inputs)
  prediction = (A_2 > 0.5)
  return np.mean(prediction == outputs) * 100

input_size = X_train.shape[0]
hidden_size = pow(10,3)
output_size = 1


model = Model(input_size , hidden_size, output_size)
model.fit(X_train, y_train, learning_rate = 0.1, epochs = 200)

accuracy  = evaluation(model, X_validation, y_validation)
print(f"Your model accuracy on the given set: {round(accuracy , 2)}%") 

  return 1 / (1 + np.exp(-x))


Your model accuracy on the given set: 78.36%


## Prediction on Test Data and Output

Finally, you need to compute the model's output for the test samples. First, obtain the model's output on the test data, and then if the model predicts a higher probability that an individual has diabetes (output greater than `0.5`), classify the individual as diabetic; otherwise, classify them as non-diabetic.

Therefore, in the `prediction` variable, which is a NumPy array, you will have `True` and `False` values. Note that this variable will also be evaluated by the grading system.


In [51]:
#accuracy  = evaluation(model,test_data_numpy , )
#print(f"Your model accuracy on the given set: {round(accuracy , 2)}%") 

## Assignment Grading Procedure

The accuracy of your model on the test data, specifically the `prediction` variable, will also be evaluated, with a minimum acceptable accuracy of **65%**.

Additionally, the `test_data` DataFrame will be checked to ensure the correctness of your data normalization process.
