<h2 align="center"><font color='black'>Neural Networks from Scratch, No TF or Pytorch</font></h2>


## Introduction
Neural networks, the fundamental building blocks of artificial intelligence, have transformed the landscape of technology and our daily lives. These powerful algorithms are designed to mimic the human brain's way of learning and decision-making, making them adept at tackling complex tasks and providing accurate predictions.

In this tutorial, we'll delve into the basics of neural networks and their significance. We'll embark on a journey to create a simple 3-layer neural network, inspired by the insightful work of Samson. This tutorial offers a hands-on introduction to the essential steps involved in building a neural network from scratch.

![image](https://media.geeksforgeeks.org/wp-content/cdn-uploads/20230602113310/Neural-Networks-Architecture.png)

This notebook draws inspiration from Samson's insightful work I found online, which includes a valuable video and a [Kaggle notebook](https://www.kaggle.com/code/wwsalmon/simple-mnist-nn-from-scratch-numpy-no-tf-keras) providing deeper insights into the intricacies of neural network development. 

Although you might never ever go this deep in real life world when building out models for your own use-case, its always better to know what's going under the hood. Let's dive in and explore the world of neural networks together.

## About the Dataset

According to the World Health Organization (WHO) stroke is the 2nd leading cause of death globally, responsible for approximately 11% of total deaths. This dataset is used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, various diseases, and smoking status. Each row in the data provides relavant information about the patient.

**Attribute Information**

1. **id:** Unique identifier
2. **gender:** "Male", "Female", or "Other"
3. **age:** Age of the patient
4. **hypertension:** 0 if the patient doesn't have hypertension, 1 if the patient has hypertension
5. **heart_disease:** 0 if the patient doesn't have any heart diseases, 1 if the patient has a heart disease
6. **ever_married:** "No" or "Yes"
7. **work_type:** "Children", "Govt_job", "Never_worked", "Private", or "Self-employed"
8. **Residence_type:** "Rural" or "Urban"
9. **avg_glucose_level:** Average glucose level in blood
10. **bmi:** Body mass index
11. **smoking_status:** "Formerly smoked", "Never smoked", "Smokes", or "Unknown"*
12. **stroke:** 1 if the patient had a stroke, 0 if not

*Note: "Unknown" in smoking_status indicates that smoking information is unavailable for the patient.

## Imports and Reading Data

- While the deep learning landscape is dominated by powerful frameworks like `TensorFlow`and `PyTorch`, we'll take a unique approach. We won't rely on these main deep learning libraries, instead opting for a more hands-on exploration using basic tools.

- Our toolkit for this tutorial primarily includes the versatile `NumPy` library for numerical computations and some other helpful libraries like `sklearn`, `pandas` for data manipulation and `seaborn` and `matplotlib` for visualization. NumPy provides us with an array-like structure that is efficient for performing mathematical operations on large datasets.

In [None]:
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from imblearn.over_sampling import RandomOverSampler
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['font.size'] = 14
plt.rcParams['figure.figsize'] = (22, 5)
plt.rcParams['figure.dpi'] = 100

In [None]:
import sqlite3
conn = sqlite3.connect('playground.db')

In [None]:
train = pd.read_csv('/kaggle/input/playground-series-s3e2/train.csv')
train.head()

In [None]:
train.to_sql('data', conn, if_exists='replace', index=False)

## Exploratory Data Analysis

In [None]:
fig, ax = plt.subplots()
N, bins, patches = ax.hist(np.array(train.avg_glucose_level), edgecolor='white', color='lightgray',linewidth=5, alpha=0.7)
for i in range(1,2):
    patches[i].set_facecolor('orange')
    plt.title('Avg Glucose Level Histogram', fontsize=18)
    plt.gca().spines['top'].set_visible(False)
    plt.gca().spines['right'].set_visible(False)
    plt.xlabel('avg_glucose_level')
    plt.ylabel('Count')
    plt.axvline(train.avg_glucose_level.mean(), linestyle='--', lw=2, zorder=1, color='blue')
    plt.annotate(f' mean', (90, 7500), fontsize=14,color='black')
    plt.show()

In [None]:
query = '''
select gender, 
    round(sum(case when stroke=0 then 1 else 0 end)*100.0/(select count(*) from data),2) as No_Stroke,
    round(sum(case when stroke=1 then 1 else 0 end)*100.0/(select count(*) from data),2) as Stroke
from data
group by 1
'''
Gender_stroke = pd.read_sql(query, conn)
Gender_stroke

Observation

- The data reveals that **females have a slightly higher rate of stroke at 2.4% compared to males at 1.73%**. This may indicate a higher stroke risk among females within this dataset. Meanwhile, the “Other” category has negligible representation, with no recorded instances of stroke.

So What?

- The higher stroke rate among females could reflect biological, lifestyle, or healthcare access differences, possibly influenced by factors such as hormonal changes, longer life expectancy, or different symptom recognition and management rates. However, without further context, it’s challenging to determine if this finding is due to inherent biological differences, healthcare biases, or other environmental factors.

Action

- To better understand gender differences in stroke risk, analyze additional demographic and health factors, such as age, lifestyle choices, and comorbid conditions like hypertension and diabetes. Also, exploring healthcare utilization patterns could reveal disparities in treatment or early warning sign recognition between genders. This deeper analysis can lead to more gender-tailored stroke prevention and intervention strategies.

In [None]:
query = '''
select smoking_status, 
    round(sum(case when stroke=0 then 1 else 0 end)*100.0/(select count(*) from data),2) as No_Stroke,
    round(sum(case when stroke=1 then 1 else 0 end)*100.0/(select count(*) from data),2) as Stroke
from data
group by 1
'''
smoking_status_stroke = pd.read_sql(query, conn)
smoking_status_stroke

Observation

- **1.68% of individuals who have never smoked experienced a stroke, which is higher than those who formerly smoked (1.04%)** and those who currently smoke (0.71%). At first glance, this seems counterintuitive, as we typically expect smoking to increase stroke risk.

So What?

- This unexpected pattern suggests the influence of other risk factors that may not be captured in the smoking status alone. Factors such as age, lifestyle, diet, and genetic predispositions could be playing a larger role among non-smokers in this dataset, potentially skewing the stroke rates. Additionally, the large “Unknown” category for smoking status (28.98%) might limit the accuracy of the insight, as unreported smoking behavior could be masking true correlations.

Action

- To clarify the relationship, further analysis is needed, controlling for other variables. Conducting a multivariable logistic regression that includes age, lifestyle habits, and medical history could help isolate smoking’s direct impact on stroke risk. Segmenting this analysis by age or other demographic details will also provide more reliable insights, enabling more accurately targeted health recommendations.

In [None]:
query = '''
select work_type, 
    round(sum(case when stroke=0 then 1 else 0 end)*100.0/(select count(*) from data),2) as No_Stroke,
    round(sum(case when stroke=1 then 1 else 0 end)*100.0/(select count(*) from data),2) as Stroke
from data
group by 1
'''
work_stroke = pd.read_sql(query, conn)
work_stroke

Observation

- **Individuals in private employment have the highest stroke incidence at 2.64%, followed by self-employed individuals at 1.03% and government employees at 0.45%**. Those who have never worked or are classified as children show minimal to no recorded cases of stroke.

So What?

- The higher stroke incidence in private-sector employees and self-employed individuals could reflect the potential impact of workplace stress, lifestyle factors, or limited access to health benefits often associated with these work types. The comparatively lower stroke rates in government employees might suggest better access to healthcare, stable work conditions, or early health screenings.

Action

- This insight suggests a need for targeted health programs focused on stroke prevention and stress management in private and self-employed sectors. Companies could benefit from integrating workplace wellness initiatives, such as stress management workshops, regular health check-ups, and awareness campaigns on stroke symptoms and prevention strategies.

In [None]:
query = '''
select smoking_status, 'Private' as work_type,
    round(sum(case when work_type='Private' and stroke = 0 then 1 else 0 end)*100.0/(select count(*) from data),2) as no_stroke,
    round(sum(case when work_type='Private' and stroke = 1 then 1 else 0 end)*100.0/(select count(*) from data),2) as stroke
from data
group by smoking_status
'''
smoking_status_pvt = pd.read_sql(query, conn)
smoking_status_pvt

Observation

- The data reveals that individuals who have never smoked have the highest percentage of no stroke occurrences (28.50), while those who smoke exhibit a stroke incidence of 0.47. Among those who formerly smoked, 9.59 report no stroke, and 0.63 have experienced a stroke. Smokers in the private work category have a relatively low incidence of strokes at 0.47, comparable to those who currently smoke.

So What?

- This indicates that smoking status plays a significant role in stroke risk, with non-smokers experiencing the lowest risk. The data highlights the potential protective effects of not smoking, suggesting that smoking cessation efforts could significantly reduce stroke incidence.

Action

- Health initiatives should focus on promoting smoking cessation programs, especially targeting private sector employees. Educational campaigns highlighting the direct correlation between smoking and stroke risk can be effective in reducing overall stroke incidence within this demographic.

In [None]:
query = '''
WITH cte AS (
    SELECT *,
           CASE 
               WHEN age BETWEEN 0 AND 17 THEN '0 - 17'   -- Ages 0 to 17
               WHEN age BETWEEN 18 AND 50 THEN '18 - 50'  -- Ages 18 to 50
               WHEN age BETWEEN 51 AND 64 THEN '51 - 64'  -- Ages 51 to 64
               WHEN age >= 65 THEN '65+'                  -- Age 65 and older
               ELSE 'Unknown'                              -- For NULL or unexpected values
           END AS age_category
    FROM data
)

SELECT age_category, 
       ROUND(SUM(CASE WHEN stroke = 0 THEN 1 ELSE 0 END) * 100.0 / (SELECT COUNT(*) FROM data), 2) AS No_Stroke,
       ROUND(SUM(CASE WHEN stroke = 1 THEN 1 ELSE 0 END) * 100.0 / (SELECT COUNT(*) FROM data), 2) AS Stroke
FROM cte
GROUP BY age_category
ORDER BY 
    CASE 
        WHEN age_category = '0 - 17' THEN 1 
        WHEN age_category = '18 - 50' THEN 2 
        WHEN age_category = '51 - 64' THEN 3 
        WHEN age_category = '65+' THEN 4
        ELSE 5          
    END;
'''
age_stroke = pd.read_sql(query, conn)
age_stroke

Observations

- In the age category of 0 - 17, 16.73% of individuals did not experience a stroke, while only 0.01% were affected. For the 18 - 50 age group, 45.65% were stroke-free, with a minor incidence of 0.25% for strokes. In the 51 - 64 category, 22.20% had no stroke history, with a slightly higher incidence of 1.22%. Lastly, the 65+ category shows 11.29% without a stroke and a more significant stroke rate of 2.65%.

So What?

- The data shows a very low stroke risk in the youth demographic (0 - 17), indicating minimal need for preventive efforts. Young adults (18 - 50) have a low but notable incidence, suggesting lifestyle-focused health campaigns. **The middle-aged group (51 - 64) faces a higher risk, requiring stronger preventive measures**. Finally, seniors (65+) have the highest stroke incidence, highlighting the need for targeted health interventions.

Action

- Health organizations should develop educational programs for young adults and middle-aged individuals to raise awareness of stroke risk factors and promote lifestyle changes. Regular health screenings for those aged 51 and older are essential for early risk detection. Additionally, enhancing initiatives for seniors with regular health checks and lifestyle counseling will help manage stroke risk effectively. Tailoring strategies to these age groups can significantly reduce stroke incidence.

In [None]:
query = '''
WITH cte AS (
    SELECT *,
           CASE 
               WHEN age BETWEEN 0 AND 17 THEN '0 - 17'   -- Ages 0 to 17
               WHEN age BETWEEN 18 AND 50 THEN '18 - 50'  -- Ages 18 to 50
               WHEN age BETWEEN 51 AND 64 THEN '51 - 64'  -- Ages 51 to 64
               WHEN age >= 65 THEN '65+'                  -- Age 65 and older
               ELSE 'Unknown'                              -- For NULL or unexpected values
           END AS age_category
    FROM data
)

SELECT '65+' as age_category, work_type, 
       ROUND(SUM(CASE WHEN stroke = 0 THEN 1 ELSE 0 END) * 100.0 / (SELECT COUNT(*) FROM data), 2) AS No_Stroke,
       ROUND(SUM(CASE WHEN stroke = 1 THEN 1 ELSE 0 END) * 100.0 / (SELECT COUNT(*) FROM data), 2) AS Stroke
FROM cte
where age_category = '65+'
GROUP BY work_type
order by Stroke desc

'''
age_category_work_type = pd.read_sql(query, conn)
age_category_work_type

- **Observation**: In the senior demographic (65+), private sector employees show a higher stroke incidence (1.57%) compared to their self-employed (0.88%) and government job counterparts (0.21%).

- **So What?** : This suggests that **seniors in private employment may be at a greater risk of strokes**, potentially due to lifestyle factors or workplace stressors.

- **Action** : Healthcare providers should focus on tailored interventions for older adults in the private sector, such as stress management programs and regular health screenings, to mitigate their stroke risk.

In [None]:
query = '''
select ever_married, 
    Residence_type,
    round(sum(case when stroke=0 then 1 else 0 end)*100.0/(select count(*) from data),2) as No_Stroke,
    round(sum(case when stroke=1 then 1 else 0 end)*100.0/(select count(*) from data),2) as Stroke
from data
group by ever_married, Residence_type
'''
ever_married_stroke = pd.read_sql(query, conn)
ever_married_stroke

Observation

- Among individuals who have never married, stroke incidence is slightly lower in rural areas (0.12) compared to urban areas (0.15). In contrast, those who are ever married show a higher incidence of strokes in both rural (1.95) and urban settings (1.91), with urban residents being marginally higher.

So What?

- Marital status and residence type influence stroke risk, with **married individuals at a higher risk in both rural and urban residence types**. This may suggest that factors such as shared lifestyle choices or stressors in married life could contribute to increased risk.

Action

- Healthcare strategies should consider marital status and living conditions when targeting stroke prevention. Specific campaigns for married individuals, especially in rural and urban settings, could emphasize healthy lifestyle choices and stress management techniques to lower stroke risk.

In [None]:
query = '''
select stroke,round(avg(bmi),2) as Avg_BMI, round(avg(avg_glucose_level),2) as Avg_Glucose from data group by 1
'''
stroke_bmi_glucose = pd.read_sql(query, conn)
stroke_bmi_glucose

In [None]:
sns.barplot(data=stroke_bmi_glucose,x= 'stroke',y='Avg_Glucose', hue='stroke', ci=None,dodge=False)
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.title(f'Avg Glucose Levels')
plt.ylabel('Avg Glucose Level')
plt.xticks([0,1], ['No Stroke', 'Stroke'])
plt.show()

In [None]:
sns.barplot(data=stroke_bmi_glucose,x= 'stroke',y='Avg_BMI', hue='stroke', ci=None,dodge=False)
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.title(f'Avg_BMI , Stroke Wise')
plt.ylabel('Avg_BMI')
plt.xticks([0,1], ['No Stroke', 'Stroke'])
plt.xlabel('Stroke')
plt.show()

In [None]:
sns.scatterplot(y=train['bmi'], x=train['avg_glucose_level'], hue=train['stroke'], alpha=0.5)
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)

plt.axvline(train['avg_glucose_level'].mean(), linestyle='--', lw=2, zorder=1, color='black')
plt.annotate(f'mean', (80, 80), fontsize=14, color='black')

plt.title('avg_glucose_level & bmi relation', fontsize=18)
plt.xlabel('avg_glucose_level')
plt.ylabel('bmi')

plt.show()

People with lower BMI have more chances of NOT getting a stroke and those who have high glucose levels are more likely to get a stroke

In [None]:
query = '''
select stroke,
case when age > 30 then 'Young'
when age between 31 and 50 then 'Middle Aged' else 'Old' end as Age_Group, 
round(avg(avg_glucose_level),2) as Avg_Glucose from data group by 1,2
'''
age_df = pd.read_sql(query, conn)
age_df

Seeing at the above stats, we cannot say that Age and Glucose level are related. A higher glucose level usually indicates higher chance of stroke

In [None]:
sns.scatterplot(x=train['age'], y=train['avg_glucose_level'], hue=train['stroke'], alpha=0.5)
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)

plt.axhline(train['avg_glucose_level'].mean(), linestyle='--', lw=2, zorder=1, color='red')
plt.annotate(f'avg_glucose_level', (70, 100), fontsize=14, color='black')

plt.title('avg_glucose_level & age relation', fontsize=18)
plt.ylabel('avg_glucose_level')
plt.xlabel('age')
plt.show()

People with age ~50 have higher glucose levels which might lead to a stroke

In [None]:
# Define the columns and create subplots
cols = ['gender', 'hypertension', 'heart_disease', 'ever_married', 
        'work_type', 'Residence_type', 'smoking_status', 'stroke']

fig, axes = plt.subplots(4, 2, figsize=(20, 13))
axes = axes.flatten()  # Flatten the 2D array of axes to easily iterate

# Loop through the columns and create bar plots in subplots
for ax, col in zip(axes, cols):
    sns.barplot(data=train, x=col, y='avg_glucose_level', hue='stroke', ax=ax, ci=None)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.axhline(train['avg_glucose_level'].mean(), linestyle='--', lw=2, zorder=1, color='black')
    ax.set_title(f'Avg Glucose Levels, {col} status wise')
    ax.set_ylabel('Avg Glucose Level')
    ax.set_xlabel(col)
    ax.annotate('Avg Glucose Level', (0.2, 80))

# Adjust layout
plt.tight_layout()
plt.show()

With the above charts what we are tryin to understand is the avg glucose levels are dictated by what factors. 

Is it the **hypertension** or **marital status** or the **residence type** status?

With Neural Networks, one should always keep in mind the model explainablity is very low and they should be used only where the model explainability is not required or the stakeholders don't really need an explanation

## Data Preprocessing

We will be transforming the data in this section and will :-

1. Convert the categorical data into numbers
2. Scale the data so that our Neural Network Converges quickly
3. Split the data for training and testing purposes
4. Fix class Imbalance

In [None]:
train.bmi = train.bmi.fillna(round(train.bmi.mean(),2), axis=0)
train_df = train[train.columns[1:-1]]
train_df.head()

### Converting Categorical Data Into Numbers

First of all we are picking out the categorical columns and making a new dataframe and converting the entire dataframe into a `category` dtypes and the getting the codes for each columns and assigining the values back to the original dataframe column.

In [None]:
cat_df = train_df[['gender', 'ever_married','work_type', 'Residence_type','smoking_status']]
cat_df = cat_df.astype('category')
cat_df = cat_df.apply(lambda x : x.cat.codes)
cat_df.head()

The `train_df` is now our dataframe with the X values

In [None]:
train_df[cat_df.columns] = cat_df.copy()
train_df.head()

### Fixing Class Imbalance

**Class imbalance** refers to a situation in machine learning where the distribution of classes in the training dataset is not equal. In other words, one class has significantly more instances (samples) than another class or classes. Class imbalance is a common issue in many real-world machine learning problems, such as fraud detection, medical diagnosis, and text classification, where one class (usually the minority class) is of more interest, but it has fewer examples to learn from compared to the majority class.

**Why Class Imbalance Is a Problem:**

Class imbalance can cause machine learning models to perform poorly, especially when the model is biased towards the majority class. Here are some challenges posed by class imbalance:

1. **Biased Models:** A model trained on imbalanced data may become biased towards the majority class. It may predict the majority class accurately but perform poorly on the minority class.

2. **Poor Generalization:** Imbalanced datasets can lead to models that do not generalize well to unseen data, as they are often overly skewed towards the majority class.

3. **Misleading Evaluation:** Traditional accuracy metrics can be misleading in imbalanced datasets because a model that predicts the majority class most of the time can still have high accuracy but perform poorly on the minority class.

**Ways to Address Class Imbalance:**

Although there are various ways of addressing this issue, we will be looking at resampling techniques. More Specifically Oversampling.

   a. **Oversampling:** Increase the number of instances in the minority class by duplicating or generating synthetic samples. Techniques like Synthetic Minority Over-sampling Technique (SMOTE) are commonly used.

   b. **Undersampling:** Reduce the number of instances in the majority class by randomly removing samples. This can help balance class distribution.


Before moving any further let's check if our classes are imbalanced or not.

In [None]:
plt.figure(figsize=(22, 6))
train['stroke'].value_counts().plot(kind='bar')
yticks = plt.gca().get_yticks()
ylabels = [f"{round(i / 1000)}k" if i != 0 else "0" for i in yticks]
plt.gca().set_yticklabels(ylabels)
plt.ylabel('Count')
plt.title('Target Class Countplot',  fontsize=15)
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.xticks([0,1], ['No Stroke', 'Stroke'])
plt.xticks(rotation=None)
plt.show()

It seems they are imbalanced so we would use `OverSampler` from `imblearn` library. Before that let's define our X and y

In [None]:
X = train_df.values
y = train.stroke.values

In [None]:
#Resampling
rus = RandomOverSampler(random_state=0)
X_resampled, y_resampled = rus.fit_resample(X,y)

In [None]:
class_counts = {i : len(y_resampled[y_resampled==i]) for i in np.unique(y_resampled)}
print(f'Instances of the class after re-sampling : {tuple(class_counts.items())}')

Using sklearn's `train_test_split` function, we will split the data into train and test sets for performance evaluation later on.

In [None]:
#Splitting Data
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state=42)

Let's now initialize the `StandardScaler` object and transform our data

In [None]:
#Scaling the values
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [None]:
print("Training data shapes:")
print("X_train:", X_train.shape)
print("y_train:", y_train.shape)

print("\nTesting data shapes:")
print("X_test:", X_test.shape)
print("y_test:", y_test.shape)

## Model Architecture

The neural network that we will be working with will have 3 layers. The neural network consists of:

1. **Input Layer:** 10 nodes that receive the initial data.
2. **Hidden Layer:** 10 nodes that process information from the input.
3. **Output Layer:** 10 nodes that produce intermediate results.
4. **Classification Neuron:** Positioned within the Output Layer to finalize binary classification using the 10 intermediate results.

Let's plot out our architecture below:-

In [None]:
##Plotting the Architecture of the network

# Define the architecture of the neural network
input_nodes = 10
hidden_nodes = 10
output_nodes = 10

# Create a figure and axis for plotting
fig, ax = plt.subplots(figsize=(16, 6))

# Plot input layer nodes
for i in range(input_nodes):
    ax.scatter(0, i, color='blue', label='Input Layer' if i == 0 else "")

# Plot hidden layer nodes
for i in range(hidden_nodes):
    ax.scatter(1, i, color='orange', label='Hidden Layer' if i == 0 else "")

# Plot output layer nodes
for i in range(output_nodes):
    ax.scatter(2, i, color='green', label='Output Layer' if i == 0 else "")

# Draw connections between layers
for i in range(input_nodes):
    for j in range(hidden_nodes):
        ax.plot([0, 1], [i, j], color='gray', alpha=0.5)

for i in range(hidden_nodes):
    for j in range(output_nodes):
        ax.plot([1, 2], [i, j], color='gray', alpha=0.5)

# Draw connection to final classification neuron
for i in range(output_nodes):
    ax.plot([2, 3], [i, 5], color='gray', alpha=0.5)

# Add labels to layers
ax.text(-0.1, input_nodes // 2, 'Input', fontsize=12, va='center', ha='right')
ax.text(1, hidden_nodes // 2, 'Hidden', fontsize=12, va='center', ha='right')
ax.text(2.1, output_nodes // 2, 'Output', fontsize=12, va='center', ha='left')
ax.text(3, 5, 'Classification', fontsize=12, va='center', ha='left')

# Set axis properties
ax.axis('off')

# Add legend
ax.legend(bbox_to_anchor=(1, 1))

# Set title
plt.title("3-Layer Neural Network Architecture")

# Show the plot
plt.tight_layout()
plt.show()

We will now go ahead and write some function and the code outlines the construction of a simple 3-layer neural network to predict the target. Let's break down what's happening at a high level:

**Network Architecture:**

The neural network architecture consists of three layers: an input layer with 10 nodes, a hidden layer with 10 nodes, and an output layer with 10 nodes. The purpose of the hidden layer is to capture complex relationships within the data. The network's operation can be summarized as follows:

1. **Initializing Parameters:** The network's parameters, including weights and biases, are initialized with random values to prevent issues like vanishing or exploding gradients.

2. **Activation Functions:** Two activation functions are employed:
   - The Rectified Linear Unit (ReLU) is used in the hidden layer, introducing non-linearity to the model.
   - The sigmoid function is utilized in the output layer, producing values between 0 and 1 for binary classification.

3. **Forward Propagation:** During forward propagation, input data (`X`) passes through the layers using the initialized parameters and activation functions. Activations for both the hidden layer (`a_hidden`) and the output layer (`a_output`) are computed. The dot product operation requires compatible matrix dimensions, and the ReLU function introduces non-linearity by applying it after the dot product and bias addition.

4. **Activation Derivatives:** To calculate gradients during backpropagation, derivatives of the sigmoid and ReLU functions (`sigmoid_derivative` and `relu_derivative`) are used.

5. **Backpropagation:** Gradients of the loss with respect to the parameters are computed using the chain rule. These gradients guide parameter updates in the direction that minimizes the loss.

6. **Updating Parameters:** The `update_parameters` function modifies the weights and biases based on the calculated gradients and a specified learning rate.

7. **Loss and Accuracy:** The `compute_loss` function determines the binary cross-entropy loss, quantifying the disparity between predicted and actual labels. The `compute_accuracy` function assesses prediction accuracy. Mathematically our loss would be calculated using this formula which is basically your cross entropy


$$ L = \frac {-1}{n} \space \Sigma\: [y_{i}\space log{y_i} + (1-y_{i}) \space log({1-y_i}) ]$$ 

In [None]:
# Network architecture
input_size = input_nodes
hidden_size = hidden_nodes
output_size = output_nodes  # The output layer before the final classification neuron

# Initialize weights and biases
def initialize_parameters(input_size, hidden_size, output_size):
    """
    Initialize weights and biases for the neural network.

    Parameters:
    input_size -- Number of input nodes
    hidden_size -- Number of nodes in the hidden layer
    output_size -- Number of output nodes

    Returns:
    w_input_hidden -- Initialized weights for input-hidden layer
    b_hidden -- Initialized biases for hidden layer
    w_hidden_output -- Initialized weights for hidden-output layer
    b_output -- Initialized biases for output layer
    w_output_classify -- Initialized weights for output-classification neuron
    b_classify -- Initialized bias for classification neuron
    """
    np.random.seed(42)  # For reproducibility
    
    # Initialize weights with small random values
    w_input_hidden = np.random.randn(hidden_size, input_size) * 0.01
    w_hidden_output = np.random.randn(output_size, hidden_size) * 0.01
    w_output_classify = np.random.randn(1, output_size) * 0.01
    
    # Initialize biases as zeros
    b_hidden = np.zeros((hidden_size, 1))
    b_output = np.zeros((output_size, 1))
    b_classify = np.zeros((1, 1))
    
    return w_input_hidden, b_hidden, w_hidden_output,b_output,w_output_classify, b_classify

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def relu(Z):
    return np.maximum(0, Z)

def forward_propagation(X, w_input_hidden, b_hidden, w_hidden_output, b_output, w_output_classify, b_classify):
    """
    Perform forward propagation through the neural network.

    Parameters:
    X -- Input data (features) of shape (input_size, batch_size)
    w_input_hidden -- Weights for input-hidden layer
    b_hidden -- Biases for hidden layer
    w_hidden_output -- Weights for hidden-output layer
    b_output -- Biases for output layer
    w_output_classify -- Weights for output-classification neuron
    b_classify -- Bias for classification neuron

    Returns:
    a_hidden -- Activations of the hidden layer after ReLU activation
    a_output -- Activations of the output layer before classification after ReLU activation
    a_classify -- Activations of the classification neuron after sigmoid activation
    """
    # Input to Hidden Layer
    z_hidden = np.dot(w_input_hidden, X) + b_hidden
    a_hidden = relu(z_hidden)  # Use ReLU activation
    
    # Hidden to Output Layer (before classification)
    z_output = np.dot(w_hidden_output, a_hidden) + b_output
    a_output = relu(z_output)
    
    # Output (before classification) to Classification Neuron
    z_classify = np.dot(w_output_classify, a_output) + b_classify
    a_classify = sigmoid(z_classify)
    
    return a_hidden, a_output, a_classify

def sigmoid_derivative(Z):
    return sigmoid(Z) * (1 - sigmoid(Z))

def relu_derivative(Z):
    return np.where(Z > 0, 1, 0)

def backpropagation(X, Y, a_hidden, a_output, a_classify,
                    w_input_hidden, w_hidden_output, w_output_classify):
    """
    Perform backpropagation to calculate gradients of the loss with respect to parameters.

    Parameters:
    X -- Input data (features) of shape (input_size, batch_size)
    Y -- True labels (ground truth) of shape (output_size, batch_size)
    a_hidden -- Activations of the hidden layer after forward propagation
    a_output -- Activations of the output layer before classification after forward propagation
    a_classify -- Activations of the classification neuron after forward propagation
    w_input_hidden -- Weights for input-hidden layer
    w_hidden_output -- Weights for hidden-output layer
    w_output_classify -- Weights for output-classification neuron

    Returns:
    dw_input_hidden -- Gradients of weights for input-hidden layer
    db_hidden -- Gradients of biases for hidden layer
    dw_hidden_output -- Gradients of weights for hidden-output layer
    db_output -- Gradients of biases for output layer
    dw_output_classify -- Gradients of weights for output-classification neuron
    db_classify -- Gradient of bias for classification neuron
    """
    # Compute gradients
    dz_classify = a_classify - Y
    dw_output_classify = np.dot(dz_classify, a_output.T) / X.shape[1]
    db_classify = np.sum(dz_classify, axis=1, keepdims=True) / X.shape[1]

    dz_output = np.dot(w_output_classify.T, dz_classify) * sigmoid_derivative(a_output)
    dw_hidden_output = np.dot(dz_output, a_hidden.T) / X.shape[1]
    db_output = np.sum(dz_output, axis=1, keepdims=True) / X.shape[1]

    dz_hidden = np.dot(w_hidden_output.T, dz_output) * relu_derivative(a_hidden)
    dw_input_hidden = np.dot(dz_hidden, X.T) / X.shape[1]
    db_hidden = np.sum(dz_hidden, axis=1, keepdims=True) / X.shape[1]

    return dw_input_hidden, db_hidden, dw_hidden_output, db_output, dw_output_classify, db_classify

def update_parameters(w_input_hidden, b_hidden, w_hidden_output, b_output, w_output_classify, b_classify,
                      dw_input_hidden, db_hidden, dw_hidden_output, db_output, dw_output_classify, db_classify,
                      learning_rate):
    """
    Update parameters using gradient descent.
    
    Parameters:
    w_input_hidden -- weights for input-hidden layer
    b_hidden -- biases for hidden layer
    w_hidden_output -- weights for hidden-output layer
    b_output -- biases for output layer
    w_output_classify -- weights for output-classification neuron
    b_classify -- bias for classification neuron
    dw_input_hidden -- gradients of weights for input-hidden layer
    db_hidden -- gradients of biases for hidden layer
    dw_hidden_output -- gradients of weights for hidden-output layer
    db_output -- gradients of biases for output layer
    dw_output_classify -- gradients of weights for output-classification neuron
    db_classify -- gradient of bias for classification neuron
    learning_rate -- learning rate for gradient descent
    
    Returns:
    updated_w_input_hidden -- updated weights for input-hidden layer
    updated_b_hidden -- updated biases for hidden layer
    updated_w_hidden_output -- updated weights for hidden-output layer
    updated_b_output -- updated biases for output layer
    updated_w_output_classify -- updated weights for output-classification neuron
    updated_b_classify -- updated bias for classification neuron
    """
    updated_w_input_hidden = w_input_hidden - learning_rate * dw_input_hidden
    updated_b_hidden = b_hidden - learning_rate * db_hidden
    updated_w_hidden_output = w_hidden_output - learning_rate * dw_hidden_output
    updated_b_output = b_output - learning_rate * db_output
    updated_w_output_classify = w_output_classify - learning_rate * dw_output_classify
    updated_b_classify = b_classify - learning_rate * db_classify
    
    return updated_w_input_hidden, updated_b_hidden, updated_w_hidden_output, updated_b_output,updated_w_output_classify, updated_b_classify

def compute_loss(Y_true, Y_pred):
    """
    Compute binary cross-entropy loss.
    
    Parameters:
    Y_true -- true labels (ground truth)
    Y_pred -- predicted labels
    
    Returns:
    loss -- computed loss
    """
    m = Y_true.shape[1]
    loss = -1/m * np.sum(Y_true * np.log(Y_pred) + (1 - Y_true) * np.log(1 - Y_pred))
    return loss

def compute_accuracy(Y_true, Y_pred):
    """
    Compute accuracy.
    
    Parameters:
    Y_true -- true labels (ground truth)
    Y_pred -- predicted labels
    
    Returns:
    accuracy -- computed accuracy
    """
    m = Y_true.shape[1]
    predictions = (Y_pred > 0.5).astype(int)
    accuracy = np.sum(predictions == Y_true) / m
    return accuracy

Below is an animation that shows how the single pass through the network:

![Image](https://yogayu.github.io/DeepLearningCourse/03/video/layer.gif)


**Forward Propagation**

The `forward_propagation` function is responsible for carrying input data through the neural network, ultimately generating predictions. It executes the following sequential steps:

1. **Input to Hidden Layer (z_hidden and a_hidden):**
   - `z_hidden`: This represents the linear combination of input features `X` with the weights `w_input_hidden` connecting the input layer to the hidden layer, along with the addition of the bias term `b_hidden`.
   - `a_hidden`: After `z_hidden`, the result undergoes the ReLU (Rectified Linear Activation) function. This activation function acts by setting negative values to zero and preserving positive values.

2. **Hidden to Output Layer (z_output and a_output):**
   - `z_output`: This signifies the linear combination of the activations from the hidden layer, `a_hidden`, with the weights `w_hidden_output` connecting the hidden layer to the output layer. It also incorporates the bias term `b_output`.
   - `a_output`: Similar to the hidden layer, the ReLU activation function is applied to `z_output`. This introduces non-linearity to the network's output.

3. **Output to Classification Neuron (z_classify and a_classify):**
   - `z_classify`: It represents the linear combination of activations from the output layer, `a_output`, with the weights `w_output_classify` connecting the output layer to the classification neuron. The bias term `b_classify` is added as well.
   - `a_classify`: The final network output, just prior to classification, is derived by applying the sigmoid activation function to `z_classify`. The sigmoid function confines the output to the range of 0 to 1, making it suitable for binary classification tasks.


**Back propogation**


![image.png](https://miro.medium.com/v2/resize:fit:1280/1*VF9xl3cZr2_qyoLfDJajZw.gif)!

Let's go over the Backpropogation in a bit detail since this is a very crucial part of Neural Networks. The `backpropagation` function is a integral part of training a neural network. It's responsible for computing the gradients of the loss function with respect to the network's parameters (weights and biases) so that those parameters can be updated using gradient descent. Let's break down the different steps in this function:


1. **Compute Gradients for Output-Classification Layer:**
   - `dz_classify`: Gradient of the loss with respect to the output of the classification neuron.
     ```
     dz_classify = a_classify - Y
     ```

   - `dw_output_classify`: Gradient of the loss with respect to the weights connecting the hidden layer to the classification neuron.
     ```
     dw_output_classify = (1/m) * dz_classify * a_output^T
     ```

   - `db_classify`: Gradient of the loss with respect to the bias of the classification neuron.
     ```
     db_classify = (1/m) * Σ(dz_classify)
     ```

2. **Compute Gradients for Hidden-Output Layer:**
   - `dz_output`: Gradient of the loss with respect to the output of the hidden layer.
     ```
     dz_output = (w_output_classify^T * dz_classify) * g'(z_output)
     ```
     where `g'` is the derivative of the sigmoid activation function.

   - `dw_hidden_output`: Gradient of the loss with respect to the weights connecting the hidden layer to the output layer.
     ```
     dw_hidden_output = (1/m) * dz_output * X^T
     ```

   - `db_output`: Gradient of the loss with respect to the bias of the output layer.
     ```
     db_output = (1/m) * Σ(dz_output)
     ```

3. **Compute Gradients for Input-Hidden Layer:**
   - `dz_hidden`: Gradient of the loss with respect to the output of the hidden layer.
     ```
     dz_hidden = (w_hidden_output^T * dz_output) * g'(z_hidden)
     ```
     where `g'` is the derivative of the ReLU activation function.

   - `dw_input_hidden`: Gradient of the loss with respect to the weights connecting the input layer to the hidden layer.
     ```
     dw_input_hidden = (1/m) * dz_hidden * X^T
     ```

   - `db_hidden`: Gradient of the loss with respect to the bias of the hidden layer.
     ```
     db_hidden = (1/m) * Σ(dz_hidden)
     ```

The calculated gradients are then used to update the parameters of the network in the `update_parameters` function. 

> This iterative process of **forward propagation** and **backpropagation** helps the neural network learn the appropriate weights and biases that minimize the loss and improve its performance on the given task.

**Derivatives**

In the specific case of activation functions like sigmoid and ReLU, let's understand why their derivatives are important:

1. **Sigmoid Activation Function:**
   - The sigmoid function maps any input value to a value between 0 and 1. It has a smooth curve and is commonly used in the past to introduce non-linearity in neural networks.
   - The derivative of the sigmoid function, denoted as `sigmoid_derivative`, is calculated as `sigmoid(x) * (1 - sigmoid(x))`. It's worth noting that the derivative of the sigmoid function is relatively small when the input is very large or very small.
   
   
   The derivative of a sigmoid function, such as the logistic sigmoid function, is given by:
   

$$f'(x) = f(x) \cdot (1 - f(x))$$

   where $f(x)$ is the sigmoid function itself:

$$f(x) = \frac{1}{1 + e^{-x}}$$

   So, the derivative of a sigmoid function at any point $x$ can be computed using the sigmoid function's value at that point.

2. **ReLU (Rectified Linear Activation) Function:**
   - The ReLU function outputs the input value as is if it's positive, and outputs zero for negative input values.
   - The derivative of the ReLU function, denoted as `relu_derivative`, is 1 for positive input values and 0 for negative input values.
   - The derivative of a Rectified Linear Unit (ReLU) function, ReLU(x), is:

$$f'(x) = \begin{cases}
1 & \text{if } x > 0 \\
0 & \text{if } x \leq 0
\end{cases}$$
    
   In this mathematical expression, the derivative of ReLU(x) is equal to 1 for $x$ greater than 0 and equal to 0 for $x$ less than or equal to 0.

## Training Loop

The loop iterates through the specified number of epochs, updating the network's parameters based on the calculated gradients. The printed accuracy and loss values provide insight into the model's performance during training.

1. **Initialize Parameters:** Initialize the weights and biases for all layers using the `initialize_parameters` function.

2. **Hyperparameters:** Set hyperparameters like the learning rate and the number of epochs.

3. **Lists for Tracking:** Create lists `accuracy_list` and `loss_list` to store accuracy and loss values for each epoch.

4. **Training Loop:** Loop through each epoch.

5. **Forward Propagation:** Perform forward propagation on the entire training set to obtain predictions (`a_classify`).

6. **Calculate Loss and Accuracy:** Calculate the loss using the predicted values and the ground truth (`y_train`). Also, calculate `accuracy` using the same values.

7. **Backpropagation:** Perform backpropagation to compute gradients of the loss with respect to the parameters.

8. **Update Parameters:** Update the parameters (weights and biases) using the computed gradients and the learning rate.

9. **Print Progress:** Print accuracy and loss values after few epoch.

10. **Save Values:** Append the accuracy and loss values to their respective lists.

In [None]:
# Initialize parameters
w_input_hidden, b_hidden, w_hidden_output, b_output, w_output_classify, b_classify = initialize_parameters(input_size, hidden_size, output_size)

# Hyperparameters
learning_rate = 0.001
num_epochs = 10

# Lists to store accuracy, loss
accuracy_list = []
loss_list = []
#Dictionary to store weights
Weights = {}
#decay rate for decaying the learning rate over time
decay_rate = 5

for epoch in range(num_epochs):
    # Forward propagation on the entire training set
    a_hidden, a_output, a_classify = forward_propagation(X_train.T, w_input_hidden, b_hidden,
                                                         w_hidden_output, b_output, w_output_classify, b_classify)
    
    # Calculate loss
    loss = compute_loss(y_train.reshape(1, -1), a_classify)
    
    # Calculate accuracy
    accuracy = compute_accuracy(y_train.reshape(1, -1), a_classify)
    
    # Backpropagation
    dw_input_hidden, db_hidden, dw_hidden_output, db_output, dw_output_classify, db_classify = backpropagation(X_train.T, y_train.reshape(1, -1), 
                                                    a_hidden, a_output, a_classify, w_input_hidden, w_hidden_output, w_output_classify)
    
    learning_rate = (1 / (1 + decay_rate)) * learning_rate

    
    # Update parameters
    w_input_hidden, b_hidden, w_hidden_output, b_output, w_output_classify, b_classify = update_parameters(w_input_hidden, b_hidden, w_hidden_output, 
                            b_output, w_output_classify, b_classify, dw_input_hidden, db_hidden, dw_hidden_output, db_output, dw_output_classify, 
                                         db_classify, learning_rate)
    
    # Print accuracy and loss 
    if (epoch + 1) % 4 == 0:
        print(f"Epoch {epoch + 1}/{num_epochs} == accuracy : {accuracy:.4f} - loss : {loss:.4f}  - learning_rate : {learning_rate:.2e}")
    
    # Save accuracy and loss values
    accuracy_list.append(accuracy)
    loss_list.append(loss)
    
    #Saving the weights at the last epoch
    if epoch == num_epochs-1:
        Weights[f'epoch_{epoch}'] = {
        'w_input_hidden': w_input_hidden,
        'b_hidden': b_hidden,
        'w_hidden_output': w_hidden_output,
        'b_output': b_output,
        'w_output_classify': w_output_classify,
        'b_classify': b_classify
                                    }

## Loss and Accuracy Plots

In [None]:
# Create a figure with two subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20, 6))
fig.subplots_adjust(wspace=0.3)

# Plot Loss
ax1.plot(loss_list, color='tab:blue')
ax1.set_title("Loss Plot")
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.spines['top'].set_visible(False)
ax1.spines['right'].set_visible(False)

# Plot Accuracy
ax2.plot(accuracy_list, color='tab:orange')
ax2.set_title("Accuracy Plot")
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Accuracy')
ax2.spines['top'].set_visible(False)
ax2.spines['right'].set_visible(False)

# Add a suptitle
plt.suptitle("Training Progress", fontsize=16)

# Show the plots
plt.tight_layout()
plt.show()

- We can see there is a steep increase in the accuracy which is not something very good for a stable training and this can be because of learning rate. The learning rate is a hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated. 

- Choosing the learning rate is challenging as a value too small may result in a long training process that could get stuck, whereas a value too large may result in learning a sub-optimal set of weights too fast or an unstable training process. `Tensorflow` and `PyTorch` both implements the concept of [learning rate decay](https://machinelearningmastery.com/understand-the-dynamics-of-learning-rate-on-deep-learning-neural-networks/) which makes training much better and avoid such steep changes in the metrics causing the network to miss the maxima sometimes and create a haphazard weight finding path

In [None]:
print(f'The accuracy after training the neural network is {round(accuracy_list[-1]*100,2)}%')

## Predictions


- During forward propagation, the neural network calculates the probabilities of each class for the given input data. The class with the highest probability is considered the predicted class for each input example. By using the `np.argmax` function, the predicted class indices are obtained, which can be associated with class labels for interpretation and evaluation.

- Let's now make predictions. But how do we do that. Well, time to define more functions. The `get_predictions` function takes an array of probabilities as input and returns the indices corresponding to the maximum probability along each column. This is effectively finding the predicted class for each input example. The function uses `np.argmax` from the NumPy library to achieve this. The `make_predictions` function takes the following inputs:
    - `X_test`: The input data (features) on which predictions need to be made.
    - Weights and biases for different layers of the neural network: `w_input_hidden`, `b_hidden`, `w_hidden_output`, `b_output`, `w_output_classify`, `b_classify`.


Here's the process that takes place within the `make_predictions` function:

1. **Forward Propagation:**- The `forward_propagation` function is called using the provided weights and biases, along with the `X_test` data. This calculates the activations of different layers in the neural network.

2. **Getting Probabilities:** - The output of the forward propagation contains the activations of the last layer, which represents the probabilities of the different classes. The variable `_` is used to capture the intermediate activations that we don't need for predictions. The variable `probs` contains the probability values for each class.

3. **Getting Predictions:** - The `get_predictions` function is called with the `probs` array as input. This function converts the probability values into predicted class indices using the maximum probability along each column.

4. **Returning Predictions:** - The predicted class indices are stored in the `predictions` variable and then returned as the output of the `make_predictions` function. We also 

In [None]:
def get_predictions(probs):
    return np.argmax(probs, 0)

def make_predictions(X_test, w_input_hidden, b_hidden, w_hidden_output, b_output, w_output_classify, b_classify):
    _, _, probs = forward_propagation(X_test.T, w_input_hidden, b_hidden,
                                                         w_hidden_output, b_output, w_output_classify, b_classify)
    predictions = get_predictions(probs)
    return predictions

In [None]:
preds = make_predictions(X_test, w_input_hidden, b_hidden, w_hidden_output, b_output, w_output_classify, b_classify)

## Evaluating the Performance

We can see the neural network is performing well on the data and is only giving 4% error which is pretty good. Feel free to go ahead and play around with the architecture or the hyperparameters to see how far can you take this.

In [None]:
misclassified_count = len(y_test[y_test != preds])
total_cases = len(y_test)
error_rate = misclassified_count / total_cases * 100
print(f"{misclassified_count} misclassified cases out of {total_cases}, error rate : {round(error_rate,2)}%")

In [None]:
misclassified = y_test[y_test != preds]
misclassification_counts = {}
for class_label in set(y_test):
    misclassification_counts[class_label] = np.sum(misclassified == class_label)
print('Misclassified Classes Value Counts')
misclassification_counts

There is another way to evaluate the model which is by calculating how many correct True Positives were correctly classified.

- True Positive Rate (TPR), also known as Sensitivity or Recall, measures a model's ability to correctly identify positive cases, 
- False Positive Rate (FPR) quantifies how often the model incorrectly labels negative cases as positive. 
-  These metrics play a crucial role in evaluating binary classification models, particularly when dealing with imbalanced datasets.

In [None]:
def evaluate(y, y_preds):
    cf = confusion_matrix(y_test, preds)
    TP = cf[0][0]
    FP = cf[0][1]
    FN = cf[1][0]
    TN = cf[1][1]
    TPR = TP/(TP+FN)
    FPR = FP / (FP + TN) if (FP + TN) > 0 else 0  # False Positive Rate
    print(f'True Positive Rate : {round(TPR,4)}')
    print(f'False Positive Rate : {round(FPR,4)}')

In [None]:
y_train_preds = make_predictions(X_train, w_input_hidden, b_hidden, w_hidden_output, b_output, 
                                 w_output_classify, b_classify)
print('Model Evaluation, Train Dataset')
evaluate(X_train, y_train_preds)

In [None]:
y_preds = make_predictions(X_test, w_input_hidden, b_hidden, w_hidden_output, b_output, w_output_classify, b_classify)
print('Model Evaluation, Test Dataset')
evaluate(y_test, y_preds)

An ideal case is a TPR equal to 1 and an FPR equal to 0.

## Conclusion

- In conclusion, this endeavor into the realm of neural networks without the aid of mainstream deep learning libraries has provided a foundational understanding of their mechanics and importance. While the world of artificial intelligence and machine learning has evolved significantly, it is imperative to grasp the basics of neural networks, as they have become integral to our daily lives. 

- Neural networks, with their ability to recognize patterns and relationships within complex datasets, underpin a multitude of applications that touch various aspects of modern living. From image and speech recognition to recommendation systems and even autonomous vehicles, neural networks have revolutionized industries and transformed the way we interact with technology.

- By using numpy and other auxiliary libraries, we built a rudimentary three-layer neural network to predict temperatures. This provided us with a glimpse into the core concepts of data preprocessing, parameter initialization, activation functions, forward and backward propagation, and the significance of hyperparameters.

- While this exploration offered only a glimpse into the vast world of neural networks, it lays a solid foundation for delving further into the intricacies of deep learning. As AI continues to reshape industries and societies, a fundamental understanding of neural networks becomes pivotal for harnessing their potential and contributing to the ongoing technological evolution.

*P.s* I also have a notebook where I implemented [Gradient Descent from Scratch](https://www.kaggle.com/code/bhatnagardaksh/gradient-descent-from-scratch) which you might want to look at, if interested.

Thanks for taking the time to go through the notebook. Please consider leaving an upvote and a follow if you liked my work.