## Artificial Neural Networks

Neural Networks lie at the center of deep learning algorithms. It refers to  a method of machine learning that mimicks the structure od working of neurons in the human brain.

ANNs are build of multiple node layers starting with an input layer, one or more hidden layers and am output layer. Each node or artificial neuron is connected to another and has an associated weight and threshold attached to it. On receiving an input, a neuron compares the input value to it's threshold. If the input value exceeds the threshold condition the node is activated and transfers the input to the forward connecting neuron via the associated weight.

### Key Terminology

1. **Neurons (Nodes):** Basic units, receive input, process with weights, produce output, organized in layers.

2. **Layers:** Neural network organization: input, hidden, output; processes data sequentially.

3. **Weights and Biases:** Neuron connections represented by weights; adjusted during training to minimize error between predicted and correct output; biases added to introduce non-linearity to the system.

4. **Activation Functions:**used to calculate weighted sum of input and bias in a neuron and define whether a neuron is activated or not; e.g., sigmoid, tanh, ReLU; model complexity.

5. **Feedforward and Backpropagation:** : During the training phase, data is fed forward through the network to produce predictions. The error between predicted and actual output is then used to update the weights and biases in a process called backpropagation. This iterative process is used to minimize the error and improve the network's performance.

6. **Deep Learning:** Neural networks with multiple layers; deep learning learns hierarchical representations for complex tasks.

7. **CNNs and RNNs:** Specialized neural networks: CNNs for images, RNNs for sequences.

### Steps

1. **Initialization:**
   - Set initial values for weights and biases.
   - Random initialization is common to break symmetry.

2. **Forward Propagation:**
   - Input data is fed forward through the network.
   - Neurons in each layer process inputs using weights and biases.
   - Activation function introduces non-linearity.

3. **Loss Calculation:**
   - Compare the network's output with the actual target values.
   - Calculate the loss (error) using a predefined loss function.

4. **Backpropagation:**
   - Calculate gradients of the loss with respect to weights and biases.
   - Update weights and biases to minimize the loss.
   - Utilize optimization algorithms like gradient descent.

5. **Iterative Training:**
   - Repeat forward propagation, loss calculation, and backpropagation.
   - Adjust weights and biases iteratively to improve performance.

6. **Validation:**
   - Use a separate validation dataset to assess generalization.
   - Prevent overfitting by monitoring performance on unseen data.

7. **Hyperparameter Tuning:**
   - Adjust hyperparameters (learning rate, batch size, etc.) for optimization.
   - Fine-tune to balance convergence speed and model stability.

8. **Testing:**
   - Evaluate the trained model on a separate test dataset.
   - Assess the model's performance on previously unseen data.

## Activation functions

Identity function
* linear
* passes incoming signal as otgoing signal without change
* g(x) = x

Non-Linear Functions

Step
* domain: real nums
* Range: 0 or 1
* g(x) = { 1, if net_i/p > 0/threshold ; 0, otherwise }

Sigmoid

* S shaped continuous func
* domain: Real nums
* Range: (0,1)
* $g(x) = \frac {1}{1 + e^(-x)}$
* aka as logisitic func because
* In a machine learning model, the output of a linear or non-linear function is transformed using a sigmoid function to obtain a predicted probability for a binary classification problem.
* 3 properties of sigmoid func that are very helpful: its monotonic, continuous and diffrentiable
    1. **it maps feature space into probability funcs:** by transforming the output of a linear or non-linear function(sum(x_i, w_i) into a predicted probability for a binary classification problem.

        --> x is closer to infinity (really large) output = 1,

        --> x is closer to - infinity output = 0,

        --> x =0, output 0.5
    2. **using exponential establishes non-linear relationship:** which helps by ensuring most output values are closer to 1 or 0
    3. **it's differentiable:** so it allows gradient calculation which is imp for optimization (gradient descent in general ML and back propogation in neural networks)

Tanh

* domain: real nums
* range: (-1, 1)
* better than sigmoid because:
 - o/p is symmetric around 0
 - no vanishing gradients
 - i/p mapped asymmetrically around 0
* $g(x) = \frac {e^(x) - e^(-x)}{e^(x) + e^(-x)}

ReLU

* very commonly used
* range: ( 0 to inf)
* easy to implement, computationally efficient
* promotes sparsity (set -ve values to 0), faster convergence
* g(x) = max(0, x)


##Steps to implement NN using Keras

1. Define your model. Create a sequence and add layers.

2. Compile your model. Specify loss functions and optimizers.

3. Fit your model. Execute the model using data.

4. Make predictions. Use the model to generate predictions on new data.

### Import libraries and dataset

dataset src: https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset/data

In [276]:
import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras import optimizers
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.impute import KNNImputer
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from imblearn.over_sampling import SMOTE

In [231]:
data = pd.read_csv("/content/healthcare-dataset-stroke-data.csv")

### EDA

In [232]:
data.stroke.value_counts()

0    4861
1     249
Name: stroke, dtype: int64

In [233]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5110 entries, 0 to 5109
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   id                 5110 non-null   int64  
 1   gender             5110 non-null   object 
 2   age                5110 non-null   float64
 3   hypertension       5110 non-null   int64  
 4   heart_disease      5110 non-null   int64  
 5   ever_married       5110 non-null   object 
 6   work_type          5110 non-null   object 
 7   Residence_type     5110 non-null   object 
 8   avg_glucose_level  5110 non-null   float64
 9   bmi                4909 non-null   float64
 10  smoking_status     5110 non-null   object 
 11  stroke             5110 non-null   int64  
dtypes: float64(3), int64(4), object(5)
memory usage: 479.2+ KB


In [234]:
data.describe()

Unnamed: 0,id,age,hypertension,heart_disease,avg_glucose_level,bmi,stroke
count,5110.0,5110.0,5110.0,5110.0,5110.0,4909.0,5110.0
mean,36517.829354,43.226614,0.097456,0.054012,106.147677,28.893237,0.048728
std,21161.721625,22.612647,0.296607,0.226063,45.28356,7.854067,0.21532
min,67.0,0.08,0.0,0.0,55.12,10.3,0.0
25%,17741.25,25.0,0.0,0.0,77.245,23.5,0.0
50%,36932.0,45.0,0.0,0.0,91.885,28.1,0.0
75%,54682.0,61.0,0.0,0.0,114.09,33.1,0.0
max,72940.0,82.0,1.0,1.0,271.74,97.6,1.0


In [237]:
data.shape

(5110, 12)

### Preprocessing

In [239]:
#Encoding
categorical_cols = ['gender', 'ever_married', 'work_type', 'Residence_type', 'smoking_status']
encoded_data = pd.get_dummies(data, columns=categorical_cols)

In [240]:
encoded_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5110 entries, 0 to 5109
Data columns (total 23 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              5110 non-null   int64  
 1   age                             5110 non-null   float64
 2   hypertension                    5110 non-null   int64  
 3   heart_disease                   5110 non-null   int64  
 4   avg_glucose_level               5110 non-null   float64
 5   bmi                             4909 non-null   float64
 6   stroke                          5110 non-null   int64  
 7   gender_Female                   5110 non-null   uint8  
 8   gender_Male                     5110 non-null   uint8  
 9   gender_Other                    5110 non-null   uint8  
 10  ever_married_No                 5110 non-null   uint8  
 11  ever_married_Yes                5110 non-null   uint8  
 12  work_type_Govt_job              51

In [241]:
encoded_data.shape

(5110, 23)

In [242]:
#Split features and target value
y = encoded_data['stroke']
X = encoded_data.drop(columns=['stroke'])

In [243]:
X.shape

(5110, 22)

In [246]:
# Handling na values in bmi
knn_imp = KNNImputer(n_neighbors=5)
X = knn_imp.fit_transform(X)

In [261]:
# Handling imbalance using synthetic oversampling
import pandas as pd
smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X, y)
X_resampled = pd.DataFrame(X_resampled)
y_resampled = pd.DataFrame(y_resampled)
resampled_data = pd.concat([pd.DataFrame(X_resampled), pd.DataFrame(y_resampled)], axis=1)

In [262]:
# Normalization
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_resampled)

In [263]:
#Train test split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_resampled, test_size = 0.2, random_state=42)

### Model Creation, compilation and training

In [268]:
X_train.shape

(7777, 22)

In [269]:
#Model Creation
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(22,))) #input layer
model.add(Dropout(0.6))  # Dropout layer after the first Dense layer
# Hidden layer
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.6))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.6))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.6))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.6))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.6))
model.add(Dense(1, activation='sigmoid')) #output layer

In [270]:
model.summary() #display model contents

Model: "sequential_13"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_67 (Dense)            (None, 64)                1472      
                                                                 
 dropout_42 (Dropout)        (None, 64)                0         
                                                                 
 dense_68 (Dense)            (None, 128)               8320      
                                                                 
 dropout_43 (Dropout)        (None, 128)               0         
                                                                 
 dense_69 (Dense)            (None, 256)               33024     
                                                                 
 dropout_44 (Dropout)        (None, 256)               0         
                                                                 
 dense_70 (Dense)            (None, 128)             

In [271]:
#Model Compilation
model.compile(optimizer=Adam(learning_rate=0.0001), loss='binary_crossentropy', metrics=['acc'])
early_stop = EarlyStopping(monitor='val_loss', patience=30, restore_best_weights=True)

In [272]:
model.fit(X_train, y_train, epochs=1000, callbacks=[early_stop], validation_split=0.2)

Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 8/1000
Epoch 9/1000
Epoch 10/1000
Epoch 11/1000
Epoch 12/1000
Epoch 13/1000
Epoch 14/1000
Epoch 15/1000
Epoch 16/1000
Epoch 17/1000
Epoch 18/1000
Epoch 19/1000
Epoch 20/1000
Epoch 21/1000
Epoch 22/1000
Epoch 23/1000
Epoch 24/1000
Epoch 25/1000
Epoch 26/1000
Epoch 27/1000
Epoch 28/1000
Epoch 29/1000
Epoch 30/1000
Epoch 31/1000
Epoch 32/1000
Epoch 33/1000
Epoch 34/1000
Epoch 35/1000
Epoch 36/1000
Epoch 37/1000
Epoch 38/1000
Epoch 39/1000
Epoch 40/1000
Epoch 41/1000
Epoch 42/1000
Epoch 43/1000
Epoch 44/1000
Epoch 45/1000
Epoch 46/1000
Epoch 47/1000
Epoch 48/1000
Epoch 49/1000
Epoch 50/1000
Epoch 51/1000
Epoch 52/1000
Epoch 53/1000
Epoch 54/1000
Epoch 55/1000
Epoch 56/1000
Epoch 57/1000
Epoch 58/1000
Epoch 59/1000
Epoch 60/1000
Epoch 61/1000
Epoch 62/1000
Epoch 63/1000
Epoch 64/1000
Epoch 65/1000
Epoch 66/1000
Epoch 67/1000
Epoch 68/1000
Epoch 69/1000


<keras.src.callbacks.History at 0x7a103302c4f0>

### Model evaluation

In [281]:
y_pred_prob = model.predict(X_test)
y_pred = np.round(y_pred_prob).astype(int).ravel()



In [282]:
# Calculate evaluation metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

# Print the evaluation metrics
print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1-Score: {f1}")

Accuracy: 0.6976863753213368
Precision: 0.6243489583333334
Recall: 0.988659793814433
F1-Score: 0.7653631284916201
