# Neural Networks: Theory, Concepts, and Implementation

<b><h3>Introduction:</h3></b>
Neural networks, inspired by the intricate structure and functioning of the human brain, stand as a powerful paradigm in the realm of artificial intelligence and machine learning.

At their core, neural networks are computational models designed to mimic the interconnected neurons that characterize the human brain's neural architecture. These networks have proven to be exceptionally adept at tackling complex tasks, particularly in areas such as image and speech recognition, natural language processing, and pattern analysis.

At a fundamental level, a neural network consists of layers of interconnected nodes, or "neurons," each contributing to the transformation and interpretation of input data. The network learns from examples, adjusting its parameters through a process known as training. This learning process enables neural networks to discern patterns, make predictions, and generalize insights from the data they are exposed to.

The versatility of neural networks is exemplified in their ability to adapt to diverse tasks, ranging from simple classifications to intricate decision-making processes. Through the ingenious application of mathematical concepts like calculus and linear algebra, neural networks can autonomously refine their internal representations, paving the way for increasingly accurate and sophisticated outcomes.

As technological advancements continue to push the boundaries of artificial intelligence, neural networks have become instrumental in powering innovations across industries, from healthcare and finance to autonomous vehicles and beyond. Understanding the principles and mechanisms that govern neural networks is pivotal for grasping the potential they hold in transforming the landscape of intelligent computing. This introduction sets the stage for a deeper exploration into the intricate workings, applications, and evolving frontiers of neural networks in the pursuit of creating intelligent systems that can learn and adapt.

<b><h3>Neural Network Basics:</h3></b>

* Neurons: The basic building blocks of neural networks. Neurons receive inputs, apply weights to them, and pass the result through an activation function to produce an output.
* Layers: Neurons are organized into layers. Input layer receives the features of the input data, hidden layers process this information, and the output layer produces the final result.
* Weights and Biases: Neural networks learn by adjusting weights and biases during training. Weights control the influence of input features, and biases shift the output.

<b><h3>Key Concepts:</h3></b>

1. Activation Functions:

    Sigmoid Function:
        σ(z)=11+e−zσ(z)=1+e−z1​
    Maps the weighted sum of inputs to values between 0 and 1. Used in the output layer for binary classification.

    Other Activation Functions: ReLU (Rectified Linear Unit), Tanh, Softmax for different scenarios.

2. Binary Cross-Entropy Loss:

  The Binary Cross-Entropy Loss, also known as log loss or logistic loss, is a measure of how well a binary classification model predicts the probability of the positive class. The formula for the Binary Cross-Entropy Loss for a single instance is given by:

          Binary Cross-Entropy Loss=−1N∑i=1N[yilog⁡(y^i)+(1−yi)log⁡(1−y^i)]

  Theoretical Insights:

    * The Binary Cross-Entropy Loss is designed to measure how well the predicted probabilities align with the true labels, penalizing the model more severely for confident yet incorrect predictions.

    * The loss is higher when the predicted probability diverges from the true label. It is particularly effective for models where the goal is to estimate probabilities, such as in logistic regression or the output layer of a neural network for binary classification.

    * During training, the goal is to minimize this loss by adjusting the model's parameters (weights and biases) using optimization algorithms like gradient descent.

    * The formula is derived from information theory and maximum likelihood estimation, aiming to maximize the likelihood of the observed labels given the predicted probabilities.

3. Optimization Algorithm:
  * Gradient Descent:

    Objective: Adjusts weights and biases to minimize the loss function.
    Learning Rate: A hyperparameter that determines the step size in the weight update. It influences the convergence speed and stability of the algorithm.

    Batch Gradient Descent: Computes the gradient of the loss with respect to each parameter using the entire training dataset.
    Batch Size: Number of training examples used in one iteration. Larger batches provide more accurate gradients but require more computation.

  * Stochastic Gradient Descent (SGD):

    Objective: An optimization technique that aims to speed up the convergence of traditional gradient descent.        
    but the gradient is estimated using only one randomly chosen training example at a time. Faster updates as the gradient is computed more frequently and particularly effective for large datasets.

  * Adam (Adaptive Moment Estimation):

    Objective: Adaptive optimization algorithm that combines ideas from momentum and RMSprop.

  * RMSprop (Root Mean Square Propagation):

    Objective: Adaptive optimization algorithm that adjusts the learning rates for each parameter.

Additional Insights:

  * The choice of optimizer depends on the specific problem, dataset size, and computational resources. Adam and RMSprop are often preferred for their adaptability to different scenarios.

* Parameters like learning rate need careful tuning for optimal performance.

* Adaptive optimizers often converge faster than traditional gradient descent but may require more careful tuning.

Understanding the nuances of optimization algorithms is crucial for training neural networks effectively, as they play a pivotal role in adjusting model parameters to minimize the loss and improve predictive accuracy.

4. Backpropagation:

    Error Backpropagation:

    Backpropagation, short for "backward propagation of errors," is an algorithm used to train neural networks by computing the gradients of the loss function with respect to the weights and biases. It involves a two-step process: the forward pass and the backward pass.

  * Forward Pass:

    Purpose: During the forward pass, input data is fed through the neural network, and the output is computed layer by layer until the final output is obtained.

    Steps:
        Input data is multiplied by weights and added to biases to compute the net input.
        The net input is passed through an activation function (e.g., sigmoid) to produce the output.

  * Compute Loss:

    Purpose: The difference between the predicted output and the true label is measured using the chosen loss function (e.g., Binary Cross-Entropy Loss).
    
    Steps:
        Calculate the loss by comparing the predicted output with the true label for each instance in the training dataset.

  * Backward Pass (Error Backpropagation):

    Purpose: Gradients of the loss with respect to weights and biases are computed using the chain rule, and the parameters are updated to minimize the loss.
    
    Steps:
        Compute the gradient of the loss with respect to the output, the output with respect to the net input, and the net input with respect to the weights.
        Update the weights using the computed gradients and the learning rate αα.

Additional Insights:

  * Backpropagation leverages the chain rule of calculus to calculate the gradients efficiently by propagating the error backward through the network.

  * Gradients are often calculated for a batch of training examples to update weights collectively, reducing computational complexity.

  * The forward pass, loss calculation, and backward pass are repeated iteratively through multiple epochs until the model converges.

Backpropagation forms the backbone of training neural networks, enabling them to learn and adjust their parameters to minimize the difference between predicted and true values. Understanding the mathematics behind backpropagation is crucial for effectively implementing and training neural networks for various tasks.

5. Training Data:

  In the context of neural networks, training data is a crucial component for teaching the model to make accurate predictions. It consists of input features and corresponding labels.

    * Input Features (X):

    Represented as XX with dimensions (m,n)(m,n), where mm is the number of samples (data points) and nn is the number of features.
    These features are the characteristics or attributes of the data that the model uses to make predictions.

  * Labels (y):

    Denotation: Represented as yy with dimensions (m,1)(m,1), containing binary labels (0 or 1).
    Role: Labels are the ground truth or actual outcomes corresponding to each set of input features.
  
    In a binary classification task (e.g., spam detection), labels could be 0 for non-spam and 1 for spam. In a multi-class classification task (e.g., recognizing handwritten digits), labels may represent the digit class (0 to 9).

  * Dataset Splitting:

    * Training Set: The portion of the data used for training the neural network. The model learns patterns and relationships from this set.
    * Validation Set: A subset of the data used to tune hyperparameters and evaluate the model's performance during training.
    * Test Set: A separate subset used to assess the final performance of the trained model. It should not be seen by the model during training.

  * Normalization and Preprocessing:

    Scaling input features to a standard range (e.g., between 0 and 1) to facilitate convergence during training and Handling missing data, encoding categorical variables, or other steps to prepare the data for training.

  * Data Imbalance:

    When the number of instances in one class is significantly higher or lower than the other. Techniques like oversampling the minority class or undersampling the majority class can be employed to address imbalances.

  * Importance of Quality Data:

    Garbage In, Garbage Out: The quality of the training data directly impacts the performance of the model. Clean, representative, and diverse data is essential for effective learning.


6. Testing and Prediction:

    Forward Pass: After training the neural network, the model is used to make predictions on new, unseen data.

  * Provide the new data as input to the trained network.
  * Conduct a forward pass through the network to obtain predictions.
  * The final layer's output represents the model's prediction for the given input.

7. Evaluation Metrics:
* Accuracy: Measures the proportion of correctly classified instances.

* Precision: Quantifies the accuracy of positive predictions.

* Recall (Sensitivity or True Positive Rate): Measures the ability of the model to capture all positive instances.

* F1 Score: A harmonic mean of precision and recall, providing a balanced metric.

Define hyperparameter grid

              param_grid = {
                      'learning_rate': [0.001, 0.01, 0.1],
                      'neurons_layer1': [32, 64, 128],
                      'neurons_layer2': [16, 32, 64],
                      'dropout_rate': [0.2, 0.5, 0.8]
                      }

9. Regularization:
* Dropout: Mitigates overfitting by randomly deactivating (dropping out) a fraction of neurons during each training iteration. A dropout rate is defined (e.g., 0.5), representing the probability of dropping out a neuron. Different neurons are dropped out in each training iteration.

* L2 Regularization: Adds a penalty term to the loss function based on the magnitude of weights. The L2 regularization term is added to the loss function, During backpropagation, additional terms are considered in the gradient calculations.

## Implementation

In [39]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

In [2]:
df = pd.read_csv('/content/titanic.csv')

In [3]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [4]:
df.shape

(891, 12)

In [5]:
df.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

In [6]:
df.info

<bound method DataFrame.info of      PassengerId  Survived  Pclass  \
0              1         0       3   
1              2         1       1   
2              3         1       3   
3              4         1       1   
4              5         0       3   
..           ...       ...     ...   
886          887         0       2   
887          888         1       1   
888          889         0       3   
889          890         1       1   
890          891         0       3   

                                                  Name     Sex   Age  SibSp  \
0                              Braund, Mr. Owen Harris    male  22.0      1   
1    Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                               Heikkinen, Miss. Laina  female  26.0      0   
3         Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                             Allen, Mr. William Henry    male  35.0      0   
..                                   

The DataFrame has 12 columns, each representing different attributes or features and contains 891 rows, meaning there are 891 data points or samples.

  * Column Data Types:
     * Numeric Types (int64, float64): Columns like 'PassengerId', 'Survived', 'Pclass', 'Age', 'SibSp', 'Parch', 'Fare' contain numerical data.
     * Object Type (usually strings): Columns like 'Name', 'Sex', 'Ticket', 'Cabin', 'Embarked' contain non-numeric (object) data.

Additional Notes:

    * 'Age' column has missing values
    * 'Embarked' and 'Cabin' columns also have missing values.
    * 'Sex' column contains categorical data (male/female).



In [7]:
df.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

In [8]:
# Handle missing values
df["Age"].fillna(df["Age"].median(), inplace=True)
df.drop("Cabin", axis=1, inplace=True)

In [9]:
# Encode categorical variables
df = pd.get_dummies(df, columns=["Sex", "Embarked"], drop_first=True)


In [10]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Age,SibSp,Parch,Ticket,Fare,Sex_male,Embarked_Q,Embarked_S
0,1,0,3,"Braund, Mr. Owen Harris",22.0,1,0,A/5 21171,7.25,1,0,1
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",38.0,1,0,PC 17599,71.2833,0,0,0
2,3,1,3,"Heikkinen, Miss. Laina",26.0,0,0,STON/O2. 3101282,7.925,0,0,1
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",35.0,1,0,113803,53.1,0,0,1
4,5,0,3,"Allen, Mr. William Henry",35.0,0,0,373450,8.05,1,0,1


In [19]:
df = df[['Age', 'Survived', 'Pclass', 'SibSp', 'Parch', 'Fare', 'Sex_male', 'Embarked_Q', 'Embarked_S']]

In [20]:
# Feature scaling
scaler = StandardScaler()
df[["Age", "Fare"]] = scaler.fit_transform(df[["Age", "Fare"]])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[["Age", "Fare"]] = scaler.fit_transform(df[["Age", "Fare"]])


In [23]:
# Split the dataset
X = df.drop("Survived", axis=1)
y = df["Survived"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=4)

In [22]:
X

Unnamed: 0,Age,Pclass,SibSp,Parch,Fare,Sex_male,Embarked_Q,Embarked_S
0,-0.565736,3,1,0,-0.502445,1,0,1
1,0.663861,1,1,0,0.786845,0,0,0
2,-0.258337,3,0,0,-0.488854,0,0,1
3,0.433312,1,1,0,0.420730,0,0,1
4,0.433312,3,0,0,-0.486337,1,0,1
...,...,...,...,...,...,...,...,...
886,-0.181487,2,0,0,-0.386671,1,0,1
887,-0.796286,1,0,0,-0.044381,0,0,1
888,-0.104637,3,1,2,-0.176263,0,0,1
889,-0.258337,1,0,0,-0.044381,1,0,0


In [24]:
# Neural Network Model
model = Sequential()
model.add(Dense(32, input_dim=X_train.shape[1], activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

In [25]:
# Model Compilation
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])


In [26]:
# Model Training
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_test, y_test))

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.src.callbacks.History at 0x7be30c9ea770>

In [27]:
# Model Evaluation
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Loss: {loss}, Test Accuracy: {accuracy}")

Test Loss: 0.4108079969882965, Test Accuracy: 0.832402229309082


In [28]:
# Make Predictions
predictions = model.predict(X_test)



In [30]:
# Print the predicted probabilities
# print(predictions)

# If you want to convert probabilities to binary predictions (0 or 1)
# binary_predictions = (predictions > 0.5).astype(int)
# print(binary_predictions)