## Neural Network - ReLU and Adam

#### Description
This Neural Network Model is made usind ReLU Activation Function and Adam Optimization Function

The intention of this MODEL is to utilize Binary Classification using basic Feedforward Neural Network

Note : A **binary classification problem** is a type of predictive modeling problem where the goal is to **predict one of two possible outcomes** for each input instance. The two possible outcomes are typically referred to as classes, and are often represented by the values 0 and 1.


In [1]:
%pip install numpy
%pip install tensorflow


Note: you may need to restart the kernel to use updated packages.


#### Step 1: Importing Necessary Libraries

In [2]:
import numpy as np
from tensorflow import keras  

**NumPy** is a Python library that provides a high-performance multidimensional array object. It is the fundamental package for scientific computing in Python.
**Keras** is a high-level neural networks API. Keras provides a simple and intuitive API for defining neural networks, as well as a wide range of tools for training and evaluating networks.

#### Step 2: Load and Preprocess the Data

We will use the breast cancer binary classification data pre-provided in sklearn. Each sample has only two classifications - malignant(1) or benign(0)

In [3]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

##### Import Explaintations

```
from sklearn.model_selection import train_test_split
```
This line imports test_split module that divides the dataset into TRAINING SET and TEST SET.


```
from sklearn.preprocessing import StandardScaler
```
This line imports the StandardScaler() class from the sklearn.preprocessing module. This class is used to standardize the features of a dataset.
 
 **Mean:** The mean is the average of all the values in a dataset. It is calculated by adding up all the values and dividing by the number of values.
 
 **Standard deviation:** The standard deviation is a measure of how spread out the values in a dataset are. It is calculated by taking the square root of the variance. The variance is calculated by taking the average of the squared differences between each value in the dataset and the mean.

Standardization is a process of transforming the features of a dataset so that they have a mean of 0 and a standard deviation of 1. This makes the features more comparable to each other, which can improve the performance of machine learning models.

Eg to understand Mean, Std Deviation and Standardization -  A dataset with two features: age and height. 

  - The age feature has a mean of 30 and a standard deviation of 10.
  Meaning values of the age feature are typically between 20 and 40.
  And people can be any age, but they are typically between 0 and 100 years old.

  - The height feature has a mean of 60 and a standard deviation of 15.
  Meaning values of the age feature are typically between 45 and 75.
  And people are typically between 50 and 80 inches.

  - Even though Theoretically Std Deviation of height > age. The spread of AGE is greater since people normally do exist of ages 20 and 40. But people are not of height 50 inches or 80 inches.

If you don't standardize these features, the age feature will have a much larger range of values than the height feature. This can make it difficult for machine learning models to learn from the data. After standardizing the features, the age feature will have a mean of 0 and a standard deviation of 1. The height feature will still have a mean of 0 and a standard deviation of 1.



#### Step 2: Continued

In [4]:
# Loading the Dataset
data = load_breast_cancer()
    # X = Data = Features of the dataset samples
    # Y = Target = Target values, here binary 1 and 0
X = data.data
y = data.target

# Splitting the Data into Training and Testing Sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Normalize the features to be between -1 and 1
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

##### Train_test_split()

```train_test_split()``` Function takes four arguments:

- X:            The features of the dataset.
- y:            The target variable.
- test_size:    The percentage of the data that should be used for the testing set.
- random_state: The seed for the random number generator.

In this case, 20% of the data will be used for the testing set. The random_state parameter is set to 42, which ensures that the data is split in the same way each time the code is run.

The line of code splits the dataset into four variables:

- X_train: The training features.
- X_test: The testing features.
- y_train: The training target variable.
- y_test: The testing target variable.

```random_state``` parameter is similar to Minecraft World seed, where the same seed generates the same World each time. Similarly, same random_state integer will ensure that the data split pattern is the same each time, irrespective of whether the ML dataset is the same or different.

Why **random_state** ?
Shuffling does not directly affect the proportion of splitting. However, it can indirectly affect the proportion of splitting by preventing the model from learning patterns in the order of the data.
**For example,** let's say that we have a dataset of 100 points, and we want to split the data into a training set and a testing set with a 80/20 split. If we do not shuffle the data, then it is possible that the training set will contain all of the points from one class, and the testing set will contain all of the points from the other class. This will make it difficult for the model to learn to generalize to new data, and it will also make it difficult to evaluate the performance of the model.

In machine learning, a class is a group of data points that share a common characteristic. For example, in a dataset of images of cats and dogs, the classes would be "cat" and "dog".
So if shuffling was not done, it is possible that the training set would contain all of the points from the "cat" class, and the testing set would contain all of the points from the "dog" class.

This line normalizes the features so that they have a mean of 0 and a standard deviation of 1. This is important because it helps to improve the performance of machine learning models.
The **fit_transform** method fits the scaler to the training data and then transforms the training data. The **transform** method then transforms the testing data using the scaler that was fit to the training data.

##### Fit_Transform

```
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
```
We know Standardization is a scaling method to scale the dataset to Mean = 0, Std Deviation = 1. Here, transform refers to Standardization only
- **Fit** is used to learn the parameters of the transformation.
- **Transform** is used to apply the transformation to the data.
- **Fit_transform** is a combination of fit and transform. It learns the parameters of the transformation on the training data and then applies the transformation.

!!! We only fit_transform the training data. This is because we want the model to learn the parameters of the transformation on the training data, and then apply the same transformation to the testing data.

#### Step 3: Build the Neural Network

In [5]:
# The Sequential model is a linear stack of layers
model = keras.models.Sequential()

# Add the first hidden layer with ReLU activation function
model.add(keras.layers.Dense(30, activation='relu', input_shape=(30,)))

# Add another hidden layer
model.add(keras.layers.Dense(15, activation='relu'))

# Add the output layer. Since this is a binary classification problem, we'll use the sigmoid activation function
model.add(keras.layers.Dense(1, activation='sigmoid'))

1. `model = keras.models.Sequential()`: This line creates a new sequential model. The sequential model is a linear stack of layers, where you can easily add and remove layers. It is a common choice for building neural networks in Keras.

2. `model.add(keras.layers.Dense(30, activation='relu', input_shape=(30,)))`: This line adds the first hidden layer to the model. The `Dense` layer represents a fully connected layer, where each neuron is connected to every neuron in the *previous* layer. The layer has 30 units/neurons. The `activation='relu'` argument sets the Rectified Linear Unit (ReLU) as the activation function for this layer. The `input_shape=(30,)` argument defines the shape of the input data expected by this layer. Since it is the first layer in the model, it expects input with 30 features.

3. `model.add(keras.layers.Dense(15, activation='relu'))`: This line adds another hidden layer to the model. It is similar to the previous line but with 15 units/neurons instead of 30. The input shape is not specified explicitly here since it is automatically inferred from the previous layer's output shape.

4. `model.add(keras.layers.Dense(1, activation='sigmoid'))`: This line adds the output layer to the model. It is a single neuron layer since this is a binary classification problem. The `activation='sigmoid'` argument sets the sigmoid activation function, which squashes the output between 0 and 1, representing the probability of the positive class. The output layer provides the final prediction of the model.

Each `Dense` layer in the model is fully connected, meaning each neuron in a layer is connected to every neuron in the previous layer. The number of units in each layer determines the complexity and expressive power of the model. The activation functions introduce non-linearities into the model, allowing it to learn complex relationships between features and make accurate predictions.

#### Step 4: Compile the Neural Network

In [6]:
# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

The `compile` method in Keras is used to configure the model for training :

1. `loss='binary_crossentropy'`: The `loss` argument specifies the loss function to be used during training. In this case, `'binary_crossentropy'` is used as the loss function. It is commonly used for binary classification problems, where the goal is to minimize the binary cross-entropy between the true labels and the predicted probabilities. The binary cross-entropy loss is well-suited for problems with two classes.

2. `optimizer='adam'`: The `optimizer` argument specifies the optimization algorithm to be used during training. In this case, `'adam'` is used as the optimizer. Adam (short for Adaptive Moment Estimation) is a popular optimization algorithm that adjusts the learning rate adaptively based on the gradients of the model parameters. It combines the advantages of two other optimization methods, AdaGrad and RMSProp, to achieve fast convergence and handle sparse gradients efficiently.

3. `metrics=['accuracy']`: The `metrics` argument specifies the evaluation metric(s) to be used during training and testing. Here, `['accuracy']` is provided as the metric. The accuracy metric measures the fraction of correctly predicted samples compared to the total number of samples. It is a commonly used metric for classification tasks, providing an intuitive understanding of the model's performance in terms of correct predictions.

#### Step 5: Train the Model

In [7]:
# Train the model for 50 epochs
model.fit(X_train, y_train, epochs=50, batch_size=10)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x1e1ae8472e0>

The `fit` method in Keras is used to train the model on a given dataset:

1. `X_train` and `y_train`: These arguments represent the training data.
`X_train` refers to the input features (often represented as a NumPy array or a Pandas DataFrame), and
`y_train` refers to the corresponding target labels (the expected outputs). The model will be trained to learn patterns and relationships between the input features and the target labels.

2. `epochs=50`: The `epochs` argument specifies the number of times the model will iterate over the entire training dataset. In this case, the model will be trained for 50 epochs. One epoch is defined as a complete pass through the entire training dataset.

3. `batch_size=10`: The `batch_size` argument specifies the number of samples that will be propagated through the model at once. In each epoch, the training dataset is divided into multiple batches, and the model's parameters are updated after each batch. A smaller batch size allows for more frequent updates of the model's parameters but can increase the training time. Here, a batch size of 10 is set.

During training, the model will perform forward propagation to compute predictions based on the input data, and then backpropagation to calculate the gradients and update the model's parameters (weights and biases) using the optimization algorithm specified during compilation.

The model will aim to minimize the loss function specified during compilation, and it will try to maximize the specified metrics to improve its performance. The training process continues for the specified number of epochs or until a convergence criterion is met.

After training, the model will have learned the patterns and relationships in the training data and will be able to make predictions on new, unseen data.

#### Step 5: Evaluate the Model

In [8]:
# Evaluate the model on the test data
score = model.evaluate(X_test, y_test)

print("Test Loss:", score[0])
print("Test Accuracy:", score[1])

Test Loss: 0.08829933404922485
Test Accuracy: 0.9736841917037964


The `evaluate` method in Keras is used to evaluate the trained model on a separate test dataset:

1. `score = model.evaluate(X_test, y_test)`: This line evaluates the trained model on the test dataset. The `X_test` represents the input features of the test dataset, and `y_test` represents the corresponding target labels. The model will make predictions on the test data and compare them with the true labels to compute the evaluation metrics specified during model compilation (in this case, the loss function and accuracy). The `evaluate` method returns the calculated evaluation metrics.

2. `print("Test Loss:", score[0])`: This line prints the test loss, which is the value of the loss function computed on the test dataset. The loss function measures the discrepancy between the predicted outputs and the true labels. A lower test loss indicates better performance of the model on the unseen data.

3. `print("Test Accuracy:", score[1])`: This line prints the test accuracy, which is the value of the accuracy metric computed on the test dataset. The accuracy metric represents the percentage of correctly predicted samples out of the total number of samples in the test dataset. A higher test accuracy indicates a better ability of the model to make accurate predictions on unseen data.

By evaluating the model on the test data, you can assess its performance and determine how well it generalizes to new, unseen samples. This evaluation helps you understand how the model is likely to perform in real-world scenarios and can guide decisions on model selection and deployment.