<a href="https://colab.research.google.com/github/angelatackett/Mathematics-for-Data-Science---DATA230/blob/main/Essential_Math_for_Data_Science_%5BNield%5D_Chapter_7_Neural_Networks_Angela_Tackett.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Welcome to your assignment about concepts covered in Chapter 7 of *Essential Math for Data Science* by Thomas Nield. You will be exploring neural networks in this assignment.

Please read each question carefully and provide detailed explanations for your answers, including any relevant calculations or work. You are also required to provide Python solutions for the technical problems in each question.

# Problem 1 Forward Propagation

Implement the forward propagation algorithm for a simple neural network with one hidden layer. The neural network has the following specifications:

1. Input layer with 3 features

2. Hidden layer with 4 units, using the ReLU activation function

3. Output layer with 2 units, using the softmax activation function

Write a Python function forward_propagation that takes an input array X of shape (m, 3), where m is the number of examples, and returns the output predictions of shape (m, 2). Assume the weights and biases of the neural network are pre-defined.


Notes (more detailed explanation of the question):

 The forward propagation algorithm for a simple neural network with one hidden layer involves passing the input data through the network to make predictions. This neural network has three layers: an input layer with 3 features, a hidden layer with 4 units (neurons), and an output layer with 2 units. Each unit in the hidden layer uses the ReLU activation function, while each unit in the output layer uses the softmax activation function.

The purpose of the forward propagation algorithm is to take the input data and compute the predicted output of the neural network. The input data comes in the form of an array called X, where each row represents an example, and there are three columns representing the three features. The goal is to calculate the predictions for each example and return the results in an array of shape (m, 2), where m is the number of examples, and 2 represents the two units in the output layer (the classes for the binary classification problem).

To implement this algorithm in Python, you can create a function called forward_propagation. This function takes the input array X as its input, and you should assume that the weights and biases of the neural network have been pre-defined. These weights and biases determine how the input data will be transformed as it passes through the neural network to produce the predictions.

The steps involved in forward propagation are as follows:

Take the input X and compute the values of the hidden layer units using the ReLU activation function.
Use the weights and biases of the connections between the input layer and the hidden layer to calculate the values of the hidden units.
Once the values of the hidden units are calculated, apply the softmax activation function to calculate the predictions of the output layer.
Return the predictions for each example in the form of an array of shape (m, 2).
After implementing the forward_propagation function, you can use it to make predictions on new data using the pre-defined weights and biases of the neural network.

Step 1. Import numpy as np. *Nothing to change in the code below.*

In [None]:
import numpy as np

Step 2. Define the weights and biases in your forward propagation. We have predefined weights and biases below.

In [None]:
def forward_propagation(X):
    # Define the weights and biases
    W1 = np.array([[0.1, 0.2, 0.3, 0.4],
                   [0.5, 0.6, 0.7, 0.8],
                   [0.9, 1.0, 1.1, 1.2]])

    b1 = np.array([0.1, 0.2, 0.3, 0.4])

    W2 = np.array([[0.5, 0.6],
                   [0.7, 0.8],
                   [0.9, 1.0],
                   [1.1, 1.2]])

    b2 = np.array([0.5, 0.6])
    Z1 = np.dot(X, W1) + b1
    A1 = np.maximum(0, Z1)  # ReLU activation

    Z2 = np.dot(A1, W2) + b2
    exp_scores = np.exp(Z2)
    A2 = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)  # Softmax activation

    return A2

Step 3. Enter different numbers (ranging from 1-10) in the arrays (3 each) below to see what the prediction will be. Feel free to experiment by repeating an array to see what happens with the prediction.

In [None]:
X = np.array([[1,2,3 ],
              [10,9,8]])
predictions1 = forward_propagation(X)
print(predictions1)

[[0.11105597 0.88894403]
 [0.00100677 0.99899323]]


In [None]:
X = np.array([[4,5,6 ],
              [10,9,8 ]])
predictions2 = forward_propagation(X)
print(predictions2)

[[0.01189113 0.98810887]
 [0.00100677 0.99899323]]


In [None]:
X = np.array([[1,8,7 ],
              [7,7,7 ]])
predictions3 = forward_propagation(X)
print(predictions3)

[[0.00486893 0.99513107]
 [0.00347043 0.99652957]]


# Problem 2
Task: Use a neural network in Python to solve a classification problem using the given dataset. Follow the example in this link: https://www.analyticsvidhya.com/blog/2021/10/implementing-artificial-neural-networkclassification-in-python-from-scratch/
A pdf is available under Supportive Learning Materials in Week 8.

Dataset Description:
We have a dataset from the finance domain with 100,000 records and 14 dimensions. The dimensions include RowNumber, CustomerId, Surname, CreditScore, Geography, Gender, Age, Tenure, Balance, NumOfProducts, HasCrCard, IsActiveMember, EstimatedSalary, and Exited. The goal is to create an artificial neural network that predicts whether a customer will exit the bank or not based on the given independent variables.

We have provided the Churn_Modelling.csv file, you need to upload it to your Google drive so you are able to access it for this problem. The file was downloaded from:
https://www.kaggle.com/datasets/aakash50897/churn-modellingcsv?resource=download

Follow the steps provided, making changes to the code when requested. Then change the outer layer activation to compare the accuracies of these three outer layer activation: sigmoid, softmax and exponential.

Step 1: Import the necessary libraries: numpy, pandas, and tensorflow. Numpy library for numerical operations in Python, which provides support for handling arrays and matrices efficiently. Pandas: A library for data manipulation and analysis, allowing easy handling of structured data. TensorFlow: An open-source machine learning library developed by Google, used for building and training neural networks.


In [None]:
#Importing necessary Libraries
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import train_test_split

Step 2: Load the dataset using the read_csv() method from pandas. In this step, we read the dataset from a CSV file into a pandas DataFrame using the read_csv() method. The dataset is usually organized in a tabular format with rows representing examples (data points) and columns representing features.

In [None]:
#this is to set the file path for Python to access the dataset under Files in your Google Drive
# You need to upload the Churn_Modelling.csv into your "My Drive", on the upper left corner of your Google Drive
from google.colab import drive
drive.mount('/content/drive')
#Loading Dataset
file_path = '/content/drive/MyDrive/Churn_Modelling.csv'
data = pd.read_csv(file_path)
# Displaying the loaded data
print(data.head())

Mounted at /content/drive
   RowNumber  CustomerId   Surname  CreditScore Geography  Gender  Age  \
0          1    15634602  Hargrave          619    France  Female   42   
1          2    15647311      Hill          608     Spain  Female   41   
2          3    15619304      Onio          502    France  Female   42   
3          4    15701354      Boni          699    France  Female   39   
4          5    15737888  Mitchell          850     Spain  Female   43   

   Tenure    Balance  NumOfProducts  HasCrCard  IsActiveMember  \
0       2       0.00              1          1               1   
1       1   83807.86              1          0               1   
2       8  159660.80              3          1               0   
3       1       0.00              2          0               0   
4       2  125510.82              1          1               1   

   EstimatedSalary  Exited  
0        101348.88       1  
1        112542.58       0  
2        113931.57       1  
3         93826.

Step 3: Generate the matrix of features (X) by excluding the first three columns and the last column from the dataset. The feature matrix X is created by selecting the relevant columns from the dataset, excluding the first three columns (which might contain some identifier or irrelevant information) and the last column (which contains the dependent variable, the one we want to predict).
*We are using "iloc" below from pandas data frame to allow us to fetch the desired columns. If you look at the data, you will see that the first three columns are RowNumber, CustomerID, and Surname. None of these columns are necessary in the analysis so we are ignoring them and starting with "Credit Score". We also don't want to include our target variable (last column) in the matrix, so we use -1 to exclude it*.

In [None]:
#Generating Matrix of Features (X)
X = data.iloc[:, 3:-1].values


Step 4: Generate the dependent variable vector (Y) by selecting the last column from the dataset. In this step, we create the dependent variable vector Y by selecting the last column from the dataset. This column represents the target variable or the labels we want to predict. *We are using iloc to pick the last column "exited" as our target variable, this represented by -1*. *Nothing to change in the code below.*


In [None]:
#Generating Dependent Variable Vector(Y)
Y = data.iloc[:, -1].values

Step 5: Encode the categorical variable "Gender" using label encoding. Label encoding is used to convert categorical data (like "Gender") into numerical format. It assigns a unique integer to each category in the "Gender" column, making it easier for the neural network to process. Neural networks perform best when their inputs are digits rather than strings. So we encode it to make sure the model can use it. We have two genders, so we use 2. If we had more than 2, we would add more numbers to match the number of categories.

In [None]:
#Encoding Categorical Variable Gender
LE1 = LabelEncoder()
X[:, 2] = np.array(LE1.fit_transform(X[:, 2]))


Step 6: Encode the categorical variable "Geography" using one-hot encoding. One-hot encoding is another technique for handling categorical data. It creates binary columns for each category in the "Geography" column, representing the presence or absence of each category. Spain will be encoded as 001, France will be 010, and Germany is 100. This helps to prevent the neural network from thinking that higher numbers are more important. We don't want to introduce bias accidentially...

In [None]:
#Encoding Categorical Variable Country
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder="passthrough")
X = np.array(ct.fit_transform(X))


Step 7: Split the dataset into training and testing datasets using an 80:20 ratio. To evaluate the performance of the neural network, we need to split the dataset into training and testing datasets. The training dataset is used to train the model, while the testing dataset is used to evaluate its performance on unseen data.


In [None]:
#Splitting Dataset into Training and Testing Dataset
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.2, random_state=0)

Step 8: Perform feature scaling on the training and testing datasets using standardization. Feature scaling is a process of normalizing the features to ensure they all have a similar scale. Standardization is a common method that scales the features to have a mean of 0 and a standard deviation of 1. It helps the neural network to converge faster during training.

In [None]:
#Performing Feature Scaling
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Step 9: Initialize the artificial neural network using the Sequential class from the Keras library. In this step, we create an instance of the Sequential class from the Keras library. The Sequential class allows us to build a neural network by stacking layers on top of each other in a linear fashion.

In [None]:
#Initializing Artificial Neural Network
ann = tf.keras.models.Sequential()

Step 10: Create two hidden layers with 6 neurons each and "relu" activation function. In the neural network, we define two hidden layers, each with 6 neurons. The "relu" (rectified linear unit) activation function is used to introduce non-linearity in the model, allowing it to learn complex patterns in the data.

In [None]:
#Adding First Hidden Layer
ann.add(tf.keras.layers.Dense(units=6, activation="relu"))

#Adding Second Hidden Layer
ann.add(tf.keras.layers.Dense(units=6, activation="relu"))


Step 11: Create the output layer with 1 neuron and "sigmoid" activation function. The output layer has 1 neuron because we are performing binary classification (predicting one of two classes). The "sigmoid" activation function is used here to squash the output between 0 and 1, representing the probability of the positive class.

In [None]:
#Adding Output Layer
ann.add(tf.keras.layers.Dense(units=1, activation="sigmoid"))

Step 12: Compile the neural network using the "adam" optimizer, "binary_crossentropy" loss function, and "accuracy" as the performance metric. Before training the neural network, we need to compile it. Here, we specify the optimizer, loss function, and performance metric to be used during the training process.

In [None]:
#Compiling ANN
ann.compile(optimizer="adam", loss="binary_crossentropy", metrics=['accuracy'])

Step 13: Fit the neural network on the training dataset with a batch size of 32 and 100 epochs. The final step is to train the neural network using the fit() method. We provide the training dataset, batch size (number of samples used in each update), and the number of epochs (number of times the model will iterate over the entire training dataset). During the training process, the neural network learns the optimal weights and biases to make accurate predictions on new data.

In [None]:
#Fitting ANN
ann.fit(X_train, Y_train, batch_size=32, epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7b7de7011c90>

Step 14. It's your turn to experiment what happens when you enter a set of inputs. You will need to enter 12 values that are reasonable for the datset. As an example:

The values 1, 0, 0, 600, 1, 40, 3, 60000, 2, 1, 1,50000
would be a customer with a credit score of 600, female (1), 40 years old, from France (first three numbers are 1,0,0) who has been when the bank for 3 years, has a balance of 60,000, has 2 products, has a credit card (1), is an active member (1), and has an estimated salary of 50,000.

Change the other numbers to see if you can find a customer who might leave (True).

In [None]:
#Predicting result for Single Observation
print(ann.predict(sc.transform([[1, 0, 0, 600, 1, 40, 3, 60000, 2, 1, 1,50000]])) > 0.5)

[[False]]


Copy the code above and show 5 experiments with different inputs with at least 1 true prediction.

In [None]:
print(ann.predict(sc.transform([[1, 0, 0, 300, 1, 40, 3, 1000, 0, 1, 1,100]])) > 0.5) #Experiment 1

[[ True]]


In [None]:
print(ann.predict(sc.transform([[1, 0, 0, 700, 1, 25, 3, 100000, 1, 1, 1,1000]])) > 0.5) #Experiment 2

[[False]]


In [None]:
print(ann.predict(sc.transform([[0, 0, 0, 450, 1, 65, 3, 5000, 1, 0, 1,20000]])) > 0.5) #Experiment 3

[[False]]


In [None]:
print(ann.predict(sc.transform([[1, 0, 0, 600, 1, 40, 3, 60000, 0, 1, 1,50000]])) > 0.5) #Experiment 4

[[ True]]


In [None]:
print(ann.predict(sc.transform([[1, 0, 0, 670, 1, 33, 3, 500, 1, 1, 1,100]])) > 0.5) #Experiment 5

[[False]]
