# Programming Assignment 1
* CSCI-5931 : Deep Learning
* Spring 2024
* Instructor: Ashis Kumer Biswas
* Student name: Bhavishya Vudatha


## Little Background about the problem

`Customer churn`` occurs when customers stop doing business with a company, also known as customer attrition. It is also referred to as loss of clients or customers.

You are given sensitive information of 9,000 of an European Bank, EBQ. Your task is to build an Artificial Neural Network (ANN) based on the dataset such that later the ANN model can predict correctly who is going to leave next. This predictive analysis is vital for the EBQ bank to revise their business strategy towards customer retention. What do you think?

Anyway, you are recruited by the bank to do the data science. And, the head of the bank only trusts heads, i.e., brains…. I mean neural networks for making any decisions. And luckily you were in Dr. B’s class and you know something(?) about the ANN that you could successfully convince the head of the bank during the interview. He has put a lot of faith in you. Now, can you solve his problem?

Tasks:

1. Please download the zip file, `PA1-deliverables.zip``. Unzip it in your workspace. Here below is the file hierarchy of "PA1-deliverables/" folder:

```
PA1-deliverables
├── 2024-Spring-DL-PA1-assignment.ipynb
├── dataset
│   └── datasetX.csv
├── figures
│   ├── le.png
│   ├── nn-1.png
│   ├── nn-1.svg
│   ├── nn-2.png
│   ├── nn-2.svg
│   ├── nn-3.png
│   ├── nn-3.svg
│   └── ohe.png
└── saved_models
```
As you can see you will mostly be working with the 2023-Fall-DL-PA1-assignment.ipynb, i.e., the jupyter notebook. The notebook accesses the dataset files: `dataset/datasetX.csv` containing few customer information and is labeled (i.e., the target column, `Exited` is present). Here below is a brief summary of the features you will find in the datasets:

* `CustomerId`: a unique identifier for each customer within the dataset. These values are not ordered sequentially within the dataset, and are only used to identify a specific customer. It typically does not have any influence to whether a customer leaves the business.
* `Surname`: A string used to identify the customer in the dataset. Surname may be distinct amidst all or most customers. Because of this, it most likely won't affect the target variable. 
* `CreditScore`: a numeric representation of the customer's individual fiscal credit score. Typically used to indicate eligibility for loans. Current credit scores use a range from 300 to 850, but the FICO auto score range uses 250-900. This feature likely determines retention rate of customers. 
* `Geography`: this feature contains a categorical string representing the name of a country the customer is from originally. 
* `Gender`: this feature contains a categorical string representing the gender of the customer ("Male"/"Female"). 
* `Age`: a numerical integer representation of a customer's age. Intuition suggests that older customers are likely to have higher retention than younger customers.
* `Tenure`: a numerical integer representation. It is assumed that this feature represents the number of total years the customer has been retained. It is likely that customers which have been retained longer will continue to be retained.
* `Balance`: a numerical floating point number (to two decimal places of precision) indicating the customer's current bank balance (assumed total across all accounts). Customers with a greater balance may be less likely to exit the account due to difficulty of transfer. 
* `NumOfProducts`: numeric integer value. It is assumed that this value represents the number of accounts (products) that this customer has open. Further evaluation of this feature would be needed to determine the usefulness of this feature, but at face-value, intuition dictates that a customer with more products is less likely to exit. 
* `HasCrCard`: boolean flag (0 or 1) representing whether the customer has a credit card or not. 
* `IsActiveMember`: boolean flag (0 or 1) representing whether the customer is an active member of the bank. It is assumed this indicates whether the customer has transactions on the regular banking statement. Intuition dictates that inactive members are more likely to exit. 
* `EstimatedSalary`: numerical floating point representation of the customer's predicted salary (to two decomal places) intuition dictates that customers with different incomes may behave differently with respect to retention rate. 
* `Exited`: boolean flag (0 or 1) representing whether the customer has exited their account. This is the target variable for the dataset. It should not be dropped, but should not be included as the training input (X), and should instead be separated as the target label (y). 

You will also see an empty directory `saved_models/`, that is for you to save all the models you'd train in this assignment.

`figures/` directory contains few image files used to properly document this assignment. Please do not delete and when possible please move them with this jupyter notebook for proper display of its contents.

> In this Jupyter notebook please write your solutions / codes in the cells marked with `#Your solution goes here...`. You may add additional code cells after that cell if you desire. But, please do not remove any cell originally given in the notebook.

> After you solve the assignment in the jupyter notebook, be sure to execute and save it so that execution/results/printouts are also saved with it.
> Finally, submit only the saved jupyter notebook (`2024-Spring-DL-PA1-assignment.ipynb`) in Canvas to receive grade. For this assignment, Canvas only will accept the jupyter notebook in "*.ipynb" extension.

## Task 1 : (10 points)
* Define a function named `summarize_dataset` that takes only one argument: `csv_file`, where `csv_file` is the name of the given `csv` file with this assignment, i.e., `datasetX.csv`. 
  * The function is expected to summarize the given dataset in the following way:
```
total number of rows = a
total number of columns = b
number of columns having non-numeric values = c
columns with missing values = [ (d1, e1)  (d2, e2), ... ]
gender based summary of exited column = [ (f1, g1)  (f2, g2), ... ]
age based summary of exited column = [ ('below or equal to 40', h1)  ('above 40', h2) ]
credit score summary =  i +/- j 
```
  
where,

* `a` is total number of rows in the dataset.
* `b` is total number of columns in the dataset.
* `c` is number of columns having non-numeric values.
* $(d_i, e_i)$ (i.e., a pair/tuple entry) represents column name ($d_i$) and number of missing values present in that column ($e_i$). If number of missing values in a column is zero (0), you do not need to list it. Please sort the tuple entries in descending order of $e_i$ values.
* $g_i$ represents the percentage of gender $f_i$ who exited. Please sort the tuple entries entries in descending order of $g_i$ values. Also, print the percentages in 2 decimal places after the decimal point, and print use `%` symbol after the percentage value.
* $h_1$ and $h_2$ represents the percentage of $\leq 40$ year olds who exited, and the percentage $>40$ year olds who exited.  Also, print the percentages in 2 decimal places after the decimal point, and print use `%` symbol after the percentage value.
* `j` and `k` are average and standard deviation of credit scores among the data samples respectively. Please print the way it is shown above. Also, print the both values in 2 decimal places after the decimal point.


In [1]:
import pandas as pd
def summarize_dataset(file):
    df=pd.read_csv(file)
    
    #total number of rows
    a=len(df)
    
    #total number of columns
    b=len(df.columns)
    
    # number of columns having non-numeric values
    c=df.select_dtypes(exclude=['number']).shape[1]
    
    
     # Columns with missing values
    missing = df.isnull().sum()
    missing = missing.loc[missing > 0].sort_values(ascending=False)
    d = list(zip(missing.index, missing.values))
    
    
    # Gender based summary of exited column
    gender = df.groupby('Gender')['Exited'].mean() * 100
    gender = gender.round(2).reset_index()
    f = list(zip(gender['Gender'], gender['Exited'].astype(str) + '%'))
    
    # Age based summary of exited column
    age = df.groupby(pd.cut(df['Age'], bins=[0, 40, float('inf')]))['Exited'].mean() * 100
    age = age.round(2).reset_index()
    h1 = age.loc[0, 'Exited']
    h2 = age.loc[1, 'Exited']
    age_summary = [('below or equal to 40', str(h1) + '%'), ('above 40', str(h2) + '%')]

    
    # Credit score summary
    i = df['CreditScore'].mean()
    j = df['CreditScore'].std()
    creditscore_summary = f"{i:.2f} +/- {j:.2f}"
    
    # Print the summaries
    print("Total number of rows =", a)
    print("Total number of columns =", b)
    print("Number of columns having non-numeric values =", c)
    print("Columns with missing values =", d)
    print("Gender based summary of exited column =", f)
    print("Age based summary of exited column =", age_summary)
    print("Credit score summary =", creditscore_summary)
    
summarize_dataset("datasetX.csv")



Total number of rows = 9000
Total number of columns = 13
Number of columns having non-numeric values = 3
Columns with missing values = [('Age', 397), ('CreditScore', 26)]
Gender based summary of exited column = [('Female', '24.77%'), ('Male', '16.63%')]
Age based summary of exited column = [('below or equal to 40', '10.94%'), ('above 40', '37.63%')]
Credit score summary = 650.25 +/- 96.75


## Task 2
* Preprocessing the given dataset for the model training.

### Task 2.1 (10 points)

* First preprocessing that we are going to do on the dataset is dropping two features (i.e., columns) that, I think, are irrelevant and would not make any meaningful relationship with the `Exited` feature. The features are: `CustomerId` and `Surname`.
* Make sure to create a variable called `dataset_dropped` that will store the revised dataset.
* Please print the name of the columns of the revised dataset.

In [32]:
dataset = pd.read_csv('datasetX.csv')
dataset_dropped = dataset.drop(columns=['CustomerId', 'Surname']).dropna()
dataset_dropped.columns



Index(['CreditScore', 'Geography', 'Gender', 'Age', 'Tenure', 'Balance',
       'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary',
       'Exited'],
      dtype='object')

### Task 2.2 (10 points)
* Second Preprocessing that we are going to do is *Shuffle Rows* of the dataset obtained from `Task 2.1`.
* "It is extremely important to shuffle the training data, so that you do not obtain entire minibatches of highly correlated examples. As long as the data has been shuffled, everything should work OK. Different random orderings will perform slightly differently from each other but this will be a small factor that does not matter much." -- [Ian Goodfellow](https://qr.ae/pGBgw8)
* Use a random seed value `4321` in case you will call any stochastic method.
* Make sure to create a variable called `dataset_shuffled` that will store the revised dataset.


In [33]:
random_seed = 4321
dataset_shuffled = dataset_dropped.sample(frac=1, random_state=random_seed)



### Task 2.3: (10 points)

* Third Preprocessing that we will do is X-y Partitioning of the dataset obtained from `Task 2.2`.
* In its current state, the dataset contains both independent (input, `X`) and the target (output, `y`) features within the same dataframe. For ease of of the training process, we need to partition the training features from the target feature into two separate dataframes. 
* Make sure, the following cell contains at least two variables: `X` and `y`:
  * `X` contains part of the dataset with only independent features, and 
  * `y` having only the dependent/target feature.

In [34]:
X = dataset_shuffled.drop(columns=['Exited'])
y = dataset_shuffled['Exited']


### Task 2.4 (10 points)
* Fourth Preprocessing that we will do is Train-Test Split of X, y obtained from `Task 2.3`.
* Now that we have X and y tables with appropriate feature pruning performed, we must split the data into a training partition (`X_train, y_train`) and a testing partition (`X_test, y_test`). 
* The training partitions (`X_train, y_train`) will be used to train your model, while the test partition (`X_test, y_test`) will be set aside during the training steps, and will only be used to evaluate the trained model. 
* Training and test splits should be mutually exclusive to the datasets... i.e., a sample can not be both in training and test sets.
* Please perform a 80-20 split, meaning 80% of the (X,y) dataset will be in (X_train, y_train) split, while, remaining 20% will be in (X_test,y_test) split. 
* Please use random seed `4321` prior to calling any stochastic methods.
* Make sure the following cell contains at least 4 variables: `X_train`, `y_train`, `X_test`, `y_test`.

In [35]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8, random_state=random_seed)

### Task 2.5 (10 points)

* Fifth preprocessing that we will do is the *Conversion of Categorical features to Numerical*
* Please adopt the `One Hot Encoding` method instead of `Label Encoding` while converting the categorical features. 
* Make sure the following cell contains a variable named `X_train_ohe` that would contain one hot encoded `X_train` data; on the two categorical columns: 'Geography','Gender'. Please save the encoder for later use; e.g., encode `X_test` dataset, or any future test sample given to you. Under any circumstance, you must not encode `X_test` independently like you would do for `X_train`.
* Now, encode the `X_test` data using the one hot encoder you saved while you encoded the `X_train`, and name the variable `X_test_ohe`.


* **Both encoding techniques are outlined below**:
> A little background first: Categorical features are features that contain values that are not numeric. It would be absurd to work with non-numeric features if you ask neurons in your ANN to compute the weighted sum of inputs, and then pass through activation function, right? These maths are undefined. An obvious solution you may be intrigued to do is dropping the features! Aha! Wrong!! Every piece of data is precious... may present with valuable insights of the data samples to find the patterns to map inputs with output/targets. So, we should include them. But, how?

The answer is via "Encoding". 

Several types of encodings are used in practice. Here below are just 2 popular ones:
1. **Label Encoding**, where labels are encoded as subsequent numbers. Say, for a categorical feature named "Category" with three categorical values: {“Cat”, “Dog” or “Zebra”} can be encoded to "0", "1", "2" respectively as in figure below. The issue with this type of encoding may unintentionally impose a type of ordering of the categories, that may add bias to the training.


![label-encoding](figures/le.png)

2. **One Hot Encoding**, ignores the ordering of the categories all together. With one-hot, we convert each categorical value into a new categorical column and assign a binary value of 1 or 0 to those columns. Each integer value is represented as a binary vector. All the values are zero, and the index is marked with a 1. Also, don't forget to remove the original categorical features. Here below just an example, how to convert the categorical feature called "Category" having the {“Cat”, “Dog” or “Zebra”} values into three new binary features: "Cat", "Dog", "Zebra".

![label-encoding](figures/ohe.png)

**A note on the Dummy Variable Trap**
The Dummy Variable Trap occurs when two or more dummy variables created by one-hot encoding are highly correlated (i.e., becomes multi-collinear). This means that one variable can be predicted from the others, making it difficult to interpret predicted coefficient variables in regression models. In other words, the individual effect of the dummy variables on the prediction model can not be interpreted well because of multicollinearity.

Using the one-hot encoding method, a new dummy variable is created for each categorical variable to represent the presence (1) or absence (0) of the categorical variable. For example, if tree species is a categorical variable made up of the values pine, or oak, then tree species can be represented as a dummy variable by converting each variable to a one-hot vector. This means that a separate column is obtained for each category, where the first column represents if the tree is pine and the second column represents if the tree is oak. Each column will contain a 0 or 1 if the tree in question is of the column's species. These two columns are multi-collinear since if a tree is pine, then we know it's not oak and vice versa. The machine learning models trained on dataset having this multi-collinearity suffers. A remedy is to drop first (or any one) of the dummy (i.e., one-hot) features created.

In [36]:
from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder(sparse=False, drop='first')
X_train_encoded = encoder.fit_transform(X_train[['Geography', 'Gender']])
X_train_encoded_df = pd.DataFrame(X_train_encoded, columns=encoder.get_feature_names_out(['Geography', 'Gender']))
X_train_ohe = pd.concat([X_train.drop(columns=['Geography', 'Gender']).reset_index(drop=True), X_train_encoded_df], axis=1)
X_test_encoded = encoder.transform(X_test[['Geography', 'Gender']])
X_test_encoded_df = pd.DataFrame(X_test_encoded, columns=encoder.get_feature_names_out(['Geography', 'Gender']))
X_test_ohe = pd.concat([X_test.drop(columns=['Geography', 'Gender']).reset_index(drop=True), X_test_encoded_df], axis=1)





### Task 2.6: (10 points)

* Sixth Preprocessing that we are going to do is *Normalization of X_train_ohe, and X_test_ohe*

* Now that we have all numerical training and test datasets: `X_train_ohe` and `X_test_ohe` respectively, we can normalize each features in both of the datasets. **Normalization** is just one of the way to scale each feature. In class you'll learn a ton of other ways to scale. For this task, let's resort to **Normalization**.

> "The rule of thumb for scaling datasets, is we scale training dataset first, then using the statistics that we learn during the scaling process, we scale the test dataset. We do not learn any new statistics while we scale the test dataset."

* Also, scaling is commonly performed column-wise, and never sample/row wise.

* Make sure the following cell contains the two scaled variables: `X_train_scaled` and `X_test_scaled` based on the requirements mentioned above.

In [37]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train_ohe)
X_test_scaled = scaler.transform(X_test_ohe)


## Task 3: (10 points)
* *Designing your first Artificial Neural Network (ANN) based classifier* using i) **Micrograd**, and ii) **Tensorflow** or PyTorch**.:

  > **Micrograd** by Andrej Karpathy [[video](https://youtu.be/VMj-3S1tku0?si=D5m1IJW5AkJzhvLE)][[git-prepo](https://github.com/karpathy/micrograd.git)]

  > **Keras/Tensorflow** @ Python reference:  please take a look here [https://keras.io/getting-started/sequential-model-guide/](https://keras.io/getting-started/sequential-model-guide/). 

  > **PyTorch** @Python reference: Please take a look at [Deep Learning with PyTorch Guide](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html)
  
  
### Step 1: The ANN architecture
* Let's design the first artificial network architecture for the classifier we would like to build. Here below is one. How did I get this architecture? Maybe in my dream! Haha. Someday you will get one too. Until that, let's follow the architecture below:
  ![Task 3 ANN architecture](figures/nn-1.png)
  * **Input layer** will have 11 units as the dimension of training set: `X_train_scaled` (i.e, number of columns = 11).
  * **First hidden layer** will have 5 neurons, each with "Rectified Linear Unit (`ReLU``)" as activation function.
  * **Second hidden layer** will have 4 neurons, each with "`ReLU`" as activation function.
  * **Output layer** will have just 1 neuron, with `sigmoid`` activation function. 
    * The reason behind a single neuron with `sigmoid` activation at the output layer is that, output of this neuron will tell the probability score of the target outcome: "Exited" True or False. If the output neuron produces value above 0.5, we will say the neural network predicted "True", otherwise, False. This is the beauty of using sigmoid function at the output layer as we can interpret the output value of the neuron as probability score.
* The architecture will come to life when you initiate the training process with training data.
  * The training process needs a g**radient descend based optimizer**, and a convex looking **loss function**.  
  * For this task, let's choose the `adam` optimizer, and the `binary_crossentropy` as the loss function.
### Step 2: The Training process
* Let's start the training process with the training dataset, `X_train_scaled`.
  * Gradient descend based optimization updates run in iterations. When number of iterations equal the total number of training samples, we call that `1 epoch` has passed. Let's continue the training for `25 epochs`. But, you are welcome to run longer than this. There are, however, simpler way to determine if you should early stop your training. 
    * (Optional) Can you extract information about optimization in each epoch? If so, draw a epoch-loss plot, where X-axis needs to show epoch numbers, and Y-axis will show the `binary_crossentropy` loss value in that particular epoch iteration.
* Don't forget to save the model into a file in the `saved_models/` directory so that you can re-use it later for further prediction. Let's give it a name: `model-ann-11-5-4-1-xx` with an extension of your choosing, with `xx` must be replaced by any of `{mc,pt,tf}`, where `mc` to denote if that's a micrograd based model that you are saving, or `pt` for a pytorch model, or `tf` for a tensorflow model.

### Step 3: The Evaluation

#### (part 3.1) Evaluating your model with the entire test dataset:

* Load your trained model `model-ann-11-5-4-1-xx` from the file, and have it predict the entire test set you have at head: (`X_test_scaled`). Luckily, for each of the test sample in the set, you also have ground true `Exited` value in the `y_test`. 
* Please report/print your model's predictive performance on the test set in terms of `accuracy`, `precision`, `recall`, and `F1 scores`.

#### (part 3.2) Evaluating your model with 1 test sample with known Exited value

* Here is a single test sample for which we know the ground true `Exited` value:

| CustomerId | Surname | CreditScore | Geography | Gender | Age | Tenure | Balance | NumOfProducts | HasCrCard | IsActiveMember |EstimatedSalary | Exited |
| :---        |    :----:   |          ---: | :---        |    :----:   |          ---: | :---        |    :----:   |          ---: | :---        |    :----:   |          ---: |          ---: |
| 55443322 | Reynolds |709|Germany|Male|30|9|115479.48|2|1|1|134732.99|0|

* Load your trained model `model-ann-11-5-4-1-xx` from the file, and have it predict the test sample above. Please don't forget to preprocess this test samples so that it is compliant with the input and model requirements.
* Please report whether it predicts a 0 or 1 for the `Exited` target, and also comment whether your model makes a mistake or predicts correctly.

#### (part 3.3) Evaluating your model with 1 test sample without known Exited value

* Here is a single test sample for which we **do not know** the ground true `Exited` value:

| CustomerId | Surname | CreditScore | Geography | Gender | Age | Tenure | Balance | NumOfProducts | HasCrCard | IsActiveMember |EstimatedSalary | 
| :---        |    :----:   |          ---: | :---        |    :----:   |          ---: | :---        |    :----:   |          ---: | :---        |    :----:   |          ---: |
| 55443323 | Nguyen |603|France|Female|76|20|123456.78|5|1|1|55000.00|


* Load your trained model `model-ann-11-5-4-1-xx` from the file, and have it predict the test sample above. Please don't forget to preprocess this test samples so that it is compliant with the input and model requirements.
* Please report whether it predicts a 0 or 1 for the `Exited` target. Can you comment on this data sample whether your model captured the pattern in the population?



In [28]:
import numpy as np

# Define activation functions
class ReLU:
    def __call__(self, x):
        return np.maximum(0, x)

    def gradient(self, x):
        return np.where(x > 0, 1, 0)

class Sigmoid:
    def __call__(self, x):
        return 1 / (1 + np.exp(-x))

    def gradient(self, x):
        return self.__call__(x) * (1 - self.__call__(x))

# Define layer class
class Dense:
    def __init__(self, input_size, output_size, activation):
        self.weights = np.random.randn(input_size, output_size)
        self.bias = np.zeros((1, output_size))
        self.activation = activation()

    def __call__(self, x):
        self.input = x
        self.output = np.dot(x, self.weights) + self.bias
        return self.activation(self.output)

    def backward(self, grad_output, learning_rate):
        grad_input = np.dot(grad_output, self.weights.T)
        grad_weights = np.dot(self.input.T, grad_output)
        grad_bias = np.sum(grad_output, axis=0, keepdims=True)

        self.weights -= learning_rate * grad_weights
        self.bias -= learning_rate * grad_bias

        return grad_input

# Define binary crossentropy loss
def binary_crossentropy(y_true, y_pred):
    return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))

# Define neural network class
class ANN:
    def __init__(self):
        self.layer1 = Dense(11, 5, ReLU)
        self.layer2 = Dense(5, 4, ReLU)
        self.output_layer = Dense(4, 1, Sigmoid)

    def __call__(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.output_layer(x)
        return x

    def backward(self, grad_output, learning_rate):
        grad_output = self.output_layer.backward(grad_output, learning_rate)
        grad_output = self.layer2.backward(grad_output, learning_rate)
        grad_output = self.layer1.backward(grad_output, learning_rate)

    def save_model(self, file_path):
        np.savez(file_path, layer1_weights=self.layer1.weights, layer1_bias=self.layer1.bias,
                 layer2_weights=self.layer2.weights, layer2_bias=self.layer2.bias,
                 output_weights=self.output_layer.weights, output_bias=self.output_layer.bias)

# Training loop
def train(X_train, y_train, epochs, learning_rate):
    model = ANN()
    for epoch in range(epochs):
        # Forward pass
        y_pred = model(X_train)
        
        # Compute loss
        loss = binary_crossentropy(y_train, y_pred)
        
        # Backward pass
        grad_output = y_pred - y_train
        model.backward(grad_output, learning_rate)
        
        # Print loss
        
        print(f"Epoch {epoch}: Loss {loss}")

    # Save model after training
    model.save_model("saved_models/model-ann-11-5-4-1-mc.h5")


    train(X_train_scaled, y_train, epochs=25, learning_rate=0.01)


In [38]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

def create_model():
   
    model = Sequential()
    model.add(Dense(5, input_dim=11, activation='relu'))
    model.add(Dense(4, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    
    return model

model = create_model()

model.summary()

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_14 (Dense)            (None, 5)                 60        
                                                                 
 dense_15 (Dense)            (None, 4)                 24        
                                                                 
 dense_16 (Dense)            (None, 1)                 5         
                                                                 
Total params: 89 (356.00 Byte)
Trainable params: 89 (356.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [39]:
import tensorflow as tf
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

history = model.fit(X_train_scaled, y_train, epochs=25,verbose=1)
model_name = "saved_models/model-ann-11-5-4-1-tf.h5"
model.save(model_name)

model_loaded = tf.keras.models.load_model(model_name)
y_pred = model_loaded.predict(X_test_scaled)
y_pred_binary = (y_pred > 0.5).astype(float) 

accuracy = accuracy_score(y_test, y_pred_binary)
precision = precision_score(y_test, y_pred_binary)
recall = recall_score(y_test, y_pred_binary)
f1 = f1_score(y_test, y_pred_binary)

# Print the evaluation metrics
print("Model Performance on Test Set:")
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")



Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25
 1/54 [..............................] - ETA: 2s

  saving_api.save_model(


Model Performance on Test Set:
Accuracy: 0.8456
Precision: 0.7175
Recall: 0.4420
F1 Score: 0.5470


In [70]:
import numpy as np

# Preprocess the test sample
test_sample = {
    'CreditScore': [709],
    'Age': [30],
    'Tenure': [9],
    'Balance': [115479.48],
    'NumOfProducts': [2],
    'HasCrCard': [1],
    'IsActiveMember': [1],
    'EstimatedSalary': [134732.99],
    'Geography_Germany': [1.0],
    'Geography_Spain': [0.0],
    'Gender_Male': [1.0]
}

# Convert the test sample to a DataFrame
test_df = pd.DataFrame(test_sample)

# Scale the test sample
test_scaled = scaler.fit_transform(test_df)

# Load the trained model
loaded_model = tf.keras.models.load_model("saved_models/model-ann-11-5-4-1-tf.h5")

# Predict the outcome for the test sample
prediction = loaded_model.predict(test_scaled)

# Convert the prediction to binary (0 or 1)
predicted_exit = np.round(prediction)[0][0]

# Ground true Exited value for the test sample
true_exit = 0

# Report the prediction
print("Predicted Exit:", int(predicted_exit))
print("True Exit:", true_exit)

# Comment on the prediction
if predicted_exit == true_exit:
    print("Model predicts correctly!")
else:
    print("Model makes a mistake.")


Predicted Exit: 0
True Exit: 0
Model predicts correctly!


In [71]:
import numpy as np

# Preprocess the test sample
test_sample1 = {
    'CreditScore': [603],
    'Age': [76],
    'Tenure': [20],
    'Balance': [123456.78],
    'NumOfProducts': [5],
    'HasCrCard': [1],
    'IsActiveMember': [1],
    'EstimatedSalary': [55000.00],
    'Geography_Germany': [0.0],
    'Geography_Spain': [0.0],
    'Gender_Male': [0.0]
}

# Convert the test sample to a DataFrame
test_df = pd.DataFrame(test_sample1)

# Scale the test sample
test_scaled = scaler.fit_transform(test_df)

# Load the trained model
loaded_model = tf.keras.models.load_model("saved_models/model-ann-11-5-4-1-tf.h5")

# Predict the outcome for the test sample
prediction = loaded_model.predict(test_scaled)

# Convert the prediction to binary (0 or 1)
predicted_exit = np.round(prediction)[0][0]

# Ground true Exited value for the test sample
true_exit = 0

# Report the prediction
print("Predicted Exit:", int(predicted_exit))
print("True Exit:", true_exit)

# Comment on the prediction
if predicted_exit == true_exit:
    print("Model predicts correctly!")
else:
    print("Model makes a mistake.")


Predicted Exit: 0
True Exit: 0
Model predicts correctly!


## Task 4: (10 points)

* Repeat Task 3 with the following new architecture of the neural network:

![Task 4 ANN architecture](figures/nn-2.png)

* Input layer will still have 11 units as the dimension of training set (i.e, number of columns = 11).
* Hidden-layer-1: 8 neurons, with relu activation
* Hidden-layer-2: 8 neurons, with relu activation,
* Hidden-layer-3: 8 neurons, with relu activation,
* Output-layer: 1 neuron with sigmoid.



In [None]:
import numpy as np

# Define activation functions
class ReLU:
    def __call__(self, x):
        return np.maximum(0, x)

    def gradient(self, x):
        return np.where(x > 0, 1, 0)

class Sigmoid:
    def __call__(self, x):
        return 1 / (1 + np.exp(-x))

    def gradient(self, x):
        return self.__call__(x) * (1 - self.__call__(x))

# Define layer class
class Dense:
    def __init__(self, input_size, output_size, activation):
        self.weights = np.random.randn(input_size, output_size)
        self.bias = np.zeros((1, output_size))
        self.activation = activation()

    def __call__(self, x):
        self.input = x
        self.output = np.dot(x, self.weights) + self.bias
        return self.activation(self.output)

    def backward(self, grad_output, learning_rate):
        grad_input = np.dot(grad_output, self.weights.T)
        grad_weights = np.dot(self.input.T, grad_output)
        grad_bias = np.sum(grad_output, axis=0, keepdims=True)

        self.weights -= learning_rate * grad_weights
        self.bias -= learning_rate * grad_bias

        return grad_input

# Define binary crossentropy loss
def binary_crossentropy(y_true, y_pred):
    return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))

# Define neural network class
class ANN:
    def __init__(self):
        self.layer1 = Dense(11, 8, ReLU)
        self.layer2 = Dense(8, 8, ReLU)
        self.layer3 = Dense(8, 8, ReLU)
        self.output_layer = Dense(8, 1, Sigmoid)

    def __call__(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.output_layer(x)
        return x

    def backward(self, grad_output, learning_rate):
        grad_output = self.output_layer.backward(grad_output, learning_rate)
        grad_output = self.layer2.backward(grad_output, learning_rate)
        grad_output = self.layer1.backward(grad_output, learning_rate)

    def save_model(self, file_path):
        np.savez(file_path, layer1_weights=self.layer1.weights, layer1_bias=self.layer1.bias,
                 layer2_weights=self.layer2.weights, layer2_bias=self.layer2.bias,
                 output_weights=self.output_layer.weights, output_bias=self.output_layer.bias)

# Training loop
def train(X_train, y_train, epochs, learning_rate):
    model = ANN()
    for epoch in range(epochs):
        # Forward pass
        y_pred = model(X_train)
        
        # Compute loss
        loss = binary_crossentropy(y_train, y_pred)
        
        # Backward pass
        grad_output = y_pred - y_train
        model.backward(grad_output, learning_rate)
        
        # Print loss
        
        print(f"Epoch {epoch}: Loss {loss}")

    # Save model after training
    model.save_model("saved_models/model-ann-11-5-4-1-mc.h5")


    train(X_train_scaled, y_train, epochs=25, learning_rate=0.01)


In [40]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

def create_model():
   
    model = Sequential()
    model.add(Dense(8, input_dim=11, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    
    return model

model = create_model()

model.summary()


Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_17 (Dense)            (None, 8)                 96        
                                                                 
 dense_18 (Dense)            (None, 8)                 72        
                                                                 
 dense_19 (Dense)            (None, 8)                 72        
                                                                 
 dense_20 (Dense)            (None, 1)                 9         
                                                                 
Total params: 249 (996.00 Byte)
Trainable params: 249 (996.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [41]:
import tensorflow as tf
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

history = model.fit(X_train_scaled, y_train, epochs=25,verbose=1)
model_name = "saved_models/model-ann-11-5-4-1-tf.h5"
model.save(model_name)

model_loaded = tf.keras.models.load_model(model_name)
y_pred = model_loaded.predict(X_test_scaled)
y_pred_binary = (y_pred > 0.5).astype(float) 

accuracy = accuracy_score(y_test, y_pred_binary)
precision = precision_score(y_test, y_pred_binary)
recall = recall_score(y_test, y_pred_binary)
f1 = f1_score(y_test, y_pred_binary)

# Print the evaluation metrics
print("Model Performance on Test Set:")
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")



Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


  saving_api.save_model(


Model Performance on Test Set:
Accuracy: 0.8415
Precision: 0.6744
Recall: 0.4807
F1 Score: 0.5613


In [72]:
import numpy as np

# Preprocess the test sample
test_sample = {
    'CreditScore': [709],
    'Age': [30],
    'Tenure': [9],
    'Balance': [115479.48],
    'NumOfProducts': [2],
    'HasCrCard': [1],
    'IsActiveMember': [1],
    'EstimatedSalary': [134732.99],
    'Geography_Germany': [1.0],
    'Geography_Spain': [0.0],
    'Gender_Male': [1.0]
}

# Convert the test sample to a DataFrame
test_df = pd.DataFrame(test_sample)

# Scale the test sample
test_scaled = scaler.fit_transform(test_df)

# Load the trained model
loaded_model = tf.keras.models.load_model("saved_models/model-ann-11-5-4-1-tf.h5")

# Predict the outcome for the test sample
prediction = loaded_model.predict(test_scaled)

# Convert the prediction to binary (0 or 1)
predicted_exit = np.round(prediction)[0][0]

# Ground true Exited value for the test sample
true_exit = 0

# Report the prediction
print("Predicted Exit:", int(predicted_exit))
print("True Exit:", true_exit)

# Comment on the prediction
if predicted_exit == true_exit:
    print("Model predicts correctly!")
else:
    print("Model makes a mistake.")


Predicted Exit: 0
True Exit: 0
Model predicts correctly!


In [73]:
import numpy as np

# Preprocess the test sample
test_sample1 = {
    'CreditScore': [603],
    'Age': [76],
    'Tenure': [20],
    'Balance': [123456.78],
    'NumOfProducts': [5],
    'HasCrCard': [1],
    'IsActiveMember': [1],
    'EstimatedSalary': [55000.00],
    'Geography_Germany': [0.0],
    'Geography_Spain': [0.0],
    'Gender_Male': [0.0]
}

# Convert the test sample to a DataFrame
test_df = pd.DataFrame(test_sample1)

# Scale the test sample
test_scaled = scaler.fit_transform(test_df)

# Load the trained model
loaded_model = tf.keras.models.load_model("saved_models/model-ann-11-5-4-1-tf.h5")

# Predict the outcome for the test sample
prediction = loaded_model.predict(test_scaled)

# Convert the prediction to binary (0 or 1)
predicted_exit = np.round(prediction)[0][0]

# Ground true Exited value for the test sample
true_exit = 0

# Report the prediction
print("Predicted Exit:", int(predicted_exit))
print("True Exit:", true_exit)

# Comment on the prediction
if predicted_exit == true_exit:
    print("Model predicts correctly!")
else:
    print("Model makes a mistake.")


Predicted Exit: 0
True Exit: 0
Model predicts correctly!


## Task 5: (10 points)

* Repeat Task 3 with the following new architecture of the neural network:

![Task 5 ANN architecture](figures/nn-3.png)

* Input layer will still have 11 units as the dimension of training set (i.e, number of columns = 11).
* Hidden-layer-1: 8 neurons, with relu activation
* Hidden-layer-2: 4 neurons, with relu activation,
* Hidden-layer-3: 2 neurons, with relu activation,
* Output-layer: 1 neuron with sigmoid.



In [None]:
import numpy as np

# Define activation functions
class ReLU:
    def __call__(self, x):
        return np.maximum(0, x)

    def gradient(self, x):
        return np.where(x > 0, 1, 0)

class Sigmoid:
    def __call__(self, x):
        return 1 / (1 + np.exp(-x))

    def gradient(self, x):
        return self.__call__(x) * (1 - self.__call__(x))

# Define layer class
class Dense:
    def __init__(self, input_size, output_size, activation):
        self.weights = np.random.randn(input_size, output_size)
        self.bias = np.zeros((1, output_size))
        self.activation = activation()

    def __call__(self, x):
        self.input = x
        self.output = np.dot(x, self.weights) + self.bias
        return self.activation(self.output)

    def backward(self, grad_output, learning_rate):
        grad_input = np.dot(grad_output, self.weights.T)
        grad_weights = np.dot(self.input.T, grad_output)
        grad_bias = np.sum(grad_output, axis=0, keepdims=True)

        self.weights -= learning_rate * grad_weights
        self.bias -= learning_rate * grad_bias

        return grad_input

# Define binary crossentropy loss
def binary_crossentropy(y_true, y_pred):
    return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))

# Define neural network class
class ANN:
    def __init__(self):
        self.layer1 = Dense(11, 8, ReLU)
        self.layer2 = Dense(8, 4, ReLU)
        self.layer3 = Dense(4, 2, ReLU)
        self.output_layer = Dense(2, 1, Sigmoid)

    def __call__(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.output_layer(x)
        return x

    def backward(self, grad_output, learning_rate):
        grad_output = self.output_layer.backward(grad_output, learning_rate)
        grad_output = self.layer2.backward(grad_output, learning_rate)
        grad_output = self.layer1.backward(grad_output, learning_rate)

    def save_model(self, file_path):
        np.savez(file_path, layer1_weights=self.layer1.weights, layer1_bias=self.layer1.bias,
                 layer2_weights=self.layer2.weights, layer2_bias=self.layer2.bias,
                 output_weights=self.output_layer.weights, output_bias=self.output_layer.bias)

# Training loop
def train(X_train, y_train, epochs, learning_rate):
    model = ANN()
    for epoch in range(epochs):
        # Forward pass
        y_pred = model(X_train)
        
        # Compute loss
        loss = binary_crossentropy(y_train, y_pred)
        
        # Backward pass
        grad_output = y_pred - y_train
        model.backward(grad_output, learning_rate)
        
        # Print loss
        
        print(f"Epoch {epoch}: Loss {loss}")

    # Save model after training
    model.save_model("saved_models/model-ann-11-5-4-1-mc.h5")


    train(X_train_scaled, y_train, epochs=25, learning_rate=0.01)


In [42]:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

def create_model():
   
    model = Sequential()
    model.add(Dense(8, input_dim=11, activation='relu'))
    model.add(Dense(4, activation='relu'))
    model.add(Dense(2, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    
    return model

model = create_model()

model.summary()


Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_21 (Dense)            (None, 8)                 96        
                                                                 
 dense_22 (Dense)            (None, 4)                 36        
                                                                 
 dense_23 (Dense)            (None, 2)                 10        
                                                                 
 dense_24 (Dense)            (None, 1)                 3         
                                                                 
Total params: 145 (580.00 Byte)
Trainable params: 145 (580.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [43]:
import tensorflow as tf
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

history = model.fit(X_train_scaled, y_train, epochs=25,verbose=1)
model_name = "saved_models/model-ann-11-5-4-1-tf.h5"
model.save(model_name)

model_loaded = tf.keras.models.load_model(model_name)
y_pred = model_loaded.predict(X_test_scaled)
y_pred_binary = (y_pred > 0.5).astype(float) 

accuracy = accuracy_score(y_test, y_pred_binary)
precision = precision_score(y_test, y_pred_binary)
recall = recall_score(y_test, y_pred_binary)
f1 = f1_score(y_test, y_pred_binary)

# Print the evaluation metrics
print("Model Performance on Test Set:")
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")


Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


  saving_api.save_model(


Model Performance on Test Set:
Accuracy: 0.8479
Precision: 0.7205
Recall: 0.4558
F1 Score: 0.5584


In [74]:
import numpy as np

# Preprocess the test sample
test_sample = {
    'CreditScore': [709],
    'Age': [30],
    'Tenure': [9],
    'Balance': [115479.48],
    'NumOfProducts': [2],
    'HasCrCard': [1],
    'IsActiveMember': [1],
    'EstimatedSalary': [134732.99],
    'Geography_Germany': [1.0],
    'Geography_Spain': [0.0],
    'Gender_Male': [1.0]
}

# Convert the test sample to a DataFrame
test_df = pd.DataFrame(test_sample)

# Scale the test sample
test_scaled = scaler.fit_transform(test_df)

# Load the trained model
loaded_model = tf.keras.models.load_model("saved_models/model-ann-11-5-4-1-tf.h5")

# Predict the outcome for the test sample
prediction = loaded_model.predict(test_scaled)

# Convert the prediction to binary (0 or 1)
predicted_exit = np.round(prediction)[0][0]

# Ground true Exited value for the test sample
true_exit = 0

# Report the prediction
print("Predicted Exit:", int(predicted_exit))
print("True Exit:", true_exit)

# Comment on the prediction
if predicted_exit == true_exit:
    print("Model predicts correctly!")
else:
    print("Model makes a mistake.")


Predicted Exit: 0
True Exit: 0
Model predicts correctly!


In [75]:
import numpy as np

# Preprocess the test sample
test_sample1 = {
    'CreditScore': [603],
    'Age': [76],
    'Tenure': [20],
    'Balance': [123456.78],
    'NumOfProducts': [5],
    'HasCrCard': [1],
    'IsActiveMember': [1],
    'EstimatedSalary': [55000.00],
    'Geography_Germany': [0.0],
    'Geography_Spain': [0.0],
    'Gender_Male': [0.0]
}

# Convert the test sample to a DataFrame
test_df = pd.DataFrame(test_sample1)

# Scale the test sample
test_scaled = scaler.fit_transform(test_df)

# Load the trained model
loaded_model = tf.keras.models.load_model("saved_models/model-ann-11-5-4-1-tf.h5")

# Predict the outcome for the test sample
prediction = loaded_model.predict(test_scaled)

# Convert the prediction to binary (0 or 1)
predicted_exit = np.round(prediction)[0][0]

# Ground true Exited value for the test sample
true_exit = 0

# Report the prediction
print("Predicted Exit:", int(predicted_exit))
print("True Exit:", true_exit)

# Comment on the prediction
if predicted_exit == true_exit:
    print("Model predicts correctly!")
else:
    print("Model makes a mistake.")


Predicted Exit: 0
True Exit: 0
Model predicts correctly!
