## Introduction

In this Kaggle notebook, we tackle the practical challenge of predicting customer churn within the banking sector. With a dataset of 10,000 records at hand, our focus is on uncovering insights and patterns that indicate a customer's likelihood of leaving the bank. By utilizing deep learning techniques, we aim to provide a data-driven approach to identifying potential churn and contributing to effective retention strategies.

For that our job consist of some steps :
1. Getting The Data
2. Cleaning Our Data
3. Data Preproccessing And Feature Engineering
4. Building Our Neural Network Model 
5. Making Predictions 

## Imports

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

## Getting The Data

Since Our data is a csv format it would be a great choise to transforme it into a DataFrame using **read_csv** 

In [None]:
data = pd.read_csv("/kaggle/input/deep-learning-az-ann/Churn_Modelling.csv")
data.head()

## Data Statistics

Now that we got our data , our job is to have a look at it and understand its structure

In [None]:
data.info()

In [None]:
data.describe()

In [None]:
fig, axes = plt.subplots(3, 2, figsize=(24,18))
axes = axes.flatten()
features = ["Geography","Gender","Tenure","NumOfProducts","HasCrCard","IsActiveMember"]
for i in range(len(features)):
    sns.countplot(x=features[i],data=data, palette="Set2",ax=axes[i])
    
plt.show()

In [None]:
sns.countplot(x="Exited",data=data, palette="Set2")

Looking at the info and the charts plotted we can understand that we have a dataset with **10000 observations** with **0 Missing Values** that contains 13 features/inputs and a single target/output which is **Exited** .

We Can also notice that we have 3 catagorical features to encode and the rest is all numerical . We will cover that in the cleaning faze of this notebook.

We Can Conclude from the charts that we have an umbalanced dataset we have 8000 customer that didn't quit and 2000 customer quit and it wouldn't make sens if they were balanced because we would have a huge number of customers that quited the bank .

Looking at the other plots we can see that most of the customers are : from france , males and have a card . Also , we have a balanced Tensure and Activity of Members which explains the quit rate : We have about 5000 unactive members which with time they would quit 

In [None]:
print("Gender Count :")
print(data['Gender'].value_counts())
print("\n")
print("Female Quit Rate :")
print(data[data["Gender"]=="Female"]["Exited"].value_counts())
print("\n")
print("Male Quit Rate :")
print(data[data["Gender"]=="Male"]["Exited"].value_counts())

In [None]:
sns.countplot(data=data, x="Gender", hue="Exited", palette="pastel")

From this Chart we can understand that Females are more likely to quit that Males . Females have a 25% rate to quit where Males have only 16% 

In [None]:
print("Adresses :")
print(data['Geography'].value_counts())
print("\n")
l =["France","Spain","Germany"]
for x in l:
    print(f"{x} Quit Rate :")
    print(data[data["Geography"]==x]["Exited"].value_counts())
    print("\n")

In [None]:
sns.countplot(data=data, x="Geography", hue="Exited", palette="pastel")

After having a look at this plot we can conclude that 16% of France and Spain Members quit where Germany got 32% quitting rate which is the highest amoung the 3 countries.

## Data Cleaning

It's Clear that the **RowNumber**,**CustomerId** and **Surname** Columns don't have any predictive power since they are just some general information about a certain client and for that it would make sens to drop them using **drop** method in pandas DataFrames.
Notice that we are going to set the inplace argument to **True** to apply the modifications to the current data

In [None]:
data.drop(columns=["RowNumber","CustomerId","Surname"],inplace=True,axis=1)
data.head()

Now we still got 2 Categorical features we will simply use 2 maps to transform them into numerical features

In [None]:
map_gen ={"Male":1,"Female":1}
map_geo ={"France":0,"Spain":1,"Germany":2}
data["Gender"]=data["Gender"].map(map_gen)
data["Geography"]=data["Geography"].map(map_geo)
data.head()

## Feature Engineering

We Will Try and explore our data and create some new features to improve the model efficency

Let's Start and classify the Credit Score : we can change our **CreditScore** column form a range of scalars into a catagorical and more easier column to interpret from it .

So we have 5 classes : 
* Exceptional 
* Very Good 
* Good
* Fair 
* Poor

In [None]:
data.loc[ data['CreditScore'] <= 579, 'CreditScore'] = 0
data.loc[(data['CreditScore'] >= 580) & (data['CreditScore'] <= 669), 'CreditScore'] = 1
data.loc[(data['CreditScore'] >= 670) & (data['CreditScore'] <= 739), 'CreditScore']   = 2
data.loc[(data['CreditScore'] >= 740) & (data['CreditScore'] <= 799), 'CreditScore']   = 3
data.loc[ data['CreditScore'] >= 800, 'CreditScore'] = 4
data["CreditScore"]=data["CreditScore"].astype(int)
data.head()

Now we are going to do the same work with the Age Colmun

In [None]:
data.loc[ data['Age'] <= 32, 'Age'] = 0
data.loc[(data['Age'] > 32) & (data['Age'] <= 37), 'Age'] = 1
data.loc[(data['Age'] > 37) & (data['Age'] <= 44), 'Age']   = 2
data.loc[ data['Age'] > 44, 'Age'] = 3
data["Age"]=data["Age"].astype(int)
data.head()

In [None]:
sns.countplot(x="Age",data=data, palette="Set2")

## Data Preprocessing

Scaling data is vital for optimizing deep learning models. Normalizing input features to a uniform range ensures balanced contributions from all attributes. This process stabilizes gradients. In sum, data scaling enhances model generalization and overall performance. For that, We are going to use **MinMaxScaler** but first we are going to split our data into train , validation and test samples using **train_test_split**

In [None]:
feature_matrix=data.drop("Exited",axis=1)
target =data["Exited"]

In [None]:
X_train, X_temp, y_train, y_temp = train_test_split(feature_matrix,target , test_size=0.3, random_state=42,stratify=target)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42,stratify=y_temp)

In [None]:
y_train.value_counts()

In [None]:
scaler = MinMaxScaler()
X_train=scaler.fit_transform(X_train)
X_val=scaler.transform(X_val)
X_test=scaler.transform(X_test)

## Building Our Model

Now We are done with the preparation of our data we will start with building our Neural Network Model and for that we will use a **sequential api model** found in keras inside the TensorFlow library

Building a sequential model means designing a linear neural network where layers are stacked in a sequence. Information flows from input to output through these layers, enabling the model to learn patterns and features at increasing levels of complexity. 

In [None]:
model = tf.keras.models.Sequential([
                                 
            # The first layers must specify the input shape always
            tf.keras.layers.Dense(32, activation='relu', input_shape=(10,)),
            tf.keras.layers.Dense(16, activation='relu'),

            # The last layer usually doesn't have activation function in regression
            tf.keras.layers.Dense(1)                

])

Now to compile the model we will use **binary_crossentropy** as our loss function because our task is to predict a binary value , and we are going to set our optimize to **adam**

In [None]:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

In [None]:
early_stopping = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)
history = model.fit(X_train, y_train, validation_data = (X_val, y_val), epochs=50,callbacks=[early_stopping])

In [None]:
test_loss, test_accuracy = model.evaluate(X_test,y_test)
print(f"Test Loss: {test_loss:.4f} - Test Accuracy: {test_accuracy:.4f}")

That's Great We got a 86% accuracy

### Accuracy variation

In [None]:
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.title('Training and Validation Accuracy')
plt.legend()
plt.show()