This dataset contains information about bank customers, which could be used for training an Artificial Neural Network (ANN) for predicting customer churn.
Here's a detailed description of the dataset:

RowNumber: A unique identifier for each row.

CustomerId: Unique identifier for each customer.

Surname: Last name of the customer.

CreditScore: The credit score of the customer, indicating their creditworthiness.

Geography: The country where the customer resides.

Gender: Gender of the customer.

Age: Age of the customer.

Tenure: Number of years the customer has been with the bank.

Balance: The amount of money present in the customer's account.

Number Of Products: Number of bank products the customer has (e.g., accounts, loans).

HasCrCard: Whether the customer has a credit card (1 for yes, 0 for no).

IsActiveMember: Whether the customer is an active member of the bank (1 for yes, 0 for no).

EstimatedSalary: The estimated salary of the customer.

Exited: Whether the customer has churned (left the bank) (1 for yes, 0 for no).

This dataset can be used to build a predictive model to determine the likelihood of a customer leaving the bank (churn prediction). Features such as credit score, age, tenure, balance, number of products, and activity level can be used as input features for the ANN, while the target variable would be the "Exited" column.


Dataset : https://github.com/ezioauditore-tech/AI/blob/main/datasets/Finance_Dataset.csv

In [100]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import accuracy_score, classification_report
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout

df = pd.read_csv('https://raw.githubusercontent.com/ezioauditore-tech/AI/main/datasets/Finance_Dataset.csv')


In [101]:
df

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.00,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.80,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.00,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.10,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9996,15606229,Obijiaku,771,France,Male,39,5,0.00,2,1,0,96270.64,0
9996,9997,15569892,Johnstone,516,France,Male,35,10,57369.61,1,1,1,101699.77,0
9997,9998,15584532,Liu,709,France,Female,36,7,0.00,1,0,1,42085.58,1
9998,9999,15682355,Sabbatini,772,Germany,Male,42,3,75075.31,2,1,0,92888.52,1


## Dropping Null values

In [102]:
df = df.dropna()

In [103]:
df

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.00,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.80,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.00,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.10,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9996,15606229,Obijiaku,771,France,Male,39,5,0.00,2,1,0,96270.64,0
9996,9997,15569892,Johnstone,516,France,Male,35,10,57369.61,1,1,1,101699.77,0
9997,9998,15584532,Liu,709,France,Female,36,7,0.00,1,0,1,42085.58,1
9998,9999,15682355,Sabbatini,772,Germany,Male,42,3,75075.31,2,1,0,92888.52,1


In [104]:
# df['Gender']  = df['Gender'].replace(to_replace=['Male', 'Female'], value=[1, 0])

## Dropping Unnecessary columns

In [105]:
df.drop(["RowNumber", "CustomerId", "Surname"], axis=1, inplace=True)

## Encoding Categorical Variables

In [106]:
label_encoder = LabelEncoder()
df["Geography"] = label_encoder.fit_transform(df['Geography'])
df["Gender"] = label_encoder.fit_transform(df["Gender"])


## Split data into features and target variable

In [107]:
X = df.drop("Exited", axis=1)
y = df["Exited"]


## Split data into training and testing sets

In [108]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

## Feature Scaling

In [109]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


## Building a Neural network

In [110]:
model = Sequential()

model.add(Dense(units=30,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(units=20,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(units=10,activation='relu'))
model.add(Dropout(0.2))

model.add(Dense(units=1,activation='sigmoid'))

# For a binary classification problem
model.compile(loss='binary_crossentropy', optimizer='adam')

In [111]:
model.fit(x=X_train_scaled,y=y_train, validation_data=(X_test_scaled, y_test), batch_size=450, epochs=600)

Epoch 1/600
Epoch 2/600
Epoch 3/600
Epoch 4/600
Epoch 5/600
Epoch 6/600
Epoch 7/600
Epoch 8/600
Epoch 9/600
Epoch 10/600
Epoch 11/600
Epoch 12/600
Epoch 13/600
Epoch 14/600
Epoch 15/600
Epoch 16/600
Epoch 17/600
Epoch 18/600
Epoch 19/600
Epoch 20/600
Epoch 21/600
Epoch 22/600
Epoch 23/600
Epoch 24/600
Epoch 25/600
Epoch 26/600
Epoch 27/600
Epoch 28/600
Epoch 29/600
Epoch 30/600
Epoch 31/600
Epoch 32/600
Epoch 33/600
Epoch 34/600
Epoch 35/600
Epoch 36/600
Epoch 37/600
Epoch 38/600
Epoch 39/600
Epoch 40/600
Epoch 41/600
Epoch 42/600
Epoch 43/600
Epoch 44/600
Epoch 45/600
Epoch 46/600
Epoch 47/600
Epoch 48/600
Epoch 49/600
Epoch 50/600
Epoch 51/600
Epoch 52/600
Epoch 53/600
Epoch 54/600
Epoch 55/600
Epoch 56/600
Epoch 57/600
Epoch 58/600
Epoch 59/600
Epoch 60/600
Epoch 61/600
Epoch 62/600
Epoch 63/600
Epoch 64/600
Epoch 65/600
Epoch 66/600
Epoch 67/600
Epoch 68/600
Epoch 69/600
Epoch 70/600
Epoch 71/600
Epoch 72/600
Epoch 73/600
Epoch 74/600
Epoch 75/600
Epoch 76/600
Epoch 77/600
Epoch 78

<keras.src.callbacks.History at 0x7f14e26a35b0>

In [112]:
predictions = (model.predict(X_test_scaled) > .05).astype("int32")



In [113]:
from sklearn.metrics import classification_report,confusion_matrix, accuracy_score

In [114]:
print(confusion_matrix(y_test,predictions))

[[ 867 1549]
 [  27  557]]


In [115]:
print(classification_report(y_test,predictions))

              precision    recall  f1-score   support

           0       0.97      0.36      0.52      2416
           1       0.26      0.95      0.41       584

    accuracy                           0.47      3000
   macro avg       0.62      0.66      0.47      3000
weighted avg       0.83      0.47      0.50      3000

