### Churn Prediction  

The aim is to predict whether a bank's customers leave the bank or not. If the Client has closed his/her bank account, he/she has left.

### Dataset:  

1) RowNumber: corresponds to the record (row) number and has no effect on the output.  
2) CustomerId: contains random values and has no effect on customer leaving the bank.  
3) Surname: the surname of a customer has no impact on their decision to leave the bank.  
4) CreditScore: can have an effect on customer churn, since a customer with a higher credit score is less likely to leave the bank.  
5) Geography: a customer’s location can affect their decision to leave the bank.  
6) Gender: it’s interesting to explore whether gender plays a role in a customer leaving the bank.  
7) Age: this is certainly relevant, since older customers are less likely to leave their bank than younger ones.  

In [1]:
import pickle
import numpy as np
import pandas as pd 

from sklearn.model_selection import train_test_split 
from sklearn.preprocessing import StandardScaler, LabelEncoder, OneHotEncoder


In [2]:
df = pd.read_csv(r'C:\Users\Nitin Flavier\Desktop\Data Nexus\Data Science\ML_BootCamp\ML_Projects\Churn_Modeling_ANN\datasets\churn.csv')
df.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [3]:
df.drop(['RowNumber','CustomerId','Surname'],axis=1,inplace=True)
df.head()

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [4]:
encode_gender = LabelEncoder()
df['Gender'] = encode_gender.fit_transform(df['Gender'])

In [5]:
encode_geography = OneHotEncoder(sparse_output=False)  # will not return scipy.sparse.csr_matrix

tranformed_geo = encode_geography.fit_transform(df[['Geography']])
print(tranformed_geo)
print(encode_geography.get_feature_names_out(['Geography']))

tranformed_geo_dataframe = pd.DataFrame(tranformed_geo,columns=encode_geography.get_feature_names_out(['Geography']))
tranformed_geo_dataframe.head()

[[1. 0. 0.]
 [0. 0. 1.]
 [1. 0. 0.]
 ...
 [1. 0. 0.]
 [0. 1. 0.]
 [1. 0. 0.]]
['Geography_France' 'Geography_Germany' 'Geography_Spain']


Unnamed: 0,Geography_France,Geography_Germany,Geography_Spain
0,1.0,0.0,0.0
1,0.0,0.0,1.0
2,1.0,0.0,0.0
3,1.0,0.0,0.0
4,0.0,0.0,1.0


In [6]:
df = pd.concat([df,tranformed_geo_dataframe],axis=1)
df.head()

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited,Geography_France,Geography_Germany,Geography_Spain
0,619,France,0,42,2,0.0,1,1,1,101348.88,1,1.0,0.0,0.0
1,608,Spain,0,41,1,83807.86,1,0,1,112542.58,0,0.0,0.0,1.0
2,502,France,0,42,8,159660.8,3,1,0,113931.57,1,1.0,0.0,0.0
3,699,France,0,39,1,0.0,2,0,0,93826.63,0,1.0,0.0,0.0
4,850,Spain,0,43,2,125510.82,1,1,1,79084.1,0,0.0,0.0,1.0


In [7]:
df.drop(['Geography'],axis=1,inplace=True)
df.head()

Unnamed: 0,CreditScore,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited,Geography_France,Geography_Germany,Geography_Spain
0,619,0,42,2,0.0,1,1,1,101348.88,1,1.0,0.0,0.0
1,608,0,41,1,83807.86,1,0,1,112542.58,0,0.0,0.0,1.0
2,502,0,42,8,159660.8,3,1,0,113931.57,1,1.0,0.0,0.0
3,699,0,39,1,0.0,2,0,0,93826.63,0,1.0,0.0,0.0
4,850,0,43,2,125510.82,1,1,1,79084.1,0,0.0,0.0,1.0


In [8]:
#save the pickle file
with open(r'..\pickle_files\label_encoder_gender.pkl', 'wb') as file_obj:
    pickle.dump(encode_gender, file_obj)

with open(r'..\pickle_files\ohe_encoder_geography.pkl','wb') as file_obj:
    pickle.dump(encode_geography,file_obj)

In [9]:
df.to_csv(r'C:\Users\Nitin Flavier\Desktop\Data Nexus\Data Science\ML_BootCamp\ML_Projects\Churn_Modeling_ANN\datasets\transformed_churn_data.csv',index_label=False)