## Bank Churn Prediction: Practical Insights into Preprocessing, Model Building, and Predictive Modeling.
In the world of banking, one of the biggest challenges is keeping customers happy and staying with the bank. It's way cheaper to retain existing customers than to acquire new ones. But sometimes, customers decide to leave, and banks want to know why. That's where this project comes in. We're using smart computer methods to figure out when customers might leave.

We start by looking at all the data the bank has collected over time. Then, we clean it up and identify the most important parts. We use data cleaning to ensure the information is accurate and ready for analysis. After that, we employ clever techniques like feature engineering to extract the most useful bits of data. This helps us uncover patterns and trends that might predict if a customer will leave or not.

Once we've got everything prepared, we're building machine learning models using simple but effective methods to predict which customers might leave the bank. But we're not stopping there. We're also testing these models with separate data to ensure their efficiency. By doing all this, we hope to provide banks with the tools they need to keep their customers happy and loyal for the long haul.

### Reading and Analyzing Bank Customer Churn Data.

In [1]:
import pandas as pd

df = pd.read_csv('churn.csv')
df

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.00,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.80,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.00,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.10,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7495,7496,15589541,Sutherland,557,France,Female,27,2,0.00,2,0,1,4497.55,0
7496,7497,15608804,Allan,824,Germany,Male,49,8,133231.48,1,1,1,67885.37,0
7497,7498,15645820,Folliero,698,France,Male,27,7,0.00,2,1,0,111471.55,0
7498,7499,15659031,Mordvinova,630,France,Female,36,8,126598.99,2,1,1,134407.93,0


### Counting Null Values in the Data.

In [2]:
null_values = df.isnull().sum()
null_values

RowNumber          0
CustomerId         0
Surname            0
CreditScore        0
Geography          0
Gender             0
Age                0
Tenure             0
Balance            0
NumOfProducts      0
HasCrCard          0
IsActiveMember     0
EstimatedSalary    0
Exited             0
dtype: int64

### Counting Duplicates.
Wow!!! Our dataset does not have any null values. Now, we need to ensure that the data does not contain any duplicates. Checking for duplicate rows is crucial for maintaining data accuracy. Let's go ahead and check for them.

In [3]:
duplicates = duplicates = df.duplicated().sum()
duplicates

0

### Exited Customer Distribution Analysis.
we need to check the distribution of 'Exited' customers in our dataset.

In [4]:
values = df['Exited'].value_counts()
values

0    5954
1    1546
Name: Exited, dtype: int64

### Removing Unnecessary Columns
After found out the distribution. Now, we need to drop irrelevant columns. This action will enhance the clarity

In [5]:
df.drop(columns=['RowNumber', 'CustomerId', 'Surname'], inplace=True)
df

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,France,Female,42,2,0.00,1,1,1,101348.88,1
1,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,502,France,Female,42,8,159660.80,3,1,0,113931.57,1
3,699,France,Female,39,1,0.00,2,0,0,93826.63,0
4,850,Spain,Female,43,2,125510.82,1,1,1,79084.10,0
...,...,...,...,...,...,...,...,...,...,...,...
7495,557,France,Female,27,2,0.00,2,0,1,4497.55,0
7496,824,Germany,Male,49,8,133231.48,1,1,1,67885.37,0
7497,698,France,Male,27,7,0.00,2,1,0,111471.55,0
7498,630,France,Female,36,8,126598.99,2,1,1,134407.93,0


In [None]:
### Categorizing Dataset Columns.