### Customer Segmentation (Bank Churn)
Customer segmentation helps in understanding which types of customers are more likely to leave the bank and why.This analysis involves aspects of customer segmentation because different customer groups may have varying churn behaviors.
#### Objective
Develop a database, extract and analyze customer data to gain insights into their behavior and provide key findings along with strategic recommendations to help retain customers who are at risk of leaving.
#### How Customer Segmentation Relates to Bank Churn Analysis?
#### Demographic Segmentation
* Age groups (young, middle-aged,aged)
* Gender (Male vs. Female churn rates)
* Geography (Churn rates by country/city/branch)
#### Behavioral Segmentation
* Transaction frequency.
* Number of products held (credit cards, loans, savings accounts etc)
* Tenure with the bank (new vs. long-term customers etc)
* Account balance (low vs. high-balance customers etc)
* Salary/Income

The dataset is available in link and includes the following features:

* **CustomerId:** A unique identifier for each customer.
* **Surname:** The customer's last name.
* **Geography:** The country where the customer resides (France, Spain, or Germany)
* **Gender:** The customer's gender (Male or Female).
* **Age:** The customer's age.
* **Tenure:** The number of years the customer has been with the bank.
* **Balance:** The customer's account balance.
* **NumOfProducts:** The number of bank products the customer uses (e.g., savings account, credit card).
* **IsActiveMember:** Whether the customer is an active member (1 = yes, 0 = no).
* **EstimatedSalary:** The customer's estimated salary.
* **Exited:** Whether the customer has churned (1 = yes, 0 = no).

In [1]:
#importing libraries
import sqlite3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
#importing dataset
df=pd.read_csv("Bank_Churn.csv")
display(df.head())

#### Data Exploration

In [3]:
print(f"\nDataset Information: {df.info()}")
print(f"\nMissing values present: {df.isna().any()}")
print(f"\nDuplicated values present: {df.duplicated().any()}")
print(f"\nData types: {df.dtypes}")

Index(['CustomerId', 'Surname', 'CreditScore', 'Geography', 'Gender', 'Age',
       'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard', 'IsActiveMember',
       'EstimatedSalary', 'Exited'],
      dtype='object')

#### Building the Database
Lets create a database called **Bank** that will have one table called **Clients_Profile**.

In [7]:
#Defining the path of the database
db_path =r"C:\Users\USER\Desktop\My Work\Database-SQLite\Bank.db"
#XConnecting to the database
conn =sqlite3.connect(db_path)
cursor =conn.cursor()
#creating the table
table = """
CREATE TABLE IF NOT EXISTS Clients_Profile (
    CustomerId INTEGER PRIMARY KEY,
    Surname TEXT,
    CreditScore INTEGER,
    Geography TEXT,
    Gender TEXT,
    Age INTEGER,
    Tenure INTEGER,
    Balance INTEGER,
    NumOfProducts INTEGER,
    HasCrCard INTEGER,
    IsActiveMember REAL,
    EstimatedSalary INTEGER,
    Exited INTEGER
);
"""
cursor.execute(table)
conn.commit()
print("Table 'Clients_Profile' created successfully")
print("===" * 20)
#Importing the dataframe to the database
df.to_sql("Clients_Profile",conn,if_exists="replace",index=False)
print("Data uploaded to 'Clients_Profile' table successfully!")

Table 'Clients_Profile' created successfully
Data uploaded to 'Clients_Profile' table successfully!


#### Lets view the first 5 rows of the table

In [8]:
query ="SELECT* FROM Clients_Profile LIMIT 5"
dff =pd.read_sql(query,conn)
display(dff.head(5))

Unnamed: 0,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


#### a) How many clients does the Bank serve?

In [9]:
cursor.execute("SELECT COUNT(DISTINCT CustomerId) FROM Clients_Profile")
distinct_count =cursor.fetchone()[0]
print(f"\nThe bank provides services to {distinct_count} customers.")

The bank serves 10000 customers


#### b) How many customers have Exited/Churned and whats the Churn Rate?

In [10]:
cursor.execute("SELECT * FROM Clients_Profile WHERE Exited = 1")
exited =len(cursor.fetchall())
print(f"Among the bank's {distinct_count} customers, {exited} have churned.")
print()
print(f"The churn rate is {((exited/10000) * 100):.2f}%.")

Out of 10000 customers, 2037 have exited the bank

Churn Rate: 20.37%


#### c) Churn Rate by Geography?

In [11]:
query ="""
SELECT 
    Geography,
    COUNT(*) as Total_Customers,
    SUM(Exited) AS Churned_Count,
    ROUND(100.0 * SUM(Exited) / COUNT(*), 2) AS "Churn_Rate(%)"
FROM Clients_Profile
GROUP BY Geography
ORDER BY "Churn_Rate(%)" DESC
"""
dff =pd.read_sql(query,conn)
display(dff)

Unnamed: 0,Geography,Total_Customers,Churned_Count,Churn_Rate(%)
0,Germany,2509,814,32.44
1,Spain,2477,413,16.67
2,France,5014,810,16.15


* Germany has the highest churn rate of 32.44% with a total of 814 customers leaving the bank.
* France and Spain have similar churn rates of 16% half lower than Germany’s, indicating more customer retention.
* Despite having 5,014 customers, France has a lower churn rate, suggesting better customer satisfaction or service.

#### d) Churn Rate by Gender

In [12]:
query ="""
SELECT Gender,
       COUNT(*) AS Total_Customers,
       SUM(Exited) AS Churned_Count,
       ROUND(100.0 * SUM(Exited) / COUNT(*), 2) AS "Churn_Rate(%)"
FROM Clients_Profile
GROUP BY Gender
ORDER BY "Churn_Rate(%)" DESC
"""
dff =pd.read_sql(query,conn)
display(dff.head())

Unnamed: 0,Gender,Total_Customers,Churned_Count,Churn_Rate(%)
0,Female,4543,1139,25.07
1,Male,5457,898,16.46


* The majority of customers are male, totaling 5,457, compared to females. However, females have a higher churn rate (25.07%) compared to males (16.46%).
* The bank may need to explore why female customers are leaving,could it be service preferences, product offerings or customer experience.

#### e) Age distribution and churn rate

In [13]:
cursor.execute("SELECT MAX(Age) FROM Clients_Profile")
max_age =cursor.fetchone()[0]
cursor.execute("SELECT MIN(Age) FROM Clients_Profile")
min_age =cursor.fetchone()[0]
print(f"\nThe oldest customer is {max_age} years old")
print(f"\nThe youngest customer is {min_age} years old")


The oldest customer is 92 years old

The youngest customer is 18 years old


In [14]:
query ="""
SELECT 
    CASE
        WHEN Age BETWEEN 18 AND 30 THEN '18-30'
        WHEN Age BETWEEN 31 AND 50 THEN '31-50'
        WHEN Age BETWEEN 51 AND 60 THEN '51-60'
        ELSE '60+'
    END AS Age_Group,
    COUNT(*) AS Total_Customers,
    SUM(Exited) AS Churn_Count,
    ROUND(100.0 * SUM(Exited) / COUNT(*),2) AS "Churn_Rate(%)"
FROM Clients_Profile
GROUP BY Age_Group
ORDER BY "Churn_Rate(%)" DESC
"""
dff =pd.read_sql(query,conn)
display(dff.head())

Unnamed: 0,Age_Group,Total_Customers,Churn_Count,Churn_Rate(%)
0,51-60,797,448,56.21
1,60+,464,115,24.78
2,31-50,6771,1326,19.58
3,18-30,1968,148,7.52


* Customers aged 51-60 have the highest churn rate of 56.21% with more than half of them leaving the bank.
* Older customers age 60 years and above have a lower churn rate of 24.78% than the 51-60 group, possibly indicating greater loyalty.
* The 31-50 group has the largest customer base of 6,771 customers but a moderate churn rate of 19.58%.
* Churn decreases significantly with younger age groups, with the 18-30 group having the lowest churn rate of 7.52%.
* The bank should investigate why mid-aged customers (51-60) leave at such high rates and tailor retention strategies accordingly.

#### Salary Distribution

In [15]:
cursor.execute("SELECT MAX(EstimatedSalary) FROM Clients_Profile")
max_salary =cursor.fetchone()[0]
cursor.execute("SELECT MIN(EstimatedSalary) FROM Clients_Profile")
min_salary =cursor.fetchone()[0]
print(f"\nThe highest salary of a customer is ${max_salary}")
print(f"\nThe lowest salary of a customer is ${min_salary}")


The highest salary of a customer is $199992.48

The lowest salary of a customer is $11.58


In [16]:
query ="""
SELECT
    CASE
        WHEN EstimatedSalary BETWEEN 10 AND 19000 THEN '<20K'
        WHEN EstimatedSalary BETWEEN 20000 AND 49000 THEN '20K-50K'
        WHEN EstimatedSalary BETWEEN 50000 AND 99000 THEN '50K-100K'
        WHEN EstimatedSalary BETWEEN 100000 AND 149000 THEN '100K-150K'
        ELSE '150K-200K'
    END AS Salary_Group,
    COUNT(*) AS Total_Customers,
    SUM(Exited) AS Churn_Count,
    ROUND(100.0 * SUM(Exited) / COUNT(*),2) AS "Churn_Rate(%)"
FROM Clients_Profile
GROUP BY Salary_Group
ORDER BY "Churn_Rate(%)" DESC
"""
dff =pd.read_sql(query,conn)
display(dff.head())

Unnamed: 0,Salary_Group,Total_Customers,Churn_Count,Churn_Rate(%)
0,150K-200K,2666,567,21.27
1,<20K,931,190,20.41
2,100K-150K,2494,503,20.17
3,50K-100K,2495,498,19.96
4,20K-50K,1414,279,19.73


* Customers earning 150K-200K have the highest churn rate (21.27%), suggesting high-income customers may have more banking options.
* Lowest-income customers (<20K) also have a relatively high churn rate (20.41%), possibly due to financial instability.
* Middle-income groups (20K-150K) have similar churn rates (~19-20%), indicating churn is not significantly influenced by salary within this range.
* Retention strategies should focus on high-income customers (who may seek better services) and low-income customers (who may need financial support).

#### Credit Score Distribution

In [29]:
cursor.execute("SELECT MAX(CreditScore) FROM Clients_Profile")
max_score =cursor.fetchone()[0]
cursor.execute("SELECT MIN(CreditScore) FROM Clients_Profile")
min_score =cursor.fetchone()[0]
cursor.execute("SELECT AVG(CreditScore) FROM Clients_Profile")
avg_score =cursor.fetchone()[0]
print(f"\nThe highest credit score is {max_score}")
print(f"\nThe lowest credit score is {min_score}")
print(f"\nThe average credit score is {avg_score:.0f}")


The highest credit score is 850

The lowest credit score is 350

The average credit score is 651


In [18]:
query ="""
SELECT
    CASE
        WHEN CreditScore BETWEEN 350 AND 499 THEN 'Low'
        WHEN CreditScore BETWEEN 500 AND 699 THEN 'Average'
        ELSE 'High'
    END AS CreditScore_Group,
    COUNT(*) AS Total_Customers,
    SUM(Exited) AS Churn_Count,
    ROUND(100.0 * SUM(Exited) / COUNT(*),2) AS "Churn_Rate(%)"
FROM Clients_Profile
GROUP BY CreditScore_Group
ORDER BY "Churn_Rate(%)" DESC
"""
dff =pd.read_sql(query,conn)
display(dff.head())

Unnamed: 0,CreditScore_Group,Total_Customers,Churn_Count,Churn_Rate(%)
0,Low,632,150,23.73
1,Average,6220,1263,20.31
2,High,3148,624,19.82


* Customers with low credit scores have the highest churn rate (23.73%), indicating financial instability or difficulty accessing bank services.
* Average and high credit score groups have similar churn rates (~20%), suggesting credit score alone isn't a strong churn predictor.
* The bank may need targeted retention strategies for low-credit customers, such as financial education or tailored loan options.

#### Tenure Distribution 

In [28]:
cursor.execute("SELECT MAX(Tenure) FROM Clients_Profile")
max_tenure =cursor.fetchone()[0]
cursor.execute("SELECT MIN(Tenure) FROM Clients_Profile")
min_tenure =cursor.fetchone()[0]
cursor.execute("SELECT AVG(Tenure) FROM Clients_Profile")
avg_tenure =cursor.fetchone()[0]
print(f"\nThe longest years a customer has stayed with the bank is {max_tenure}")
print(f"\nThe shortest years a customer has stayed with the bank is {min_tenure}")
print(f"\nThe average tenure of a customer is {avg_tenure:.0f}")


The longest years a customer has stayed with the bank is 10

The shortest years a customer has stayed with the bank is 0

The average tenure of a customer is 5


In [20]:
query ="""
SELECT
    CASE
        WHEN Tenure BETWEEN 0 AND 3 THEN '0-3'
        WHEN Tenure BETWEEN 4 AND 7 THEN '4-7'
        ELSE '8-10'
    END AS Tenure_Group,
    COUNT(*) AS Total_Customers,
    SUM(Exited) AS Churn_Count,
    ROUND(100.0 * SUM(Exited) / COUNT(*),2) AS "Churn_Rate(%)"
FROM Clients_Profile
GROUP BY Tenure_Group
ORDER BY "Churn_Rate(%)" DESC
"""
dff =pd.read_sql(query,conn)
display(dff)

Unnamed: 0,Tenure_Group,Total_Customers,Churn_Count,Churn_Rate(%)
0,0-3,3505,741,21.14
1,8-10,2499,511,20.45
2,4-7,3996,785,19.64


* Churn is highest (21.14%) among customers with a short tenure (0-3 years), indicating early-stage dissatisfaction or unmet expectations.
* Customers with medium tenure (4-7 years) have the lowest churn rate (19.64%), suggesting higher engagement or satisfaction.
* Longer-tenured customers (8-10 years) still experience notable churn (20.45%), possibly due to evolving needs or better offers from competitors.
* The bank should focus on improving early customer experience to reduce churn in the first 3 years.

#### Number of Products Distribution

In [31]:
cursor.execute("SELECT MAX(NumOfProducts) FROM Clients_Profile")
products =cursor.fetchone()[0]
cursor.execute("SELECT AVG(NumOfProducts) FROM Clients_Profile")
avg_products =cursor.fetchone()[0]
print(f"\nThe bank offers {products} types of products")
print(f"\nThe customers have an average of {avg_products:.0f} products")


The bank offers 4 types of products

The customers have an average of 2 products


In [22]:
query ="""
SELECT
    NumOfProducts,
    COUNT(*) AS Total_Customers,
    SUM(Exited) AS Churn_Count,
    ROUND(100.0 * SUM(Exited) / COUNT(*),2) AS "Churn_Rate(%)"
FROM Clients_Profile
GROUP BY NumOfProducts
ORDER BY "Churn_Rate(%)" DESC
"""
dff =pd.read_sql(query,conn)
display(dff.head())

Unnamed: 0,NumOfProducts,Total_Customers,Churn_Count,Churn_Rate(%)
0,4,60,60,100.0
1,3,266,220,82.71
2,1,5084,1409,27.71
3,2,4590,348,7.58


* Customers with 4 products have a 100% churn rate, indicating dissatisfaction or forced retention strategies leading to exits.
* Customers with 3 products also have an extremely high churn rate (82.71%), suggesting that having more than 2 products may not improve loyalty.
* Customers with 1 product have a significantly higher churn rate (27.71%) compared to those with 2 products (7.58%), implying that multi-product engagement enhances retention.
* The bank should focus on encouraging customers with 1 product to adopt a second one while investigating why those with 3+ products churn at such high rates.

#### Has Credit Card Distribution

In [23]:
query ="""
SELECT 
    HasCrCard,
    COUNT(*) AS Total_Customers,
    SUM(Exited) AS Churn_Count,
    ROUND(100.0 * SUM(Exited) / COUNT(*),2) AS "Churn_Rate(%)"
FROM Clients_Profile
GROUP BY HasCrCard
ORDER BY "Churn_Rate(%)" DESC
"""
dff =pd.read_sql(query,conn)
display(dff.head())

Unnamed: 0,HasCrCard,Total_Customers,Churn_Count,Churn_Rate(%)
0,0,2945,613,20.81
1,1,7055,1424,20.18


* Churn rates are nearly identical for customers with (20.18%) and without (20.81%) a credit card, suggesting that having a credit card does not significantly impact retention.
* Since both groups churn at similar rates, other factors (e.g., products, tenure, geography) likely have a stronger influence on customer exits.
The bank may need to explore how credit card usage, rather than just ownership, affects customer loyalty.

### Is Active Members Distribution

In [24]:
query ="""
SELECT 
    IsActiveMember,
    COUNT(*) AS Total_Customers,
    SUM(Exited) AS Churn_Count,
    ROUND(100.0 * SUM(Exited) / COUNT(*),2) AS "Churn_Rate(%)"
FROM Clients_Profile
GROUP BY IsActiveMember
ORDER BY "Churn_ Rate(%)" 
"""
dff =pd.read_sql(query,conn)
display(dff.head())

Unnamed: 0,IsActiveMember,Total_Customers,Churn_Count,Churn_Rate(%)
0,0,4849,1302,26.85
1,1,5151,735,14.27


* Inactive members have a much higher churn rate (26.85%) compared to active members (14.27%), indicating that engagement strongly influences retention.
* Despite a similar number of total customers, inactive members account for nearly twice as many churned customers (1,302 vs. 735).
* The bank should implement engagement strategies (e.g., personalized offers, loyalty programs) to encourage inactive members to become more active and reduce churn.

#### Recommendations
* **Improve Retention for Older Customers (51-60 years):**
Focus on understanding the specific reasons for high churn in this age group. Consider offering tailored financial products or services that address their unique needs and challenges.
* **Enhance Female Customer Retention:**
Investigate the reasons behind the higher churn rate among female customers and adjust services, communication, or product offerings to better meet their expectations.
* **Target High-Income and Low-Income Customers:**
For high-income customers (150K-200K), consider offering premium services or loyalty programs to increase retention.
For low-income customers (<20K), implement financial support initiatives, such as budgeting tools or low-interest loans, to foster loyalty.
* **Focus on Short Tenure Customers (0-3 years):**
Improve early customer experiences, possibly through onboarding programs, personalized offers, or financial education to reduce churn in the first few years.
* **Encourage Multi-Product Usage:**
Promote bundling of products to customers with only one product, offering incentives for adopting additional products to increase retention. Investigate the high churn rates among customers with 3 or more products to identify pain points.
* **Leverage Credit Cards to Increase Engagement:**
While credit card ownership doesn’t directly reduce churn, focus on making credit card usage more attractive (e.g., rewards, benefits, lower interest rates) to keep customers engaged.
* **Engage Inactive Members:**
Develop re-engagement campaigns for inactive members (e.g., personalized offers, reminders, or product recommendations) to lower churn and boost retention.
* **Segment Retention Strategies Based on Age and Tenure:**
Offer targeted products or retention strategies for different age groups and tenure levels, e.g., offering retirement-focused products for older customers or specialized loans for customers with longer tenures.