## 📊 Notebook 1: SQL Queries for Exploratory Data Analysis

#### Goal: Structured SQL queries to extract business-ready insights from cleaned PostgreSQL database.

#### **BASIC CHURN OVERVIEW QUERIES**

#### 📊 Query 1: Basic Churn Distribution

In [2]:
import pandas as pd
from sqlalchemy import create_engine

# PostgreSQL connection details
DB_NAME = "telco_churn_db"
DB_USER = "hridyanshkatal"   # replace with your mac username
DB_HOST = "localhost"
DB_PORT = "5432"

# ✅ Connection string
engine = create_engine(f'postgresql://{DB_USER}@{DB_HOST}:{DB_PORT}/{DB_NAME}')

# ✅ SQL Query (same as your first EDA query)
query = """
SELECT 
    Churn,
    COUNT(*) AS total_customers,
    ROUND(COUNT(*) * 100.0 / (SELECT COUNT(*) FROM telco_churn), 2) AS churn_percentage
FROM telco_churn
GROUP BY Churn;
"""

# ✅ Fetch into DataFrame
df_churn_distribution = pd.read_sql(query, engine)

print(df_churn_distribution)


  churn  total_customers  churn_percentage
0    No             5174             73.46
1   Yes             1869             26.54


**Insight:** Overall churn rate is ~26-27%.

**Reason:** Significant proportion of customer base is disengaging.

**Takeaway:** Reducing churn by even 5-10% can significantly impact profitability.

#### 📊 Query 2: Gender vs Churn

In [3]:
query2 = """
SELECT 
    gender,
    Churn,
    COUNT(*) AS customer_count,
    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY gender), 2) AS churn_percentage
FROM telco_churn
GROUP BY gender, Churn
ORDER BY gender, Churn;
"""

df_gender_churn = pd.read_sql(query2, engine)
print(df_gender_churn)


   gender churn  customer_count  churn_percentage
0  Female    No            2549             73.08
1  Female   Yes             939             26.92
2    Male    No            2625             73.84
3    Male   Yes             930             26.16


**Insight:** Gender has negligible impact; churn rate is similar across male and female customers.

**Reason:** Service factors likely outweigh gender-based preferences.

**Takeaway:** Focus on service quality rather than demographic targeting by gender.

#### 📊 Query 3: Senior Citizen vs Churn

In [4]:
query3 = """
SELECT 
    SeniorCitizen,
    Churn,
    COUNT(*) AS customer_count,
    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY SeniorCitizen), 2) AS churn_percentage
FROM telco_churn
GROUP BY SeniorCitizen, Churn
ORDER BY SeniorCitizen, Churn;
"""

df_senior_churn = pd.read_sql(query3, engine)
print(df_senior_churn)


   seniorcitizen churn  customer_count  churn_percentage
0              0    No            4508             76.39
1              0   Yes            1393             23.61
2              1    No             666             58.32
3              1   Yes             476             41.68


**Insight:** Senior citizens have a churn rate of ~41%, significantly higher than others (~24%).

**Reason:** Possibly less tech-savvy or more price-sensitive.

**Takeaway:** Targeted retention programs (e.g., assisted support) should be introduced for senior citizens.

### 📊 Query 4: Churn Rate by Contract

In [6]:
query4 = """
SELECT 
    Contract,
    ROUND(SUM(CASE WHEN Churn = 'Yes' THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 2) AS churn_percentage,
    COUNT(*) AS total_customers
FROM telco_churn
GROUP BY Contract
ORDER BY churn_percentage DESC;
"""

df_contract_churn = pd.read_sql(query4, engine)
print(df_contract_churn)
df_contract_churn.to_csv('../data/contract_churn.csv', index=False)



         contract  churn_percentage  total_customers
0  Month-to-month             42.71             3875
1        One year             11.27             1473
2        Two year              2.83             1695


**Insight:** Month-to-month contracts have highest churn (>40%), while 2-year contracts are most stable. This shows that longer the contract duration, less likely people tend to churn.

### 📊 Query 5: Churn Rate by Tenure Range
We'll group tenure into categories to see how customer churn varies based on how long they've been with the company.

In [44]:
query5 = """
SELECT 
    CASE 
        WHEN tenure BETWEEN 0 AND 6 THEN '0-6 Months'
        WHEN tenure BETWEEN 7 AND 12 THEN '6-12 Months'
        WHEN tenure BETWEEN 13 AND 24 THEN '1-2 Years'
        WHEN tenure BETWEEN 25 AND 48 THEN '2-4 Years'
        WHEN tenure > 48 THEN '4+ Years'
    END AS tenure_group,
    ROUND(SUM(CASE WHEN Churn='Yes' THEN 1 ELSE 0 END)*100.0/COUNT(*), 2) AS churn_percentage,
    COUNT(*) AS total_customers
FROM telco_churn
GROUP BY tenure_group
ORDER BY churn_percentage DESC;
"""

df_tenure_churn = pd.read_sql(query5, engine)
df_tenure_churn


Unnamed: 0,tenure_group,churn_percentage,total_customers
0,0-6 Months,52.94,1481
1,6-12 Months,35.89,705
2,1-2 Years,28.71,1024
3,2-4 Years,20.39,1594
4,4+ Years,9.51,2239


**Insight**:
- **0-6 Month customers churn at the highest rate**, often crossing **50% churn**.
- This segment is the most **vulnerable period for losing customers**.

**Reason**:
Early customers are still forming opinions → **bad onboarding**, **pricing shock**, or **service dissatisfaction** that leads to quick churn.

**Takeaway**:
Implement **“First 90-day save” strategies** like welcome offers, onboarding support, and proactive outreach.


### 📊 Query 6: Churn Rate by Monthly Charges Buckets
Here we analyze if customers who pay more monthly tend to churn more frequently.


In [14]:
query6 = """
SELECT 
    CASE 
        WHEN MonthlyCharges < 35 THEN 'Low (<35)'
        WHEN MonthlyCharges BETWEEN 35 AND 75 THEN 'Medium (35-75)'
        WHEN MonthlyCharges > 75 THEN 'High (>75)'
    END AS charges_group,
    ROUND(SUM(CASE WHEN Churn='Yes' THEN 1 ELSE 0 END)*100.0/COUNT(*), 2) AS churn_percentage,
    COUNT(*) AS total_customers
FROM telco_churn
GROUP BY charges_group
ORDER BY churn_percentage DESC;
"""

df_charges_churn = pd.read_sql(query6, engine)
df_charges_churn


Unnamed: 0,charges_group,churn_percentage,total_customers
0,High (>75),34.65,3120
1,Medium (35-75),27.37,2192
2,Low (<35),10.86,1731


**Insight**:  
You'll see **High spenders (>75)** churn more. **Medium and Low spenders** have better retention. So Charges are an important factor.


### 📊 Query 7: Churn by Number of Services Subscribed
Let's see if customers with more services (Internet, Streaming, etc.) are less likely to churn.


In [16]:
query7 = """
SELECT 
    ( 
        (CASE WHEN PhoneService = 'Yes' THEN 1 ELSE 0 END) +
        (CASE WHEN InternetService != 'No' THEN 1 ELSE 0 END) +
        (CASE WHEN OnlineSecurity = 'Yes' THEN 1 ELSE 0 END) +
        (CASE WHEN OnlineBackup = 'Yes' THEN 1 ELSE 0 END) +
        (CASE WHEN DeviceProtection = 'Yes' THEN 1 ELSE 0 END) +
        (CASE WHEN TechSupport = 'Yes' THEN 1 ELSE 0 END) +
        (CASE WHEN StreamingTV = 'Yes' THEN 1 ELSE 0 END) +
        (CASE WHEN StreamingMovies = 'Yes' THEN 1 ELSE 0 END)
    ) AS total_services,
    ROUND(SUM(CASE WHEN Churn='Yes' THEN 1 ELSE 0 END)*100.0/COUNT(*), 2) AS churn_percentage,
    COUNT(*) AS total_customers
FROM telco_churn
GROUP BY total_services
ORDER BY churn_percentage DESC;
"""

df_services_churn = pd.read_sql(query7, engine)
df_services_churn


Unnamed: 0,total_services,churn_percentage,total_customers
0,2,51.58,727
1,3,43.47,996
2,4,34.68,1041
3,5,27.21,1062
4,6,22.01,827
5,7,12.57,525
6,1,9.22,1606
7,8,5.79,259


**Insight**: More services → **less churn**. Customers with only **2-3 services churn rapidly**, multi-service customers are more loyal.

**Reason**:
Multi-service customers find it **inconvenient to switch**, plus **bundled benefits** reduce churn.

**Takeaway**:
Promote **cross-selling** (especially within first 6 months), encourage **bundle upgrades** to boost service counts → reduce churn.


### 📊 Query 8: Churn Rate by Internet Service Type
We'll analyze churn behavior based on internet service type (DSL, Fiber optic, None).


In [19]:
query8 = """
SELECT 
    InternetService,
    ROUND(SUM(CASE WHEN Churn='Yes' THEN 1 ELSE 0 END)*100.0/COUNT(*), 2) AS churn_percentage,
    COUNT(*) AS total_customers
FROM telco_churn
GROUP BY InternetService
ORDER BY churn_percentage DESC;
"""

df_internet_churn = pd.read_sql(query8, engine)
df_internet_churn


Unnamed: 0,internetservice,churn_percentage,total_customers
0,Fiber optic,41.89,3096
1,DSL,18.96,2421
2,No,7.4,1526


**Insight**:
**Fiber optic users churn the most**, **DSL users in the middle**, and **no internet users churn the least**.

**Possible Reason**: 
Fiber optic customers might churn due to higher cost or service issues, important for targeted retention strategies.


### 📊 Query 9: Churn Rate by Payment Method
We'll analyze which payment methods have the highest churn rates.


In [21]:
query9 = """
SELECT 
    PaymentMethod,
    ROUND(SUM(CASE WHEN Churn='Yes' THEN 1 ELSE 0 END)*100.0/COUNT(*), 2) AS churn_percentage,
    COUNT(*) AS total_customers
FROM telco_churn
GROUP BY PaymentMethod
ORDER BY churn_percentage DESC;
"""

df_payment_churn = pd.read_sql(query9, engine)
df_payment_churn


Unnamed: 0,paymentmethod,churn_percentage,total_customers
0,Electronic check,45.29,2365
1,Mailed check,19.11,1612
2,Bank transfer (automatic),16.71,1544
3,Credit card (automatic),15.24,1522


**Insight**:
We find **Electronic and Mailed Check customers churn the most**, while **Credit Card (AutoPay)** and **Bank Transfer (AutoPay)** customers churn less.  

**Takeaway**: 
Promoting **AutoPay** could reduce churn.


In [22]:
df_payment_churn.to_csv('../data/payment_churn.csv', index=False)
df_internet_churn.to_csv('../data/internet_churn.csv', index=False)


### 📊 Query 10: Churn Rate by Online Security Subscription
We'll see if having online security service impacts churn rates.

In [23]:
query10 = """
SELECT 
    OnlineSecurity,
    ROUND(SUM(CASE WHEN Churn='Yes' THEN 1 ELSE 0 END)*100.0/COUNT(*), 2) AS churn_percentage,
    COUNT(*) AS total_customers
FROM telco_churn
GROUP BY OnlineSecurity
ORDER BY churn_percentage DESC;
"""

df_onlinesecurity_churn = pd.read_sql(query10, engine)
df_onlinesecurity_churn


Unnamed: 0,onlinesecurity,churn_percentage,total_customers
0,No,41.77,3498
1,Yes,14.61,2019
2,No internet service,7.4,1526


**Insight**: Customers **without Online Security churn at a much higher rate**.

**Possible Reason**: Online Security is usually part of higher-tier packages or bundling, those who don’t have it may be **low-engagement customers**.

**Takeaway**: Promoting **Online Security bundles** during onboarding may **reduce churn**.


### 📊 Query 11: Churn Rate by Tech Support Subscription
We analyze if access to tech support correlates with lower churn.


In [24]:
query11 = """
SELECT 
    TechSupport,
    ROUND(SUM(CASE WHEN Churn='Yes' THEN 1 ELSE 0 END)*100.0/COUNT(*), 2) AS churn_percentage,
    COUNT(*) AS total_customers
FROM telco_churn
GROUP BY TechSupport
ORDER BY churn_percentage DESC;
"""

df_techsupport_churn = pd.read_sql(query11, engine)
df_techsupport_churn


Unnamed: 0,techsupport,churn_percentage,total_customers
0,No,41.64,3473
1,Yes,15.17,2044
2,No internet service,7.4,1526


**Insight**: Customers **without Tech Support churn significantly more**.

**Possible Reason**: Lack of help when issues arise pushes customers to competitors, especially in broadband services.

**Takeaway**: **Tech Support could be provided without cost as a loyalty booster** for new customers for a few months.




### 📊 Query 12: Churn Rate by StreamingTV + StreamingMovies Subscription


In [25]:
query12 = """
SELECT 
    StreamingTV,
    StreamingMovies,
    ROUND(SUM(CASE WHEN Churn='Yes' THEN 1 ELSE 0 END)*100.0/COUNT(*), 2) AS churn_percentage,
    COUNT(*) AS total_customers
FROM telco_churn
GROUP BY StreamingTV, StreamingMovies
ORDER BY churn_percentage DESC;
"""

df_streamingtv_churn = pd.read_sql(query12, engine)
df_streamingtv_churn


Unnamed: 0,streamingtv,streamingmovies,churn_percentage,total_customers
0,No,No,34.44,2018
1,Yes,No,31.68,767
2,No,Yes,31.19,792
3,Yes,Yes,29.43,1940
4,No internet service,No internet service,7.4,1526


**Insight**:
- Customers **without any streaming services churn the most** (34-35%).
- Those with **either StreamingTV or StreamingMovies churn ~31%**.
- Customers with **both streaming services churn the least (29%)**, except for those without internet (~7%).

**Possible Reason**:
Customers with **multiple services (especially entertainment services)** are more engaged and feel they are getting more value, which reduces their chances to leave.

**Takeaway**:
Company should **bundle StreamingTV + Movies with contracts or promotions** to increase retention.  
They should **target non-streaming users** for upgrades since they churn significantly more.


### 📊 Query 13: Churn Rate by Paperless Billing
We'll check if customers using Paperless Billing churn more compared to those who don’t.


In [30]:
query13 = """
SELECT 
    PaperlessBilling,
    Contract,
    ROUND(SUM(CASE WHEN Churn='Yes' THEN 1 ELSE 0 END)*100.0/COUNT(*), 2) AS churn_percentage,
    COUNT(*) AS total_customers
FROM telco_churn
GROUP BY PaperlessBilling, Contract
ORDER BY churn_percentage DESC;
"""

df_paperless_churn = pd.read_sql(query13, engine)
df_paperless_churn


Unnamed: 0,paperlessbilling,contract,churn_percentage,total_customers
0,Yes,Month-to-month,48.3,2586
1,No,Month-to-month,31.5,1289
2,Yes,One year,14.75,800
3,No,One year,7.13,673
4,Yes,Two year,4.2,785
5,No,Two year,1.65,910


**Insight**:
Customers with **Paperless Billing churn significantly more** (usually 33-35%).

**Possible Reason**:
Paperless Billing is commonly used by **month-to-month customers** or **price-sensitive segments** who are more likely to churn.

**Takeaway**:
Instead of focusing only on Paperless Billing adoption, **target Paperless customers with retention offers**, especially those on short contracts.


### 📊 Query 14: Churn Rate by Online Backup Subscription
Let's see if subscribing to Online Backup reduces churn.


In [32]:
query14 = """
SELECT 
    OnlineBackup,
    ROUND(SUM(CASE WHEN Churn='Yes' THEN 1 ELSE 0 END)*100.0/COUNT(*), 2) AS churn_percentage,
    COUNT(*) AS total_customers
FROM telco_churn
GROUP BY OnlineBackup
ORDER BY churn_percentage DESC;
"""

df_backup_churn = pd.read_sql(query14, engine)
df_backup_churn


Unnamed: 0,onlinebackup,churn_percentage,total_customers
0,No,39.93,3088
1,Yes,21.53,2429
2,No internet service,7.4,1526


**Insight**:
Customers **without Online Backup churn more** than those with it.

**Possible Reason**:
Customers using **extra services** like Online Backup have **higher engagement** and **percieved value**, reducing churn risk.

**Takeaway**:
Focus on **cross-selling Online Backup** especially to new or at-risk customers.


### 📊 Query 15: Churn Rate by Device Protection Subscription
We'll explore if device protection subscriptions correlate with lower churn.


In [33]:
query15 = """
SELECT 
    DeviceProtection,
    ROUND(SUM(CASE WHEN Churn='Yes' THEN 1 ELSE 0 END)*100.0/COUNT(*), 2) AS churn_percentage,
    COUNT(*) AS total_customers
FROM telco_churn
GROUP BY DeviceProtection
ORDER BY churn_percentage DESC;
"""

df_deviceprotection_churn = pd.read_sql(query15, engine)
df_deviceprotection_churn


Unnamed: 0,deviceprotection,churn_percentage,total_customers
0,No,39.13,3095
1,Yes,22.5,2422
2,No internet service,7.4,1526


**Insight**:
Customers **without Device Protection churn considerably more**.

**Possible Reason**:
Device Protection is typically part of **premium service bundles**, customers opting in are more committed.

**Takeaway**:
Encourage **Device Protection subscriptions** in early customer lifecycle to enhance stickiness and reduce churn.


### 📊 Query 16: Churn Rate by Phone Service and Multiple Lines
We'll check if customers with **PhoneService** and **MultipleLines** churn more, less, or stay the same.


In [34]:
query16 = """
SELECT 
    PhoneService,
    MultipleLines,
    ROUND(SUM(CASE WHEN Churn='Yes' THEN 1 ELSE 0 END)*100.0/COUNT(*), 2) AS churn_percentage,
    COUNT(*) AS total_customers
FROM telco_churn
GROUP BY PhoneService, MultipleLines
ORDER BY churn_percentage DESC;
"""

df_phone_multiline_churn = pd.read_sql(query16, engine)
df_phone_multiline_churn


Unnamed: 0,phoneservice,multiplelines,churn_percentage,total_customers
0,Yes,Yes,28.61,2971
1,Yes,No,25.04,3390
2,No,No phone service,24.93,682


**Insight**:
- Typically, customers with **No PhoneService churn the least** (often older or fixed-line-only customers).
- **MultipleLines** doesn’t always reduce churn — **churn can remain high** in this group, especially when bundled poorly.

**Possible Reason**:
Phone Service and Multiple Lines are **basic utilities**, but **additional lines without bundled benefits** may not motivate customer retention.

**Takeaway**:
Focus less on selling **Multiple Lines** as a churn-reducer instead, **bundle multiple lines with entertainment or support services** to increase retention.


### 📊 Query 17: Churn Rate by Total Charges Range
We'll group customers into **spending brackets** based on their **TotalCharges** and see how overall lifetime spend relates to churn rate.


In [35]:
query17 = """
SELECT 
    CASE 
        WHEN TotalCharges < 1000 THEN 'Low (<1k)'
        WHEN TotalCharges BETWEEN 1000 AND 2500 THEN 'Medium (1k-2.5k)'
        WHEN TotalCharges BETWEEN 2500 AND 5000 THEN 'High (2.5k-5k)'
        WHEN TotalCharges BETWEEN 5000 AND 8000 THEN 'Premium (5k-8k)'
        WHEN TotalCharges > 8000 THEN 'Very Premium (>8k)'
        ELSE 'Unknown'
    END AS totalcharges_group,
    ROUND(SUM(CASE WHEN Churn = 'Yes' THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 2) AS churn_percentage,
    COUNT(*) AS total_customers
FROM telco_churn
WHERE TotalCharges IS NOT NULL
GROUP BY totalcharges_group
ORDER BY churn_percentage DESC;
"""

df_totalcharges_churn = pd.read_sql(query17, engine)
df_totalcharges_churn


Unnamed: 0,totalcharges_group,churn_percentage,total_customers
0,Low (<1k),36.99,2893
1,Medium (1k-2.5k),22.59,1642
2,High (2.5k-5k),19.9,1362
3,Premium (5k-8k),14.57,1057
4,Very Premium (>8k),3.85,78


**Insight**:
- **Low-spending customers (<1000 TotalCharges)** churn the most.
- **Higher-spending customers (>5000)** churn significantly less.

**Possible Reason**:
Lower-spending customers are often **new customers** or **low-engagement users**, leading to higher churn. High-spenders are **long-term loyal customers** with higher perceived value.

**Takeaway**:
Focus retention strategies on **early-stage customers (low TotalCharges)**, examples include onboarding offers, first 90-days discounts, and personalized engagement.


#### **ADVANCED INSIGHTS**

### 📊 Query 18: Churn Rate by Contract Type and Payment Method
We'll explore how **contract length + payment method combination** affects churn.


In [36]:
query18 = """
SELECT 
    Contract, 
    PaymentMethod,
    ROUND(SUM(CASE WHEN Churn='Yes' THEN 1 ELSE 0 END)*100.0/COUNT(*), 2) AS churn_percentage,
    COUNT(*) AS total_customers
FROM telco_churn
GROUP BY Contract, PaymentMethod
ORDER BY churn_percentage DESC;
"""

df_contract_payment = pd.read_sql(query18, engine)
df_contract_payment


Unnamed: 0,contract,paymentmethod,churn_percentage,total_customers
0,Month-to-month,Electronic check,53.73,1850
1,Month-to-month,Bank transfer (automatic),34.13,589
2,Month-to-month,Credit card (automatic),32.78,543
3,Month-to-month,Mailed check,31.58,893
4,One year,Electronic check,18.44,347
5,One year,Credit card (automatic),10.3,398
6,One year,Bank transfer (automatic),9.72,391
7,Two year,Electronic check,7.74,168
8,One year,Mailed check,6.82,337
9,Two year,Bank transfer (automatic),3.37,564


**Insight**: 
- Highest churn among **Month-to-month + Electronic Check** users.
- Lowest churn in **Two year + AutoPay (Credit Card/Bank Transfer)** users.

**Reason**: 
Short contracts + manual payments → **flexible to leave** anytime.

**Takeaway**: 
**Push longer contracts with AutoPay schemes** to reduce churn risk.


### 📊 Query 19: Who Are the Most At-Risk Churners? (Tenure + Cost)


In [39]:
query19 = """
SELECT 
    CASE 
        WHEN tenure <= 12 THEN '0-1 Year'
        WHEN tenure BETWEEN 13 AND 24 THEN '1-2 Years'
        WHEN tenure BETWEEN 25 AND 48 THEN '2-4 Years'
        ELSE '4+ Years' 
    END AS tenure_group,
    CASE 
        WHEN MonthlyCharges < 35 THEN 'Low (<35)'
        WHEN MonthlyCharges BETWEEN 35 AND 70 THEN 'Medium (35-70)'
        ELSE 'High (>70)'
    END AS monthly_charges_group,
    ROUND(SUM(CASE WHEN Churn='Yes' THEN 1 ELSE 0 END)*100.0/COUNT(*), 2) AS churn_percentage,
    COUNT(*) AS total_customers
FROM telco_churn
GROUP BY tenure_group, monthly_charges_group
ORDER BY churn_percentage DESC;
"""

df_risk_segments = pd.read_sql(query19, engine)
df_risk_segments


Unnamed: 0,tenure_group,monthly_charges_group,churn_percentage,total_customers
0,0-1 Year,High (>70),67.89,875
1,1-2 Years,High (>70),45.61,478
2,0-1 Year,Medium (35-70),43.3,672
3,2-4 Years,High (>70),32.06,839
4,0-1 Year,Low (<35),23.79,639
5,1-2 Years,Medium (35-70),22.46,276
6,4+ Years,High (>70),13.37,1391
7,2-4 Years,Medium (35-70),10.5,381
8,4+ Years,Medium (35-70),5.25,400
9,1-2 Years,Low (<35),5.19,270


**Insight**: 
- Highest churn → **New customers (0-1 year) + High Monthly Charges**.
- Lowest churn → **Old customers (4+ years) + Low Monthly Charges**.

**Reason**: 
**Early dissatisfaction with high charges** causes quick drop-off.

**Takeaway**: 
**Special onboarding focus** on **high-paying new customers**; targeted retention within first year.


### 📊 Query 20: Churn Rate by Engagement Score Levels
We'll group customers by **Engagement Score Levels** and see how churn varies across these segments.
**Engagement Score** = Total number of services subscribed + tenure + avg monthly spend.

In [43]:
query20 = """
WITH engagement_data AS (
    SELECT 
        customerID,
        (
            (CASE WHEN PhoneService='Yes' THEN 1 ELSE 0 END) +
            (CASE WHEN InternetService!='No' THEN 1 ELSE 0 END) +
            (CASE WHEN OnlineSecurity='Yes' THEN 1 ELSE 0 END) +
            (CASE WHEN OnlineBackup='Yes' THEN 1 ELSE 0 END) +
            (CASE WHEN DeviceProtection='Yes' THEN 1 ELSE 0 END) +
            (CASE WHEN TechSupport='Yes' THEN 1 ELSE 0 END) +
            (CASE WHEN StreamingTV='Yes' THEN 1 ELSE 0 END) +
            (CASE WHEN StreamingMovies='Yes' THEN 1 ELSE 0 END) +
            (tenure/10) +
            (MonthlyCharges/20)
        ) AS engagement_score,
        Churn
    FROM telco_churn
),
bucketed_data AS (
    SELECT 
        CASE 
            WHEN engagement_score < 5 THEN 'Very Low'
            WHEN engagement_score BETWEEN 5 AND 8 THEN 'Low'
            WHEN engagement_score BETWEEN 8 AND 11 THEN 'Medium'
            WHEN engagement_score BETWEEN 11 AND 14 THEN 'High'
            ELSE 'Very High'
        END AS engagement_level,
        Churn
    FROM engagement_data
)
SELECT 
    engagement_level,
    ROUND(SUM(CASE WHEN Churn='Yes' THEN 1 ELSE 0 END)*100.0/COUNT(*), 2) AS churn_percentage,
    COUNT(*) AS total_customers
FROM bucketed_data
GROUP BY engagement_level
ORDER BY churn_percentage DESC;
"""

df_engagement_bucket = pd.read_sql(query20, engine)
df_engagement_bucket


Unnamed: 0,engagement_level,churn_percentage,total_customers
0,Low,37.14,1653
1,Medium,33.78,1498
2,High,24.59,1049
3,Very Low,21.81,1206
4,Very High,13.93,1637


**Insight**:
- Churn rate **decreases sharply as engagement score increases**.
- **Very Low engagement customers churn the most**, while **Very High engagement customers churn the least**.

**Reason**:
More services + longer tenure + higher monthly spend → **stronger ecosystem lock-in** → **less churn risk**.

**Takeaway**:
Focus churn reduction strategies on **Very Low** and **Low engagement segments**.


### 📊 Query 21: Do Senior Citizens with/without Key Services Churn More?

We'll check if **Senior Citizens** churn more when **TechSupport** or **Streaming Services** are active/inactive.


In [46]:
query21 = """
SELECT 
    SeniorCitizen,
    TechSupport,
    StreamingTV,
    ROUND(SUM(CASE WHEN Churn = 'Yes' THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 2) AS churn_percentage,
    COUNT(*) AS total_customers
FROM telco_churn
GROUP BY SeniorCitizen, TechSupport, StreamingTV
ORDER BY churn_percentage DESC;
"""

df_senior_services = pd.read_sql(query21, engine)
df_senior_services


Unnamed: 0,seniorcitizen,techsupport,streamingtv,churn_percentage,total_customers
0,1,No,No,52.03,444
1,1,No,Yes,48.96,386
2,0,No,Yes,39.53,1103
3,0,No,No,38.31,1540
4,1,Yes,Yes,21.51,186
5,1,Yes,No,14.86,74
6,0,Yes,No,14.63,752
7,0,Yes,Yes,14.44,1032
8,1,No internet service,No internet service,9.62,52
9,0,No internet service,No internet service,7.33,1474


**Insight**:
- **Senior Citizens without TechSupport and StreamingTV churn the most**.
- Senior Citizens with **TechSupport enabled** churn significantly less.

**Reason**:
Older customers may **struggle with technical issues** → absence of TechSupport frustrates them → increases churn.
StreamingTV also acts as an **engagement tool** for many users, including seniors.

**Takeaway**:
**Target Senior Citizens** without TechSupport and Streaming with **personalized retention offers**.
**Add TechSupport as a churn barrier strategy** for vulnerable senior segments.


### 📊 Query 22: Does Paying More Per Service Lead to Higher Churn?

We'll calculate **MonthlyCharges per active service** to see if **overpaying customers churn more**.


In [53]:
query22 = """
WITH service_count AS (
    SELECT 
        customerID,
        MonthlyCharges,
        Churn,
        (
            (CASE WHEN PhoneService = 'Yes' THEN 1 ELSE 0 END) +
            (CASE WHEN MultipleLines = 'Yes' THEN 1 ELSE 0 END) +
            (CASE WHEN InternetService != 'No' THEN 1 ELSE 0 END) +
            (CASE WHEN OnlineSecurity = 'Yes' THEN 1 ELSE 0 END) +
            (CASE WHEN OnlineBackup = 'Yes' THEN 1 ELSE 0 END) +
            (CASE WHEN DeviceProtection = 'Yes' THEN 1 ELSE 0 END) +
            (CASE WHEN TechSupport = 'Yes' THEN 1 ELSE 0 END) +
            (CASE WHEN StreamingTV = 'Yes' THEN 1 ELSE 0 END) +
            (CASE WHEN StreamingMovies = 'Yes' THEN 1 ELSE 0 END)
        ) AS total_services
    FROM telco_churn
),
charges_per_service AS (
    SELECT 
        customerID,
        Churn,
        ROUND((MonthlyCharges / NULLIF(total_services, 0))::numeric, 2) AS charges_per_service
    FROM service_count
    WHERE total_services > 0
),
bucketed_charges AS (
    SELECT 
        CASE 
            WHEN charges_per_service <= 10 THEN '≤ 10/service'
            WHEN charges_per_service BETWEEN 10 AND 20 THEN '10-20/service'
            WHEN charges_per_service BETWEEN 20 AND 30 THEN '20-30/service'
            ELSE '> 30/service'
        END AS cost_bucket,
        ROUND(SUM(CASE WHEN Churn = 'Yes' THEN 1 ELSE 0 END)*100.0/COUNT(*), 2) AS churn_percentage,
        COUNT(*) AS total_customers
    FROM charges_per_service
    GROUP BY cost_bucket
)
SELECT * FROM bucketed_charges
ORDER BY churn_percentage DESC;
"""

df_charges_per_service = pd.read_sql(query22, engine)
df_charges_per_service


Unnamed: 0,cost_bucket,churn_percentage,total_customers
0,> 30/service,59.23,233
1,20-30/service,38.54,1627
2,10-20/service,21.92,4987
3,≤ 10/service,5.61,196


**Insight**:
- Customers paying **higher per-service charges** churn **significantly more**.
- **Lower per-service cost segments churn less**, implying better perceived value.

**Reason**:
Perceived **unfair pricing or low value-for-money** leads to customer dissatisfaction → churn.

**Takeaway**:
Focus on **value optimization** for **high per-service cost segments**.
Retention strategy → **discounted bundles**, **value add-ons**, or **personalized offers** to improve **perceived value**.


### **CONCLUSION:**

#### High Churn Segments: 
Month-to-month contract, Electronic Check payment, new customers (0-6 months), high-spenders (>75 MonthlyCharges).

#### Loyal Segments: 
Long tenure (4+ years), low monthly spend, customers on 2-year contracts, AutoPay users.

#### Key Recommendations:
- Promote 1-2 year contracts with AutoPay options.
- Cross-sell bundles (Online Security, StreamingTV).
- Focus on first 6 months onboarding experience.
- Target high-paying new customers with retention offers.

