# 📉 Customer Churn in Telecommunications

##  Introduction: What is Customer Churn?

**Customer churn**, also known as *customer attrition* or *customer turnover*, refers to the phenomenon where customers stop doing business with a company or service provider.

In the context of **telecommunications (telco)**, churn occurs when subscribers:
- Cancel their service contracts
- Switch to competitors
- Discontinue their subscriptions entirely

---

##  Key Characteristics of Customer Churn

###  Voluntary vs. Involuntary Churn

- **Voluntary Churn**:  
  Customers actively decide to leave due to:
  - Dissatisfaction
  - Better offers elsewhere
  - Changing needs

- **Involuntary Churn**:  
  Customers are disconnected due to:
  - Non-payment
  - Fraud
  - Policy violations

###  Churn Rate Calculation

Churn Rate = (Number of customers lost during period / Total customers at start of period) × 100

---

##  Why Customer Churn Matters in Telecommunications

###  Financial Impact

- Acquiring new customers costs **5–25x** more than retaining existing ones
- Telco companies typically spend **$300–$400** to acquire a new customer
- Lost revenue from churned customers directly impacts **profitability**
- Reduces **Customer Lifetime Value (CLV)**

###  Competitive Landscape

- The telecom industry is **highly competitive**
- **Low switching costs** make it easy for customers to leave
- **Market saturation** means growth depends on winning over competitors’ customers

###  Business Sustainability

- High churn rates often reflect **service or operational issues**
- Sustainable growth = **Customer Acquisition + Retention**
- Churn prediction enables **proactive customer retention strategies**

---

##  Common Reasons for Telco Customer Churn

###  Service-Related Factors

- Poor **network coverage** or **call quality**
- Frequent **service outages**
- Slow **internet speeds**
- Delayed or ineffective **customer service**

###  Pricing and Value Perception

- Better pricing from competitors
- **Perceived low value** for money
- Unexpected charges or **billing disputes**
- Lack of **flexible pricing options**

###  Customer Experience Issues

- Complicated **billing systems**
- Rigid or unclear **contract terms**
- Poor **customer support** experiences
- Lack of **personalized services**

###  External Factors

- Economic downturns impacting budgets
- Regulatory changes in telecom
- Tech disruptions (e.g., VoIP, OTT services)
- Geographic moves to non-serviceable areas

---



##  Objective

To develop a classification model that predicts whether a customer is likely to churn, and understand the key drivers behind it.
This notebook showcases an end-to-end machine learning project to predict customer churn using the Telco Customer Churn dataset. 

We aim to explore the patterns behind customer attrition and build predictive models that help businesses retain valuable customers.

##  Business Questions

- Which customer segments are most at risk of churning?
- What are the major factors influencing churn?
- Can we predict churn with good precision and recall?
- How can businesses take proactive steps to reduce churn?

##  What We’ll Do

- Perform exploratory data analysis (EDA) to identify churn patterns
- Engineer features and prepare the data for modeling
- Train a baseline model using Logistic Regression
- Evaluate model performance using precision, recall, F1-score, and ROC-AUC
- Save the model pipeline for deployment in a Streamlit app


In [3]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.io as pio
from sklearn.preprocessing import LabelEncoder
pio.renderers.default = 'iframe'



df=pd.read_csv('./data/WA_Fn-UseC_-Telco-Customer-Churn.csv')
df

# #loading data
# df = pd.read_csv('../input/telco-customer-churn/WA_Fn-UseC_-Telco-Customer-Churn.csv')

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.30,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.70,151.65,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7038,6840-RESVB,Male,0,Yes,Yes,24,Yes,Yes,DSL,Yes,...,Yes,Yes,Yes,Yes,One year,Yes,Mailed check,84.80,1990.5,No
7039,2234-XADUH,Female,0,Yes,Yes,72,Yes,Yes,Fiber optic,No,...,Yes,No,Yes,Yes,One year,Yes,Credit card (automatic),103.20,7362.9,No
7040,4801-JZAZL,Female,0,Yes,Yes,11,No,No phone service,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.60,346.45,No
7041,8361-LTMKD,Male,1,Yes,No,4,Yes,Yes,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Mailed check,74.40,306.6,Yes


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 


In [5]:
# Check if any duplicates exist (returns True/False)
has_duplicates = df.duplicated().any()
print(f"Dataset has duplicates: {has_duplicates}")

Dataset has duplicates: False


In [6]:
df.dtypes

customerID           object
gender               object
SeniorCitizen         int64
Partner              object
Dependents           object
tenure                int64
PhoneService         object
MultipleLines        object
InternetService      object
OnlineSecurity       object
OnlineBackup         object
DeviceProtection     object
TechSupport          object
StreamingTV          object
StreamingMovies      object
Contract             object
PaperlessBilling     object
PaymentMethod        object
MonthlyCharges      float64
TotalCharges         object
Churn                object
dtype: object

    TotalCharges It’s stored as object, but it's supposed to be a number. Lets covert it data type

In [8]:
# Replace blanks with NaN
df['TotalCharges']=df['TotalCharges'].replace(' ',np.nan)
# Convert to float
df['TotalCharges'] = df['TotalCharges'].astype('float')

In [9]:
df.isnull().sum()

customerID           0
gender               0
SeniorCitizen        0
Partner              0
Dependents           0
tenure               0
PhoneService         0
MultipleLines        0
InternetService      0
OnlineSecurity       0
OnlineBackup         0
DeviceProtection     0
TechSupport          0
StreamingTV          0
StreamingMovies      0
Contract             0
PaperlessBilling     0
PaymentMethod        0
MonthlyCharges       0
TotalCharges        11
Churn                0
dtype: int64

In [10]:
df['TotalCharges'].isna().sum()

11

In [11]:
df['TotalCharges'].describe()

count    7032.000000
mean     2283.300441
std      2266.771362
min        18.800000
25%       401.450000
50%      1397.475000
75%      3794.737500
max      8684.800000
Name: TotalCharges, dtype: float64

In [12]:
mask = df['TotalCharges'].isna()
df.loc[mask, ['tenure', 'TotalCharges']].head()

Unnamed: 0,tenure,TotalCharges
488,0,
753,0,
936,0,
1082,0,
1340,0,


    All 11 missing rows correspond to tenure == 0. For those customers the company hasn’t billed anything yet, so TotalCharges should logically be 0.

In [14]:
df.loc[mask,['TotalCharges']]=0.0

In [15]:
df[df['TotalCharges']==0.0]

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
488,4472-LVYGI,Female,0,Yes,Yes,0,No,No phone service,DSL,Yes,...,Yes,Yes,Yes,No,Two year,Yes,Bank transfer (automatic),52.55,0.0,No
753,3115-CZMZD,Male,0,No,Yes,0,Yes,No,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,20.25,0.0,No
936,5709-LVOEQ,Female,0,Yes,Yes,0,Yes,No,DSL,Yes,...,Yes,No,Yes,Yes,Two year,No,Mailed check,80.85,0.0,No
1082,4367-NUYAO,Male,0,Yes,Yes,0,Yes,Yes,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,25.75,0.0,No
1340,1371-DWPAZ,Female,0,Yes,Yes,0,No,No phone service,DSL,Yes,...,Yes,Yes,Yes,No,Two year,No,Credit card (automatic),56.05,0.0,No
3331,7644-OMVMY,Male,0,Yes,Yes,0,Yes,No,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,19.85,0.0,No
3826,3213-VVOLG,Male,0,Yes,Yes,0,Yes,Yes,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,25.35,0.0,No
4380,2520-SGTTA,Female,0,Yes,Yes,0,Yes,No,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,20.0,0.0,No
5218,2923-ARZLG,Male,0,Yes,Yes,0,Yes,No,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,One year,Yes,Mailed check,19.7,0.0,No
6670,4075-WKNIU,Female,0,Yes,Yes,0,Yes,Yes,DSL,No,...,Yes,Yes,Yes,No,Two year,No,Mailed check,73.35,0.0,No


    id column can be dropped; it's just an identifier and doesn't help in prediction.

In [17]:
df.drop('customerID',axis=1,inplace=True)

In [18]:
df.MultipleLines.unique()

array(['No phone service', 'No', 'Yes'], dtype=object)

In [19]:
df.InternetService.unique()

array(['DSL', 'Fiber optic', 'No'], dtype=object)

    We will later check some other coulmn if those need any fix

In [21]:
df.SeniorCitizen.unique()

array([0, 1], dtype=int64)

    lets make the seniorCitizen column as categorical column

In [23]:
df['SeniorCitizen']=df['SeniorCitizen'].map({0:'No',1:'Yes'})

In [24]:
df.SeniorCitizen.unique()

array(['No', 'Yes'], dtype=object)

## Business-Focused Visualizations for Customer Churn

**Gender and Churn Distribution**

In [26]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots


# Calculate value counts and percentages for gender
gender_counts = df['gender'].value_counts()
gender_percentages = (gender_counts / len(df)) * 100

# Calculate value counts and percentages for churn
churn_counts = df['Churn'].value_counts()
churn_percentages = (churn_counts / len(df)) * 100

# Create subplots with 1 row and 2 columns
fig = make_subplots(
    rows=1, cols=2,
    specs=[[{"type": "pie"}, {"type": "pie"}]],
    subplot_titles=("Gender", "Churn"),
    horizontal_spacing=0.1
)

# Gender distribution
gender_labels = gender_counts.index.tolist()
gender_values = gender_percentages.tolist()
gender_colors = ['#4472C4', '#E55A4E']  # Blue for Male, Red for Female

# Churn distribution
churn_labels = churn_counts.index.tolist()
churn_values = churn_percentages.tolist()
churn_colors = ['#70AD47', '#9966CC']  # Green for No, Purple for Yes

# Add gender pie chart
fig.add_trace(
    go.Pie(
        labels=gender_labels,
        values=gender_values,
        hole=0.4,
        marker=dict(colors=gender_colors),
        textinfo='label+percent',
        textposition='outside',
        showlegend=True,
        legendgroup='gender',
        name='Gender'
    ),
    row=1, col=1
)

# Add churn pie chart
fig.add_trace(
    go.Pie(
        labels=churn_labels,
        values=churn_values,
        hole=0.4,
        marker=dict(colors=churn_colors),
        textinfo='label+percent',
        textposition='outside',
        showlegend=True,
        legendgroup='churn',
        name='Churn'
    ),
    row=1, col=2
)

# Update layout
fig.update_layout(
    title={
        'text': 'Gender and Churn Distribution',
        'x': 0.5,
        'xanchor': 'center',
        'font': {'size': 20}
    },
    showlegend=True,
    legend=dict(
        orientation="v",
        yanchor="middle",
        y=0.5,
        xanchor="left",
        x=1.05
    ),
    width=800,
    height=500
)

# Show the plot
fig.show()

# Print the actual percentages for verification
print("Gender Distribution:")
print(gender_percentages)
print("\nChurn Distribution:")
print(churn_percentages)



Gender Distribution:
gender
Male      50.47565
Female    49.52435
Name: count, dtype: float64

Churn Distribution:
Churn
No     73.463013
Yes    26.536987
Name: count, dtype: float64


#### Based on this dual pie chart analysis, here are key business observations:
- **Customer Demographics & Retention:**  The customer base is nearly perfectly balanced by gender (50.5% male, 49.5% female), indicating broad market appeal across gender lines. This balanced demographic suggests the product or service doesn't skew heavily toward one gender.

- **Churn Rate Analysis:** The 26.5% churn rate, while concerning, falls within typical ranges for many industries. However, losing more than 1 in 4 customers indicates significant revenue leakage and suggests opportunities for improvement in customer retention strategies.

##### Strategic Implications:

- **Retention Focus:** With 73.5% retention, the foundation is solid, but there's clear room for improvement. A 5-10% reduction in churn could significantly impact revenue

- **Investigation Needed:** The business should investigate what's driving the 26.5% churn - is it price sensitivity, product satisfaction, competitive pressures, or service issues?

**Churn by gender**

In [29]:
fig = px.histogram(df, x='gender', color='Churn',
                   barmode='group',
                   title='Churn by gender',
                   color_discrete_sequence=['#636EFA', '#EF553B'])
fig.show()

#### Key Business Observation:

- **Gender-Neutral Churn Pattern:** Both male and female customers show nearly identical churn rates (approximately 27-28%), indicating that gender is not a significant factor in customer retention. This suggests churn drivers are related to service quality, pricing, or other factors rather than gender-specific preferences.
- **Balanced Customer Base:**
The similar customer volumes across genders confirm the earlier observation of a well-balanced demographic split, reinforcing that the product/service appeals equally to both market segments.

##### Strategic Implication:
- Since gender doesn't influence churn behavior, retention strategies should focus on universal factors like service quality, pricing, contract terms, and customer experience rather than gender-targeted approaches. Resources can be allocated to addressing the broader churn drivers identified in other analyses (contract type, internet service quality, pricing sensitivity) rather than demographic segmentation.

**Churn by Senior Citizen Status**

In [32]:
fig = px.histogram(df, x='SeniorCitizen', color='Churn',
                   barmode='group',
                   title='Churn by Senior Citizen Status',
                   color_discrete_sequence=['#00CC96', '#EF553B'])
fig.show()

#### Key Business Observation:

- **Senior Citizens Show Higher Churn Rates:** Senior citizens demonstrate a significantly higher churn rate (approximately 41%) compared to non-senior customers (roughly 23%). This represents nearly double the churn rate, indicating age-related factors are influencing customer retention.

- **Smaller but Higher-Risk Segment:** While senior citizens represent a smaller portion of the customer base, they constitute a high-risk retention segment that requires targeted attention.

- **Potential Age-Related Churn Drivers:** The higher senior churn rate could indicate issues with technology adoption, price sensitivity on fixed incomes, difficulty with customer service processes, or competitive offerings better suited to senior needs.

##### Strategic Implication:
- The business should develop senior-specific retention strategies, including simplified service options, dedicated senior customer support, pricing plans for fixed incomes, or enhanced onboarding to address technology barriers.

##### Revenue Impact:
- Given the higher churn rate in this demographic, there's significant opportunity to improve overall retention metrics by focusing on senior citizen customer experience improvements.

**Churn by Contract Type**

In [35]:
fig = px.histogram(df, x='Contract', color='Churn',
                   barmode='group',
                   title='Churn by Contract Type',
                   color_discrete_sequence=['#636EFA', '#EF553B'])
fig.show()

#### Based on this churn analysis by contract type, here are the key business observations:
    Contract Length Strongly Correlates with Retention:There's a clear inverse relationship between contract commitment length and churn rates. Month-to-month contracts show the highest churn (appears to be around 43% churn rate), while two-year contracts have virtually no churn (less than 5%).

- **Month-to-Month Contracts Are High Risk:** The month-to-month segment represents the largest customer volume but also the highest churn risk. This flexible arrangement, while attractive for customer acquisition, creates significant revenue volatility and customer instability.

- **Annual Contracts Show Moderate Success:** One-year contracts demonstrate a much lower churn rate (roughly 15-20%) compared to month-to-month, suggesting that even modest commitment periods significantly improve retention.

- **Two-Year Contracts Are Highly Effective:** The minimal churn in two-year contracts indicates that longer commitments virtually guarantee retention, though the customer volume is lower, suggesting fewer customers are willing to make this commitment.

##### Strategic Implications:
- **Pricing Strategy:** Consider incentivizing longer-term contracts through significant discounts to shift customers away from month-to-month plans
- **Customer Lifecycle Management:** Focus intensive retention efforts on month-to-month customers who represent the highest churn risk
- **Revenue Predictability:** Promoting annual and two-year contracts could dramatically improve revenue forecasting and reduce customer acquisition costs
- **Market Positioning:** The willingness to commit longer-term may indicate higher customer satisfaction and product-market fit



**Tenure Distribution by Churn**

In [38]:
fig = px.histogram(df, x='tenure', color='Churn',
                   nbins=30, barmode='overlay',
                   opacity=0.6,
                   title='Tenure Distribution by Churn',
                   color_discrete_sequence=['#EF553B', '#00CC96'])
fig.show()

#### Based on this tenure distribution analysis, here are key business observations:
- **Critical Early Customer Period:** The highest churn risk occurs within the first 10 months of customer tenure, with churn dramatically decreasing after this initial period. This suggests a critical "make-or-break" window where customers decide whether to stay long-term.

- **Customer Lifecycle Pattern:** There's a clear inverse relationship between tenure and churn probability. New customers (0-10 months) show the highest churn rates, while long-term customers (60+ months) demonstrate strong loyalty with minimal churn.

- **The "Loyalty Cliff":** Around the 10-month mark, there appears to be a significant drop in churn rates, suggesting customers who survive this initial period are much more likely to become long-term, stable customers.

- **Long-term Customer Value:** Customers with 60+ months of tenure show exceptional loyalty, with very low churn rates. This indicates high customer lifetime value for those who reach this stage.

##### Strategic Implications:

- **Onboarding Focus:** Invest heavily in the first 10 months of customer experience through enhanced onboarding, proactive support, and engagement programs
- **Early Warning System:** Implement predictive analytics to identify at-risk customers in their first year
- **Customer Success Programs:** Create milestone-based retention programs, particularly targeting the 6-10 month danger zone
- **Long-term Relationship Building:** Develop loyalty programs and exclusive benefits for customers who reach the 1-2 year mark

**Scatter plot Monthly Charges vs Total Charges by Churn**

In [41]:
fig = px.scatter(df, x='MonthlyCharges', y='TotalCharges', color='Churn',
                 title='Monthly Charges vs Total Charges by Churn',
                 color_discrete_sequence=['#EF553B', '#636EFA'])
fig.show()

#### Key Business Observations:

- **Price Sensitivity Impact on Churn:** Higher monthly charges  ($$80) show increased churn risk, with churned customers (red dots) becoming more prominent in the higher price ranges. This suggests price sensitivity is a significant churn driver.

- **High-Value Customer Retention Challenge:** The scatter shows that customers with higher total lifetime charges (indicating longer tenure and higher value) still churn at elevated monthly charge levels, representing significant revenue loss.

-**Sweet Spot Identification:** The $20-60 monthly charge range shows better retention patterns, with fewer red dots relative to blue ones, suggesting this may be the optimal pricing zone for customer retention.

- **Revenue vs. Retention Trade-off** There's a clear tension between maximizing monthly revenue per customer and maintaining low churn rates. Higher-paying customers are more likely to leave.

##### Strategic Implication:
- The business should consider value-based pricing strategies, loyalty discounts for high-tenure customers, or tiered service offerings to retain higher-value customers while maintaining profitability in the competitive pricing sweet spot.

**Stacked Churn % by Internet Service**

In [44]:
churn_pct = pd.crosstab(df['InternetService'], df['Churn'], normalize='index') * 100
churn_pct = churn_pct.reset_index().melt(id_vars='InternetService', value_name='Percent', var_name='Churn')

fig = px.bar(churn_pct, x='InternetService', y='Percent', color='Churn',
             title='Churn % by Internet Service (Stacked)',
             text_auto='.1f',
             color_discrete_sequence=['#00CC96', '#EF553B'])
fig.update_layout(barmode='stack')
fig.show()

#### Key Business Observations:

- **Fiber Optic Service Has Critical Churn Problem:** Fiber optic customers show an alarming 41.9% churn rate - more than double the rates of DSL (19%) and customers without internet service (7.4%). This suggests serious issues with fiber service delivery, pricing, or customer satisfaction.

- **Premium Service Paradox:** The company's likely premium offering (fiber optic) is driving the highest customer loss, indicating a fundamental disconnect between service promise and delivery or value perception.

- **Non-Internet Customers Are Most Loyal:**
Customers without internet service show exceptional retention (92.6%), suggesting the core non-internet services are solid and well-received.

- **Service Quality vs. Technology Gap:** The dramatic difference in churn rates suggests either fiber optic service has technical issues, is overpriced for the value delivered, or faces intense competition from other fiber providers.

  
##### Strategic Implication:
- The business needs urgent intervention in its fiber optic service - whether through pricing adjustments, service quality improvements, customer support enhancement, or competitive positioning. The current fiber offering is destroying customer value rather than creating it.

**Monthly Charges Box Plot by Churn**

In [47]:
fig = px.box(df, x='Churn', y='MonthlyCharges', color='Churn',
             title='Monthly Charges Distribution by Churn',
             color_discrete_sequence=['#00CC96', '#EF553B'])
fig.show()

#### Key Business Observation:

- **Clear Price-Churn Relationship:** Customers who churn have significantly higher monthly charges (median ~$$80) compared to retained customers (median ~$65). The churned customer distribution is shifted upward, indicating price sensitivity is a major churn driver.

- **Pricing Sweet Spot Identified:**
Non-churned customers cluster in the $$20-90 range with a lower median, while churned customers show a tighter distribution around higher price points ($55-95), suggesting there's a pricing threshold beyond which churn risk increases dramatically.

- **Revenue vs. Retention Trade-off:**
The overlap between the two distributions shows that while higher prices drive some churn, many customers do stay at elevated price points, indicating value perception varies among customers.

##### Strategic Implication:
- The business should analyze what differentiates high-paying loyal customers from high-paying churners - likely service quality, contract terms, or perceived value. This could inform targeted retention offers for high-value at-risk customers.

In [49]:
# Map Churn to 0/1 if not already
df_corr = df.copy()
df_corr['Churn'] = df_corr['Churn'].map({'No': 0, 'Yes': 1})

# One-hot encode categorical columns
df_encoded = pd.get_dummies(df_corr, drop_first=True)

# Correlation with churn
churn_corr = df_encoded.corr()['Churn'].drop('Churn')
churn_corr_sorted = churn_corr.abs().sort_values(ascending=False).head(15)

# Plot
fig = px.bar(x=churn_corr_sorted.index,
             y=churn_corr_sorted.values,
             title='Top 15 Features Correlated with Churn',
             labels={'x': 'Feature', 'y': 'Absolute Correlation with Churn'},
             text=churn_corr_sorted.round(2),
             color=churn_corr_sorted.values,
             color_continuous_scale='RdBu')
fig.update_layout(xaxis_tickangle=45)
fig.show()

In [50]:

# Copy your original DataFrame
df_temp = df.copy()

# Temporarily encode categorical columns
le = LabelEncoder()
for col in df_temp.columns:
    if df_temp[col].dtype == 'object':
        df_temp[col] = le.fit_transform(df_temp[col])

# Correlation heatmap
fig = px.imshow(df_temp.corr(),
                text_auto=True,
                title="Correlation Heatmap (Raw + Label Encoded)",
                color_continuous_scale='RdBu_r')
fig.update_layout(height=700, width=1200)
fig.show()

In [249]:
df.to_csv("cleaned_churn_data.csv", index=False)
print("✅ Cleaned data saved successfully!")


✅ Cleaned data saved successfully!
