
# 📊 Telco Customer Churn Analysis

## 🎯 Objective
Exploratory analysis and predictive modeling to identify and forecast customer churn for a telecommunications company.

---



## 📥 Data Loading and Initial Cleaning

We load the Telco dataset, handle missing values, and convert necessary columns to the appropriate data types.


In [None]:
import pandas as pd      
import numpy as np 

df = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv')
print(df.isna().sum().sum())
print((df == ' ').sum())
df['TotalCharges'] = df['TotalCharges'].replace(' ', np.nan)
df = df.dropna(subset=['TotalCharges'])
df['TotalCharges'] = df['TotalCharges'].astype(float)
df = df.drop('customerID', axis=1)



## 📊 Target Variable Distribution: 'Churn'

Visualize the proportion of customers who churned versus those who remained.


In [None]:
import pandas as pd        
import matplotlib.pyplot as plt  
import seaborn as sns

sns.countplot(x='Churn', data=df)
plt.title('Distribution of Churn')
plt.show()



## 🔤 Encoding Categorical Variables

Categorical columns are encoded using LabelEncoder to prepare the data for machine learning.


In [None]:

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
cols_to_encode = [
    'gender', 'Partner', 'Dependents', 'PhoneService', 'MultipleLines',
    'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection',
    'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract',
    'PaperlessBilling', 'PaymentMethod', 'Churn'
]
for col in cols_to_encode:
    df[col] = le.fit_transform(df[col])



## 🧪 Chi-Square Test for Categorical Feature Association with Churn

We test the statistical association between categorical features and churn using the Chi-Square test.


In [None]:

from scipy.stats import chi2_contingency

variables = []
pvalues = []
for col in cols_to_encode:
    if col == 'Churn':
        continue
    table = pd.crosstab(df[col], df['Churn'])
    chi2, p, _, _ = chi2_contingency(table)
    print(f"{col}: p-value = {p:.4f}")
    variables.append(col)
    pvalues.append(p)
plt.figure(figsize=(8, len(variables)))
plt.barh(variables, pvalues)
plt.axvline(0.05, color='red', linestyle='--')
plt.xlabel('p-value')
plt.title('Chi-Square Test with Churn')
plt.tight_layout()
plt.show()



## 📈 Correlation with Churn (Numerical Features)

We calculate and interpret the correlation between numerical features and churn.


In [None]:

selected = df[['Churn', 'tenure', 'MonthlyCharges', 'TotalCharges', 'gender']]
correlation = selected.corr(numeric_only=True)['Churn'].drop('Churn')
print("Correlation with Churn:")
print(correlation)



### 📝 Interpretation

- ⏳ **Tenure**: -0.35 (moderate) — The longer a customer stays, the less likely they are to churn.
- 💸 **MonthlyCharges**: +0.19 (weak positive) — Higher bills slightly increase the churn risk.
- 🚻 **Gender**: 0.00 (no correlation) — Gender does not influence churn.



## 🧹 Data Preparation

We define predictors and the target variable, and split the data into training and testing sets.


In [None]:

from sklearn.model_selection import train_test_split

X = df.iloc[:, 0:19].values
y = df.iloc[:, 19].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)



## 🤖 Model Training and Evaluation

We train a Logistic Regression model and evaluate its performance using accuracy, confusion matrix, and classification report.


In [None]:

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("🎯 Accuracy:", accuracy_score(y_test, y_pred))
print("\n📉 Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("\n📋 Classification Report:")
print(classification_report(y_test, y_pred))



## 🧠 Predict Churn for New Customer (Example)

We simulate the churn prediction for a new customer based on their service and contract attributes.


In [None]:

new_customer = np.array([[1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 2, 5, 70.35, 350.0, 0]])
prediction = model.predict(new_customer)
print("🔮 Churn prediction (0=No, 1=Yes):", prediction[0])



## ✅ Conclusion

In this notebook:

- We performed data cleaning and preparation.
- We explored categorical associations using Chi-Square tests.
- We analyzed correlations between numerical features and churn.
- We trained a Logistic Regression model to predict churn.
- We applied the model to a new customer example.

### 🎓 Knowledge Gained

We learned how statistical tests (Chi-Square) help uncover important categorical predictors of churn. We also saw how numerical correlations provide insight into trends like tenure and billing. Logistic Regression enabled us to make meaningful predictions.

### 📌 Business Value

By identifying customers likely to churn, the company can take preventive actions to retain them—reducing revenue loss and improving customer loyalty.
