# Customer Churn Prediction (Telecom)
**Goal:** Predict which telecom customers are likely to churn using logistic regression. We clean the dataset, explore patterns, and build a classification model.

## Step 1: Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score, roc_curve

## Step 2: Load Dataset

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/blastchar/telco-customer-churn/master/WA_Fn-UseC_-Telco-Customer-Churn.csv')
df.head()

## Step 3: EDA & Preprocessing

In [None]:
df.info()
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')
df.dropna(inplace=True)
df.drop('customerID', axis=1, inplace=True)

In [None]:
sns.countplot(x='Churn', data=df)
plt.title('Class Distribution: Churn')
plt.show()

In [None]:
df = pd.get_dummies(df, drop_first=True)

In [None]:
plt.figure(figsize=(12, 8))
corr = df.corr()['Churn_Yes'].sort_values(ascending=False)[:10]
sns.heatmap(df[corr.index].corr(), annot=True, cmap='coolwarm')
plt.title('Top Correlated Features to Churn')
plt.show()

## Step 4: Train-Test Split

In [None]:
X = df.drop('Churn_Yes', axis=1)
y = df['Churn_Yes']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Step 5: Logistic Regression Model

In [None]:
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:, 1]

## Step 6: Model Evaluation

In [None]:
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
print('ROC AUC Score:', roc_auc_score(y_test, y_prob))

In [None]:
fpr, tpr, _ = roc_curve(y_test, y_prob)
plt.figure(figsize=(8, 5))
plt.plot(fpr, tpr, label='Logistic Regression')
plt.plot([0, 1], [0, 1], linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend()
plt.grid(True)
plt.show()

## 📌 Conclusion
- Logistic regression offers decent churn prediction with basic features.
- Strongest churn indicators include contract type and tenure.
- Next steps could involve SMOTE resampling, feature selection, or tree-based models like Random Forest.