# Telco Customer Churn Prediction

## Task 1: Exploratory Data Analysis (EDA)
This notebook covers the step-by-step process of analyzing the Telco Customer Churn dataset, preprocessing the data, building classification models (Decision Tree and Neural Network), and evaluating them.

### 1.1 Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, roc_curve, auc
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

# Set visualization style
sns.set_style('whitegrid')
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

### 1.2 Load Dataset

In [None]:
df = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv')
print(f"Dataset Shape: {df.shape}")
df.head()

### 1.3 Data Cleaning & Inspection

In [None]:
df.info()

In [None]:
# 'TotalCharges' is object but should be numeric. Coerce errors to NaN.
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')

# Check for missing values
print("Missing values per column:")
print(df.isnull().sum())

In [None]:
# Drop rows with missing TotalCharges (usually very few)
df.dropna(inplace=True)

# Remove 'customerID' as it's not a feature
if 'customerID' in df.columns:
    df.drop(columns=['customerID'], inplace=True)

print("New Shape after cleaning:", df.shape)

### 1.4 Visualization

In [None]:
# Target Variable Distribution
plt.figure(figsize=(6,4))
sns.countplot(x='Churn', data=df, palette='viridis')
plt.title('Distribution of Churn')
plt.show()

In [None]:
# Numerical Features Distributions
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

sns.histplot(df['tenure'], kde=True, ax=axes[0], color='skyblue')
axes[0].set_title('Tenure Distribution')

sns.histplot(df['MonthlyCharges'], kde=True, ax=axes[1], color='salmon')
axes[1].set_title('Monthly Charges Distribution')

sns.histplot(df['TotalCharges'], kde=True, ax=axes[2], color='green')
axes[2].set_title('Total Charges Distribution')

plt.tight_layout()
plt.show()

Note: Ensure column names match (case-sensitivity). The dataset usually has 'tenure' (lowercase) or 'Tenure'. Adjusting code to be safe.

In [None]:
# Standardizing column names to lowercase for ease
df.columns = [c.lower() for c in df.columns]
print(df.columns)

In [None]:
# Churn vs Contract Type
plt.figure(figsize=(8,5))
sns.countplot(x='contract', hue='churn', data=df, palette='pastel')
plt.title('Churn Rate by Contract Type')
plt.show()

In [None]:
# Correlation Matrix
# Convert Churn to binary for correlation visualization
df_corr = df.copy()
df_corr['churn'] = df_corr['churn'].apply(lambda x: 1 if x == 'Yes' else 0)
numeric_df = df_corr.select_dtypes(include=['number'])

plt.figure(figsize=(10,8))
sns.heatmap(numeric_df.corr(), annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Matrix')
plt.show()

## Task 2: Model Implementation

### 2.1 Data Preprocessing
- Encoding Categorical Variables
- Feature Scaling
- Train-Test Split

In [None]:
# 1. Drop Target from features
X = df.drop('churn', axis=1)
y = df['churn'].apply(lambda x: 1 if x == 'Yes' else 0)

# 2. Encoding Categorical Variables
# Get dummy variables for categorical features, drop_first to avoid multicollinearity
X = pd.get_dummies(X, drop_first=True)

# 3. Scaling Numerical Features
# Identify numerical cols: tenure, monthlycharges, totalcharges
num_cols = ['tenure', 'monthlycharges', 'totalcharges']
scaler = StandardScaler()
X[num_cols] = scaler.fit_transform(X[num_cols])

# 4. Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

print(f"Training Shape: {X_train.shape}")
print(f"Testing Shape: {X_test.shape}")

### 2.2 Decision Tree Classifier
- Implementation
- Hyperparameter Tuning (GridSearchCV)

In [None]:
# Init Model
dt = DecisionTreeClassifier(random_state=42)

# Hyperparameter Grid
param_grid = {
    'max_depth': [3, 5, 7, 10, None],
    'min_samples_split': [2, 5, 10],
    'criterion': ['gini', 'entropy']
}

# Grid Search
grid_search = GridSearchCV(estimator=dt, param_grid=param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid_search.fit(X_train, y_train)

best_dt = grid_search.best_estimator_
print("Best Parameters for Decision Tree:", grid_search.best_params_)

# Predictions
y_pred_dt = best_dt.predict(X_test)

# Evaluation
print("Decision Tree Accuracy:", accuracy_score(y_test, y_pred_dt))
print("\nClassification Report:\n", classification_report(y_test, y_pred_dt))

### 2.3 Neural Network Classifier
- Implementation using TensorFlow/Keras
- Model Architecture: Input -> Dense(ReLU) -> Dropout -> Dense(ReLU) -> Output(Sigmoid)

In [None]:
model = Sequential()

# Input Layer & 1st Hidden Layer
model.add(Dense(32, activation='relu', input_shape=(X_train.shape[1],)))
model.add(Dropout(0.1))

# 2nd Hidden Layer
model.add(Dense(16, activation='relu'))
model.add(Dropout(0.1))

# Output Layer
model.add(Dense(1, activation='sigmoid'))

# Compile
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train
history = model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.1, verbose=1)

In [None]:
# Plot Training History
plt.figure(figsize=(10,4))
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Val Accuracy')
plt.title('Neural Network Training History')
plt.legend()
plt.show()

In [None]:
# Evaluation
y_pred_nn = (model.predict(X_test) > 0.5).astype("int32")
print("Neural Network Accuracy:", accuracy_score(y_test, y_pred_nn))
print("\nClassification Report:\n", classification_report(y_test, y_pred_nn))

### 2.4 Model Comparison (ROC-AUC)

In [None]:
# Decision Tree Probabilities
y_prob_dt = best_dt.predict_proba(X_test)[:, 1]
fpr_dt, tpr_dt, _ = roc_curve(y_test, y_prob_dt)
auc_dt = auc(fpr_dt, tpr_dt)

# Neural Network Probabilities
y_prob_nn = model.predict(X_test).ravel()
fpr_nn, tpr_nn, _ = roc_curve(y_test, y_prob_nn)
auc_nn = auc(fpr_nn, tpr_nn)

plt.figure(figsize=(8,6))
plt.plot(fpr_dt, tpr_dt, label=f'Decision Tree (AUC = {auc_dt:.2f})')
plt.plot(fpr_nn, tpr_nn, label=f'Neural Network (AUC = {auc_nn:.2f})')
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve Comparison')
plt.legend()
plt.show()

## Task 3: Ethics & Post-Deployment Strategy

### 3.1 AI Ethics Strategies
**1. Fairness & Bias Mitigation:**
- **Strategy:** Analyze the model's performance across different demographic groups (e.g., gender, senior citizen status) to ensure it doesn't systematically disadvantage a specific group.
- **Implementation:** Use metrics like Disparate Impact or Equal Opportunity difference.

**2. Transparency & Explainability:**
- **Strategy:** Ensure stakeholders understand *why* a customer is predicted to churn.
- **Implementation:** The Decision Tree naturally offers interpretability via feature importance. For the Neural Network, techniques like SHAP or LIME can be used.

**3. Privacy & Data Protection:**
- **Strategy:** Anonymize PII (Personally Identifiable Information) before training. The dataset uses 'customerID' which is a pseudonym, but further checks ensure no sensitive data leaks into the model.

### 3.2 Post-Deployment Strategy
**1. Model Monitoring:**
- **Metric Tracking:** Continuously monitor Accuracy, Recall, and Precision in production. A drop in these metrics indicates problems.
- **Data Drift Detection:** Compare the statistical distribution of incoming live data with the training data. If 'MonthlyCharges' distribution shifts significantly, the model may need retraining.

**2. Retraining Pipeline:**
- Establish a schedule (e.g., monthly) or trigger-based system (e.g., performance drops < 80%) to retrain the model with the latest data.

**3. Feedback Loop:**
- Incorporate actual churn results back into the dataset to refine future predictions.