# **Welcome to the Final Project of Winter Bootcamp**

## Project: Telco Customer Churn Prediction

In the telecommunications industry, customers are able to choose from multiple service providers and actively switch from one operator to another. In this highly competitive market, the telecommunications industry experiences an average of 15-25% annual churn rate.

- The Business Pain Point:
Acquiring a new customer is estimated to be 5 to 25 times more expensive than retaining an existing one. Therefore, for a Telco company, Customer Retention is the most critical strategy to maximize profit.

- The Goal:
The management wants to reduce customer churn by identifying customers who are likely to leave before they actually leave. If we can predict who is at risk, the marketing team can offer them special discounts or better plans to keep them.

2. Problem Statement

- Objective: Develop a Machine Learning solution to predict whether a customer will Churn (leave the company) or Stay based on their account information, demographic details, and service usage.

3. The Dataset

We will use the Telco Customer Churn dataset.

Source: [Kaggle - Telco Customer Churn](https://www.kaggle.com/datasets/blastchar/telco-customer-churn)



**Importing the Dependencies**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

**Task: Reading and Exploring the Data**

In [None]:
df = pd.read_csv('Telco-Customer-Churn.csv')

In [None]:
df.head()

**TASK: Confirm quickly with .info() methods the datatypes and non-null values in your dataframe.**

In [None]:
df.info()

**TASK: Get a quick statistical summary of the numeric columns with .describe() , you should notice that many columns are categorical, meaning you will eventually need to convert them to dummy variables.**

In [None]:
df.describe()

## General Feature Exploration

**TASK: Confirm that there are no NaN cells by displaying NaN values per feature column.**

In [None]:
df.isna().sum()

**TASK:Display the balance of the class labels (Churn) with a Count Plot.**

In [None]:
sns.countplot(data=df,x='Churn')

**TASK: Explore the distrbution of TotalCharges between Churn categories with a Box Plot or Violin Plot.**

In [None]:
sns.violinplot(data=df,x='Churn',y='TotalCharges')

The Violin Plot reveals that **Total Charges** is a strong indicator of churn risk, acting as a proxy for customer loyalty/tenure.

| Inference | Churn = No (Stayed) | Churn = Yes (Left) |
| :--- | :--- | :--- |
| **Density Peak** | The distribution peaks at **high Total Charges** (approx. \$2,000 - \$4,000). | The distribution is heavily concentrated at **very low Total Charges** (primarily under \$1,000). |
| **Correlation** | Customers with accumulated **high total charges** over their subscription lifetime are **less likely to churn**. | The majority of customers who churn are **low-value** or **new** customers. |
| **Actionable Insight** | High total charges create a strong **retention barrier**. The highest churn risk is clustered within the **new or short-term** customer segment. | |

***

### Conclusion:

There is a clear **negative correlation** between the accumulated amount a customer has paid and their likelihood of churning.

**TASK: Create a boxplot showing the distribution of TotalCharges per Contract type, also add in a hue coloring based on the Churn class.**

In [None]:
plt.figure(figsize=(10,4),dpi=200)
sns.boxplot(data=df,y='TotalCharges',x='Contract',hue='Churn')
plt.legend(loc=(1.1,0.5))

This Box Plot shows that the length of the customer's contract is the **single most dominant predictor** of the total revenue accumulated over the customer's lifetime.

| Inference | Month-to-month | One year | Two year |
| :--- | :--- | :--- | :--- |
| **Median Value** | **Lowest** (Median $\approx \$400$) | Mid-Range (Median $\approx \$1,600$) | **Highest** (Median $\approx \$3,000$) |
| **Customer Value** | **Low-Value Segment.** Most customers here accumulate very few charges before ending their contract (high churn risk). | **Mid-Value Segment.** Shows a wider range of charges, indicating moderate loyalty. | **High-Value Segment.** Represents the company's most loyal and revenue-producing customer base. |
| **Spread (IQR)** | **Most Concentrated.** The box (Interquartile Range) is very short, meaning most customers are clustered at the low end. | Wide IQR, indicating high variability in total charges. | **Widest Spread.** The largest IQR shows the highest overall range in customer value, reflecting longer tenure. |
| **Outliers** | Shows many high outliers, meaning the few long-term customers who stay month-to-month are **exceptions** to the rule. | Very few high outliers, as most high-value customers are captured within the main box. | |

***

### Conclusion:

Contract length acts as a strong measure of **customer loyalty and retention**. Shifting customers from 'Month-to-month' plans to 'One year' or 'Two year' plans is the most effective strategy to significantly increase customer lifetime value.

**TASK: Create a bar plot showing the correlation of the following features to the class label. Keep in mind, for the categorical features, you will need to convert them into dummy variables first, as you can only calculate correlation for numeric features.**

    ['gender', 'SeniorCitizen', 'Partner', 'Dependents','PhoneService', 'MultipleLines',
     'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'InternetService',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod']

***Note, we specifically listed only the features above, you should not check the correlation for every feature, as some features have too many unique instances for such an analysis, such as customerID***

In [None]:
corr_df  = pd.get_dummies(df[['gender', 'SeniorCitizen', 'Partner', 'Dependents','PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport','StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod','Churn']]).corr()

In [None]:
corr_df['Churn_Yes'].sort_values().iloc[1:-1]

In [None]:
plt.figure(figsize=(10,4),dpi=200)
sns.barplot(x=corr_df['Churn_Yes'].sort_values().iloc[1:-1].index,y=corr_df['Churn_Yes'].sort_values().iloc[1:-1].values)
plt.title("Feature Correlation to Yes Churn")
plt.xticks(rotation=90);

# Churn Analysis

**This section focuses on segementing customers based on their tenure, creating "cohorts", allowing us to examine differences between customer cohort segments.**

**TASK: What are the 3 contract types available?**

In [None]:
df['Contract'].unique()

**TASK: Create a histogram displaying the distribution of 'tenure' column, which is the amount of months a customer was or has been on a customer.**

In [None]:
plt.figure(figsize=(10,4),dpi=200)
sns.histplot(data=df,x='tenure',hue='Churn',bins=60)

**TASK: Now use the seaborn documentation as a guide to create histograms separated by two additional features, Churn and Contract.**

In [None]:
plt.figure(figsize=(10,3),dpi=200)
sns.displot(data=df,x='tenure',bins=70,col='Contract',row='Churn');

**TASK: Display a scatter plot of Total Charges versus Monthly Charges, and color hue by Churn.**

In [None]:
plt.figure(figsize=(10,4),dpi=200)
sns.scatterplot(data=df,x='MonthlyCharges',y='TotalCharges',hue='Churn', linewidth=0.5,alpha=0.5,palette='Dark2')

### Creating Cohorts based on Tenure

**Let's begin by treating each unique tenure length, 1 month, 2 month, 3 month...N months as its own cohort.**

**TASK: Treating each unique tenure group as a cohort, calculate the Churn rate (percentage that had Yes Churn) per cohort. For example, the cohort that has had a tenure of 1 month should have a Churn rate of 61.99%. You should have cohorts 1-72 months with a general trend of the longer the tenure of the cohort, the less of a churn rate. This makes sense as you are less likely to stop service the longer you've had it.**

In [None]:
no_churn = df.groupby(['Churn','tenure']).count().transpose()['No']
yes_churn = df.groupby(['Churn','tenure']).count().transpose()['Yes']

In [None]:
no_churn

In [None]:
churn_rate = 100 * yes_churn / (no_churn+yes_churn)

In [None]:
churn_rate.transpose()['customerID']

**TASK: Now that you have Churn Rate per tenure group 1-72 months, create a plot showing churn rate per months of tenure.**

In [None]:
plt.figure(figsize=(10,4),dpi=200)
sns.lineplot(data=churn_rate.iloc[0])
plt.ylabel('Churn Percentage');

## Feature Engineering

### Broader Cohort Groups
**TASK: Based on the tenure column values, create a new column called Tenure Cohort that creates 4 separate categories:**
   * '0-12 Months'
   * '24-48 Months'
   * '12-24 Months'
   * 'Over 48 Months'    

In [None]:
def cohort(tenure):
    if tenure < 13:
        return '0-12 Months'
    elif tenure < 25:
        return '12-24 Months'
    elif tenure < 49:
        return '24-48 Months'
    else:
        return "Over 48 Months"

In [None]:
df['Tenure Cohort'] = df['tenure'].apply(cohort)

**Task**
- Create 'Family' column
- Logic: Partner (1/0) + Dependents (1/0)
- Result: 0 = Single, 1 = Has Partner OR Child, 2 = Has Both

In [None]:
df['Family'] = (df['Partner'] == 'Yes').astype(int) + (df['Dependents'] == 'Yes').astype(int)

**Task**
- Create 'ServiceCount' column (How many products or services do they buy?)

In [None]:
services = ['OnlineSecurity', 'OnlineBackup', 'DeviceProtection',
            'TechSupport', 'StreamingTV', 'StreamingMovies']

df['ServiceCount'] = df[services].apply(lambda x: (x == 'Yes').sum(), axis=1)

**TASK: Create a scatterplot of Total Charges versus Monthly Charts,colored by Tenure Cohort defined in the previous task.**

In [None]:
plt.figure(figsize=(10,4),dpi=200)
sns.scatterplot(data=df,x='MonthlyCharges',y='TotalCharges',hue='Tenure Cohort',alpha=0.5,palette='Dark2')

**TASK: Create a count plot showing the churn count per cohort.**

In [None]:
plt.figure(figsize=(10,4),dpi=200)
sns.countplot(data=df,x='Tenure Cohort',hue='Churn')

In [None]:
df.columns

In [None]:
df['Churn'] = df['Churn'].apply(lambda x: 1 if x == 'Yes' else 0)

# Part 4: Predictive Modeling

**Let's explore different classification based methods: A Single Decision Tree, Random Forest, AdaBoost, Gradient Boosting. Feel free to add any other supervised learning models to your comparisons!**


In [None]:
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

In [None]:
X = df.drop(['customerID','Churn'],axis=1)
y = df['Churn']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
numeric_cols = X.select_dtypes(include=['int64', 'float64']).columns
categorical_cols = X.select_dtypes(include=['object', 'category']).columns

In [None]:
preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numeric_cols),
        ('cat', OneHotEncoder(sparse_output=False), categorical_cols)
    ]
)

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier
from xgboost import XGBClassifier
from sklearn.metrics import recall_score, precision_score

model_params = {
    'Logistic Regression': {
        'model': LogisticRegression(random_state=42, class_weight='balanced', solver='liblinear'),
        'params': {
            # C: Inverse of regularization strength.
            # Smaller values = stronger regularization (simpler model). Larger values = try to fit data perfectly.
            'classifier__C': [0.01, 0.1, 1, 10, 100],

            # Penalty: How we punish complex models.
            # 'l1' (Lasso) = can remove features (set coef to 0). 'l2' (Ridge) = just shrinks coefs.
            'classifier__penalty': ['l1', 'l2']
        }
    },
    'Naive Bayes': {
        'model': GaussianNB(),
        'params': {
            # Var Smoothing: Adds a tiny value to variances to prevent division by zero errors.
            # Helps when a feature has 0 variance in a specific class.
            'classifier__var_smoothing': [1e-9, 1e-8, 1e-7]
        }
    },
    'SVM': {
        'model': SVC(random_state=42, class_weight='balanced', probability=True),
        'params': {
            # C: Penalty for misclassifying a point.
            # High C = Strict (risk of overfitting). Low C = Soft margin (allows some errors).
            'classifier__C': [0.1, 1, 10],

            # Kernel: The math used to project data into higher dimensions.
            'classifier__kernel': ['rbf', 'poly'],

            # Gamma: Defines how far the influence of a single training example reaches.
            # High Gamma = Only close points matter (islands). Low Gamma = Far points matter.
            'classifier__gamma': ['scale', 'auto', 0.1]
        }
    },
    'KNN': {
        'model': KNeighborsClassifier(),
        'params': {
            # N Neighbors: The 'K' in KNN.
            # Low K = sensitive to noise. High K = smoother decision boundary.
            'classifier__n_neighbors': [3, 5, 9, 15],

            # Weights: How much vote does each neighbor get?
            # 'uniform' = all equal. 'distance' = closer neighbors have more influence.
            'classifier__weights': ['uniform', 'distance'],

            # P: The distance metric.
            # 1 = Manhattan (City block distance). 2 = Euclidean (Straight line).
            'classifier__p': [1, 2]
        }
    },
    'Decision Tree': {
        'model': DecisionTreeClassifier(random_state=42, class_weight='balanced'),
        'params': {
            # Max Depth: How deep the tree can grow.
            # None = unlimited (overfitting risk). Numbers restrict height.
            'classifier__max_depth': [5, 10, 20, None],

            # Min Samples Split: How many samples needed to justify splitting a node.
            # Higher = prevents creating tiny, specific branches.
            'classifier__min_samples_split': [2, 10, 20],

            # Min Samples Leaf: Minimum samples required at a leaf node (end point).
            # Crucial for smoothing the model.
            'classifier__min_samples_leaf': [1, 5, 10]
        }
    },
    'Random Forest': {
        'model': RandomForestClassifier(random_state=42, class_weight='balanced'),
        'params': {
            # N Estimators: Number of trees in the forest.
            # More is usually better, but slower.
            'classifier__n_estimators': [100, 200],

            # Max Features: How many features to look at when splitting.
            # 'sqrt' is standard. 'log2' looks at fewer features (more randomness).
            'classifier__max_features': ['sqrt', 'log2'],

            # (Same as Decision Tree above)
            'classifier__max_depth': [10, 20, None],
            'classifier__min_samples_leaf': [1, 4]
        }
    },
    'AdaBoost': {
        'model': AdaBoostClassifier(random_state=42),
        'params': {
            # Learning Rate: How much each tree contributes to the final answer.
            # Low rate = need more trees, but usually better accuracy.
            'classifier__learning_rate': [0.01, 0.1, 1.0],
            'classifier__n_estimators': [50, 100, 200]
        }
    },
    'Gradient Boosting': {
        'model': GradientBoostingClassifier(random_state=42),
        'params': {
            # Subsample: Fraction of samples used for fitting the individual base learners.
            # < 1.0 results in Stochastic Gradient Boosting (reduces variance).
            'classifier__subsample': [0.8, 1.0],

            'classifier__n_estimators': [100, 200],
            'classifier__learning_rate': [0.01, 0.1, 0.2],
            'classifier__max_depth': [3, 5, 8]
        }
    },
    'XGBoost': {
        'model': XGBClassifier(random_state=42, scale_pos_weight=3),
        'params': {
            # Gamma: Minimum loss reduction required to make a further partition.
            # Acts as a regularization parameter (higher = more conservative).
            'classifier__gamma': [0, 0.1, 0.2],

            # Colsample Bytree: Subsample ratio of columns when constructing each tree.
            # Similar to 'max_features' in Random Forest.
            'classifier__colsample_bytree': [0.7, 1.0],

            'classifier__n_estimators': [100, 200],
            'classifier__learning_rate': [0.01, 0.1, 0.2],
            'classifier__max_depth': [3, 6, 10]
        }
    }
}

scores = []

for model_name, mp in model_params.items():
    print(f"  > Tuning {model_name}...")

    pipe = Pipeline(steps=[('preprocessor', preprocessor), ('classifier', mp['model'])])

    clf = GridSearchCV(pipe, mp['params'], cv=3, scoring='recall', n_jobs=-1)
    clf.fit(X_train, y_train)

    y_pred = clf.predict(X_test)
    scores.append({
        'Model': model_name,
        'Recall': recall_score(y_test, y_pred),
        'Precision': precision_score(y_test, y_pred),
        'F1-Score': 0.0 if (recall_score(y_test, y_pred)+precision_score(y_test, y_pred))==0 else (2 * recall_score(y_test, y_pred) * precision_score(y_test, y_pred)) / (recall_score(y_test, y_pred) + precision_score(y_test, y_pred)),
        'Best Params': str(clf.best_params_)
    })


In [None]:
final_df = pd.DataFrame(scores).sort_values(by='Recall', ascending=False)

print(final_df[['Model', 'Recall', 'Precision', 'F1-Score']])

plt.figure(figsize=(14, 8))

melted_df = final_df.melt(id_vars='Model', value_vars=['Recall', 'Precision'], var_name='Metric', value_name='Score')

sns.barplot(x='Score', y='Model', hue='Metric', data=melted_df, palette='viridis')
plt.title('The Ultimate Model Showdown: Precision vs. Recall')
plt.axvline(0.80, color='red', linestyle='--', label='Target Recall (80%)')
plt.legend()
plt.show()