# StreamFlex customer churn predictions using Decision Trees and Random Forests

- **Demographics**: Age, Location, Subscription Length
- **Usage Behavior**: Watch Time, Number of Logins, Preferred Content Type
- **Subscription Details**: Membership Type (Basic, Standard, Premium), Payment Method, Payment Issues
- **Customer Support Interactions**: Number of Complaints, Resolution Time

Analyze and model customer churn using Decision Trees and Random Forests
and evaluate model performance using appropriate classification metrics.

## Dataset columns
```CustomerID,Age,Subscription_Length_Months,Watch_Time_Hours,Number_of_Logins,Preferred_Content_Type,Membership_Type,Payment_Method,Payment_Issues,Number_of_Complaints,Resolution_Time_Days,Churn```


## Section 1: Application of Decision Trees in Business

```
Why are decision trees useful in customer churn prediction?

    Interpretability – Decision trees provide a clear, visual representation of how different factors contribute to customer churn.
    Handling Non-Linearity – They can capture complex relationships between variables without assuming a linear relationship.
    Feature Importance – Decision trees highlight the most influential factors driving customer churn.
    Handling Missing Data – Unlike some models, decision trees can work well with incomplete data.
    Decision-Making Support – They offer actionable insights, helping businesses prioritize interventions.

What business actions can be taken based on decision tree predictions?

    Targeted Retention Campaigns: If high "Resolution_Time_Days" correlates with churn, customer support efficiency should improve.
    Personalized Offers: If "Subscription_Length_Months" impacts churn, offering discounts for longer commitments may help.
    Content Strategy Adjustments: If "Preferred_Content_Type" affects churn, optimize content offerings accordingly.
    Billing Optimization: If "Payment_Issues" are a key factor, smoother billing processes or alternative payment methods should be implemented.
```

## Section 2: Python Impl - Building the model

### Task 1: Data Preparation and Exploration

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

# Load dataset
df = pd.read_csv("customer_churn.csv")

# Basic info
print(df.info())
print(df.describe())

# Check for missing values
print(df.isnull().sum())

# Handling missing values (e.g., filling with median/mode)
df.fillna(df.median(numeric_only=True), inplace=True)
df.fillna(df.mode().iloc[0], inplace=True)

# Convert categorical variables to numerical using Label Encoding
label_encoders = {}
categorical_cols = ["Preferred_Content_Type", "Membership_Type", "Payment_Method"]

# Encoding of categorical features and labels to have them mathematically usable in linear/matrix/etc.. system
for col in categorical_cols:
    le = LabelEncoder()
    df[col] = le.fit_transform(df[col])
    label_encoders[col] = le

# Correlation heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(df.corr(), annot=True, cmap="coolwarm")
plt.title("Feature Correlation Heatmap")
plt.show()

# Visualizing churn distribution
sns.countplot(x="Churn", data=df)
plt.title("Churn Distribution")
plt.show()


### Task 2 Building a Decision Tree classifier

In [None]:
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
from sklearn.model_selection import GridSearchCV

# Splitting data
X = df.drop(columns=["CustomerID", "Churn"])
y = df["Churn"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Decision Tree model
dt = DecisionTreeClassifier(random_state=42)
params = {"max_depth": [3, 5, 10], "min_samples_split": [2, 5, 10]}

# Grid Search for hyperparameter tuning 
grid_search = GridSearchCV(dt, param_grid=params, cv=5, scoring="accuracy")
grid_search.fit(X_train, y_train)

# Best model. Fwd training to get model estimation and bwd training/loss gradient to update weights
best_dt = grid_search.best_estimator_

# Predictions, loss func threshold classification
y_pred = best_dt.predict(X_test)

# Evaluation metrics
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("F1 Score:", f1_score(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

# Visualizing decision tree ... basically decision probability tree look. (here binary classification)
plt.figure(figsize=(15, 8))
plot_tree(best_dt, feature_names=X.columns, class_names=["No Churn", "Churn"], filled=True)
plt.show()


### Task 3 Improving Performance,overfitting,etc.... with Random Forests

In [None]:
from sklearn.ensemble import RandomForestClassifier

# Train Random Forest
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

# Predictions
y_pred_rf = rf.predict(X_test)

# Evaluate Random Forest
print("Random Forest Accuracy:", accuracy_score(y_test, y_pred_rf))
print("Random Forest Precision:", precision_score(y_test, y_pred_rf))
print("Random Forest Recall:", recall_score(y_test, y_pred_rf))
print("Random Forest F1 Score:", f1_score(y_test, y_pred_rf))
print("Random Forest Confusion Matrix:\n", confusion_matrix(y_test, y_pred_rf))

# Feature importance
feature_importances = pd.Series(rf.feature_importances_, index=X.columns)
feature_importances.nlargest(10).plot(kind="barh")
plt.title("Feature Importances in Random Forest")
plt.show()


### Task 4 Business Insight and Recommendations

```
Key Factors Contributing to Customer Churn

    Resolution Time: If delays in issue resolution lead to churn, improving customer service response times is critical.
    Subscription Length: Shorter subscriptions may correlate with higher churn rates, suggesting incentives for long-term plans.
    Payment Issues: Frequent payment failures increase churn risk, necessitating a smoother payment experience.
    Watch Time & Logins: Low engagement could indicate dissatisfaction; personalized recommendations may improve retention.

Three Concrete Business Strategies

    Loyalty & Subscription Incentives
        Offer discounts on annual plans to increase retention.
        Implement a rewards system for continued engagement.

    Customer Support Optimization
        Reduce issue resolution time through AI chatbots or 24/7 support.
        Provide self-service options to resolve common issues.

    Personalized Content & Engagement
        Use AI-driven recommendations to improve watch time.
        Send targeted notifications for user-preferred content.
```