# *KEEPING CUSTOMERS CONNECTED - AND NOT DISCONNECTED!* 
 ## THE SYRIATEL ANALYSIS

# 1.BUSINESS UNDERSTANDING

## **1.1 BUSINESS OVERVIEW**

According to this [article](https://www.sciencedirect.com/topics/social-sciences/telecommunications-industry) published in 2011, Telecommunications company is an organization that provides services for long distance communication. They do this by building and mainatining  the physical networks, like cell towers, that transmit signals to individuals and businesses.These companies facilitate essential services like accessing the internet, making phone calls and sending messages. They make money through customer subscriptions and usage fees for these services.SyriaTel is a telecom company that provides call,text and data services to customers. 
One advantage of working with in the telecommunication sector is that it is a high-performing sector that contributes to economic growth, potentially increasing returns for investors. Telecommunication is also an essential service with steady demand, making it stable and a valuable industry to be part of.
However, the telecom industry is highly competitive and customers can easily switch to other providers if they're dissatisfied. This creates a high risk of customer churn, which can reduce revenue and can discourage investor confidence if not properly managed.

<img src="telecomm.webp" alt="Churn Heatmap" width="600">





## **1.2 PROBLEM STATEMENT**
SyriaTel is losing customers to competitors, by analysing customer data, we can predict churn and uncover the reasons why customers leave, so SyriaTel can take action to reduce churn and improve customer retention.

This is costly because:

*Revenue loss:* Each customer lost means recurring revenue lost.

*High acquisition cost:* It is more expensive to acquire a new customer than to retain an exsisting one.

*Competitive pressure:* In a competetive market, reducing churn is critical for survival and growth.

If we can predict which customers are likely to leave, SyriaTel can take action early eg. giving offers, improving services,or solving problems to make those customers stay.

So the goal is to reduce churn and keep loyal customers.


## **1.3 BUSINESS OBJECTIVES**

 ## 1.3.1 *Main objective:*
To predict customer churn and provide insights that help SyriaTel keep its customers and reduce revenue loss.

 ## 1.3.2 *Specific objectives:*

1. To develop a model that predicts whether a customer will churn or stay.
2. To identify the key factors eg. call charges, service quality or customer complaints that influence the probability of a customer to churn or not to churn.
3. To provide insights that SyriaTel can use to design strategies for reducing churn and improving customer satisfaction.
4. To determine the state with the highest churning rate.



 ## 1.3.3*Research questions*
1. Can we accurately predict which syriaTel customers are likely to churn using their demographic and usage data?
2. What are the main factors that influence customer churn?
3. How can SyriaTel use the model's prediction and insights to design strategies that reduce churn and retain more customers?
4. What is the state with the highest churning rate?

## **1.4 SUCCESS CRITERIA**
 ***Model performance***
The churn prediction model achieves a good level of accuracy and balances correctly identifying customers who churn and those who stay.
 ***Insights gained***
The analysis clearly identifies the key factors that contribute to churn eg. high call charges and frequent complains.
 ***Business value***
SyriaTel can use the model's results to take practical actions, such as designing loyalty offers or improving customer service which can help improve customer churn.


# 2. DATA UNDERSTANDING
The Syria Tel customer churn dataset we are working with is from [Kaggle](https://www.kaggle.com/becksddf/churn-in-telecoms-dataset).Our data is on Syria Tel which is a telecommunication conmpany, it had a total of 21 columns and 3333 rows after data cleaning we decided to work with the coloumns below: where `churn` is our dependent varaible.
    
`state` – U.S. state where the customer lives.
    
`account length `– Number of days the customer has had the account.
    
`area code` – Telephone area code.
    
`phone number` – Customer’s phone number (serves as an identifier, not useful for prediction).
    
`international plan`– Whether the customer has an international calling plan (yes/no).
    
`log_vmail_messages` – Number of voicemail messages the customer has.
    
`customer service calls` – Number of calls made to customer service.
    
**`churn`** – Whether the customer left the company (True = churned, False = stayed). which is our dependent variable**
    
`total_calls` - The total number of calls.
    
`total_minutes` - The total number number of minutes for all calls.
    
`total_charge` - The total charges for all calls.

     
We are merging the columns `total day minutes` ,`total eve minutes` and `total night minutes` into one column named `total_minutes`. We are also merging `total day calls` , `total eve calls` and `total night calls` into one column named `total_calls`. The columns `total day charge`, `total eve charge`,  and`total night charge` are also being merged to become one column called `total_charge`.



# 3. DATA EXPLORATION

## 3.1 Loading a dataset

In [None]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
%matplotlib inline 
import seaborn as sns 

In [None]:
df = pd.read_csv("Syria_Tel.csv")
df.head()

In [None]:
df.shape

Our dataset has *3333* rows and *21* columns .

In [None]:
df.info()

## 3.2 Data cleaning

Let's check for any missing values in our dataset.

In [None]:
df.isnull().sum()

Since our dataset doesn't have any missing values we don't have to drop any null or fill for missing values.

Let's drop the `phone number` column since it is not useful in our prediction.

In [None]:
#dropping a column
df = df.drop(columns =['phone number'])


In [None]:
df.shape

We need to check for categorical data in our dataset so that we can perform **one hot encoding** which is an important step for us to make predictions and create Machine learning models.

In [None]:
#checking for categorical values
categorical_cols = df.select_dtypes(include=["object"]).columns
print(categorical_cols)


`state`, `international plan` and `voice mail plan` are the columns that are categorical and we need to perfom one hot encoding on 

In [None]:
df["international plan"].value_counts()

In [None]:
df["voice mail plan"].value_counts()

In [None]:
# One-hot encode both binary categorical columns
df_encoded = pd.get_dummies(df,columns=["international plan", "voice mail plan"],drop_first=True,dtype=int)
df_encoded.head()



Encoding `state` into 50 dummy columns might not be the recommended way to go about it because it will make it harder for the model to interpret and cause **multicollinearity**, So for this `state` column we took a different approach.

In [None]:
df_encoded["state"].value_counts().head(7)

In [None]:
df_encoded["churn"].value_counts()

For uniformity i wanna change the contents of this columns to 0 and 1 to match the new encoded columns.

In [None]:
#convert entries
df_encoded["churn"] = df_encoded["churn"].map({True: 1, False: 0})


In [None]:
df_encoded["churn"].unique()

Let's check for class imbalance in our dependent variable `churn`

In [None]:
df_encoded["churn"].value_counts(normalize = True)

There is clearly a class imbalance on this column with **85.5%** of the data going to 0 (stay) while the other **14.49%** going to churn and this might not give a correct representation of our model prediction.

Let's check for feature distributions and decide whether to perform log transformation or other normalizations before modelling.

In [None]:
df_encoded.describe()

Every other column seems to have a n even distribution or little to know skewness.But let's take a look at `number vmail messages` whose most customers have 0 messages but some have upto 51 messages.

In [None]:
sns.histplot(df_encoded["number vmail messages"], kde=True, bins=30)
plt.title("Distribution of Total Day Minutes")
plt.show()

In [None]:
df_encoded["log_vmail_messages"] = np.log1p(df_encoded["number vmail messages"])

In [None]:
sns.histplot(df_encoded["log_vmail_messages"], kde=True, bins=30)
plt.title("Log-Transformed Distribution")
plt.show()

 # 3.4 Feature engineering

Let's check how different columns correlate with each other before we decide on what features to use so we need to conduct feature engineering on the columns.

In [None]:
#let's check for multicollinearity
corr = df_encoded.corr(numeric_only = True)

In [None]:
plt.figure(figsize=(12,8))
sns.heatmap(corr, annot=True, fmt=".2f", cmap="RdBu_r", center=0, cbar=True)
plt.title("Correlation Heatmap of All Numeric Features",fontsize = 14)
plt.show()


Some of this features are highly correlated and give the same insights for example `total day minutes`,`total eve minutes`and `total night minutes` can be put together to give us `total_minutes` that are used in a 24hrs.This will be similarly be applied to `total_calls` and `total_charge`.

In [None]:
df_encoded["total_minutes"] = df_encoded["total day minutes"] + df_encoded["total eve minutes"] + df_encoded["total night minutes"]
df_encoded["total_calls"] = df_encoded["total day calls"] + df_encoded["total eve calls"] + df_encoded["total night calls"] 
df_encoded["total_charge"] = df_encoded["total day charge"] + df_encoded["total eve charge"] + df_encoded["total night charge"] 



In [None]:
cols_to_drop =["total day minutes", "total eve minutes", "total night minutes", "total day calls", "total eve calls", "total night calls",
    "total day charge", "total eve charge", "total night charge"]

In [None]:
df_encoded = df_encoded.drop(columns = cols_to_drop)

Since `voice mail plan_yes` and `number vmails messages` carry essentially the same info and have a high correlation of **0.96**.Let's drop the column `voive mail plan_yes` because one can't sent or recieve any voice mails without a plan.

In [None]:
df_encoded = df_encoded.drop(columns = ["voice mail plan_yes"])

In [None]:
#cleaned dataframe
df_encoded.head()

# 4. EXPLANATORY DATA ANALYSIS

Let's do a bit of explanatory data analysis before we move to building our models.

## 4.1 Top 5 and Bottom 5 states with highest churn rate.

In [None]:
# Group by state and churn counts
state_churn = df_encoded.groupby(["state", "churn"]).size().unstack(fill_value=0)

# Add churn rate per state
state_churn["churn_rate"] = state_churn[1] / (state_churn[0] + state_churn[1])

# Sort by churn rate (descending)
highest_churn = state_churn.sort_values(by="churn_rate", ascending=False).head(5)

least_churn = state_churn.sort_values(by ="churn_rate", ascending=False).tail(5)

print("Top 5 states with the highest churn rate:")
print(highest_churn)# show top 10 states with highest churn rate
print("Top 5 states with the least churn rate:")
print(least_churn)


In [None]:
plt.figure(figsize=(10,6))
highest_churn["churn_rate"].plot(kind="barh", color="cyan")
plt.title("Churn Rate by State")
plt.ylabel("Churn Rate")
plt.xlabel("State")
plt.xticks(rotation=45)
plt.show()


**New Jersey (NJ)**: Out of 68 customers, 18 churned giving a 26.5% churn rate.

**California (CA)**: Out of 34 customers, 9 churned giving a 26.5% churn rate.

**Texas (TX)**: Out of 72 customers, 18 churned this is a 25% churn rate.

**Maryland (MD)**: Out of 70 customers, 17 churned which is a 24.3% churn rate.

**South Carolina (SC)**: Out of 60 customers, 14 churned equal to 23.3% churn rate.

This might be due to reasons such as high competition, customer expectations among others.

In [None]:
plt.figure(figsize=(10,6))
least_churn["churn_rate"].plot(kind="barh", color="pink")
plt.title("Churn Rate by State")
plt.ylabel("Churn Rate")
plt.xlabel("State")
plt.xticks(rotation=45)
plt.show()

**Hawaii (HI)**: Out of 53 customers, only 3 churned equal to 5.7% churn rate.

**Alaska (AK)**: Out of 52 customers, 3 churned which is  5.8% churn rate.

**Arizona (AZ)**: Out of 64 customers, 4 churned adds upto 6.3% churn rate.

**Virginia (VA)**: Out of 77 customers, 5 churned equals to 6.5% churn rate.

**Iowa (IA)**: Out of 44 customers, 3 churned equals to 6.8% churn rate.

This will guide the company to know where it's loyal customers are and where they have a stronger market.

# 5. MODELLING

## 5.1 BASELINE MODEL

### 5.1.1 LOGISTIC REGRESSION

Let's move to creating models,specifically a logistic regression model because the problem we are trying to answer is a binary classification and we are also trying to answer the question **what is the probability of a customer to churn or not to churn?** depending on various features.

Before moving to the modelling bit of things we first have to do **feature selection** and also look at our predictor variable `churn`, So as to create our baseline model.

In [None]:
df_encoded['churn'].value_counts(normalize =True)*100

In [None]:
sns.countplot(x="churn", data=df_encoded,color ="tomato")
plt.show()

As you can see from the above visual,the target variable is highly imbalanced.The class 0 has a percentage of **85.5%** while our class 1 has **14.49%** .It shows that 85% of the customers stayed while 14.49% churned which we must address during model training to avoid biased predictions.

Since `churn` is binary we can compute **Pearson correlation** between churn and other features before doing Logistic regression. 

In [None]:
# Only keep numeric columns
numeric_df = df_encoded.select_dtypes(include=[np.number])

# Correlation with churn
churn_corr = numeric_df.corr()["churn"].sort_values(ascending=False)
print(churn_corr)


Let's visualize this correlation to `churn` column which is our dependent feature.

In [None]:
#convert to Dataframe for heatmap
churn_corr_df = churn_corr.to_frame()
#plot heatmap
plt.figure(figsize=(6,10))
sns.heatmap(churn_corr_df, annot=True, cmap="coolwarm", center=0, cbar=True)
plt.title("Correlation of Features with Churn", fontsize=14)
plt.show()

We will be using Positive correlation features because they tell us who is likely to churn while Negaative correlation features tell us who is likely to stay.

In [None]:
#sklearn model import required libraries
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, confusion_matrix, classification_report,roc_curve

In [None]:
#top 4 positive correlatrion with churn
selected_features =["international plan_yes","total_charge" ,"customer service calls" ,"total_minutes"]           

In [None]:
#independent and dependent variables
X = df_encoded[["international plan_yes","total_charge","customer service calls" ,"total_minutes"]]          
y = df_encoded["churn"]

In [None]:
# train and test split
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42, stratify=y) #stratify=y ensures the churn ration is preserved in both train and test
log_reg = LogisticRegression(max_iter=1000, class_weight="balanced", random_state=42)
#fit the model
log_reg.fit(X_train, y_train)
#let's predict
y_pred = log_reg.predict(X_test)
y_proba = log_reg.predict_proba(X_test)[:, 1]  # probability of churn (class 1)


## 5.2 LOGISTIC REGRESSION WITH ALL FEATURES.

We want to check whether adding more information improves the model compared to the baseline.For features we are using all the predictores available in the dataset excluding the target variable. We'll first onehotencode the multi-categorical variable state to have a smooth flow.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, roc_auc_score


Let's split and train our model.

In [None]:
# Separate features and target
X = df_encoded.drop("churn", axis=1)
y = df_encoded["churn"]

# Split
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42, stratify=y)


We will have to **OneHotEncode** our multi-categorical variable after conducting a split so that it ensures encoding happens after splitting, with no leakage.

In [None]:
#encode state 
encoder = OneHotEncoder(handle_unknown="ignore", sparse_output=False)

# Fit on train, transform both train & test
X_train_state = encoder.fit_transform(X_train[["state"]])
X_test_state = encoder.transform(X_test[["state"]])

Now scale the the numeric features after encoding `state`

In [None]:
# all numeric features
numeric_features = ['account length', 'area code', 'number vmail messages',
       'total intl minutes', 'total intl calls', 'total intl charge','log_vmail_messages',
       'customer service calls', 'international plan_yes','total_minutes',
       'total_calls', 'total_charge']

In [None]:
scaler = StandardScaler()
X_train_num = scaler.fit_transform(X_train[numeric_features])
X_test_num = scaler.transform(X_test[numeric_features])

In [None]:
#combine encoded vs numeric 
X_train_final = np.hstack([X_train_state, X_train_num])
X_test_final = np.hstack([X_test_state, X_test_num])

#fit the model
model = LogisticRegression(max_iter=1000, random_state=42,class_weight="balanced")
model.fit(X_train_final, y_train)

y_pred = model.predict(X_test_final)
y_pred_proba = model.predict_proba(X_test_final)[:,1]

## 5.3 DECISIONTREE CLASSIFIER

Let's build another model to perform classification ,in this case a DecisionTree Classifier .

In [None]:
from sklearn.preprocessing import OrdinalEncoder
enc = OrdinalEncoder()
df_encoded["state_encoded"] = enc.fit_transform(df_encoded[["state"]])


In [None]:
df_encoded = df_encoded.drop(["state"],axis=1)
df_encoded.head()

In [None]:
X = df_encoded.drop(columns=["churn"])
y = df_encoded['churn']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

# Initialize the model
dt = DecisionTreeClassifier(random_state=42, class_weight="balanced")

In [None]:
# parameter grid
param_grid = {
    "max_depth": [3, 5, 7, 10,],
    "min_samples_split": [2, 5, 10],
    "min_samples_leaf": [1, 2, 4],
    "criterion": ["gini", "entropy"]
}
#GridSearchCV will try different tree depths, splits, and leaf sizes, then pick the best configuration.
grid_search = GridSearchCV(estimator=dt,param_grid=param_grid,cv=5,scoring="recall",n_jobs=-1)

# Run grid search on training set
grid_search.fit(X_train, y_train)

# Best parameters
print("Best Parameters:", grid_search.best_params_)

# Evaluate on test set
y_pred = grid_search.predict(X_test)
y_pred_proba = grid_search.predict_proba(X_test)[:, 1]

## 5.3 RANDOM FOREST MODEL 

we will advance to use Random forest model to build another model.

In [None]:
from sklearn.ensemble import RandomForestClassifier

In [None]:
# churn = target variable

X = df_encoded.drop("churn", axis=1)
y = df_encoded["churn"]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)


In [None]:
rf = RandomForestClassifier(random_state=42, class_weight="balanced")
#
param_grid = {
    "n_estimators": [100, 200],
    "max_depth": [5, 10, None],
    "min_samples_split": [2, 5],
    "min_samples_leaf": [1, 2]
}
grid_search = GridSearchCV(
    estimator=rf,
    param_grid=param_grid,
    scoring="recall",   # optimize for recall
    cv=5,
    n_jobs=-1,
    refit=True          # refits the best model on the whole training set
)
#fit grid search
grid_search.fit(X_train, y_train)

best_rf = grid_search.best_estimator_
print("Best Parameters:", grid_search.best_params_)

In [None]:
y_pred = best_rf.predict(X_test)
y_proba = best_rf.predict_proba(X_test)[:, 1]


# 6.EVALUATION

In this section we will be evaluating our models to determine which performs better at predicting churning customers. We will compare the models and ultimately choose the one that performs better as our baseline model of recommendation.Let's begin.

We will be using Recall and ROC-AUC as the metric of success of our model.we will be using:

**Recall** 
*  Recall measures how many actual churners the model correctly identifies.
*  In churn prediction, missing a churner is costly, because it means losing a customer and revenue.
*  By optimizing high recall, we ensure the model captures most at-risk customers, even if it occasionally flags a few non-churners.

**ROC-AUC**
* measures the model’s ability to discriminate between churners and non-churners across all thresholds.
* ROC-AUC is threshold-independent, so it evaluates the model’s overall ranking ability.
* A high ROC-AUC means the model is reliable in assigning higher churn probabilities to churners than to non-churners, which is critical for making informed business decisions.

Togther they align with our business objectives and the problem we are tyrying to solve.

## 6.1 Logistic regression baseline model

We start with our basemodel logistic with 4 features

In [None]:
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("F1 Score:", f1_score(y_test, y_pred))
print("ROC-AUC:", roc_auc_score(y_test, y_proba))

print("\nClassification Report:\n", classification_report(y_test, y_pred))

**Accuracy score:**

We have an accuracy score of 76% which means about 76% of predictions are correct.But because the dataset is imbalanced accuracy alone is misleading.

**recall:**

The model correctly identifies 75% of true churners meaning the model is good at catching churners even though it misses about 25%.

**Precision :**

Out of all the customers predicted as churners, only 35% actually churn meaning it gives a high number of false positives.It therefore predicts churn when the customer stays.

This model is better at finding churners (high recall) than being precise about them (low precision).
This means:

We will catch most customers who are likely to churn,but also flag many who wouldn’t have churned (false alarms).


Let's visualize our ROC-AUC for more understanding of our model.

In [None]:
# Compute ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_proba)

# Compute AUC
roc_auc_base = roc_auc_score(y_test, y_proba)

# Plot
plt.figure(figsize=(8,6))
plt.plot(fpr, tpr, label=f"ROC Curve (AUC = {roc_auc_base:.2f})", color="tomato")
plt.plot([0,1], [0,1], linestyle="--", color="black")  # Random baseline
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate (Recall)")
plt.title("ROC-AUC Curve for Churn Prediction Model")
plt.legend(loc="lower right")
plt.show()

Our ROC curve lies well above the diagonal, meaning the model does a good job distinguishing churners from non-churners.It has an AUC = 0.81 which is good and shows a that our model is highly predictive but has room for growth.

In [None]:
#confusion matrix
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt="d", cmap="Greens",
            xticklabels=["Not Churn", "Churn"],
            yticklabels=["Not Churn", "Churn"])
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()

Top-left cell: customers correctly predicted as “stay”(True positive).

Bottom-right cell: customers correctly predicted as “churn”(True negative).

Top-right cell: customers predicted to churn but actually stayed (false positives).

Bottom-left cell: customers predicted to stay but actually churned (false negatives).

**7.1.2 Feature importance**

In [None]:
feature_importance = pd.DataFrame({
    "Feature": X_train.columns,
    "Coefficient": log_reg.coef_[0]
}).sort_values(by="Coefficient", ascending=False)

print(feature_importance)

In [None]:
plt.figure(figsize=(8,5))

# Horizontal bar plot
sns.barplot(data =feature_importance,x="Coefficient",y="Feature")
plt.title("Feature Importance from Logistic Regression")
plt.xlabel("Coefficient Value (Impact on Churn)")
plt.ylabel("Features")
plt.show()


**Impact on churn**

Let's look at how the features of our baseline model affect churn.

`international plan_yes`(2.33): Customers with an international plan are much more likely to churn.

`customer service calls`(0.62): The more times a customer calls customer service, the higher the chance they churn.

`total_charge` (0.079): As charges increase, churn likelihood slightly increases.weak effect

`total_minutes`(-0.0007) :Customers who use more minutes are slightly less likely to churn

## 6.2 Logistic regression  model with all features

Let's first start by checking for **overfitting** in our model since we have used all features.

In [None]:
# Train predictions
y_train_pred = model.predict(X_train_final)
y_train_prob = model.predict_proba(X_train_final)[:,1]

# Test predictions
y_test_pred = model.predict(X_test_final)
y_test_prob = model.predict_proba(X_test_final)[:,1]

print("Train Accuracy:", accuracy_score(y_train, y_train_pred))
print("Test Accuracy:", accuracy_score(y_test, y_test_pred))

print("Train ROC-AUC:", roc_auc_score(y_train, y_train_prob))
print("Test ROC-AUC:", roc_auc_score(y_test, y_test_prob))

our logistic regression with all features does not show strong signs of overfitting. The performance is slightly better on the training set but performs close enough to the test set data suggesting good generalization.

Let's take a look at the model's general performance.

In [None]:
# Print metrics
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("F1-score:", f1_score(y_test, y_pred))
print("ROC-AUC:", roc_auc_score(y_test, y_pred_proba))

print("\nClassification Report:\n", classification_report(y_test, y_pred))

In [None]:
# Confusion matrix
cm = confusion_matrix(y_test, y_pred)

# Plot
plt.figure(figsize=(6,4))
sns.heatmap(cm, annot=True, fmt='d', cmap="Blues", xticklabels=["No Churn", "Churn"], yticklabels=["No Churn", "Churn"])
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()

## 6.3 DecisionTreeClassifier with all features

Our Logistic regression model with all features did not improve.This might be because logistic regression may not fully capture nonlinear patterns in our data so we try a DecisionTreeClassifier.  

In [None]:
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("F1 Score:", f1_score(y_test, y_pred))
print("ROC-AUC:",roc_auc_score(y_test, y_pred_proba))

print("\nClassification Report:\n", classification_report(y_test, y_pred))

In [None]:
# Plot ROC curve
#roc_auc = roc_auc_score(y_test, y_proba)
fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
# Compute AUC score
roc_auc_tr = roc_auc_score(y_test, y_pred_proba)

plt.plot(fpr, tpr, label=f"Decision Tree (AUC = {roc_auc_tr:.2f})")
plt.plot([0,1],[0,1],'--',color='gray')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate (Recall)")
plt.title("ROC-AUC Curve - Decision Tree")
plt.legend()
plt.show()

In [None]:
y_pred = grid_search.predict(X_test)
# Confusion matrix
cm = confusion_matrix(y_test, y_pred)

# Plot heatmap
plt.figure(figsize=(6,4))
sns.heatmap(cm, annot=True, fmt='d', cmap="Blues",
            xticklabels=["No Churn", "Churn"],
            yticklabels=["No Churn", "Churn"])
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix - Decision Tree")
plt.show()

In [None]:
from sklearn import tree
# Get the best estimator from grid search
best_dt = grid_search.best_estimator_

# Plot the tree
plt.figure(figsize=(20,10))
tree.plot_tree(
    best_dt, 
    feature_names=X_train.columns,    # Names of your features
    class_names=["No Churn", "Churn"],  # Target classes
    filled=True,                      # Color nodes by class
    rounded=True,                     # Rounded boxes
    fontsize=12
)
plt.title("Decision Tree from GridSearchCV")
plt.show()

The Decision Tree is performing much better across all metrics than the baseline logistic regression.

**Recall**: Slightly improved from 0.753 to 0.808 meaning the tree catches more actual churners.

**ROC-AUC**: Higher AUC of 0.90 means the tree has a much better ability to discriminate churners from non-churners overall

The Decision Tree clearly outperforms the baseline logistic regression on this dataset.

In [None]:
# Feature importances
feat_importance = pd.DataFrame({
    "Feature": X_train.columns,
    "Importance": best_dt.feature_importances_
}).sort_values(by="Importance", ascending=False)

print(feat_importance)

In [None]:
# Plot top features
plt.figure(figsize=(10,6))
plt.barh(feat_importance["Feature"], feat_importance["Importance"], color="royalblue")
plt.gca().invert_yaxis()  # largest on top
plt.xlabel("Importance")
plt.title("Feature Importance - Decision Tree")
plt.show()

**top 3 features** 

`total_charge` at a coefficient of (0.34) is strongest predictor of churn this can be intepreted as higher charges may indicate higher risk or dissatisfaction.

`customer service calls`(0.21) Number of calls to customer service strongly signals churn this might be a likelihood of more complaints.

`international plan_yes`(0.195) Having an international plan contributes significantly to predicting churn this might be perhaps because of cost concerns.

## 6.4 Random forest with all features

Let's take a look at how our Random Forest model worked compared to the others.

In [None]:
# Metrics
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("F1-score:", f1_score(y_test, y_pred))
print("ROC-AUC:", roc_auc_score(y_test, y_proba))

print("\nClassification Report:\n", classification_report(y_test, y_pred))


Random Forest outperforms both the baseline logistic regression and the Decision Tree in accuracy, precision, F1-score, and ROC-AUC.

**Recall** is the ability to detect actual churners and is slightly lower than the tuned Decision Tree (0.807 vs 0.814), but the precision and overall F1 improved significantly, meaning fewer false positives.

**ROC-AUC** Random Forest has the highest ROC-AUC (0.924), indicating it discriminates churners from non-churners better than the other models.

Random Forest gives the best overall performance on this dataset.


Tuning the probability threshold is the most common way to adjust the trade-off between precision and recall, especially in imbalanced datasets like churn prediction.we want to have a high recall which ensures we catch all churners but we will have to pay the price of false alarms which is a trade off we are willing to make.

In [None]:
thresholds = np.arange(0.1, 1.0, 0.05)
for t in thresholds:
    y_pred_t = (y_proba > t).astype(int)
    print(f"Threshold: {t:.2f} | Precision: {precision_score(y_test, y_pred_t):.2f} | Recall: {recall_score(y_test, y_pred_t):.2f} | F1: {f1_score(y_test, y_pred_t):.2f}")


In [None]:
best_threshold = 0.35  # example from tuning
y_pred_final = (y_proba > best_threshold).astype(int)


In [None]:
# Metrics
print("Recall:", recall_score(y_test, y_pred_final))
print("ROC-AUC:", roc_auc_score(y_test, y_proba))

Let's compare the models ROC-AUC and Recall across the models.

Our model correctly identifies 84.8% of the actual churners.A high recall will ensure we catch more churners and this important in churn prediction because missing churners (false negatives) can lead to lost revenue.ROC-AUC measures the model’s ability to discriminate between churners and non-churners across all possible thresholds. Our score of 0.917 indicates that if you randomly pick a churner and a non-churner, the model assigns a higher probability of churn to the churner 91.7% of the time.
We can conclude that This model is very good at separating churners from non-churners compared to our baseline model and other models too.

In [None]:
# Feature importances
feat_importance = pd.DataFrame({
    "Feature": X_train.columns,
    "Importance": best_rf.feature_importances_  # from your fitted GridSearchCV best model
}).sort_values(by="Importance", ascending=False)
print(feat_importance)

In [None]:
palette = sns.color_palette("viridis", len(feat_importance))
# Plot
plt.figure(figsize=(10,6))
plt.barh(feat_importance["Feature"], feat_importance["Importance"], color=palette)
plt.gca().invert_yaxis()
plt.xlabel("Importance")
plt.title("Random Forest Feature Importance")
plt.savefig("feature_importance.png", dpi=300, bbox_inches='tight')
plt.show()

## HOW FEATURES INFLUENCING CHURN
This are the features that increase rate of churning in the model we seek to deploy:

**total charge** Customers with higher total charges are more likely to churn.High spending may indicate dissatisfaction with value or plan costs.

**customer service calls** Frequent calls to customer service strongly predict churn. Likely reflects unresolved issues or poor service experience.

**total_minute** Customers with higher total minutes usage may be at risk; possibly they are testing services or comparing alternatives.

**international plan** Having an international plan increases churn risk. Possibly due to cost or underuse of the plan.

**area code** Area code does not influence churn prediction.

**state** Geographic location contributes very little.

# 7.CONCLUSION
1. New Jersey (NJ) and California (CA) are the states with the highest churning rate at 26.5% churn rate.
2. Hawaii (HI) and Alaska (AK) have the most loyal customers with a low rate of 5.7% and 5.8%  churn rate respectively.
3. The best performing model has a recall of 84.5% and an ROC-AUC score of 91.7% .
4. High charges is the highest feature influencing churn at 0.305
5. 

# 8. RECCOMENDATIONS
1. Stakeholders should focus on states with highest churn NJ(New jersey) and CA(Carlifonia) with special offers, loyalty programs, or improved customer service inorder to retain the customers.
2. High charges, frequent customer service calls and international plans are the key factors driving churn. By targeting retention efforts to these customers, the company can maximize revenue retention.
3. Stakeholders should de-priotize  Voicemail usage, account length, state, and area code because they have minimal predictive value when working on interventions.
4. Stakeholders should investigate complaints or service issues in high-churn states to reduce dissatisfaction and also pay attention to areas that are reported by customer complaints to increase customer satisfaction .
5. Stakeholders should relocate resources to higher-risk states such as New Jersey, California ,Texas among others since churn is low in those states and replicate strategies used in those states to improve others.