<a href="https://colab.research.google.com/github/KOUSHIKI97/Telecom_churn/blob/main/Telecom_churn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

***Telecom Churn Prediction Project***

1. ***Introduction***
In this project, we aim to predict churn among high-value customers in a telecom dataset. We'll focus on:

Identifying high-value customers.
Tagging churners based on usage data.
Building models to predict churn and interpret important features.

2. ***Data Loading and Exploration***:-
Load Libraries and Dataset

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score
from sklearn.preprocessing import StandardScaler
from imblearn.over_sampling import SMOTE
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
data = pd.read_csv('/mnt/data/telecom_churn_data.csv')

# Display basic information
data.info()
data.head()


***Basic Data Cleaning***:-
Check for and handle missing values if necessary.

In [None]:
# Check for missing values
missing_values = data.isnull().sum()
print(missing_values[missing_values > 0])


3. ***Define High-Value Customers***:-
Calculate Average Recharge and Filter

In [None]:
# Calculate average recharge amount for the good phase (first two months)
data['avg_recharge_good_phase'] = data[['recharge_amt_1', 'recharge_amt_2']].mean(axis=1)

# Define high-value customers as those in the 70th percentile or above
X = data['avg_recharge_good_phase'].quantile(0.70)
high_value_customers = data[data['avg_recharge_good_phase'] >= X]
print(f"Number of high-value customers: {len(high_value_customers)}")


4. ***Tag Churners and Remove Churn Phase Attributes***
Tagging Churners :- Tag churners based on the conditions provided for month 9.

In [None]:
# Tag churners: churn = 1 if no incoming/outgoing calls and no data usage in month 9
high_value_customers['churn'] = ((high_value_customers['total_ic_mou_9'] == 0) &
                                 (high_value_customers['total_og_mou_9'] == 0) &
                                 (high_value_customers['vol_2g_mb_9'] == 0) &
                                 (high_value_customers['vol_3g_mb_9'] == 0)).astype(int)


***Remove Churn Phase Attributes***:-
Remove all attributes related to month 9.

In [None]:
# Define features (X) and target (y)
X = high_value_customers.drop(columns=['churn'])
y = high_value_customers['churn']

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


5. ***Model Building***:-
Split Data:- Separate features and target, and split into training and test sets.

In [None]:
# Define features (X) and target (y)
X = high_value_customers.drop(columns=['churn'])
y = high_value_customers['churn']

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


***Handle Class Imbalance with SMOTE***

In [None]:
# Apply SMOTE to balance the target classes
smote = SMOTE(random_state=42)
X_train_smote, y_train_smote = smote.fit_resample(X_train, y_train)


6. ***Model Training and Evaluation***:-
Logistic Regression Model for Feature Importance

In [None]:
# Train a Logistic Regression model to identify important predictors
log_model = LogisticRegression(max_iter=1000)
log_model.fit(X_train_smote, y_train_smote)

# Make predictions and evaluate
y_pred_log = log_model.predict(X_test)
print("Logistic Regression Classification Report:\n", classification_report(y_test, y_pred_log))

# Feature Importance (Logistic Regression coefficients)
feature_importance_log = pd.Series(log_model.coef_[0], index=X.columns).sort_values(ascending=False)


***Random Forest Model for Churn prediction***:-

In [None]:
# Train a Random Forest model for churn prediction
rf_model = RandomForestClassifier(random_state=42)
rf_model.fit(X_train_smote, y_train_smote)

# Make predictions and evaluate
y_pred_rf = rf_model.predict(X_test)
print("Random Forest Classification Report:\n", classification_report(y_test, y_pred_rf))


7. ***Feature Importance Visualization***:-

**Logistic Regression Coefficients**

In [None]:
# Plot feature importance from Logistic Regression
plt.figure(figsize=(10, 8))
feature_importance_log.plot(kind='bar')
plt.title("Feature Importance - Logistic Regression")
plt.xlabel("Features")
plt.ylabel("Importance")
plt.show()


***Random Forest Feature Importances***

In [None]:
# Plot feature importance from Random Forest
feature_importance_rf = pd.Series(rf_model.feature_importances_, index=X.columns).sort_values(ascending=False)
plt.figure(figsize=(10, 8))
feature_importance_rf.plot(kind='bar')
plt.title("Feature Importance - Random Forest")
plt.xlabel("Features")
plt.ylabel("Importance")
plt.show()


8. ***Recommendations to Manage Customer Churn***:-
Based on the identified important features and model findings, we can recommend the following actions:

1. **Target high-risk customers**: For customers who show signs of reduced engagement (low call or data usage), offer tailored retention plans or discounts.
2. **Monitor usage patterns**: Monitor changes in key usage variables, such as call minutes and data consumption. Significant drops may indicate increased churn risk.
3. **Offer customized plans**: For customers who show heavy usage of certain services (e.g., data-heavy users), provide special data packages or loyalty bonuses to maintain engagement.

***Conclusion ***:- This analysis offers valuable insights into customer behavior and supports targeted interventions to reduce churn among high-value customers. By implementing personalized retention strategies, telecom companies can strengthen customer loyalty, enhance revenue stability, and reduce the cost associated with customer acquisition. With ongoing monitoring and refinement, these predictive models can continue to offer meaningful support to customer retention efforts.

This project provides a foundation for further analysis, including deeper exploration of customer segments and a more granular understanding of usage patterns, which can lead to even more customized and effective retention strategies.