<a href="https://colab.research.google.com/github/Subrahamanyampola/Django/blob/master/Chrun%20Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Introduction

In the telecommunications industry, customer churn prediction is crucial for reducing customer turnover. By using predictive analytics and data mining techniques, telecom providers can proactively identify customers likely to churn, enabling them to implement retention strategies. This project aims to develop predictive models for customer churn using the IBM Telco Customer Churn dataset.

The machine learning models used for this project are:
- Decision Trees
- Logistic Regression
- Support Vector Machines (SVM)


# 2. Data Collection & Preprocessing

In this section, we will load the dataset, handle missing values, encode categorical variables, normalize data, and address any class imbalance issues.

## 2.1 Load Dataset


In [None]:
from google.colab import drive
drive.mount('/content/drive')



## 2.2 Handle Missing Values


In [None]:
# Check for missing values
df.isnull().sum()

# Handle missing values (e.g., filling with median, mode, or dropping rows)
df.fillna(df.median(), inplace=True)  # Example for filling numeric columns with median


## 2.3 Encode Categorical Variables


In [None]:
# Example of encoding categorical variables
df = pd.get_dummies(df, drop_first=True)  # One-hot encoding for categorical columns


## 2.4 Normalize Data


In [None]:
from sklearn.preprocessing import StandardScaler

# Normalizing the data
scaler = StandardScaler()
df_scaled = scaler.fit_transform(df.drop('Churn', axis=1))  # Exclude the target column


## 2.5 Address Class Imbalance


In [None]:
from imblearn.over_sampling import SMOTE

# Apply SMOTE to balance the classes
smote = SMOTE()
X_res, y_res = smote.fit_resample(df_scaled, df['Churn'])


# 3. Exploratory Data Analysis (EDA) & Feature Engineering

In this section, we will explore the dataset, visualize important patterns, and select relevant features.

## 3.1 Visualize Churn Distribution


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Plot churn distribution
sns.countplot(x='Churn', data=df)
plt.title('Churn Distribution')
plt.show()


## 3.2 Correlation Matrix


In [None]:
# Correlation matrix to find important features
corr_matrix = df.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Matrix')
plt.show()


## 3.3 Feature Selection


In [None]:
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

# Feature selection using Recursive Feature Elimination (RFE)
model = LogisticRegression()
selector = RFE(model, n_features_to_select=10)
selector = selector.fit(df_scaled, df['Churn'])
selected_features = df.columns[selector.support_]
print("Selected Features:", selected_features)


# 4. Modeling & Evaluation

In this section, we will train the models using Decision Trees, Logistic Regression, and Support Vector Machines (SVM) and evaluate their performance using K-Fold cross-validation.

## 4.1 Model Initialization


In [None]:
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_res, y_res, test_size=0.2, random_state=42)

# Initialize models
models = {
    'Decision Tree': DecisionTreeClassifier(),
    'Logistic Regression': LogisticRegression(),
    'SVM': SVC()
}


## 4.2 Model Training & Cross-Validation


In [None]:
# Train and evaluate each model using cross-validation
for model_name, model in models.items():
    scores = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy')
    print(f"{model_name} Accuracy: {scores.mean():.4f}")


## 4.3 Model Evaluation Metrics


In [None]:
from sklearn.metrics import classification_report, confusion_matrix

# Choose the best model (e.g., SVM) for evaluation
best_model = SVC()
best_model.fit(X_train, y_train)

# Predict and evaluate the model
y_pred = best_model.predict(X_test)
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))


# 5. Visualization Dashboard

In this section, we will build a simple interactive dashboard to visualize the churn probabilities and important features contributing to customer churn. We will use **Plotly** and **Dash**.

## 5.1 Build Dashboard (Optional)


In [None]:
# Example code for a simple Plotly dashboard (optional, based on project scope)
import plotly.express as px

# Create a visualization for feature importance (example)
fig = px.bar(x=selected_features, y=best_model.coef_[0])
fig.update_layout(title="Feature Importance", xaxis_title="Features", yaxis_title="Importance")
fig.show()


# 6. Conclusion

This project aimed to build predictive models for customer churn in the telecommunications industry. By using data mining techniques, we successfully identified key predictors of churn and developed a classification model to predict customer attrition.

Future work could involve exploring deep learning techniques or improving the dashboard for better business insights.
