<a href="https://colab.research.google.com/github/Pranshu-007/Customer-Churn-ML-Model/blob/main/Untitled0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Customer Churn Prediction**

## **Step 1: Import all the required libraries**

In [293]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import OrdinalEncoder
from imblearn.over_sampling import SMOTE

## **Step 2: Load the dataset and check the Data**

In [294]:
df = pd.read_csv('/content/Telco-Customer-Churn.csv')
df.sample(10)

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
120,1091-SOZGA,Female,0,Yes,Yes,56,Yes,Yes,Fiber optic,No,...,Yes,No,Yes,Yes,One year,Yes,Credit card (automatic),99.8,5515.45,No
6911,0508-SQWPL,Female,0,Yes,Yes,57,Yes,No,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,Two year,Yes,Bank transfer (automatic),20.1,1087.7,No
4453,1455-UGQVH,Male,0,Yes,No,10,Yes,Yes,Fiber optic,No,...,Yes,No,Yes,Yes,Month-to-month,Yes,Electronic check,98.5,1037.75,Yes
2182,2530-FMFXO,Male,0,Yes,Yes,56,Yes,Yes,Fiber optic,No,...,Yes,Yes,Yes,Yes,Two year,Yes,Electronic check,103.2,5873.75,No
5949,8756-RDDLT,Female,0,No,No,68,No,No phone service,DSL,No,...,No,Yes,No,Yes,Month-to-month,No,Electronic check,44.95,3085.35,No
894,7997-EASSD,Female,0,Yes,No,63,Yes,Yes,Fiber optic,No,...,No,Yes,No,No,One year,Yes,Credit card (automatic),81.2,4965.1,No
5587,1707-HABPF,Female,1,No,No,46,Yes,No,Fiber optic,No,...,Yes,No,Yes,No,One year,Yes,Bank transfer (automatic),91.3,4126.35,No
3712,9209-NWPGU,Male,0,No,No,44,Yes,No,DSL,Yes,...,No,Yes,Yes,No,One year,No,Electronic check,65.4,2774.55,No
643,4908-XAXAY,Female,1,No,No,49,Yes,No,Fiber optic,No,...,Yes,No,No,Yes,One year,Yes,Bank transfer (automatic),89.85,4287.2,No
6166,4077-CROMM,Female,0,Yes,Yes,31,Yes,Yes,Fiber optic,No,...,Yes,Yes,Yes,Yes,Month-to-month,Yes,Electronic check,104.2,3243.45,Yes


## **Step 3: Fill/Remove all missing/NaN values and change Datatypes if Required**
Convert SeniorCitizen to category (0 or 1)

In [295]:
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')

categorical_cols = [
    'gender', 'Partner', 'Dependents', 'PhoneService', 'MultipleLines',
    'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection',
    'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract',
    'PaperlessBilling', 'PaymentMethod', 'Churn'
]
df[categorical_cols] = df[categorical_cols].astype('category')
df['SeniorCitizen'] = df['SeniorCitizen'].astype('category')

In [296]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype   
---  ------            --------------  -----   
 0   customerID        7043 non-null   object  
 1   gender            7043 non-null   category
 2   SeniorCitizen     7043 non-null   category
 3   Partner           7043 non-null   category
 4   Dependents        7043 non-null   category
 5   tenure            7043 non-null   int64   
 6   PhoneService      7043 non-null   category
 7   MultipleLines     7043 non-null   category
 8   InternetService   7043 non-null   category
 9   OnlineSecurity    7043 non-null   category
 10  OnlineBackup      7043 non-null   category
 11  DeviceProtection  7043 non-null   category
 12  TechSupport       7043 non-null   category
 13  StreamingTV       7043 non-null   category
 14  StreamingMovies   7043 non-null   category
 15  Contract          7043 non-null   category
 16  PaperlessBilling  7043 n

## **Step 4: Check for Colums that have NaN values and replace them if there are more in number**

In [297]:
na_cols = df.isna().sum()
print(na_cols[na_cols > 0])

TotalCharges    11
dtype: int64


## **Step 5: Since we have 11 rows with NaN number in TotalCharges column so we can remove as they are comparetivly low incomparison with the main dataset(7042 rows)**

In [298]:
df = df.dropna(subset=['TotalCharges'])

## **Step 6: Remove customerID col as it is only for reference and will not be used in ML Model(Keep the copy of customerID so that we can apply discount offer for customers that have *churn == 1*)**

In [299]:
customer_ids = df['customerID']
df.drop('customerID', axis=1, inplace=True)

## **Step 7: Used OrdinalEncoder for Mapping data with Number for Better Acurracy**

In [300]:
encoder = OrdinalEncoder()
cat_cols_for_encoding = df.select_dtypes(include=['category', 'object']).columns
df[cat_cols_for_encoding] = encoder.fit_transform(df[cat_cols_for_encoding])

## **Step 8: *Split* features and target**

In [301]:
X = df.drop('Churn', axis=1)
y = df['Churn']

## **Step 9: Balance dataset using SMOTE**

In [302]:
smote = SMOTE(random_state=42)
X, y = smote.fit_resample(X, y)

## **Step 10: Train-test split , Test size 20%**

In [303]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## **Step 11: Train tuned Random Forest model**

In [304]:
model = RandomForestClassifier(
    n_estimators=200,
    max_depth=10,
    min_samples_split=5,
    min_samples_leaf=3,
    random_state=42
)
model.fit(X_train, y_train)

## **Step 12: Prediction**

In [305]:
y_pred = model.predict(X_test)

## **Step 13: Evaluate model**

In [306]:
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Accuracy: 0.8620522749273959

Classification Report:
               precision    recall  f1-score   support

         0.0       0.87      0.85      0.86      1037
         1.0       0.85      0.87      0.86      1029

    accuracy                           0.86      2066
   macro avg       0.86      0.86      0.86      2066
weighted avg       0.86      0.86      0.86      2066



## **Step 14: Used Pickel to Create ML Model for this Dataset**

In [308]:
import pickle
pickle.dump(model,open('churn-prediction-model.pkl','wb'))