# Project 2 - Churn - Predictor

- Predict Churn with the loaded dataset, preprocessor and models

- Course Name :         Applied Machine Learning
- Course instructor :   Sohail Tehranipour
- Student Name :        Afshin Masoudi Ashtiani
- Project 2 -           Churn
- Date :                September 2024

## Install Required Libraries

In [15]:
%pip install pandas numpy joblib 
%pip install scikit-learn imbalanced-learn
%pip install lightgbm xgboost 

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## Step 1: Load the data, preprocessor and models

In [16]:
# Load models and preprocessor
import os
import pandas as pd
import joblib


model_dir = 'C:\\Users\\Afshin\\Desktop\\10_Projects\\Project_2_Churn\\models'
data_dir = 'C:\\Users\\Afshin\\Desktop\\10_Projects\\Project_2_Churn\\datasets'

preprocessor_path = os.path.join(model_dir, 'churn_preprocessor.joblib')
loaded_preprocessor = joblib.load(preprocessor_path)

model_names = [
    'Ada Boost Classifier',
    'Extra Trees Classifier',
    'Gradient Boosting Classifier',
    'LGBM Classifier', 
    'Logistic Regression',
    'Random Forest Classifier',
    'XGBoost Classifier', 
]
model_paths = {name: os.path.join(model_dir, f"{name.replace(' ', '')}.joblib") for name in model_names}

# Load models safely
models = {}
for name, path in model_paths.items():
    try:
        models[name] = joblib.load(path)
    except Exception as e:
        print(f"Error loading model {name} from {path}: {str(e)}")
        
# Load dataset
data_path = os.path.join(data_dir, 'cleaned_IT_customer_churn.csv')
df = pd.read_csv(data_path)

# Prepare features and target
X = df.drop(columns=['Churn'])
y = df['Churn']

## Step 2: Make Predictions

- Calculate metrics

In [17]:
# Function for calculating metrics
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score

def calculate_metrics(y_true, y_pred):
    """Calculate and return accuracy, recall, F1, and precision scores."""
    acc = accuracy_score(y_true, y_pred)
    rec = recall_score(y_true, y_pred)
    f1 = f1_score(y_true, y_pred)
    prec = precision_score(y_true, y_pred)
    return acc, rec, f1, prec

- Predict the sample 

In [18]:
from imblearn.over_sampling import SMOTE

def predict(sample):
    results = []

    try:
        # Process the sample and dataset with the loaded preprocessor
        sample_trans = loaded_preprocessor.transform(sample)
        X_trans = loaded_preprocessor.transform(X)

        # Balance the processed dataset with SMOTE
        X_resampled, y_resampled = SMOTE(random_state=42).fit_resample(X_trans, y)
        
        for name, model in models.items():
            # Predict Churn
            churn_pred = model.predict(sample_trans)
            y_resampled_pred = model.predict(X_resampled)

            # Calculate metrics
            acc, rec, f1, prec = calculate_metrics(y_resampled, y_resampled_pred)

            results.append({
                'Model': name,
                'Accuracy': round(acc * 100, 2),
                'Recall': round(rec * 100, 2),
                'F1': round(f1 * 100, 2),
                'Precision': round(prec * 100, 2),
                'Predicted Churn': 'Yes' if churn_pred[0] == 1 else 'No',
            })

    except Exception as e:
        print(f"An error occurred during model loading or prediction: {str(e)}")
        return pd.DataFrame()

    return pd.DataFrame(results).sort_values(by=['Accuracy'], ascending=False)

- Random Sample

In [19]:
sample = df.sample(1)
X_sample = sample.drop('Churn', axis= 1)
y_sample = sample.Churn

print(f'> Churn: {y_sample.values[0]}')
print(f'>>>> The result of prediction :')
predict(X_sample)

> Churn: 1
>>>> The result of prediction :


Unnamed: 0,Model,Accuracy,Recall,F1,Precision,Predicted Churn
5,Random Forest Classifier,97.08,97.05,97.08,97.11,Yes
1,Extra Trees Classifier,96.26,97.98,96.33,94.73,Yes
3,LGBM Classifier,89.31,89.62,89.34,89.06,Yes
2,Gradient Boosting Classifier,88.9,89.38,88.95,88.53,Yes
6,XGBoost Classifier,86.97,87.17,87.0,86.82,Yes
0,Ada Boost Classifier,84.93,87.52,85.31,83.21,Yes
4,Logistic Regression,77.96,82.24,78.87,75.76,Yes


- Predict the new data 

In [20]:
# Make predictions on new data
new_data = pd.DataFrame([{
  'gender' : 0,
  'SeniorCitizen' : 1,
  'Partner' : 1,
  'Dependents' : 0,
  'tenure' : 67,
  'PhoneService' : 1,
  'MultipleLines' : 1,
  'InternetService' : 'Fiber optic',
  'OnlineSecurity' : 1,
  'OnlineBackup' : 1,
  'DeviceProtection' : 1,
  'TechSupport' : 1,
  'StreamingTV' : 0,
  'StreamingMovies' : 1,
  'Contract' : 'One year',
  'PaperlessBilling' : 1,
  'PaymentMethod' : 'Credit card (automatic)',
  'MonthlyCharges' : 105.4,
  'TotalCharges' : 7035.6,
}])

print(f'>>>> The result of prediction :')
predict(new_data)

>>>> The result of prediction :


Unnamed: 0,Model,Accuracy,Recall,F1,Precision,Predicted Churn
5,Random Forest Classifier,97.08,97.05,97.08,97.11,No
1,Extra Trees Classifier,96.26,97.98,96.33,94.73,No
3,LGBM Classifier,89.31,89.62,89.34,89.06,No
2,Gradient Boosting Classifier,88.9,89.38,88.95,88.53,No
6,XGBoost Classifier,86.97,87.17,87.0,86.82,No
0,Ada Boost Classifier,84.93,87.52,85.31,83.21,No
4,Logistic Regression,77.96,82.24,78.87,75.76,No
