# **Day 53: Implementing a Customer Churn Prediction Model** 🚀

Customer churn prediction is a vital machine learning task that helps businesses identify and retain at-risk customers. By combining theoretical insights and practical implementation, we can build a robust churn prediction model.

---

## **What is Customer Churn?**
- **Customer Churn** refers to the loss of customers over a specific time period.  
- Predicting churn helps businesses take **proactive measures** to retain customers, saving costs and improving profitability.

---

## **Key Steps in Churn Prediction**

### **1. Data Preprocessing**  
- **Data Cleaning**: Handle missing values to ensure the dataset is complete.  
- **Encoding**: Convert categorical variables (e.g., contract type, payment method) into numerical format.  
- **Scaling**: Normalize numerical features (e.g., monthly charges, tenure) to ensure uniform ranges.

---

### **2. Feature Engineering**  
- **Identify Relevant Features**: Select features like `contract type`, `tenure`, `monthly charges`, etc., that are likely to influence churn.  
- **Feature Creation**: Derive new features, such as **average monthly spend** or **tenure-to-charges ratio**, for better insights.  
- **Feature Selection**: Use methods like **correlation analysis** or **feature importance** to retain the most predictive features.

---

### **3. Model Selection and Training**  
- Use **supervised learning algorithms** such as:  
  - **Logistic Regression**: For interpretable, probabilistic predictions.  
  - **Decision Trees**: For straightforward, rule-based insights.  
  - **Random Forests**: For robust predictions by combining multiple decision trees.  

---

### **4. Model Evaluation**  
Evaluate the model on unseen data using key metrics:  
- **Accuracy**: Proportion of correctly predicted churn and non-churn cases.  
- **Precision**: Percentage of true churns among predicted churn cases.  
- **Recall**: Ability to detect all true churn cases.  
- **F1-Score**: Balance between precision and recall.  
- **ROC-AUC**: Measures the model’s capability to distinguish between churn and non-churn customers.

---

## Practical: Building the Churn Prediction Pipeline
### Steps to Build the Pipeline:
#### 1. Import Required Libraries:

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, roc_auc_score
from sklearn.preprocessing import StandardScaler, LabelEncoder

##### 2. Load the Dataset:

In [None]:
data = pd.read_csv("customer_churn.csv")
print(data.head())

#### 3. Data Preprocessing:

- Handle missing values.
- Encode categorical variables using LabelEncoder or pd.get_dummies.
- Scale numerical features using StandardScaler.

In [None]:
data.fillna(method='ffill', inplace=True)

le = LabelEncoder()
data['Contract_Type'] = le.fit_transform(data['Contract_Type'])

scaler = StandardScaler()
num_cols = ['Monthly_Charges', 'Tenure']
data[num_cols] = scaler.fit_transform(data[num_cols])

#### 4. Feature Engineering:

- Identify important features or use all available features.
- Select features based on correlation or feature importance from a model like Random Forest.

In [None]:
X = data.drop(['Churn'], axis=1)
y = data['Churn']

#### 5. Split the Data:

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

#### 6. Train a Model:

In [None]:
log_reg = LogisticRegression(random_state=42)
log_reg.fit(X_train, y_train)

rf_model = RandomForestClassifier(random_state=42)
rf_model.fit(X_train, y_train)

#### 6. Evaluate the Model:

In [None]:
y_pred = log_reg.predict(X_test)
print(classification_report(y_test, y_pred))

y_pred_proba = log_reg.predict_proba(X_test)[:, 1]
print("ROC-AUC Score:", roc_auc_score(y_test, y_pred_proba))