# Customer Churn Prediction at a Telecom Company

This document outlines the process of predicting customer churn using logistic regression. We'll handle both binary and multi-class classification scenarios to identify different types of churn.

## Scenario 1: Binary Classification

### Dataset Creation

Create a dataset containing the following features and a binary target variable for churn:

| Feature         | Description                                              |
|-----------------|----------------------------------------------------------|
| Age             | Customer's age                                           |
| MonthlyCharges  | Monthly billing amount                                   |
| ContractType    | Type of customer contract (Month-to-month, One year, Two year) |
| Tenure          | Number of months the customer has been with the company  |
| Churn           | Binary indicator (1 for churned, 0 for not churned)      |

### Logistic Regression Model Implementation

#### Steps to Implement the Model

1. **Normalize the continuous features**: Scale the features Age, MonthlyCharges, and Tenure to have a mean of 0 and a standard deviation of 1.
2. **Initialize the parameters randomly**: Start with random weights and bias for the logistic regression model.
3. **Gradient Descent**: Use gradient descent to minimize the logistic cost function iteratively.
4. **Determine Optimal Parameters**: Find the set of parameters that result in the lowest cost function value.
5. **Model Evaluation**:
   - Use accuracy and confusion matrix to evaluate the model.
   - Implement cross-validation to assess the model's performance reliably.

## Scenario 2: Multi-Class Classification

### Dataset Modification

Extend the dataset to include three classes of churn:

| Feature         | Description                                              |
|-----------------|----------------------------------------------------------|
| Churn Type      | Categorical variable (No Churn, Voluntary Churn, Involuntary Churn) |

### Logistic Regression for Multi-Class

#### Adaptations for Multi-Class

1. **Adjust the Model for One-vs-All Strategy**: Implement a one-vs-all (OvR) strategy where separate logistic regression classifiers are trained for each class against all other classes. This approach allows us to extend logistic regression to multi-class problems.
2. **Training and Optimization**: Train each classifier using the same steps as in the binary classification scenario.
3. **Evaluation**: Extend the evaluation metrics to include class-specific accuracy, overall accuracy, and a multi-class confusion matrix.

### Conclusion

The ability to predict and analyze customer churn can significantly help in devising better retention strategies and understanding customer behavior. This predictive modeling process is crucial for maintaining competitive advantage in the telecom industry.


In [1]:
data = {
    'Age': [25, 30, 35, 40, 45, 50, 55, 60, 20, 22, 28, 33, 38, 43, 48, 53, 58, 63, 23, 26, 31, 36, 41, 46, 51, 56, 61, 66, 21, 24, 29, 34, 39, 44, 49, 54, 59, 64, 27, 32, 37, 42, 47, 52, 57, 62, 67, 68, 69, 65],
    'MonthlyCharges': [29.99, 56.95, 42.30, 89.10, 115.50, 99.45, 85.65, 75.00, 49.95, 60.00, 70.05, 90.45, 101.10, 109.45, 58.75, 64.95, 80.65, 96.75, 102.55, 111.85, 50.30, 65.10, 79.90, 93.70, 107.50, 55.20, 69.40, 83.60, 97.80, 30.40, 71.55, 85.75, 100.95, 60.85, 75.05, 89.25, 103.45, 54.15, 68.35, 82.55, 31.75, 45.95, 77.15, 91.35, 105.55, 119.75, 34.35, 48.55, 62.75, 84.95],
    'ContractType': [1, 1, 2, 3, 3, 2, 1, 3, 1, 1, 2, 2, 3, 3, 1, 1, 2, 3, 3, 1, 2, 2, 3, 3, 1, 1, 2, 3, 3, 1, 2, 2, 3, 3, 1, 1, 2, 3, 3, 1, 2, 2, 3, 3, 1, 1, 2, 3, 3, 1],
    'Tenure': [2, 12, 24, 36, 48, 60, 72, 30, 4, 6, 18, 29, 41, 53, 10, 22, 34, 46, 58, 3, 15, 27, 39, 51, 63, 5, 17, 40, 52, 8, 20, 32, 44, 56, 7, 19, 31, 43, 9, 21, 33, 45, 57, 11, 23, 35, 47, 59, 14, 25],
    'Churn': [0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0]
}

In [2]:
import pandas as pd
df = pd.DataFrame(data)
df.head()

Unnamed: 0,Age,MonthlyCharges,ContractType,Tenure,Churn
0,25,29.99,1,2,0
1,30,56.95,1,12,0
2,35,42.3,2,24,0
3,40,89.1,3,36,0
4,45,115.5,3,48,1


In [3]:
# normalizing all the values in the dataframe
def normalize(df):
    result = df.copy()
    for features in df.columns:
        if features == 'Churn' or features == 'ContractType':
            continue
        max_value = df[features].max()
        min_value = df[features].min()
        result[features] = (df[features] - min_value)/(max_value - min_value)
    return result

df = normalize(df)
df.head()

Unnamed: 0,Age,MonthlyCharges,ContractType,Tenure,Churn
0,0.102041,0.0,1,0.0,0
1,0.204082,0.300357,1,0.142857,0
2,0.306122,0.137143,2,0.314286,0
3,0.408163,0.658534,3,0.485714,0
4,0.510204,0.952652,3,0.657143,1


In [4]:
X_train = df[['Age', 'MonthlyCharges', 'ContractType', 'Tenure']][0: 35]
y_train = df['Churn'][0: 35]
X_test = df[['Age', 'MonthlyCharges', 'ContractType', 'Tenure']][35: 50]
y_test = df['Churn'][35: 50]

In [5]:
# initializing the parameters randomly
import random
beta_0 = random.random()
beta_1 = random.random()
beta_2 = random.random()
beta_3 = random.random()
beta_4 = random.random()
print(f" beta 0: {beta_0} beta 1: {beta_1} beta 2: {beta_2} beta 3: {beta_3} beta 4: {beta_4}")

 beta 0: 0.9778149759647619 beta 1: 0.6429788331088573 beta 2: 0.48057910848577534 beta 3: 0.3232664705153617 beta 4: 0.027957316048658365


In [6]:
# implementation of gradient descent function for logistic regression to minimize the cost function
import math
def gradient_descent(X, y, alpha, iter, beta_0, beta_1, beta_2, beta_3, beta_4):
    m = len(y)
    cost_func_prev = math.inf
    for i in range(iter):
        cost_func = 0
        grad_0 = 0
        grad_1 = 0
        grad_2 = 0
        grad_3 = 0
        grad_4 = 0
        for j in range(m):
            z = beta_0 + beta_1 * X['Age'].iloc[j] + beta_2 * X['MonthlyCharges'].iloc[j] + beta_3 * X['ContractType'].iloc[j] + beta_4 * X['Tenure'].iloc[j]
            y_pred = 1/(1 + math.exp(-z))
            cost_func += y.iloc[j] * math.log(y_pred) + (1 - y.iloc[j]) * math.log(1 - y_pred)
            grad_0 += (y_pred - y.iloc[j])
            grad_1 += (y_pred - y.iloc[j])*X['Age'].iloc[j]
            grad_2 += (y_pred - y.iloc[j])*X['MonthlyCharges'].iloc[j]
            grad_3 += (y_pred - y.iloc[j])*X['ContractType'].iloc[j]
            grad_4 += (y_pred - y.iloc[j])*X['Tenure'].iloc[j]
        cost_func = -cost_func/m
        
        beta_0 = beta_0 - alpha * grad_0/m
        beta_1 = beta_1 - alpha * grad_1/m
        beta_2 = beta_2 - alpha * grad_2/m
        beta_3 = beta_3 - alpha * grad_3/m
        beta_4 = beta_4 - alpha * grad_4/m
        if cost_func_prev - cost_func < 0.01:
            break
        
    decision_boundary = - (beta_0/beta_4 + beta_1/beta_4 * X['Age'] + beta_2/beta_4 * X['MonthlyCharges'] + beta_3/beta_4 * X['ContractType'])
    return beta_0, beta_1, beta_2, beta_3, beta_4, decision_boundary
    

In [7]:
# determining the optimal parameters
beta_0, beta_1, beta_2, beta_3, beta_4, decision_boundary = gradient_descent(X_train, y_train, 0.01, 10000, beta_0, beta_1, beta_2, beta_3, beta_4)
print(f"Optimal parameters are: beta 0: {beta_0} beta 1: {beta_1} beta 2: {beta_2} beta 3: {beta_3} beta 4: {beta_4}")

Optimal parameters are: beta 0: -1.1385903520471954 beta 1: -1.4609631743502787 beta 2: 0.04235458788333859 beta 3: 0.3958846800292466 beta 4: 1.4886498288821421


In [8]:
y_predict = []
for i in range(len(y_test)):
    z = beta_0 + beta_1 * X_test['Age'].iloc[i] + beta_2 * X_test['MonthlyCharges'].iloc[i] + beta_3 * X_test['ContractType'].iloc[i] + beta_4 * X_test['Tenure'].iloc[i]
    y_pred = 1/(1 + math.exp(-z))
    if y_pred >= 0.5:
        y_predict.append(1)
    else:
        y_predict.append(0)

print(f"Predicted values: {y_predict}")

Predicted values: [0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]


In [9]:
# evaluating models performance using confusion matrix and accuracy
def confusion_matrix(y_test, y_predict):
    TP = 0
    TN = 0
    FP = 0
    FN = 0
    for i in range(len(y_test)):
        if y_test.iloc[i] == 1 and y_predict[i] == 1:
            TP += 1
        elif y_test.iloc[i] == 0 and y_predict[i] == 0:
            TN += 1
        elif y_test.iloc[i] == 0 and y_predict[i] == 1:
            FP += 1
        else:
            FN += 1
    return TP, TN, FP, FN

TP, TN, FP, FN = confusion_matrix(y_test, y_predict)
accuracy = (TP + TN)/(TP + TN + FP + FN)
print(f"Accuracy: {accuracy}")
print(f"Confusion matrix: TP: {TP} TN: {TN} FP: {FP} FN: {FN}")

Accuracy: 0.5333333333333333
Confusion matrix: TP: 1 TN: 7 FP: 1 FN: 6


In [10]:
# making the dataset for 3 classes
data2 = {
    'Age': [25, 30, 35, 40, 45, 50, 55, 60, 20, 22, 28, 33, 38, 43, 48, 53, 58, 63, 23, 26, 31, 36, 41, 46, 51, 56, 61, 66, 21, 24, 29, 34, 39, 44, 49, 54, 59, 64, 27, 32, 37, 42, 47, 52, 57, 62, 67, 68, 69, 65],
    'MonthlyCharges': [29.99, 56.95, 42.30, 89.10, 115.50, 99.45, 85.65, 75.00, 49.95, 60.00, 70.05, 90.45, 101.10, 109.45, 58.75, 64.95, 80.65, 96.75, 102.55, 111.85, 50.30, 65.10, 79.90, 93.70, 107.50, 55.20, 69.40, 83.60, 97.80, 30.40, 71.55, 85.75, 100.95, 60.85, 75.05, 89.25, 103.45, 54.15, 68.35, 82.55, 31.75, 45.95, 77.15, 91.35, 105.55, 119.75, 34.35, 48.55, 62.75, 84.95],
    'ContractType': [1, 1, 2, 3, 3, 2, 1, 3, 1, 1, 2, 2, 3, 3, 1, 1, 2, 3, 3, 1, 2, 2, 3, 3, 1, 1, 2, 3, 3, 1, 2, 2, 3, 3, 1, 1, 2, 3, 3, 1, 2, 2, 3, 3, 1, 1, 2, 3, 3, 1],
    'Tenure': [2, 12, 24, 36, 48, 60, 72, 30, 4, 6, 18, 29, 41, 53, 10, 22, 34, 46, 58, 3, 15, 27, 39, 51, 63, 5, 17, 40, 52, 8, 20, 32, 44, 56, 7, 19, 31, 43, 9, 21, 33, 45, 57, 11, 23, 35, 47, 59, 14, 25],
    'Churn': [0, 0, 2, 0, 1, 0, 1, 0, 2, 0, 2, 1, 0, 1, 2, 0, 1, 0, 2, 0, 2, 1, 2, 1, 0, 2, 1, 0, 1, 0, 1, 1, 2, 1, 2, 0, 2, 2, 1, 0, 2, 1, 2, 2, 0, 0, 2, 2, 1, 0]
}

In [11]:
df2 = pd.DataFrame(data2)
df2.head()

Unnamed: 0,Age,MonthlyCharges,ContractType,Tenure,Churn
0,25,29.99,1,2,0
1,30,56.95,1,12,0
2,35,42.3,2,24,2
3,40,89.1,3,36,0
4,45,115.5,3,48,1


In [12]:
df2 = normalize(df2)
df2.head()

Unnamed: 0,Age,MonthlyCharges,ContractType,Tenure,Churn
0,0.102041,0.0,1,0.0,0
1,0.204082,0.300357,1,0.142857,0
2,0.306122,0.137143,2,0.314286,2
3,0.408163,0.658534,3,0.485714,0
4,0.510204,0.952652,3,0.657143,1


In [13]:
X_train2 = df2[['Age', 'MonthlyCharges', 'ContractType', 'Tenure']][0: 35]
y_train2 = df2['Churn'][0: 35]
X_test2 = df2[['Age', 'MonthlyCharges', 'ContractType', 'Tenure']][35: 50]
y_test2 = df2['Churn'][35: 50]

In [14]:
def class_chanage(class0, class1, class2, y):
    y_new = []
    for i in range(len(y)):
        if y.iloc[i] == class0:
            y_new.append(1)
        else:
            y_new.append(0)
    return y_new
y_train2_0_vs_all = pd.Series(class_chanage(0, 1, 2, y_train2))
y_train2_1_vs_all = pd.Series(class_chanage(1, 0, 2, y_train2))
y_train2_2_vs_all = pd.Series(class_chanage(2, 0, 1, y_train2))

In [15]:
y_predict_new = []
beta_0_new_class0, beta_1_new_class0, beta_2_new_class0, beta_3_new_class0, beta_4_new_class0, decision_boundary_new_class0 = gradient_descent(X_train2, y_train2_0_vs_all, 0.01, 10000, beta_0, beta_1, beta_2, beta_3, beta_4)
beta_0_new_class1, beta_1_new_class1, beta_2_new_class1, beta_3_new_class1, beta_4_new_class1, decision_boundary_new_class1 = gradient_descent(X_train2, y_train2_1_vs_all, 0.01, 10000, beta_0, beta_1, beta_2, beta_3, beta_4)
beta_0_new_class2, beta_1_new_class2, beta_2_new_class2, beta_3_new_class2, beta_4_new_class2, decision_boundary_new_class2 = gradient_descent(X_train2, y_train2_2_vs_all, 0.01, 10000, beta_0, beta_1, beta_2, beta_3, beta_4)
for i in range(len(y_test2)):
    z_class0 = beta_0_new_class0 + beta_1_new_class0 * X_test2['Age'].iloc[i] + beta_2_new_class0 * X_test2['MonthlyCharges'].iloc[i] + beta_3_new_class0 * X_test2['ContractType'].iloc[i] + beta_4_new_class0 * X_test2['Tenure'].iloc[i]
    z_class1 = beta_0_new_class1 + beta_1_new_class1 * X_test2['Age'].iloc[i] + beta_2_new_class1 * X_test2['MonthlyCharges'].iloc[i] + beta_3_new_class1 * X_test2['ContractType'].iloc[i] + beta_4_new_class1 * X_test2['Tenure'].iloc[i]
    z_class2 = beta_0_new_class2 + beta_1_new_class2 * X_test2['Age'].iloc[i] + beta_2_new_class2 * X_test2['MonthlyCharges'].iloc[i] + beta_3_new_class2 * X_test2['ContractType'].iloc[i] + beta_4_new_class2 * X_test2['Tenure'].iloc[i]
    y_pred_class0 = 1/(1 + math.exp(-z_class0))
    y_pred_class1 = 1/(1 + math.exp(-z_class1))
    y_pred_class2 = 1/(1 + math.exp(-z_class2))
    if max(y_pred_class0, y_pred_class1, y_pred_class2) == y_pred_class0:
        y_predict_new.append(0)
    elif max(y_pred_class0, y_pred_class1, y_pred_class2) == y_pred_class1:
        y_predict_new.append(1)
    else:
        y_predict_new.append(2)
print(f"Predicted values: {y_predict_new}")


Predicted values: [0, 0, 1, 2, 0, 2, 1, 1, 0, 0, 0, 0, 1, 0, 0]


In [16]:
# evaluating models performance using confusion matrix and accuracy
def confusion_matrix(y_test, y_predict, label):
    TP = 0
    TN = 0
    FP = 0
    FN = 0
    if label == 0:
        for i in range(len(y_test)):
            if y_test2.iloc[i] == 0 and y_predict_new[i] == 0:
                TP += 1
            elif (y_test2.iloc[i] == 1 or y_test2.iloc[i] == 2) and (y_predict_new[i] == 1 or y_predict_new[i] == 2):
                TN += 1
            elif (y_test2.iloc[i] == 1 or y_test2.iloc[i] == 2)  and (y_predict[i] == 0):
                FP += 1
            else:
                FN += 1
    elif label == 1:
        for i in range(len(y_test)):
            if y_test2.iloc[i] == 1 and y_predict_new[i] == 1:
                TP += 1
            elif (y_test2.iloc[i] == 0 or y_test2.iloc[i] == 2) and (y_predict_new[i] == 0 or y_predict_new[i] == 2):
                TN += 1
            elif (y_test2.iloc[i] == 0 or y_test2.iloc[i] == 2)  and (y_predict[i] == 1):
                FP += 1
            else:
                FN += 1
    else:
        for i in range(len(y_test)):
            if y_test2.iloc[i] == 2 and y_predict_new[i] == 2:
                TP += 1
            elif (y_test2.iloc[i] == 0 or y_test2.iloc[i] == 1) and (y_predict_new[i] == 0 or y_predict_new[i] == 1):
                TN += 1
            elif (y_test2.iloc[i] == 0 or y_test2.iloc[i] == 1)  and (y_predict[i] == 2):
                FP += 1
            else:
                FN += 1            
    return TP, TN, FP, FN

TP_class0, TN_class0, FP_class0, FN_class0 = confusion_matrix(y_test2, y_predict_new, 0)
accuracy_class0 = (TP_class0 + TN_class0)/(TP_class0 + TN_class0 + FP_class0 + FN_class0)
print(f"Accuracy class 0: {accuracy_class0}")
print(f"Confusion matrix class 0: TP: {TP_class0} TN: {TN_class0} FP: {FP_class0} FN: {FN_class0}")
TP_class1, TN_class1, FP_class1, FN_class1 = confusion_matrix(y_test2, y_predict_new, 1)
accuracy_class1 = (TP_class1 + TN_class1)/(TP_class1 + TN_class1 + FP_class1 + FN_class1)
print(f"Accuracy class 1: {accuracy_class1}")
print(f"Confusion matrix class 1: TP: {TP_class1} TN: {TN_class1} FP: {FP_class1} FN: {FN_class1}")
TP_class2, TN_class2, FP_class2, FN_class2 = confusion_matrix(y_test2, y_predict_new, 2)
accuracy_class2 = (TP_class2 + TN_class2)/(TP_class2 + TN_class2 + FP_class2 + FN_class2)
print(f"Accuracy class 2: {accuracy_class2}")
print(f"Confusion matrix class 2: TP: {TP_class2} TN: {TN_class2} FP: {FP_class2} FN: {FN_class2}")
        


Accuracy class 0: 0.7333333333333333
Confusion matrix class 0: TP: 5 TN: 6 FP: 4 FN: 0
Accuracy class 1: 0.6666666666666666
Confusion matrix class 1: TP: 1 TN: 9 FP: 3 FN: 2
Accuracy class 2: 0.5333333333333333
Confusion matrix class 2: TP: 1 TN: 7 FP: 1 FN: 6
