<a href="https://colab.research.google.com/github/Superbom99/MADT8101-SEMINAR-IN-ADVANCED-ANALYTICS/blob/main/Churn_Scoring.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Churn Scoring

##**Introduction**

Customer churn is a major problem for businesses of all sizes. It can lead to lost revenue, decreased customer satisfaction, and increased marketing costs. By predicting which customers are most likely to churn, businesses can take steps to prevent them from leaving.

Churn scoring is a machine learning technique that can be used to predict customer churn. It involves creating a model that assigns a score to each customer, based on their historical data and other factors. The higher the score, the more likely the customer is to churn.

In this blog post, we will walk through the steps involved in churn scoring using Python. We will use a public dataset of customer churn data from a telecommunications company.

###**Step 1: Import the libraries**

The first step is to import the libraries that we will need. These include:

pandas: for data manipulation
numpy: for mathematical operations
scikit-learn: for machine learning

In [None]:
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier

###**Step 2: Load the dataset**

The next step is to load the dataset. We can do this using the `read_csv()` function from the pandas library.

In [None]:
dataset = pd.read_csv('churn_data.csv')

###**Step 3: Explore the dataset**

Before we start building our model, it is important to explore the dataset. This will help us to understand the data and identify any potential problems.

We can explore the dataset using the following commands:

* `head()`: to display the first few rows of the dataset
* `describe()`: to get a summary of the statistical distribution of the data
* `info()`: to get information about the data types and missing values

In [None]:
dataset.head()

dataset.describe()

dataset.info()

### **Step 4: Select the features**

The next step is to select the features that we will use to build our model. We need to select features that are predictive of customer churn.

We can select the features using the following steps:

1. Identify the features that are related to customer churn.
2. Remove features that are not relevant or that are too noisy.
3. Normalize the features so that they have a similar scale.

In [None]:
# Identify the features that are related to customer churn.
churn_related_features = [
    'tenure',
    'monthly_charges',
    'contract_type',
    'payment_method',
    'number_of_calls',
    'number_of_texts',
    'number_of_data_usage'
]

# Remove features that are not relevant or that are too noisy.
not_relevant_features = ['customer_id', 'gender']
noisy_features = ['customer_name']

# Normalize the features.
for feature in churn_related_features:
    dataset[feature] = (dataset[feature] - dataset[feature].mean()) / dataset[feature].std()


###**Step 5: Build the model**

Now that we have selected the features, we can build the model. We will use a random forest classifier for this task.

A random forest classifier is an ensemble learning algorithm that builds multiple decision trees and then combines their predictions to make a final decision.

In [None]:
# Create a random forest classifier.
model = RandomForestClassifier(n_estimators=100)

# Fit the model to the training data.
model.fit(X_train, y_train)

###**Step 6: Evaluate the model**

Once the model is built, we need to evaluate its performance. We can do this using the following metrics:

* Accuracy: the percentage of predictions that are correct
* Precision: the percentage of positive predictions that are actually positive
* Recall: the percentage of actual positives that are predicted as positive

In [None]:
# Evaluate the model on the test data.
y_pred = model.predict(X_test)

accuracy = np.mean(y_pred == y_test)
precision = np.mean(y_pred[y_test == 1])
recall = np.mean(y_test[y_pred == 1])