# Building a Decision Tree to Predict Customer Churn

<p>
Imagine you are a data analyst at a telecom company. The marketing department has noticed an increase in customer churn and needs your help to identify which customers are most likely to leave next month.
</p>

<p><b>My Tasks</b> <br>
In this exercise, you will build a decision tree model to predict customer churn for a telecom company. Customer churn refers to when a customer stops doing business with a company. Predicting churn is crucial for businesses to retain customers by addressing their issues proactively.</p>

## PROJECT STEP
<p>
Dataset Description <br>
We will use a synthetic dataset for this exercise. The dataset contains the following columns:</p>

- CustomerID: A unique identifier for each customer.
- Age: The age of the customer.
- MonthlyCharge: The monthly bill amount for the customer.
- CustomerServiceCalls: The number of times the customer contacted customer service.
- Churn: This is our target variable, indicating whether the customer churned (Yes) or not (No).

### Step-by-Step Instructions
<p>
Setup the Environment: <br>
Import necessary libraries: Pandas for data manipulation, Scikit-learn for machine learning, and Matplotlib for visualization. <br>
Create the Dataset:<br>
Use Python to create a synthetic dataset. We'll make a small dataset for simplicity. <br>
Data Preparation:<br>
Split the data into features (X) and the target variable (y).<br>
Further split the dataset into training and testing sets.<br>
Build the Decision Tree Model:<br>
Use Scikit-learn to create a DecisionTreeClassifier.<br>
Train the model on the training data.<br>
Evaluate the Model:<br>
Make predictions on the test set.<br>
Calculate the accuracy of the model.<br>
Visualize the Decision Tree:<br>
Use Matplotlib to visualize how the decision tree makes decisions.<br>
Discuss the Results:<br>
Interpret the decision tree.<br>
Discuss how it can be used by the company to reduce customer churn.</p>

In [2]:
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import warnings
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import tree
warnings.filterwarnings('ignore')

In [3]:
# Creating a synthetic dataset
# This dataset simulates customer data for a telecom company
data = {
    'CustomerID': range(1, 101),
    'Age': [25, 34, 45, 23, 36, 50, 29, 40, 31, 28] * 10,
    'MonthlyCharge': [50, 60, 70, 80, 90, 100, 110, 120, 130, 140]*10,
    'CustomerServiceCalls': [1, 2, 3, 4, 0, 1, 2, 3, 4, 0]*10,  # Number of customer service calls
    'Churn': ['No', 'No', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'No', 'Yes']*10  # Churn status
}

In [4]:
df = pd.DataFrame(data)
df.head()

Unnamed: 0,CustomerID,Age,MonthlyCharge,CustomerServiceCalls,Churn
0,1,25,50,1,No
1,2,34,60,2,No
2,3,45,70,3,Yes
3,4,23,80,4,No
4,5,36,90,0,Yes


In [None]:
# Splitting the dataset into features and target variable
# Features include age, monthly charge, and customer service calls
# The target variable is churn (Yes or No)
X = df.drop('Churn', axis=1)
y = df['Churn']

In [6]:
# Splitting the dataset into training and testing sets
# 70% of the data is used for training and 30% for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [None]:
# Training the Decision Tree model
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)