<a href="https://colab.research.google.com/github/Subhajit53/Teleom-Churn-Analysis/blob/main/Telecom_Churn_Analysis_Capstone_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## <b> Orange S.A., formerly France Télécom S.A., is a French multinational telecommunications corporation. The Orange Telecom's Churn Dataset, consists of cleaned customer activity data (features), along with a churn label specifying whether a customer canceled the subscription.

## <b> Explore and analyze the data to discover key factors responsible for customer churn and come up with ways/recommendations to ensure customer retention. </b>

#<b>Introduction
#####"We see our customers as invited guests to a party, and we are the hosts. It's our job everyday to make every important aspect of the customer experience a little better."
##### The above quote, given by Jeff Bezos, founder and CEO of Amazon, tells a lot about customer retention in this competitive world. If you don't serve your customers well, there are many hungry companies to take away that bite from your mouth.
##### That's why, customer retention has been one of the chief goals of any customer serving company. Unsatisfied customers not only make loss to the business, but also creates a negative impression in the market.
##### And in this telecom dependent era, there are a lot of emerging companies to give a tough competition to any other existing telecom industry whether it is in pricing, data speed, call connectivity, international charges and all.
##### There might be several factors affecting customer churn for a specific company. Naturally, Orange S.A. also wants to find out what reasons are causing its customers to leave their services. And by finding that out, it can take precautious measures to retain its customers.

#<b> The Talking Data </b>
##### Understanding the data is the first step towards any analysis. Without having an idea of what we have in our plates, we can't proceed a step. The data in question here have 20 columns and 3333 rows. Although no data dictionary was provided with the dataset, let's try to figure out what features our dataset have.
1. <b> State: </b> State in which the customer lives in.
2. <b> Account Length: </b> For how many days the customer is using the service.
3. <b> Area Code: </b> An identifier to the area the customer lives in.
4. <b> International plan: </b> A binary identifier to whether the customer has opt for an international plan.
5. <b> Voice mail plan: </b> A binary identifier to whether the customer has opt for a voice mail plan.
6. <b> Number vmail messages: </b> Number of voicemail messages sent or received.
7. <b> Total day minutes: </b> How much the customer have talked over phone in the daytime.
8. <b> Total day calls: </b> How many calls the customer have made over phone in the daytime.
9. <b> Total day charge: </b> How much money was charged to the customer in the daytime.
10. <b> Total eve minutes: </b> How much the customer have talked over phone in the evening.
11. <b> Total eve calls: </b> How many calls the customer have made over phone in the evening.
12. <b> Total eve charge: </b> How much money was charged to the customer in the evening.
13. <b> Total night minutes: </b> How much the customer have talked over phone in the night.
14. <b> Total night calls: </b> How many calls the customer have made over phone in the night
15. <b> Total night charge: </b> How much money was charged to the customer in the night.
16. <b> Total intl minutes: </b> How much the customer have talked over phone internationally.
17. <b> Total intl calls: </b> How many calls the customer have made over phone internationally.
18. <b> Total intl charge: </b> How much money was charged to the customer for international calls.
19. <b> Customer service calls: </b> How many service calls were made to the customer.
20. <b> Churn: </b> A binary identifier to whether the customer has churned or not.

# <b> Approach </b>
Before proceeding to the main analysis part, let us discuss how we are going to approach the given problem.

#### <b> 1. Data Cleaning: </b>
##### The first and most crucial task for analysing a data is to clean it first. A messy data can give messy outputs. As we are not currently feeding the data to any ML model, we shall only check on null values and unrealistic values. We shall make a conclusion about outliers while doing the analysis.
#### <b> 2. Univariate Analysis: </b>
##### We shall make some plots and try to conclude about each possible variable individually.
#### <b> 3. Bivariate Analysis: </b>
##### We shall make some plots and try to conclude about pairs of variables. We can also assess relations between the Churn variable and other variables.
#### <b> 4. Multivariate Analysis: </b>
##### We shall make a correlation heatmap and try to make conclusions about magnitude of relationship between Churn and other variables.

In [2]:
# Importing essential libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### <b> Reading the dataset and exploring it

In [1]:
# Mount the drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
# Read the dataset
telecom_df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Telecom Churn Analysis Notebook/Telecom Churn.csv')

In [4]:
# Let's see how the data looks like
telecom_df.head()

Unnamed: 0,State,Account length,Area code,International plan,Voice mail plan,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,Total eve calls,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls,Churn
0,KS,128,415,No,Yes,25,265.1,110,45.07,197.4,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False
1,OH,107,415,No,Yes,26,161.6,123,27.47,195.5,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
2,NJ,137,415,No,No,0,243.4,114,41.38,121.2,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False
3,OH,84,408,Yes,No,0,299.4,71,50.9,61.9,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
4,OK,75,415,Yes,No,0,166.7,113,28.34,148.3,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False


In [5]:
telecom_df.tail()

Unnamed: 0,State,Account length,Area code,International plan,Voice mail plan,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,Total eve calls,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls,Churn
3328,AZ,192,415,No,Yes,36,156.2,77,26.55,215.5,126,18.32,279.1,83,12.56,9.9,6,2.67,2,False
3329,WV,68,415,No,No,0,231.1,57,39.29,153.4,55,13.04,191.3,123,8.61,9.6,4,2.59,3,False
3330,RI,28,510,No,No,0,180.8,109,30.74,288.8,58,24.55,191.9,91,8.64,14.1,6,3.81,2,False
3331,CT,184,510,Yes,No,0,213.8,105,36.35,159.6,84,13.57,139.2,137,6.26,5.0,10,1.35,2,False
3332,TN,74,415,No,Yes,25,234.4,113,39.85,265.9,82,22.6,241.4,77,10.86,13.7,4,3.7,0,False


In [6]:
telecom_df.shape

(3333, 20)

In [7]:
telecom_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3333 entries, 0 to 3332
Data columns (total 20 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   State                   3333 non-null   object 
 1   Account length          3333 non-null   int64  
 2   Area code               3333 non-null   int64  
 3   International plan      3333 non-null   object 
 4   Voice mail plan         3333 non-null   object 
 5   Number vmail messages   3333 non-null   int64  
 6   Total day minutes       3333 non-null   float64
 7   Total day calls         3333 non-null   int64  
 8   Total day charge        3333 non-null   float64
 9   Total eve minutes       3333 non-null   float64
 10  Total eve calls         3333 non-null   int64  
 11  Total eve charge        3333 non-null   float64
 12  Total night minutes     3333 non-null   float64
 13  Total night calls       3333 non-null   int64  
 14  Total night charge      3333 non-null   

In [8]:
telecom_df.describe()

Unnamed: 0,Account length,Area code,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,Total eve calls,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls
count,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0
mean,101.064806,437.182418,8.09901,179.775098,100.435644,30.562307,200.980348,100.114311,17.08354,200.872037,100.107711,9.039325,10.237294,4.479448,2.764581,1.562856
std,39.822106,42.37129,13.688365,54.467389,20.069084,9.259435,50.713844,19.922625,4.310668,50.573847,19.568609,2.275873,2.79184,2.461214,0.753773,1.315491
min,1.0,408.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,23.2,33.0,1.04,0.0,0.0,0.0,0.0
25%,74.0,408.0,0.0,143.7,87.0,24.43,166.6,87.0,14.16,167.0,87.0,7.52,8.5,3.0,2.3,1.0
50%,101.0,415.0,0.0,179.4,101.0,30.5,201.4,100.0,17.12,201.2,100.0,9.05,10.3,4.0,2.78,1.0
75%,127.0,510.0,20.0,216.4,114.0,36.79,235.3,114.0,20.0,235.3,113.0,10.59,12.1,6.0,3.27,2.0
max,243.0,510.0,51.0,350.8,165.0,59.64,363.7,170.0,30.91,395.0,175.0,17.77,20.0,20.0,5.4,9.0


### <b>1. Data Cleaning: </b>
##### While exploring the data, we saw that our dataframe has 3333 observations and all the columns have 3333 non-null values. Hence we are free of any headache to deal with the demonic nulls!
##### Now again, with the describe() method, we saw that the numerical columns have no unrealistic values!
##### It seems that the data is a very good boy and showed some mercy to us by lessening our work! Now it's time to bind the data in chairs and beat it until it spits out some information to us.
##### Sorry data! Being a good boy doesn't always help!