<a href="https://colab.research.google.com/github/ashfaquesayyed/telecom-churn-analysis/blob/main/telecom_churn_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


#Orange S.A., formerly France Télécom S.A., is a French multinational telecommunications corporation. The Orange Telecom's Churn Dataset, consists of cleaned customer activity data (features), along with a churn label specifying whether a customer canceled the subscription.
#Explore and analyze the data to discover key factors responsible for customer churn and come up with ways/recommendations to ensure customer retention.

#Telecom Churn Analysis
**Submission by**
Ashfaque Sayyed (Cohort - Geneva)

##What is Customer Churn ?

##Customer attrition, also known as customer churn, customer turnover, or customer defection, is the loss of clients or customers. Telephone service companies, Internet service providers, pay TV companies, insurance firms, often use customer attrition analysis and customer attrition rates as one of their key business metrics because the cost of retaining an existing customer is far less than acquiring a new one. Companies from these sectors often have customer service branches which attempt to win back defecting clients, because recovered long-term customers can be worth much more to a company than newly recruited clients

In [1]:
#importing libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [2]:
#mounting google drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
#providing data file directory and loading the file as a dataframe
df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/churn analysis/Telecom Churn.csv')

##Data Set Information

In order to understand our data, we can look at each columns and try to understand their meaning and relevance to this problem.

State: 51 states of USA(Categorical)

Account Length: Length of Account(Numerical)

Area Code: 3 area codes of 408, 415 and 510(Numerical)

International Plan: Yes Indicate International Plan is Present and No Indicates no subscription for Internatinal Plan(Categorical)

Voice Mail Plan: Yes Indicates Voice Mail Plan is Present and No Indicates no subscription for Voice Mail Plan(Categorical)

Number vmail messages: Number of Voice Mail Messages(Numerical)

Total day minutes: Total Number of Minutes Spent By Customers in Morning(Numerical)

Total day calls: Total Number of Calls made by Customer in Morning.(Numerical)

Total day charge: Total Charge to the Customers in Morning.(Numerical)

Total eve minutes:Total Number of Minutes Spent By Customers in Evening(Numerical)

Total eve calls: Total Number of Calls made by Customer in Evening.(Numerical)

Total eve charge: Total Charge to the Customers in Morning.(Numerical)

Total night minutes: Total Number of Minutes Spent By Customers in the Night.(Numerical)

Total night calls: Total Number of Calls made by Customer in Night.(Numerical)

Total night charge: Total Charge to the Customers in Night.(Numerical)

Total International minutes: Total Number of Minutes Spent by Customers for International calls.(Numerical)

Total International calls: Total Number of Internatinal Call made by Customer.(Numerical)

Total International charge: Total Charge to the Customer for International Calls.(Numerical)

Customer Service Calls: Calls by Customer to Customer Services for Solution of Problem faced in Network.(Numerical)

Churn: Customers who left the network Operator are given with TRUE and FALSE.(Categorical)

In [5]:
#data
df.head()

Unnamed: 0,State,Account length,Area code,International plan,Voice mail plan,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,Total eve calls,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls,Churn
0,KS,128,415,No,Yes,25,265.1,110,45.07,197.4,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False
1,OH,107,415,No,Yes,26,161.6,123,27.47,195.5,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
2,NJ,137,415,No,No,0,243.4,114,41.38,121.2,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False
3,OH,84,408,Yes,No,0,299.4,71,50.9,61.9,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
4,OK,75,415,Yes,No,0,166.7,113,28.34,148.3,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False


In [6]:
#Checking all the unique values
df.nunique()

State                       51
Account length             212
Area code                    3
International plan           2
Voice mail plan              2
Number vmail messages       46
Total day minutes         1667
Total day calls            119
Total day charge          1667
Total eve minutes         1611
Total eve calls            123
Total eve charge          1440
Total night minutes       1591
Total night calls          120
Total night charge         933
Total intl minutes         162
Total intl calls            21
Total intl charge          162
Customer service calls      10
Churn                        2
dtype: int64

In [7]:
#duplicate values
df[df.duplicated()].count()

State                     0
Account length            0
Area code                 0
International plan        0
Voice mail plan           0
Number vmail messages     0
Total day minutes         0
Total day calls           0
Total day charge          0
Total eve minutes         0
Total eve calls           0
Total eve charge          0
Total night minutes       0
Total night calls         0
Total night charge        0
Total intl minutes        0
Total intl calls          0
Total intl charge         0
Customer service calls    0
Churn                     0
dtype: int64

In [8]:
df.info()
# There is no null values in this data

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3333 entries, 0 to 3332
Data columns (total 20 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   State                   3333 non-null   object 
 1   Account length          3333 non-null   int64  
 2   Area code               3333 non-null   int64  
 3   International plan      3333 non-null   object 
 4   Voice mail plan         3333 non-null   object 
 5   Number vmail messages   3333 non-null   int64  
 6   Total day minutes       3333 non-null   float64
 7   Total day calls         3333 non-null   int64  
 8   Total day charge        3333 non-null   float64
 9   Total eve minutes       3333 non-null   float64
 10  Total eve calls         3333 non-null   int64  
 11  Total eve charge        3333 non-null   float64
 12  Total night minutes     3333 non-null   float64
 13  Total night calls       3333 non-null   int64  
 14  Total night charge      3333 non-null   

In [9]:
df.describe(include='all').T

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
State,3333.0,51.0,WV,106.0,,,,,,,
Account length,3333.0,,,,101.064806,39.822106,1.0,74.0,101.0,127.0,243.0
Area code,3333.0,,,,437.182418,42.37129,408.0,408.0,415.0,510.0,510.0
International plan,3333.0,2.0,No,3010.0,,,,,,,
Voice mail plan,3333.0,2.0,No,2411.0,,,,,,,
Number vmail messages,3333.0,,,,8.09901,13.688365,0.0,0.0,0.0,20.0,51.0
Total day minutes,3333.0,,,,179.775098,54.467389,0.0,143.7,179.4,216.4,350.8
Total day calls,3333.0,,,,100.435644,20.069084,0.0,87.0,101.0,114.0,165.0
Total day charge,3333.0,,,,30.562307,9.259435,0.0,24.43,30.5,36.79,59.64
Total eve minutes,3333.0,,,,200.980348,50.713844,0.0,166.6,201.4,235.3,363.7


In [10]:
#All the columns in data
df.columns

Index(['State', 'Account length', 'Area code', 'International plan',
       'Voice mail plan', 'Number vmail messages', 'Total day minutes',
       'Total day calls', 'Total day charge', 'Total eve minutes',
       'Total eve calls', 'Total eve charge', 'Total night minutes',
       'Total night calls', 'Total night charge', 'Total intl minutes',
       'Total intl calls', 'Total intl charge', 'Customer service calls',
       'Churn'],
      dtype='object')