<a href="https://colab.research.google.com/github/abhishek0478/EDA-Telecom-churn-analysis/blob/main/Telecom_Data_EDA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual
##### **Team Member 1 -** Abhishek

# **Project Summary -**

Telecom companies often struggle with customer churn, which refers to the number of customers who leave the company over a given period. In this project, we aimed to analyze the churn rate of a telecom company and identify the factors that contribute to customer churn.




# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


Orange S.A., formerly France Télécom S.A., is a French multinational telecommunications corporation. The Orange Telecom's Churn Dataset, consists of cleaned customer activity data (features), along with a churn label specifying whether a customer cancelled the subscription. Explore and analyze the data to discover key factors responsible for customer churn and come up with ways/recommendations to ensure customer retention.



#### **Define Your Business Objective?**

*   Identifying the key cause of the customer churn
*   Provide steps to retain the valuable customer

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset


In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
path= '/content/drive/MyDrive/Telecom Churn.csv'

In [None]:
df = pd.read_csv(path)
print(df)

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
print('Total Number of rows in Play Store Data.csv :-',len(df))
print('Total Number of Columns in Play Store Data.csv :-',len(df.columns))

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df[df.duplicated()]

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
sns.heatmap(df.isnull())

### What did you know about your dataset?

The given dataset is from telecom industry. This dataset consists of 3333 rows and 20 columns of customer activity data as Area code, plan details, call details along with churn labels specifying whether the customer cancelled the subscription. There is no missing and duplicate values in the dataset.



Answer Here

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description



State :All 51 states

Account Length : How long account has been active

Area Code : Code Number of Area

lntl Plan : International plan activated ( yes, no )

VMail Plan : Voice Mail plan activated ( yes ,no )

VMail Message : No.of voice mail messages

Day Mins : Total day minutes used

Day calls : Total day calls made

Day Charge : Total day charge

Eve Mins : Total evening minutes

Eve Calls : Total evening calls

Eve Charge : Total evening charge

Night Mins : Total night minutes

Night Calls : Total night calls

Night Charge : Total night charge

Intl Mins : Total International minutes used

Intl Calls : Total International calls made

Intl Charge : Total International charge

CustServ calls : Number of customer service calls made

Churn : Customer churn (Target Variable True=1, False=0)


Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.loc[:].nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
print("No. of customers Churning :", df [df[ 'Churn'] ==True].Churn.count()) # Total No of Customers Churning
# Variable Assigning to churned data
churn_df = df[df ['Churn'] ==True]

# Total No of Unique Area Code
print("No of Unique Area Code : ", df['Area code'].nunique())
# Customers On which Internaltion plan is activated
print("No of customer with international plan activated :", df[df['International plan'] =='Yes']['International plan'].count())
# Customers On which Voice mail plan is activated
print("No of customer with Voice mail plan activated :", df [df [ 'Voice mail plan' ] == 'Yes'][ 'Voice mail plan' ].count())
churn_df

In [None]:
# % of total customers churning
print("Total no. of customers :", df. Churn.count())
perc_churn = (churn_df.Churn.count()/df.Churn.count ( ) ) *100
print (f"Percentage of customer churning : {round (perc_churn, 2)}%")



In [None]:
#Statewise customer churning
state_cust_churn = churn_df.groupby(['State'])['Churn'].value_counts().reset_index(name='Churn_customer')
print (state_cust_churn.sum())
state_cust_churn

In [None]:
# Account length wise churn data
acc_len_churn = churn_df.groupby(['Account length'])['Churn'].value_counts().reset_index (name='values')
print (acc_len_churn.sum())
acc_len_churn

In [None]:
# Area Code wise churn Percentage
Area_code_churn_perc = (df.groupby(['Area code'])[ 'Churn'].mean()*100).reset_index()
Area_code_churn_perc



In [None]:
# Area code wise Churn Count
Area_code_churn_count = churn_df.groupby(['Area code'])['Churn'].value_counts().reset_index(name='Counts')
Area_code_churn_count



In [None]:
#Poor Connectivity by intersection of state by area code
Area_state_churn = churn_df.groupby(['Area code'])['State'].unique().reset_index (name='Unique state')
Area_state_churn

In [None]:
# Assigning the list of state of each area
Area_408_state_churn = Area_state_churn.loc [0, 'Unique state']
Area_415_state_churn = Area_state_churn.loc [1, 'Unique state']
Area_510_state_churn = Area_state_churn. loc [2, 'Unique state']
inter_1= set (Area_408_state_churn). intersection (set (Area_415_state_churn))
poor_connectivity_states = set(inter_1).intersection(set(Area_510_state_churn))
print (f"Intersection by different areas code of churning gives {len (poor_connectivity_states)} poor connectivity states")
print("Try to short down the list of state by different dataset to generate more precise poor connectivity states")



In [None]:
# Percentage of Customers with and without international Plan churning to other networks
intl_plan_churn = (df.groupby(['International plan'])['Churn'].mean()*100).reset_index (name='Churn %')
print (intl_plan_churn)
# Total number of Customers churning with international plan
churn_intl_yes = churn_df[churn_df['International plan'] =='Yes']
print (f"No of Customers churning having International plan is {churn_intl_yes ['International plan'].value_counts()[0]}")
# Total number of Customers churning without international plan
churn_intl_no = churn_df[churn_df['International plan'] == 'No' ]
print (f"No of Customers churning without having International plan is {churn_intl_no['International plan'].value_counts()[0]}")

In [None]:
# Percentage of People with and without Voice mail Plan churning to other networks
vmail_plan_churn = (df.groupby(['Voice mail plan'])['Churn'].mean()*100).reset_index(name='Churn %')
print (vmail_plan_churn)
# Total number of people churning with vmail plan
churn_vmail_yes = churn_df[churn_df['Voice mail plan'] =='Yes']
print (f"No of Customers churning having vmail plan is {churn_vmail_yes [ 'Voice mail plan'].value_counts() [0]}")
# Total number of people churning without vmail plan
churn_vmail_no = churn_df[churn_df[ 'Voice mail plan'] == 'No']
print (f"No of Customers churning not having vmail plan is {churn_vmail_no['Voice mail plan'].value_counts()[0]}")

In [None]:
df.columns

In [None]:
# Poor Connectivity States
# States Sorted with respect to international and voice mail plan
state_intl_vmail_yes=intl_vmail_yes['State'].unique()
state_intl_vmail_no =intl_vmail_no['State'].unique()
state_intl_yes_vmail_no= intl_yes_vmail_no['State'].unique()
state_intl_no_vmail_yes= intl_no_vmail_yes['State'].unique()

# Intersection of 4 types of plan made
inter_1 = set(state_intl_vmail_yes).intersection(set(state_intl_vmail_no))
inter_2 = set(state_intl_yes_vmail_no).intersection(set(state_intl_no_vmail_yes))
Intersection =set(inter_1).intersection(set(inter_2))
print(f"List of poor connectivity states are : {list(Intersection)}")

In [None]:
# Percentage of People with number of Voice mail messages churning/not-churning to other networks
vmail_mssg_churn = df.groupby(['Churn'])['Number vmail messages'].mean().reset_index(name='perc_vmail_mssg')
vmail_mssg_churn


In [None]:
# Day Data
# Percentage of people churning with the average day mins,calls and charge.
df.groupby(['Churn'])['Total day minutes'].mean().reset_index(name='perc_day_mins')


In [None]:
# Evening Data
# Percentage of people churning with the average Evening mins,calls and charge.
df.groupby(['Churn'])['Total eve minutes'].mean().reset_index(name='perc_eve_mins')


In [None]:
df.groupby(['Churn'])['Total eve calls'].mean().reset_index(name='perc_eve_calls')

In [None]:
df.groupby(['Churn'])['Total eve charge'].mean().reset_index(name='perc_eve_charge')

In [None]:
# Night Data
# Percentage of people churning with the average Night mins,calls and charge.
df.groupby(['Churn'])['Total night minutes'].mean().reset_index(name='perc_night_mins')


In [None]:
df.groupby(['Churn'])['Total night calls'].mean().reset_index(name='perc_night_calls')


In [None]:
df.groupby(['Churn'])['Total night charge'].mean().reset_index(name='perc_night_charge')



In [None]:
df.groupby(['Churn'])['Total intl minutes'].mean().reset_index(name='perc_intl_mins')


In [None]:
df.groupby(['Churn'])['Total intl calls'].mean().reset_index(name='perc_intl_calls')


In [None]:
df.groupby(['Churn'])['Total intl charge'].mean().reset_index(name='perc_intl_charge')

In [None]:
# Combining of day,evening and night calls, mins and charges
df['Total calls'] = df.loc[:,['Total day calls','Total eve calls', 'Total night calls']].sum(axis=1)
df['Total mins'] = df.loc[:,['Total day minutes','Total eve minutes', 'Total night minutes']].sum(axis=1)
df['Total charge'] = df.loc[:,['Total day charge','Total eve charge', 'Total night charge']].sum(axis=1)

# mins per call
df['min_per_call'] = df['Total mins']/df['Total calls']

#charge per min
df['charge_per_min'] = df['Total charge']/df['Total mins']

# International mins per call
df['Intl_min_per_call'] = df['Total intl minutes']/df['Total intl calls']

#International charge per min
df['Intl_charge_per_min'] = df['Total intl charge']/df['Total intl minutes']



In [None]:
# Customer Service Call data
print("No of unique service calls made :", df['Customer service calls'].nunique())

# Percentage of churning based on the customer service calls made
(df.groupby(['Customer service calls'])['Churn'].mean()*100).reset_index(name='Perc_churned')


In [None]:
print(df.dtypes)


In [None]:
df.info()

In [None]:
df.sample(50)

In [None]:
df['Total call minutes'] = df['Total day minutes'] + df['Total eve minutes']+ df['Total night minutes']


In [None]:
# prompt: create a new column with total day calsl and total eve calls and total night calls as Total calls

df['Total calls'] = df['Total day calls'] + df['Total eve calls'] + df['Total night calls']


In [None]:
# prompt: create a new column with total day charge and total eve charge and total night charge as Total charge

df['Total charge'] = df['Total day charge'] + df['Total eve charge'] + df['Total night charge']


In [None]:
df

### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

In [None]:
# Chart - 1: Dependent Column Value Counts
print(df['Churn'].value_counts(), "\n")

# Dependent Variable Column Visualization
df['Churn'].value_counts().plot.pie(figsize=(10, 6), autopct="%1.1f%%", startangle=50, shadow=True, labels=['Not Churn(%)', 'Churn(%)'], colors=['yellow', 'blue'], explode=[0.12, 0])
plt.title('Total Percentage of Churn')
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code

In [None]:
# Chart - 2 Visualizing code of histogram plot & boxplot for each columns to know the data distribution
for col in df.describe().columns:
    fig,axes = plt.subplots(nrows=1,ncols=2,figsize=(18,6))
    sns.histplot(df[col], ax = axes[0],kde = True)
    sns.boxplot(df[col], ax = axes[1],orient='h',showmeans=True,color='pink')
    fig.suptitle("Distribution plot of "+ col, fontsize = 15)
    plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

In [None]:
# Vizualizing top 10 churned state
plt.figure(figsize=(10, 5))
bar1 = (df.groupby('State')['Churn'].mean() * 100).sort_values(ascending=False).head(10).reset_index(name="Average True Churn ")
plots = sns.barplot(data=bar1, x='State', y="Average True Churn ")

plt.title("State with most churn percentage", fontsize=20)
plt.xlabel('State', fontsize=15)
plt.ylabel('Percentage (%)', fontsize=15)
plt.ylim(0, 30)
plt.show()


In [None]:
# Vizualizing bottom 10 churned state
plt.figure(figsize=(10, 5))
bar1 = (df.groupby('State')['Churn'].mean() * 100).sort_values().head(10).reset_index(name="Average True Churn ")
plots = sns.barplot(data=bar1, x='State', y="Average True Churn ")

plt.title("State with least churn percentage", fontsize=20)
plt.xlabel('State', fontsize=15)
plt.ylabel('Percentage (%)', fontsize=15)
plt.ylim(0, 10)
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

In [None]:
# Chart - 4: One Digit Account Length
one_length = df[df['Account length'] <= 9]['Churn'].value_counts()
print(one_length, "\n")

# Visualizing One Digit Account Length Based on Churn percentage
plt.figure(figsize=(15, 7))
colors = sns.color_palette('pastel')[0:7]
textprops = {'fontsize': 13}

plt.pie(one_length, labels=['Not Churn(%)', 'Churn(%)'], startangle=90, colors=colors, autopct="%1.1f%%", textprops=textprops)
plt.title('One Digit Account Length churn rate', fontsize=18)
plt.show()


In [None]:
# Two Digit Account Length
two_account = df[(df['Account length'] >= 10) & (df['Account length'] <= 99)]['Churn'].value_counts()
print(two_account, "\n")

# Visualizing Two Digit Account Length Based on Churn percentage
plt.figure(figsize=(15, 7))
colors = sns.color_palette('pastel')[0:7]
textprops = {'fontsize': 13}

plt.pie(two_account, labels=['Not Churn(%)', 'Churn(%)'], startangle=90, colors=colors, autopct="%1.1f%%", textprops=textprops)


In [None]:
# Three Digit Account Length
three_account = df[(df['Account length'] >= 100)]['Churn'].value_counts()
print(three_account, "\n")

# Visualizing Three Digit Account Length Based on Churn percentage
plt.figure(figsize=(15, 7))
colors = sns.color_palette('pastel')[0:7]
textprops = {'fontsize': 13}

plt.pie(three_account, labels=['Not Churn(%)', 'Churn(%)'], startangle=90, colors=colors, autopct="%1.1f%%", textprops=textprops)
plt.title('Three Digit Account Length churn rate', fontsize=18)
plt.show()


In [None]:
# Box Plot for Account Length attribute
plt.figure(figsize=(10,8))
sns.boxplot(data=df, x='Churn', y='Account length', showmeans = True)
plt.title('Account Length Boxplot with Churn', fontsize=18)
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

# Data for international plan
inter_plan = df['International plan'].value_counts()
print(inter_plan, "\n")

# Visualizing Percentage of customers taken international plan
plt.figure(figsize=(15, 7))
colors = sns.color_palette('husl')[0:7]
textprops = {'fontsize': 13}

plt.pie(inter_plan, labels=['No', 'Yes'], startangle=90, colors=colors, autopct="%1.1f%%", textprops=textprops)
plt.title('International Plan', fontsize=18)
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

# Data for voice mail plan
voice = df['Voice mail plan'].value_counts()

# Vizualizing code for customers percentage having voice mail plan
plt.figure(figsize=(9, 7))
palette_color = sns.color_palette('pastel')
textprops = {'fontsize': 13}

# Plotting chart of voice mail
plt.pie(voice, labels=['No', 'Yes'], startangle=90, colors=palette_color, autopct="%1.1f%%", textprops=textprops)
plt.title('Distribution of customers having voice mail plan', fontsize=18)
plt.show()


In [None]:
# Vizualizing code for customers churning while having voice mail plan

a=list(['no','yes'])
b=df.groupby('Voice mail plan')['Churn'].mean()*100

plt.figure(figsize=(6,8))
plots = sns.barplot(x=a,y=b)
plt.title(" Percentage of customer churn on basis of voice mail plan", fontsize = 20)
plt.xlabel('Voice mail plan', fontsize = 15)
plt.ylabel('Percentage (%)', fontsize = 15)
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code
# Visualizing code for Area Code wise average churn percentage
a1= df['Area code'].unique()
b1= df.groupby('Area code')['Churn'].mean()*100

plt.figure(figsize=(6,5))
plots = sns.barplot(x=a1, y=b1)

plt.title('Area Code vs Churn Percantage',fontsize=20)
plt.xlabel('Area code', fontsize = 15)
plt.ylabel('Churn percentage (%)', fontsize = 15)
plt.ylim(0,17)
plt.show()



##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code
# Average of total day calls, total day minutes & total day charge of churn
cn_dcalls = pd.DataFrame(df.groupby(["Churn"])['Total day calls'].mean())
print(cn_dcalls)
print('')
cn_dm = pd.DataFrame(df.groupby(["Churn"])['Total day minutes'].mean())
print(cn_dm)
print('')
cn_dc = pd.DataFrame(df.groupby(["Churn"])['Total day charge'].mean())
print(cn_dc)


In [None]:
# Vizualizing Total day minutes vs total day charge
plt.figure(figsize=(7,8))
sns.scatterplot(data=df, x="Total day minutes", y="Total day charge", hue="Churn")
plt.title('Total Day Minutes, vs Total Day Charge', fontsize=18)
plt.xlabel('Total day minutes',fontsize = 13)
plt.ylabel('Total day charges',fontsize = 13)
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

In [None]:
# Chart - 9 visualization code
# data for customer service calls
service = pd.DataFrame(df.groupby('Customer service calls')['Churn'].mean()*100)

# Visualizing churn rate per customer service calls
plt.figure(figsize=(12,9))
plots = sns.barplot(x=service.index, y=service['Churn'])

plt.title("Churn rate per service call", fontsize = 20)
plt.xlabel('No of cust service call', fontsize = 15)
plt.ylabel('Percentage (%)', fontsize = 15)
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

In [None]:

# Correlation Heatmap visualization code
corr = df.corr()
mask = np.zeros_like(corr)

mask[np.triu_indices_from(mask)] = False

with sns.axes_style("white"):
    f, ax = plt.subplots(figsize=(18, 12))
    ax = sns.heatmap(corr , mask=mask, vmin = -1,vmax=1, annot = True, cmap="YlGnBu")


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#5. Solution to Business Objective

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Solution to Reduce Customer Churn

*	Modify International Plan as the charge is same as normal one.
*	Be proactive with communication.
*	Ask for feedback often.
*	Periodically throw Offers to retain customers.
*	Look at the customers facing problem in  the most churning states.
*	Lean into  best customers.
*	Regular Server Maintenance.
*	Solving Poor Network Connectivity Issue.
*	Define a roadmap for new customers.
*	Analyze churn when it happens.
*	Stay competitive.

# **Conclusion**

The telecommunications market is already well-established, and the rate of new
customers is slow. As a result, companies in this industry prioritize retention and
reducing customer churn. This project analyzed a churn dataset to identify the
main factors contributing to churn and gain valuable insights. Through exploratory
data analysis, we were able to gain insight into the churn dataset, listed below:

1. The four charge fields are directly related to the minute fields.

2. The area code may not be relevant and can be excluded.

3. Customers with the International Plan tend to churn more often.

4. Customers who have had four or more customer service calls churn significantly more than other customers.

5. Customers with high day and evening minute usage tend to churn at a higher rate.

6. There is no clear relationship between churn and the variables such as day calls, evening calls, night calls, international calls, night minutes, international minutes, account length, or voice mail messages.



### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***