<a href="https://colab.research.google.com/github/SouvikChakraborty472/EDA_TelecomChurn/blob/main/EDA_of_Telecom_Churn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Exploratory Data Analysis of Telecom Churn



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual/Team
##### **Member -** Souvik Chakraborty


# **Project Summary -**

Telecom companies often struggle with customer churn, which refers to the number of customers who leaves the company over a given period. In this project, we aimed to analyze the churn rate of a telecom company and identify the factors that contribute to customer churn.

# **GitHub Link -**

https://github.com/SouvikChakraborty472/Exploratory_Data_Analysis

# **Problem Statement**


Orange S.A, formerly France Telecom S.A, is a french multinational telecommunication corporation. The Orange Telecom's Churn Dataset, consist of cleaned customer activity data, along with a churn label specifying whether a customer cancelled the subscription. Explore and analyze the data to discover key factors responsible for customer customer churn and come up with ways/recommendations to ensure customer retention.

#### **Define Your Business Objective?**



*   Identifying the key cause of the customer churn.
*   Provide steps to retain the valuable customer.



# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import ast

### Dataset Loading

In [None]:
# Mount Drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load Dataset
telecom_data = pd.read_csv('/content/drive/MyDrive/TelecomChurn.csv')
telecom_data

### Dataset First View

In [None]:
# Dataset First Look
telecom_data.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
num_rows, num_columns = telecom_data.shape

print("Number of rows:", num_rows)
print("Number of columns:", num_columns)

### Dataset Information

In [None]:
# Dataset Info
telecom_data.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
num_duplicates = telecom_data.duplicated().sum()
print("Number of duplicates:", num_duplicates)

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
telecom_data.isnull().sum()

In [None]:
# Visualizing the missing values
sns.heatmap(telecom_data.isnull())

### What did you know about your dataset?

The above given dataset is related to customer churn in telecommunication industry. In this data there is 3333 rows and 20 coulmns. There are no null or duplicate values present in this dataset. The data shows customer activity during morning, evening and night, different types of subscriptions, area from where the customer belong, and a column named as churn. The churn column shows whether the customer canceled the subscription or not.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
telecom_data.columns

In [None]:
# Dataset Describe
telecom_data.describe()

### Variables Description

* **State                :**categorica for the 50 states

* **Account Length       :**how long account has been active

* **Area Code            :**Code Number of Area having some States included in each area code

* **lntl Plan            :**Internat ional plan activated ( yes, no )

* **VMail Plan           :**  ice Mail plan activated ( yes ,no )

* **VMail Message        :**No.of voice mail messages

* **Day Mins             :**Total day minutes used

* **Day calls**         :Total day calls made

* **Day Charge**         :Total day charge

* **Eve Mins**          :Total evening minutes

* **Eve Calls**          :Total evening calls

* **Eve Charge**         :Total evening charge

* **Night Mins**         :Total night minutes

* **Night Calls**        :Total night calls

* **Night Charge**      :Total night charge

* **Intl Mins**         :Total International minutes used

* **Intl Calls**         :Total International calls made

* **Intl Charge**        :Total International charge

* **CustServ calls**    :Number of customer service caUs made

* **Churn**             :Customer churn (Target Variable True=1, False=0)Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in telecom_data.columns:
  print("No. of unique values in ",i,"is",telecom_data[i].nunique())

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
print("No. of Customers Churning: ", telecom_data[telecom_data['Churn']==True].Churn.count())

# Variable assigning to churned data
churn_data = telecom_data[telecom_data['Churn'] == True]

# Total No. of Unique Area code
print("No. of Unique Area code: ", telecom_data['Area code'].nunique())

# Total No. of international plan activated
print("No. of customer with international plan: ", telecom_data[telecom_data['International plan']=='Yes']['International plan'].count())

# Total No. of voice mail plan activated
print("No. of customer with voice mail plan: ", telecom_data[telecom_data['Voice mail plan']=='Yes']['Voice mail plan'].count())

churn_data

In [None]:
# Customer churning Percentage
print("Total no. customer: ", telecom_data.Churn.count())
percent_churn = churn_data.Churn.count()/telecom_data.Churn.count()*100
print(f"Percentage of customer churning: , {round(percent_churn, 2)}%")

In [None]:
# Customer churn per area
state_cust_churn = churn_data.groupby(['State'])['Churn'].value_counts().reset_index(name='Churn_customer')
print(state_cust_churn.sum())
state_cust_churn

In [None]:
# Account length wise churn data
acc_len_churn = churn_data.groupby(['Account length'])['Churn'].value_counts().reset_index(name='values')
print(acc_len_churn.sum())
acc_len_churn

In [None]:
# Churn percentage per Area code
Area_code_churn_percent = (telecom_data.groupby(['Area code'])['Churn'].mean()*100).reset_index()
Area_code_churn_percent

In [None]:
# No. of churn per Area code
Area_code_churn_count = churn_data.groupby(['Area code'])['Churn'].value_counts().reset_index(name='Counts')
Area_code_churn_count

In [None]:
# Poor contectivity by intersection of state by area code
Area_state_churn = churn_data.groupby(['Area code'])['State'].unique().reset_index(name='Unique state')
Area_state_churn

In [None]:
# Assigning the list of state of each area
Area_408_state_churn = Area_state_churn.loc[0, 'Unique state']
Area_415_state_churn = Area_state_churn.loc[1, 'Unique state']
Area_510_state_churn = Area_state_churn.loc[2, 'Unique state']

inter_1 = set(Area_408_state_churn).intersection(set(Area_415_state_churn))
poor_connectivity_states = set(inter_1).intersection(set(Area_510_state_churn))
print(len(poor_connectivity_states))

In [None]:
# Percentage of Churned customers with & without international plan
inter_plan_churn = (telecom_data.groupby(['International plan'])['Churn'].mean()*100).reset_index(name = 'Churn %')
print(inter_plan_churn)

# Churn customer with International plan
Churn_inter_yes = churn_data[churn_data['International plan'] == 'Yes']
print(Churn_inter_yes.Churn.count())

# Churn customer without international plan
Churn_inter_no = churn_data[churn_data['International plan'] == 'No']
print(Churn_inter_no.Churn.count())

In [None]:
# Percentage of Churned customer with and without vmail plan
vmail_plan_churn = (telecom_data.groupby(['Voice mail plan'])['Churn'].mean()*100).reset_index(name = 'Churn %')
print(vmail_plan_churn)

# Churn customer with vmail plan
Churn_vmail_yes = churn_data[churn_data['Voice mail plan'] == 'Yes']
print(Churn_vmail_yes.Churn.count())

# Churn customer without vmail plan
Churn_vmail_no = churn_data[churn_data['Voice mail plan'] == 'No']
print(Churn_vmail_no.Churn.count())

In [None]:
# Combining International & Voice mail plan

# Churned customer having both plans
Churn_both_plan = churn_data[(churn_data['International plan'] == 'Yes') & (churn_data['Voice mail plan'] == 'Yes')]
print(Churn_both_plan.Churn.count())

# Churned customer without having both plans
Churn_no_both_plan = churn_data[(churn_data['International plan'] == 'No') & (churn_data['Voice mail plan'] == 'No')]
print(Churn_no_both_plan.Churn.count())

# Churned customer having International plan but no vmail plan
Churn_inter_no_vmail = churn_data[(churn_data['International plan'] == 'Yes') & (churn_data['Voice mail plan'] == 'No')]
print(Churn_inter_no_vmail.Churn.count())

# Churned customer having vmail plan but no International plan
Churn_vmail_no_inter = churn_data[(churn_data['International plan'] == 'No') & (churn_data['Voice mail plan'] == 'Yes')]
print(Churn_vmail_no_inter.Churn.count())

In [None]:
# States with Poor Connectivity

# Sates sorted with respect to international and vmail plan
state_inter_vmail = Churn_both_plan['State'].unique()
state_inter_vmail_no = Churn_no_both_plan['State'].unique()
state_vmail_no_inter = Churn_inter_no_vmail['State'].unique()
state_inter_no_vmail = Churn_vmail_no_inter['State'].unique()

# Intersection of 4 types of plan
inter_vmail = set(state_inter_vmail).intersection(set(state_inter_vmail_no))
vmail_no_inter = set(state_vmail_no_inter).intersection(set(state_inter_no_vmail))
Intersection = set(inter_vmail).intersection(set(vmail_no_inter))
print(list(Intersection))


In [None]:
# Percentage of People with number of Voice mail messages churning/not-churning to other networks
vmail_mssg_churn = telecom_data.groupby(['Churn'])['Number vmail messages'].mean().reset_index(name='perc_vmail_mssg')
vmail_mssg_churn

In [None]:
# Day data
telecom_data.groupby(['Churn'])['Total day minutes'].mean().reset_index(name='perc_day_mins')

In [None]:
telecom_data.groupby(['Churn'])['Total day calls'].mean().reset_index(name='perc_day_calls')

In [None]:
telecom_data.groupby(['Churn'])['Total day charge'].mean().reset_index(name='perc_day_charge')

In [None]:
# Evening Data
telecom_data.groupby(['Churn'])['Total eve minutes'].mean().reset_index(name='perc_eve_mins')

In [None]:
telecom_data.groupby(['Churn'])['Total eve calls'].mean().reset_index(name='perc_eve_calls')

In [None]:
telecom_data.groupby(['Churn'])['Total eve charge'].mean().reset_index(name='perc_eve_charge')

In [None]:
# Night Data
telecom_data.groupby(['Churn'])['Total night minutes'].mean().reset_index(name='perc_night_mins')

In [None]:
telecom_data.groupby(['Churn'])['Total night calls'].mean().reset_index(name='perc_night_calls')

In [None]:
telecom_data.groupby(['Churn'])['Total night charge'].mean().reset_index(name='perc_night_charge')

In [None]:
# International Call Data
telecom_data.groupby(['Churn'])['Total intl minutes'].mean().reset_index(name='perc_intl_mins')

In [None]:
telecom_data.groupby(['Churn'])['Total intl calls'].mean().reset_index(name='perc_intl_calls')

In [None]:
telecom_data.groupby(['Churn'])['Total intl charge'].mean().reset_index(name='perc_intl_charge')

In [None]:
# Combining of day,evening and night calls, mins and charges
telecom_data['Total calls'] = telecom_data.loc[:,['Total day calls','Total eve calls', 'Total night calls']].sum(axis=1)
telecom_data['Total mins'] = telecom_data.loc[:,['Total day minutes','Total eve minutes', 'Total night minutes']].sum(axis=1)
telecom_data['Total charge'] = telecom_data.loc[:,['Total day charge','Total eve charge', 'Total night charge']].sum(axis=1)

# mins per call
telecom_data['min_per_call'] = telecom_data['Total mins']/telecom_data['Total calls']

#charge per min
telecom_data['charge_per_min'] = telecom_data['Total charge']/telecom_data['Total mins']

# International mins per call
telecom_data['Intl_min_per_call'] = telecom_data['Total intl minutes']/telecom_data['Total intl calls']

#International charge per min
telecom_data['Intl_charge_per_min'] = telecom_data['Total intl charge']/telecom_data['Total intl minutes']

In [None]:
# Customer Service Call data
print("No of unique service calls made :", telecom_data['Customer service calls'].nunique())

# Percentage of churning based on the customer service calls made
(telecom_data.groupby(['Customer service calls'])['Churn'].mean()*100).reset_index(name='Perc_churned')

### What all manipulations have you done and insights you found?

In our data wrangling process, our initial step involved examining customer churn rates to determine the proportion of customers who have churned compared to those who haven't. Utilizing visualizations, such as charts, proved instrumental in enhancing our understanding. As a result, we discovered that approximately 14.5% of our customers are classified as churned.

There are total 483 Churned customers.Among them mostly are from area code 415 but if we check the distribution of churn it almost same in all area code.
Also, we found out that those who has international plan their churn rate is higher, almost 42.41 % customers are churned.

the highest chur happens among those customers who has neither international plan nor having vmail plan, almost around 62 %.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 Visualization of Dependent Variable

In [None]:
# Chart - 1  Dependent Column Value Counts
print(telecom_data.Churn.value_counts())
print(" ")

# Dependent Variable Column Visualization
telecom_data['Churn'].value_counts().plot( kind='pie',figsize=(10,6),
                                        autopct="%1.1f%%",
startangle=50,
shadow=True,
labels=['Not Churn(%)','Churn(%)'],
colors=['green','red'],explode=[0.12,0])
plt.title('Total Percentage of Churn')

# Displaying chart
plt.show()

##### 1. Why did you pick the specific chart?

Pie charts excel at illustrating the distribution of parts within a whole, particularly when the data is already presented as percentages. They offer a visually engaging and straightforward method for showcasing proportions or percentages within a dataset.

##### 2. What is/are the insight(s) found from the chart?

Based on the chart, we observe that out of the total customer dataset, 2,850 customers, or 85.5%, remain active, while 483 customers, equivalent to 14.5%, have churned. Although 14.5% might seem insignificant initially, it's crucial to note that this percentage has surged from 1.45% in the past. Therefore, urgent intervention is warranted to address this escalating churn rate.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


In an ideal scenario within the telecom industry, the expected customer churn rate typically falls between 15% to 20%. While our current churn rate stands at 14.5%, which isn't notably high, it's crucial to recognize that any level of customer churn can adversely impact business operations. Acquiring new customers is a challenging endeavor, far more demanding than retaining existing ones. Additionally, the ripple effect of a single churned customer can result in the loss of three to four potential customers who could have been acquired through persuasive communication strategies. Thus, understanding and leveraging insights from the churn rate is imperative for making informed decisions moving forward.

#### Chart - 2 State vs Average True churn percentage

In [None]:
# Chart - 2  State vs. average true churn percentage visualization code

# Vizualizing top 10 churned state
plt.figure(figsize=(10,5))
bar1 = ((telecom_data.groupby(['State'])['Churn'].mean()*100).sort_values(ascending = False).reset_index(name="Average True Churn ").head(10))
plots = sns.barplot(data = bar1, x = 'State', y="Average True Churn ")
for bar in plots.patches:
  plots.annotate(format(bar.get_height(),'.2f'),
                   (bar.get_x() + bar.get_width() / 2,
                    bar.get_height()), ha='center', va='center',
                   size=12, xytext=(0, 8),
                   textcoords='offset points')

plt.title(" State with most churn percentage", fontsize = 20)
plt.xlabel('State', fontsize = 15)
plt.ylabel('Percentage (%)', fontsize = 15)
# Setting limit of the y axis from 0 to 30
plt.ylim(0,30)
plt.show()

In [None]:
# Vizualizing bottom 10 churned state
plt.figure(figsize=(10,5))
bar1 = ((telecom_data.groupby(['State'])['Churn'].mean()*100).sort_values(ascending = True).reset_index(name="Average True Churn ").head(10))
plots = sns.barplot(data = bar1, x = 'State', y="Average True Churn ")
for bar in plots.patches:
  plots.annotate(format(bar.get_height(),'.2f'),
                   (bar.get_x() + bar.get_width() / 2,
                    bar.get_height()), ha='center', va='center',
                   size=12, xytext=(0, 8),
                   textcoords='offset points')
plt.title(" State with least churn percentage", fontsize = 20)
plt.xlabel('State', fontsize = 15)
plt.ylabel('Percentage (%)', fontsize = 15)
# Setting limit of y axis from 0 to 10
plt.ylim(0,10)
plt.show()

##### 1. Why did you pick the specific chart?


Bar charts serve as effective tools for comparing the magnitudes or occurrences of various categories or data groups. They prove invaluable for juxtaposing data across diverse categories and efficiently presenting extensive datasets within a confined area. In our case, we've employed a bar chart to illustrate the average percentage of true churn relative to different states.

##### 2. What is/are the insight(s) found from the chart?

Among the 51 states analyzed, 10 states—CA, NJ, TX, MD, SC, MI, MS, NV, WA, and ME—exhibit notably higher churn rates exceeding 20%, surpassing half of the average churn rate. Conversely, another set of 10 states—HI, AK, AZ, VA, IA, LA, NE, IL, WI, and RI—demonstrate lower churn rates, each falling below the 10% mark.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Indeed, the state-wise churn data highlights a significant number of states with an average churn rate exceeding 20%. This necessitates a thorough examination and further analysis to identify the underlying factors contributing to this high churn rate. By delving into these factors, we can gain insights crucial for devising strategies aimed at mitigating churn and enhancing customer retention in those states.

#### Chart - 3 Histogram and Box Plot (Univariate) per column

In [None]:
# Chart - 3 Visualizing code of histogram plot & boxplot for each columns to know the data distribution
for col in telecom_data.describe().columns:
    fig,axes = plt.subplots(nrows=1,ncols=2,figsize=(18,6))
    sns.histplot(telecom_data[col], ax = axes[0],kde = True)
    sns.boxplot(telecom_data[col], ax = axes[1],orient='h',showmeans=True,color='pink')
    fig.suptitle("Distribution plot of "+ col, fontsize = 15)
    plt.show()

##### 1. Why did you pick the specific chart?

When you aim to visualize the distribution of a single variable, histograms come into play. They're a specific type of bar chart designed to showcase the frequency or count of data points falling within predefined ranges. Histograms are particularly handy for representing continuous data like age, height, weight, or income. They offer insights into the distribution's shape, revealing any skewness or outliers and enabling the identification of patterns or trends.

On the other hand, box plots, also known as box-and-whisker plots, serve a different purpose. They excel in comparing the distributions of two or more datasets, providing a visual summary of their essential statistical properties. Box plots offer a glimpse into differences or similarities in the spread and central tendency of the data, aiding in the comparison and analysis of multiple datasets.

##### 2. What is/are the insight(s) found from the chart?

Based on the distribution charts provided, it appears that all numerical columns exhibit symmetric distributions, with the mean and median values being approximately equal. This suggests a balanced distribution without significant skewness.

However, in the case of the "Area code" column, which contains only three distinct values, it will be treated as a categorical variable rather than a numerical one. Categorical variables represent discrete categories or groups and are typically analyzed differently from numerical variables. Therefore, for analytical purposes, we'll categorize "Area code" accordingly.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Absolutely, histograms and box plots offer valuable insights into the distribution of data within a dataset. They provide visual representations that help us understand the spread, central tendency, and any potential outliers or patterns present in the data. However, they don't provide a comprehensive view of the entire dataset or capture all aspects of its characteristics.

While histograms and box plots offer essential information about the distribution of individual variables, additional analysis techniques and statistical methods may be necessary to gain a deeper understanding of the data, such as correlation analysis, regression modeling, or hypothesis testing. These techniques can help uncover relationships between variables, identify trends, and make more informed decisions based on the data.

#### Chart - 4 Account Length with Churn

In [None]:
# Chart - 4  One Digit Account Length
one_length = telecom_data[telecom_data['Account length']<=9].loc[:,['Churn']].value_counts()
print(one_length)
print(" ")

# Visualizing One Digit Account Length Based on Churn percentage
#color palette selection
colors = sns.color_palette('pastel')[0:7]
textprops = {'fontsize':13}

plt.figure(figsize=(15,7))
# plotting pie chart
plt.pie(one_length, labels=['Not Churn(%)','Churn(%)'], startangle=90, colors=colors, autopct="%1.1f%%",textprops = textprops)
plt.title('One Digit Account Length churn rate', fontsize=18)
plt.show()

In [None]:
# Two Digit Account Length
two_account=telecom_data[(telecom_data['Account length']<=99) & (telecom_data['Account length']>=10)].loc[:,['Churn']].value_counts()
print(two_account)
print(" ")

# Visualizing Two Digit Account Length Based on Churn percentage
#color palette selection
colors = sns.color_palette('pastel')[0:7]
textprops = {'fontsize':13}

plt.figure(figsize=(15,7))
# plotting pie chart
plt.pie(two_account, labels=['Not Churn(%)','Churn(%)'], startangle=90, colors=colors, autopct="%1.1f%%", textprops = textprops)
plt.title('Two Digit Account Length churn rate', fontsize=18)
plt.show()

In [None]:
# Three Digit Account Length
three_account=telecom_data[(telecom_data['Account length']<=telecom_data['Account length'].max()) & (telecom_data['Account length']>=100)].loc[:,['Churn']].value_counts()
print(three_account)
print(" ")

# Visualizing Three Digit Account Length Based on Churn percentage
#color palette selection
colors = sns.color_palette('pastel')[0:7]
textprops = {'fontsize':13}

plt.figure(figsize=(15,7))
# plotting data on chart using seaborn
plt.pie(three_account, labels=['Not Churn(%)','Churn(%)'],startangle=90 , colors=colors, autopct="%1.1f%%",textprops = textprops)
plt.title('Three Digit Account Length churn rate', fontsize=18)
plt.show()

In [None]:
# Box Plot for Account Length attribute
plt.figure(figsize=(10,8))
sns.boxplot(data=telecom_data, x='Churn', y='Account length', showmeans = True)
plt.title('Account Length Boxplot with Churn', fontsize=18)
plt.show()

##### 1. Why did you pick the specific chart?

Pie charts are typically utilized to illustrate the proportions of a whole, making them especially effective for presenting data that has already been converted into percentages. They offer a clear visual representation of how different parts contribute to the overall dataset.


So, I used a pie chart to facilitate a percentage comparison of the churn rates based on account length.

A box plot, on the other hand, is employed to summarize the key statistical characteristics of a dataset, including the median, quartiles, and range, all within a single visualization. Box plots are valuable for detecting outliers, comparing distributions across multiple datasets, and understanding the data's dispersion. They are frequently utilized in statistical analysis and data visualization.

In this case, I used a box plot to determine the maximum and minimum values, identify well-segregated outliers, and clearly define the mean and median, as illustrated in the box plot graph.

##### 2. What is/are the insight(s) found from the chart?

We can observe that customers with two-digit account lengths have a churn count of 225, while those with three-digit account lengths have a churn count of 256. This indicates a high churn rate among these groups.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Account length represents the number of days customers have been active. For new customers, the churn rate is very low at around 8.3%, with only 2 customers churning. These customers might have been testing the telecom service and were not satisfied, leading to their churn.

Customers with account lengths between 10 to 99 days have a churn rate of 14%. Those with account lengths below 50 days can still be considered relatively new, while those between 55 and 99 days may not be receiving sufficient benefits from their current plans, leading to higher churn rates.

Customers with account lengths exceeding 100 days are considered long-term customers. They may be churning due to a lack of additional offers or benefits, such as power-plus plans or other incentives.

Therefore, analyzing account length provides valuable insights into the reasons behind customer churn and highlights areas for potential improvement in customer retention strategies.

#### Chart - 5 Voice Mail

In [None]:
# Chart - 5 visualization code

# data for voice mail plan
voice = telecom_data['Voice mail plan'].value_counts()

# vizualizing code for customers percentage having voice mail plan

#color palette selection
palette_color = sns.color_palette('pastel')
textprops = {'fontsize':13}

# plotting chart of voice mail
plt.figure(figsize=(9,7))
plt.pie(voice, labels=['No','Yes'],startangle=90 , colors=palette_color, autopct="%1.1f%%",textprops = textprops)
plt.title('Distribution of customers having voice mail plan', fontsize=18)
plt.show()

In [None]:
# Vizualizing code for customers churning while having voice mail plan

cc1=list(['no','yes'])
cc2=telecom_data.groupby('Voice mail plan')['Churn'].mean()*100

plt.figure(figsize=(6,8))
plots = sns.barplot(x=cc1,y=cc2)
for bar in plots.patches:
  plots.annotate(format(bar.get_height(),'.2f'),
                   (bar.get_x() + bar.get_width() / 2,
                    bar.get_height()), ha='center', va='center',
                   size=12, xytext=(0, 8),
                   textcoords='offset points')

plt.title(" Percentage of customer churn on basis of voice mail plan", fontsize = 20)
plt.xlabel('Voice mail plan', fontsize = 15)
plt.ylabel('Percentage (%)', fontsize = 15)
plt.show()

##### 1. Why did you pick the specific chart?

Pie charts are generally used to illustrate the proportions of a whole, making them particularly effective for displaying data that has been calculated as percentages of the entire dataset. Therefore, we used a pie chart to show the percentage of customers with a voice mail plan.

Bar charts, on the other hand, are ideal for comparing the size or frequency of different categories or groups of data. They are useful for comparing data across various categories and can efficiently display a large amount of data in a compact space. Hence, we utilized a bar chart to display the percentage of customers who have churned and possess a voice mail plan.

##### 2. What is/are the insight(s) found from the chart?

A total of 2,411 customers do not have a voice mail plan, while 922 customers do have a voice mail plan. Among those without a voice mail plan, 16.7% have churned. In contrast, among those with a voice mail plan, only 8.7% have churned. This indicates that customers with a voice mail plan tend to churn less frequently.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, voice mail plan might be considered partially.

#### Chart - 6 Area Code

In [None]:
# Chart - 6 visualization code
# Visualizing code for Area Code wise average churn percentage
a1= telecom_data['Area code'].unique()
b1= telecom_data.groupby('Area code')['Churn'].mean()*100

plt.figure(figsize=(6,5))
plots = sns.barplot(x=a1, y=b1)
for bar in plots.patches:
  plots.annotate(format(bar.get_height(),'.2f'),
                   (bar.get_x() + bar.get_width() / 2,
                    bar.get_height()), ha='center', va='center',
                   size=12, xytext=(0, 8),
                   textcoords='offset points')
plt.title('Area Code vs Churn Percantage',fontsize=20)
plt.xlabel('Area code', fontsize = 15)
plt.ylabel('Churn percentage (%)', fontsize = 15)
plt.ylim(0,17)
plt.show()

##### 1. Why did you pick the specific chart?


Bar charts are used to compare the size or frequency of different categories or groups of data. They are useful for comparing data across various categories and can efficiently display a large amount of data in a compact space.

To illustrate the average percentage of true churn concerning Area Code, we have utilized a bar chart.

##### 2. What is/are the insight(s) found from the chart?

All Area Code have around 14% Churn rate. So, Area Code doesn't matter.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Area Code does not significantly impact the churn rate, as the churn rate remains consistent at 14% across all area codes. However, when we further segregate the area codes by their respective states, we can better analyze and identify the specific states where issues are occurring. Thus, while the Area Code itself may not help create a business impact, a state-wise analysis within those area codes can provide valuable insights for addressing churn-related issues.

#### Chart - 7 International Plan

In [None]:
# Chart - 7 visualization code

# data for international plan
inter_plan = telecom_data['International plan'].value_counts()
print(inter_plan)
print(" ")

# Visualizing Percentage of customers taken international plan
#color palette selection
colors = sns.color_palette('husl')[0:7]
textprops = {'fontsize':13}

plt.figure(figsize=(15,7))
# plotting pie chart
plt.pie(inter_plan, labels=['No','Yes'],startangle=90 , colors=colors, autopct="%1.1f%%", textprops = textprops)
plt.title('International Plan', fontsize=18)
plt.show()

In [None]:
i1 = telecom_data['International plan'].unique()
i2 = telecom_data.groupby('International plan')['Churn'].mean()*100
i3 = telecom_data.groupby(['International plan'])['Total intl charge'].mean()
i4 = telecom_data.groupby(["Churn"])['Total intl minutes'].mean()

In [None]:
# Visualizing code for people churning percentage having international plan
plt.figure(figsize=(6,7))
plots = sns.barplot(x=i1,y=i2)
for bar in plots.patches:
  plots.annotate(format(bar.get_height(),'.2f'),
                   (bar.get_x() + bar.get_width() / 2,
                    bar.get_height()), ha='center', va='center',
                   size=12, xytext=(0, 8),
                   textcoords='offset points')

plt.title(" Percentage of customer churn on basis of International plan", fontsize = 20)
plt.xlabel('International plan', fontsize = 15)
plt.ylabel('Percentage (%)', fontsize = 15)
plt.ylim(0,45)
plt.show()

In [None]:
# Visualizing code for average calling charge of customers having international plan
plt.figure(figsize=(6,7))
plots = sns.barplot(x=i1,y=i3)
for bar in plots.patches:
  plots.annotate(format(bar.get_height(),'.2f'),
                   (bar.get_x() + bar.get_width() / 2,
                    bar.get_height()), ha='center', va='center',
                   size=12, xytext=(0, 8),
                   textcoords='offset points')
plt.title(" Average charges on the basis of International plan", fontsize = 20)
plt.xlabel('International plan', fontsize = 15)
plt.ylabel('Charges', fontsize = 15)
plt.ylim(0,3.5)
plt.show()

In [None]:
# Visualizing code for average minutes talked by customers having international plan
plt.figure(figsize=(6,7))
plots = sns.barplot(x=i1,y=i4)
for bar in plots.patches:
  plots.annotate(format(bar.get_height(),'.2f'),
                   (bar.get_x() + bar.get_width() / 2,
                    bar.get_height()), ha='center', va='center',
                   size=12, xytext=(0, 8),
                   textcoords='offset points')
plt.title(" Average International minutes on basis of international plan", fontsize = 20)
plt.xlabel('International Plan', fontsize = 15)
plt.ylabel('Minutes', fontsize = 15)
plt.ylim(0,12)
plt.show()

##### 1. Why did you pick the specific chart?

Pie charts are generally used to illustrate the proportions of a whole and are especially effective for displaying data that has been calculated as percentages. Therefore, we used a pie chart with different colored sections to show the percentage of customers who have taken an international plan.

Bar charts, on the other hand, are useful for comparing the size or frequency of different categories or groups of data. They are ideal for displaying a large amount of data in a compact space. Consequently, we used a bar chart to show the percentage of customers who have churned and have an international plan, as well as the average calling charges and average conversation minutes for customers with an international plan.

##### 2. What is/are the insight(s) found from the chart?

In this analysis, among the customers, 3010 do not have an international plan, while 323 have opted for one. Among those with an international plan, 42.4% have churned, whereas among those without, only 11.4% have churned.

Additionally, customers with an international plan have an average charge of $2.87 and speak for an average of 10.7 minutes. In contrast, those without an international plan have an average charge of $2.75 and speak for an average of 10.16 minutes.

The higher churn rate among customers with international plans suggests dissatisfaction, potentially stemming from paying the same amount for international calls as those without such plans. Consequently, customers may feel they're not receiving the expected benefits from their international plan, leading to higher churn rates among this group.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Absolutely, those insights can indeed have a positive impact on business decisions. The discrepancy between the additional charges associated with international plans and the lack of corresponding benefits in call rates compared to customers without such plans could be a significant factor contributing to higher churn rates among international plan subscribers. Addressing this issue by adjusting pricing or offering additional incentives to international plan subscribers could help improve customer satisfaction and retention rates in the future.

#### Chart - 8 Overall Calls

In [None]:
# Chart - 8 visualization code
# Average of total day calls, total day minutes & total day charge of churn
cn_dcalls = pd.DataFrame(telecom_data.groupby(["Churn"])['Total day calls'].mean())
print(cn_dcalls)
print('')
cn_dm = pd.DataFrame(telecom_data.groupby(["Churn"])['Total day minutes'].mean())
print(cn_dm)
print('')
cn_dc = pd.DataFrame(telecom_data.groupby(["Churn"])['Total day charge'].mean())
print(cn_dc)

In [None]:
# Vizualizing Total day minutes vs total day charge
plt.figure(figsize=(7,8))
sns.scatterplot(data=telecom_data, x="Total day minutes", y="Total day charge", hue="Churn")
plt.title('Total Day Minutes, vs Total Day Charge', fontsize=18)
plt.xlabel('Total day minutes',fontsize = 13)
plt.ylabel('Total day charges',fontsize = 13)
plt.show()


In [None]:
# Average of total eve calls, total eve minutes & total evening charge of churn
cn_ecalls = pd.DataFrame(telecom_data.groupby(["Churn"])['Total eve calls'].mean())
print(cn_ecalls)
print(" ")
cn_em = pd.DataFrame(telecom_data.groupby(["Churn"])['Total eve minutes'].mean())
print(cn_em)
print(" ")
cn_ec = pd.DataFrame(telecom_data.groupby(["Churn"])['Total eve charge'].mean())
print(cn_ec)

In [None]:
# Vizualizing total evening minutes vs total evening charge
plt.figure(figsize=(7,8))
sns.scatterplot(x="Total eve minutes", y="Total eve charge", hue="Churn", data=telecom_data)
plt.title('Total evening minutes vs Total evening charge', fontsize=18)
plt.xlabel('Total eve minutes',fontsize = 13)
plt.ylabel('Total eve charges',fontsize = 13)
plt.show()

In [None]:
# Average of total night calls, total night minutes & total night charge of churn
cn_ncalls = pd.DataFrame(telecom_data.groupby(["Churn"])['Total night calls'].mean())
print(cn_ncalls)
print(" ")
cn_nm = pd.DataFrame(telecom_data.groupby(["Churn"])['Total night minutes'].mean())
print(cn_nm)
print(" ")
cn_nc = pd.DataFrame(telecom_data.groupby(["Churn"])['Total night charge'].mean())
print(cn_nc)


In [None]:
# Vizualizing Total night minutes vs total night charge
plt.figure(figsize=(7,8))
sns.scatterplot(x="Total night minutes", y="Total night charge", hue="Churn", data=telecom_data)
plt.title('Total night minutes vs Total night charge', fontsize=18)
plt.xlabel('Total night minutes',fontsize = 13)
plt.ylabel('Total night charges',fontsize = 13)
plt.show()

In [None]:
# Table of average calls of total day, eve & night on basis of churn
cn_calls = pd.merge(pd.merge(cn_dcalls,cn_ecalls, on = 'Churn'),cn_ncalls,on = 'Churn').round(2).T
cn_calls

In [None]:
# Bar plot of above table
plt.figure(figsize=(9,6))
cn_calls.plot(kind='bar',ylabel='mean')
plt.title('Average of calls on the basis of churn', fontsize=18)
plt.xlabel("calls", fontsize = 13)
plt.ylabel('Average of Calls',fontsize = 13)
plt.show()

In [None]:
# Table of average minutes of total day, eve & night on basis of churn
cn_minutes = pd.merge(pd.merge(cn_dm,cn_em, on = 'Churn'),cn_nm,on = 'Churn').round(2).T
cn_minutes

In [None]:
# Bar plot of above table
plt.figure(figsize=(9,6))
cn_minutes.plot(kind='bar',ylabel='mean')
plt.title('Average of minutes on the basis of churn', fontsize=18)
plt.xlabel("minutes", fontsize = 13)
plt.ylabel('Average of minutes',fontsize = 13)
plt.show()

In [None]:
# Table of average charges of total day, eve & night on basis of churn
cn_charges = pd.merge(pd.merge(cn_dc,cn_ec, on = 'Churn'),cn_nc,on = 'Churn').round(2).T
cn_charges

In [None]:
# Bar plot of above table
plt.figure(figsize=(9,7))
cn_charges.plot(kind='bar',ylabel='mean')
plt.title('Average of charges on the basis of churn', fontsize=18)
plt.xlabel("Charges", fontsize = 13)
plt.ylabel('Average of charges',fontsize = 13)
plt.show()

##### 1. Why did you pick the specific chart?


Scatter plots are indeed effective for illustrating the relationship between two numerical variables, aiding in the identification of patterns and trends in the data. Hence, we utilized a scatter plot to visualize the relationship between evening, day, and night calls, minutes, and charges.

On the other hand, bar charts excel in comparing the size or frequency of different categories or groups of data, making them ideal for displaying a large amount of data within a confined space. Therefore, we employed a bar plot to showcase the manipulated data for evening, night, and day categories, aiming to convey meaningful insights from the analysis.

##### 2. What is/are the insight(s) found from the chart?

Your observation is insightful. It seems that churn customers tend to speak more minutes during the day, evening, and night compared to non-churn customers. Consequently, they end up paying higher charges than non-churn customers.

Introducing a master plan that offers discounts or additional free minutes for customers who frequently exceed their allotted talk time could be a strategic move to retain churn customers. This approach not only incentivizes high-usage customers to stay with the company but also enhances customer satisfaction by offering them better value for their usage. By addressing the needs of churn customers in this manner, the company can potentially reduce churn rates and foster greater customer loyalty.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Overall, by strategically optimizing voice call plans and tailoring offers to meet the diverse needs of different customer segments, the telecom service provider can improve customer satisfaction, increase retention rates, and ultimately drive positive business outcomes.

#### Chart - 9 Customer Service Calls

In [None]:
# Chart - 9 visualization code
# data for customer service calls
service = pd.DataFrame(telecom_data.groupby('Customer service calls')['Churn'].mean()*100)

# Visualizing churn rate per customer service calls
plt.figure(figsize=(12,9))
plots = sns.barplot(x=service.index, y=service['Churn'])
for bar in plots.patches:
  plots.annotate(format(bar.get_height(),'.2f'),
                   (bar.get_x() + bar.get_width() / 2,
                    bar.get_height()), ha='center', va='center',
                   size=12, xytext=(0, 8),
                   textcoords='offset points')
plt.title("Churn rate per service call", fontsize = 20)
plt.xlabel('No of cust service call', fontsize = 15)
plt.ylabel('Percentage (%)', fontsize = 15)
plt.show()

##### 1. Why did you pick the specific chart?

Using a bar plot to illustrate the relationship between churn rate and customer service calls is an effective approach. Bar charts are well-suited for comparing the size or frequency of different categories or groups of data, making them ideal for visualizing churn rates across various levels of customer service calls. This visualization can provide valuable insights into how customer service interactions impact churn rates, allowing businesses to identify areas for improvement and develop targeted strategies to reduce churn and enhance customer satisfaction.

##### 2. What is/are the insight(s) found from the chart?

the observation that customers with four or more service calls churn more than four times as often highlights the importance of focusing efforts on addressing the needs of these customers. Implementing targeted retention strategies tailored to this segment can help mitigate churn and foster long-term customer loyalty.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 prioritizing customer service excellence through efficient query resolution, proactive issue management, and agent recognition initiatives can significantly contribute to the success and growth of the business.

#### Chart - 10 Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
# Identify non-numerical columns
non_numeric_cols = telecom_data.select_dtypes(exclude=['number']).columns

# Example: One-hot encode categorical columns
telecom_data_encoded = pd.get_dummies(telecom_data, columns=non_numeric_cols)

# Calculate correlations on the encoded DataFrame
corr = telecom_data_encoded.corr()
#corr = telecom_data.corr()
mask = np.zeros_like(corr)

mask[np.triu_indices_from(mask)] = False

with sns.axes_style("white"):
    f, ax = plt.subplots(figsize=(18, 12))
    ax = sns.heatmap(corr , mask=mask, vmin = -1,vmax=1, annot = True, cmap="YlGnBu")

##### 1. Why did you pick the specific chart?

the concept of correlation and correlation matrix succinctly. The correlation coefficient indeed measures the strength and direction of a linear relationship between two variables, with a range of [-1, 1]. A correlation matrix summarizes the relationships among a set of variables, providing valuable insights for data exploration and variable selection in modeling tasks. A correlation of 1 indicates a perfect positive linear relationship, -1 indicates a perfect negative linear relationship, and 0 indicates no linear relationship between the variables. Thank you for sharing this important information!

##### 2. What is/are the insight(s) found from the chart?

the correlation heatmap provides valuable insights into the relationships between different variables in the dataset, helping to identify patterns and trends that can inform further analysis and decision-making.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Solution to Reduce Customer Churn

Modify International Plan as the charge is same as normal one.

*   Be proactive with communication.
*   Ask for feedback often.

*   Periodically throw Offers to retain customers.
*   Look at the customers facing problem in the most churning states.

*   Lean into best customers.
*   Regular Server Maintenance.

*   Solving Poor Network Connectivity Issue.
*   Define a roadmap for new customers.

*   Analyze churn when it happens.
*   Stay competitive.

# **Conclusion**

Your summary of the insights gained from the churn dataset analysis is comprehensive and insightful. Here's a breakdown of the key findings:

1. The charge fields are directly related to the minute fields, indicating a strong correlation between usage and charges.
2. The relevance of the area code appears minimal and may be excluded from further analysis.
3. Customers with the International Plan exhibit higher churn rates, suggesting dissatisfaction or other issues related to this plan.
4. Customers who have had four or more customer service calls demonstrate significantly higher churn rates, highlighting the importance of addressing customer concerns effectively.
5. High day and evening minute usage correlate with higher churn rates, indicating potential dissatisfaction or other factors influencing churn behavior.
6. No clear relationship exists between churn and variables such as day calls, evening calls, night calls, international calls, night minutes, international minutes, account length, or voice mail messages, suggesting these factors may have minimal impact on churn.

These insights provide valuable guidance for developing targeted retention strategies and addressing key drivers of churn in the telecommunications industry. By focusing on mitigating the identified risk factors and enhancing customer satisfaction, companies can improve retention rates and foster long-term customer loyalty.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***