# **Project Name**    - 



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Name** - Anurag Taiskar

# **Project Summary -**

Telecom churn analysis is the process of identifying customers who are likely to cancel their service or switch to a different service provider. This is an important problem for telecom companies, as churn can have a significant impact on their revenue and profitability. Orange Telecom's Churn Dataset consists of cleaned customer activity data and a label specifying whether a customer has churned. The goal of the analysis is to understand the factors that contribute to churn and develop strategies to reduce churn by targeting those factors. The first step in conducting an exploratory data analysis (EDA) for telecom churn analysis is data acquisition, which involves obtaining a representative sample of data from the telecom company including customer demographic information, usage patterns, and churn status. The next step is data cleaning, which involves removing any missing or incomplete data and ensuring that the data is in a format that can be easily analyzed. Data visualization involves using plots and charts to visualize the data and identify trends and patterns, while data summarization involves using statistical techniques to summarize the data and understand the relationships between different variables.

To reduce churn and improve customer retention, it is important to take a proactive approach. One effective strategy is to modify the International Plan so that the charges are the same as the normal plan. This will help address any potential dissatisfaction with higher charges for international usage. Communication and asking for feedback often can help to identify and address any issues that may lead to churn. Offering promotions periodically can also help to retain customers, as can focusing on customers experiencing problems in states with high churn rates. Paying attention to your best customers and ensuring they receive the support they need is also important. Regular server maintenance and addressing poor network connectivity issues can also help to reduce churn. Developing a roadmap for onboarding new customers can help to ensure a smooth onboarding process and reduce the likelihood of churn. Analyzing churn when it occurs can provide valuable insights into the factors contributing to churn, which can inform strategies for reducing churn. Finally, it is important to stay competitive by keeping up with industry trends and continuously improving the customer experience.

Through EDA of the churn dataset, it was found that the charge fields are directly related to the minute fields, the area code may not be relevant and can be excluded, customers with the International Plan tend to churn more often, customers who have had four or more customer service calls churn significantly more than other customers, customers with high day and evening minute usage tend to churn at a higher rate, and there is no clear relationship between churn and the number of calls made or received. In conclusion, to reduce churn and improve customer retention, telecom companies should focus on modifying the International Plan, being proactive with communication and asking for feedback, offering promotions, focusing on customers experiencing problems in states with high churn rates, paying attention to their best customers, and continuously improving the customer experience.

# **GitHub Link -**

https://github.com/anuragtaiskar/EDA-on-Telecom-churn

# **Problem Statement**


Telecom churn analysis is the process of identifying customers who are likely to cancel their service or switch to a different service provider. This is an important problem for telecom companies, as churn can have a significant impact on their revenue and profitability.

Orange S.A., formerly France Télécom S.A., is a French multinational telecommunications corporation. The Orange Telecom's Churn Dataset, consists of cleaned customer activity data (features), along with a churn label specifying whether a customer canceled the subscription. Explore and analyze the data to discover key factors responsible for customer churn and come up with ways/recommendations to ensure customer retention.

#### **Define Your Business Objective?**

*Identifying the key cause of the customer churn

*provide steps to retain the valuable customer

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required. 
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits. 
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule. 

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np

# Import Visualization Libraries
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# Import warnings
import warnings
warnings.filterwarnings('ignore')

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')
churn_df=pd.read_csv('/content/drive/MyDrive/EDA on telecom churn/Telecom Churn.csv')

### Dataset First View

In [None]:
# Display first 10 indexes of the dataset
churn_df.head(10)

In [None]:
# Display last 10 indexes of the dataset
churn_df.tail(10)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
#below given code returns a tuple representing the dimensionality of the DataFrame.
print(churn_df.shape)
print("Number of rows are: ",churn_df.shape[0])
print("Number of columns are: ",churn_df.shape[1])

### Dataset Information

In [None]:
# Dataset Info
churn_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
churn_df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
churn_df.isnull().sum()

Visualizing the missing values
No missing value present.Hence, no need to do the missing value imputation

### What did you know about your dataset?

This dataset is from telecom industry. There is threat to the company when customer churn happens. We have to analyze the customer churn and insights of it. And determine why the customer churn is happening.

In this dataset, there are 3333 number of rows and 20 feature columns. Also there are no missing values and duplicate values in dataset.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
columns_list=list(churn_df.columns)
print(columns_list)

In [None]:
# Dataset Describe
churn_df.describe()

### Variables Description 

State : Categorica for the 50 states and capital DC.

Account Length : Number of days account has been active.

Area Code : Code Number of Area having some States included in each area code.

lnternational Plan : International Plan activated ( yes, no )

Voice Mail Plan : Voice Mail Plan activated ( yes ,no )

Voice Mail Message : Count of vmail messages sent.

Day Minutes : Total minutes uesd during day time

Day calls : Total number of calls during day time

Day Charge : Total charge during day time

Evening Minutes : Total minutes used during evening time

Evening Calls : Total number of calls during evening time

Evening Charge : Total charge during evening time

Night Minutes : Total minutes used during night time

Night Calls : Total number of calls during night time
Night Charge : Total charge during night time

International Minutes : Total minutes used of international call

International Calls : Total number of international calls

International Charge : Total charge of international calls

Customer Service calls : Number of calls to customer service

Churn :Customer churn (Target Variable True=1, False=0)

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in columns_list:
  print("No. of unique values in",i,"is",churn_df[i].nunique())

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Churn by each columns
churn_count1 = churn_df.groupby('Churn')
churn_count1.mean()

In [None]:
#calculate charges for each kind of mins:
day_min_charges = churn_df['Total day charge'].mean()/churn_df['Total day minutes'].mean()
eve_min_charges = churn_df['Total eve charge'].mean()/churn_df['Total eve minutes'].mean()
night_min_charges = churn_df['Total night charge'].mean()/churn_df['Total night minutes'].mean()
int_min_charges= churn_df['Total intl charge'].mean()/churn_df['Total intl minutes'].mean()
print(f' Day_min_charge: {day_min_charges}')
print(f'eve_min_charge: {eve_min_charges}')
print(f'night_min_charge: {night_min_charges}')
print(f'Total intl charge: {int_min_charges}') 

In [None]:
# percentage of customer churned out of total customer statewise
State_data = pd.crosstab(churn_df["State"],churn_df["Churn"])
State_data['Churn_%'] = State_data.apply(lambda x : x[1]*100/(x[0]+x[1]),
                                         axis = 1)
print(State_data)

In [None]:
# percentage of customer who churned with respect to total no of cusmtomers witn international plan ON & OFF
International_plan = pd.crosstab(churn_df["International plan"],churn_df["Churn"])
International_plan['Churn_%'] = International_plan.apply(lambda x : x[1]*100/(x[0]+x[1]),axis = 1)
print(International_plan)

In [None]:
# churn rate area code wise
churn_area = pd.DataFrame(churn_df.groupby('Area code')['Churn'].value_counts())
churn_area['churn rate'] = round(churn_area*100/len(churn_df),2)
churn_area
  

In [None]:
#churn rate for customer with and without voice mail paln
Voice_mail = pd.crosstab(churn_df["Voice mail plan"],churn_df["Churn"])
Voice_mail['Churn_%'] = Voice_mail.apply(lambda x : x[1]*100/(x[0]+x[1]),axis = 1)
print(Voice_mail)

### What all manipulations have you done and insights you found?

**customer churned out of total customer statewise**

From the above analysis( CA, NJ, TX, MD, SC, MI ) are the ones with churn rate of higher than 21.
As we can see not all state have approximately same churn , thus package price(charges) cannnot be major factor . The reason for churn rate from a particular state may be due to the low coverage of the cellular network.

**Churn on basis of international plan**

In this we found out that those who has international plan their churn rate is higher, almost 42.41 % customers are churned.

**churn on basis of international plan**

we can see their is no clear relation that we can find between the customers with voice maill plan vs. people who churned 



## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# pie chart of churn column(in percentage)
data = churn_df['Churn'].value_counts()
explode = (0, 0.2)
plt.pie(data, explode = explode,autopct='%1.1f%%',shadow=True,radius = 3.0, 
        labels = ['Not churned customer','Churned customer'],
        colors=['green' ,'red'])
circle = plt.Circle( (0,0), 1, color='white')

##### 1. Why did you pick the specific chart?

Pie charts are especially useful for displaying data that has already been calculated as a percentage of the whole.

##### 2. What is/are the insight(s) found from the chart?

There are 2850 customers which are not churned which is 85.5% of the total customers dataset and 483 customers are churned which is 14.5 % of the whole customers data given in the dataset. 

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

It's easy to loose customers but too difficult to aquire one. One churned cutomer will make 3-4 customers away those might be acquired by your teleservice provider.So, definitely churn rate insight is very helpful for further decisions.

#### Chart - 2

In [None]:
# Histogram for each variable
churn_df.hist(figsize=(16, 20), bins=50, xlabelsize=8, ylabelsize=8)

##### 1. Why did you pick the specific chart?

A histplot is a type of chart that displays the distribution of a dataset. It is a graphical representation of the data that shows how often each value or group of values occurs. Histplots are useful for understanding the distribution of a dataset and identifying patterns or trends in the data. It is also useful when dealing with large data sets (greater than 100 observations). It can help detect any unusual observations (outliers) or any gaps in the data.I used the histogram plot to analysis the variable distributions over the whole dataset whether it's symmetric or not.

##### 2. What is/are the insight(s) found from the chart?

All columns are symmetric distributed and mean is nearly same with median for numerical columns. Here Area code will be treated as categorical value as there are only 3 values in the particular column.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Histogram can not give us whole information regarding data.It's only used for overlooking the distribution of the column data over the dataset.

#### Chart - 3

In [None]:
# Vizualizing Total day minutes vs total day charge
sns.scatterplot(x="Total day minutes", y="Total day charge", hue="Churn", 
                data=churn_df)

In [None]:
# Vizualizing Total eve minutes vs total eve charge
sns.scatterplot(x="Total eve minutes", y="Total eve charge", hue="Churn", 
                data=churn_df)

In [None]:
# Vizualizing Total night minutes vs total night charge
sns.scatterplot(x="Total night minutes", y="Total night charge", hue="Churn", 
                data=churn_df)

##### 1. Why did you pick the specific chart?

Scatter plots are used to plot the relationship between two numerical variables. Scatter plots are useful for identifying patterns and trends in data, and they can be used to visualize the relationship between two variables.
we have used the scatter plot to depict the relationship between evening, day & night calls , minutes and charge.

##### 2. What is/are the insight(s) found from the chart?

Churn customers speak more minutes that non-churn customers at day,evening and night. Hence they pay more charge that non-churn customers.

We can retain churn customers if we include master plan. In master plan if a customer is talking more minutes then we can charge a little less amount from him or he can get discount or additional few free minutes to talk.

This will make customers who are going to churn happy and they will not leave the company.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

For telecom service provider calling and messaging are two essential product plans. Thus, optimizing voice call plans will definitely create a business impact. Those who are using just calling service must be provided some additional offers either in talktime or powerplus plan. Those who use voice call plan for night only, we might offer some exciting plans from midnight 12 to morning 6. For customers those who have higher accout length should be provided exciting offers as they are our loyal customers. Churing of higher account length customer will have a negative impact on business.

#### Chart - 4

In [None]:
# Customer Service Calls 
pd.crosstab(churn_df['Churn'], churn_df["Customer service calls"], margins=True)

In [None]:
# Histogram of Customer Serice calls grouped by churn  
plt.rcParams['figure.figsize'] = (8, 6)
sns.countplot(x="Customer service calls", hue='Churn', data=churn_df);

##### 1. Why did you pick the specific chart?

Count plot are used to compare the size or frequency of different categories or groups of data. Bar charts are useful for comparing data across different categories, and they can be used to display a large amount of data in a small space.we have used the bar plot to show the relationship between churn rate per customer service calls.

##### 2. What is/are the insight(s) found from the chart?

The service calls of customers varies from 0 to 9 .

Those customers who make more service calls they have a high probability of leaving.

As we can see from graph , customers with more than 5, their churning rate is more.

Hence customers who make more than 5 service calls, their queries should be solved immediately and they should be given better service so that they dont leave the company.

Customers with four or more customer service calls churn more than four times as often as do the other customers

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Customer service is an essential factor for every business. So definitely good customer service will have a positive impact to the business. We have to look after the customer calls and customer query report resolution duration. Need to optimize the time period. If one type of issue is coming from more than 5 customers, root cause analysis should be done on that same issue and should be resolved for everyone. Need to reduce the calls for each customer and he should be satisfied in a single call only. The customer service agents should be given great offer or recognition over great performance of customer issue resolution.

#### Chart - 5

In [None]:
# statewise data regarding the no. of churned and not churned customers
sns.set(style="darkgrid")                                                    
plt.figure(figsize=(15,8))
ax = sns.countplot(x='State', hue="Churn", data=churn_df)
plt.show()

##### 1. Why did you pick the specific chart?

Count plot are used to compare the size or frequency of different categories or groups of data. Count plot are useful for comparing data across different categories, and they can be used to display a large amount of data in a small space.To show the average percentage of true churn with respect to states, we have used Count plot.

##### 2. What is/are the insight(s) found from the chart?

There are 51 states having different churn rates .

CA, NJ ,TX , MD ,SC ,MI, MS, NV, WA, ME are the ones who have higher churn rate more than 20% which is more than 50% of average churn rate.

And HI, AK, AZ, VA, IA, LA, NE, IL, WI, RI are the ones who have lower churn rate which is less than 10%.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the data of state wise churning depicts that there are lot of states who are having average churn rate more than 20 % which needs to be studied and look for further analysis to decide which factor are causing the churn.

#### Chart - 6

In [None]:
#One Digit Account Length 
one_length = churn_df[churn_df['Account length']<=9].loc[:,['Churn']].value_counts()
print(one_length)
print(" ")

# Visualizing One Digit Account Length Based on Churn percentage
#color palette selection
colors = sns.color_palette('pastel')[0:7]
textprops = {'fontsize':13}

plt.figure(figsize=(15,7))
# plotting pie chart
plt.pie(one_length, labels=['Not Churn(%)','Churn(%)'], startangle=90, colors=('green','red'), autopct="%1.1f%%",textprops = textprops)
plt.title('One Digit Account Length churn rate', fontsize=18)
plt.show()

In [None]:
# Two Digit Account Length 
two_account=churn_df[(churn_df['Account length']<=99) & (churn_df['Account length']>=10)].loc[:,['Churn']].value_counts()
print(two_account)
print(" ")

# Visualizing Two Digit Account Length Based on Churn percentage
#color palette selection
colors = sns.color_palette('pastel')[0:7]
textprops = {'fontsize':13}

plt.figure(figsize=(15,7))
# plotting pie chart
plt.pie(two_account, labels=['Not Churn(%)','Churn(%)'], startangle=90, colors=('green','red'), autopct="%1.1f%%", textprops = textprops)
plt.title('Two Digit Account Length churn rate', fontsize=18)
plt.show()

In [None]:
# Three Digit Account Length 
three_account=churn_df[(churn_df['Account length']<=churn_df['Account length'].max()) & (churn_df['Account length']>=100)].loc[:,['Churn']].value_counts()
print(three_account)
print(" ")

# Visualizing Three Digit Account Length Based on Churn percentage
#color palette selection
colors = sns.color_palette('pastel')[0:7]
textprops = {'fontsize':13}

plt.figure(figsize=(15,7))
# plotting data on chart using seaborn
plt.pie(three_account, labels=['Not Churn(%)','Churn(%)'],startangle=90 , colors=('green','red'), autopct="%1.1f%%",textprops = textprops)
plt.title('Three Digit Account Length churn rate', fontsize=18)
plt.show()

In [None]:
# Box Plot for Account Length attribute
plt.figure(figsize=(10,8))
sns.boxplot(data=churn_df, x='Churn', y='Account length', showmeans = True)
plt.title('Account Length Boxplot with Churn', fontsize=18)
plt.show()

##### 1. Why did you pick the specific chart?

Pie charts are generally used to show the proportions of a whole, and are especially useful for displaying data that has already been calculated as a percentage of the whole.

So, I used Pie chart and which helped me to get the percentage comparision of the churn percentage account length wise.

A boxplot is used to summarize the key statistical characteristics of a dataset, including the median, quartiles, and range, in a single plot. Boxplots are useful for identifying the presence of outliers in a dataset, comparing the distribution of multiple datasets, and understanding the dispersion of the data. They are often used in statistical analysis and data visualization.

So, I used box plot to get the maximum and minimum value with well sagreggated outliers with well defined mean and median as shown in the box plot graph.

##### 2. What is/are the insight(s) found from the chart?

we can see that Two digit Account Length customers are churning with a number of 225 and Three digit Account Length customers are churning with a number of 256.

So, their churning rate is high.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Account length is the no. of days the customers are active. So for the new customers those churning rate is too low around 8.3% in percentage and number is 2. They might be just using the telecom service to experience the benefits and they might not be satisfied with the service provided and churned.

Those people whose account length are between 10 to 99 are having a churning rate of 14%. The customers below 50 might be treated as new customers and more than 55 and less than 99 they might not be geting benefits from plan taken.

Those people whose account length is more than 100 are like old customers and they might be churning due to no additional offers given to them like power plus plan or other benefits.

So, yes Account Length is also depicting a clear view of churing reasons and insights.

#### Chart - 7

In [None]:
# calculate count of customer with and without international plan
churn_df['International plan'].value_counts()

In [None]:
# pie chart of percenteage of customer with international plan ON/OFF 
data = churn_df['International plan'].value_counts()
explode = (0, 0.2)
plt.pie(data, explode = explode,autopct='%1.1f%%',radius = 2.0, 
        labels = ['International plan: OFF','International plan: ON'],
        colors=['skyblue' ,'orange'])
circle = plt.Circle( (0,0), 1, color='white')


In [None]:
# percentage of customer who churned with respect to total no of cusmtomers witn international plan ON & OFF
International_plan = pd.crosstab(churn_df["International plan"],churn_df["Churn"])
International_plan['Churn_%'] = International_plan.apply(lambda x : x[1]*100/(x[0]+x[1]),axis = 1)
print(International_plan)

In [None]:
#Analysing by using countplot
sns.countplot(x='International plan',hue="Churn",data = churn_df)

##### 1. Why did you pick the specific chart?

Pie charts are generally used to show the proportions of a whole, and are especially useful for displaying data that has already been calculated as a percentage of the whole.

Thus, we used to show the percentage of people taken international plan through pie chart with different colored area under a circle.

Bar charts are used to compare the size or frequency of different categories or groups of data. Bar charts are useful for comparing data across different categories, and they can be used to display a large amount of data in a small space

##### 2. What is/are the insight(s) found from the chart?

Above data shows us that out of customer with international plan 42.4% customer churn.This can due to high charges or connectivity issues, since cutomers already pay more tariff for interational calls compared to normal domestic call,if they have connectivity issues customers are bound to be unsatisfied, which leads to churn.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

the insights found will definitely help for a positive business impact. Those people who have international plan they are paying some additional charges to get the plan but the talk time value charge is same as those customers having no international plan. That might be great reason for more churns for those having international plan.

#### Chart - 8

In [None]:
#curn rate for customer with and without voice mail paln
Voice_mail = pd.crosstab(churn_df["Voice mail plan"],churn_df["Churn"])
Voice_mail['Churn_%'] = Voice_mail.apply(lambda x : x[1]*100/(x[0]+x[1]),axis = 1)
print(Voice_mail)

In [None]:
'''Detailed plot of churned and non churned customer vs. customer with voice 
mail plan ''' 
sns.countplot(x='Voice mail plan',hue="Churn",data = churn_df)

##### 1. Why did you pick the specific chart?

Countplot method is used to Show the counts of observations in each categorical bin using bars.

##### 2. What is/are the insight(s) found from the chart?

Here, as we can see their is no clear relation that we can find between the customers with voice maill plan vs. people who churned 

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, voice mail plan might be considered partially.

#### Chart - 9

In [None]:
# Customer Service Calls 
pd.crosstab(churn_df['Churn'], churn_df["Customer service calls"], margins=True)

In [None]:
#churn rate with respect to no .of customer service calls.
Customer_service = pd.crosstab(churn_df['Customer service calls'],churn_df["Churn"])
Customer_service['Churn_%'] = Customer_service.apply(lambda x : x[1]*100/(x[0]+x[1]),axis = 1)
print(Customer_service)

In [None]:
# Histogram of Customer Serice calls grouped by churn  
plt.rcParams['figure.figsize'] = (8, 6)
sns.countplot(x="Customer service calls", hue='Churn', data=churn_df);

##### 1. Why did you pick the specific chart?

Countplot method is used to Show the counts of observations in each categorical bin using bars.

##### 2. What is/are the insight(s) found from the chart?

It can be stated that the churn rate increases from 4 calls to the service center. Customers who have called customer service three or fewer times have a markedly lower churn rate than that of customers who have called customer service four or more times.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Customer service is an essential factor for every business. So definitely good customer service will have a positive impact to the business. We have to look after the customer calls and customer query report resolution duration. Need to optimize the time period. If one type of issue is coming from more than 5 customers, root cause analysis should be done on that same issue and should be resolved for everyone. Need to reduce the calls for each customer and he should be satisfied in a single call only. The customer service agents should be given great offer or recognition over great performance of customer issue resolution.

#### Chart - 14 - Correlation Heatmap

In [None]:
# find correlation between all colummns and rows of given dataframe.
# Correlation Plot 
corr= churn_df.corr()
corr.style.background_gradient().set_precision(1)

In [None]:
# above data can also the diplayed in more detailed and colourful way by below code
plt.figure(figsize=(20,10))
churn_heatmap=churn_df.corr()
sns.heatmap(abs(churn_heatmap),annot=True)

##### 1. Why did you pick the specific chart?

The correlation coefficient is a measure of the strength and direction of a linear relationship between two variables. A correlation matrix is used to summarize the relationships among a set of variables and is an important tool for data exploration and for selecting which variables to include in a model. The range of correlation is [-1,1].

##### 2. What is/are the insight(s) found from the chart?

From the above correlation heatmap, we can see total day charge & total day minute, total evening charge & total evening minute, total night charge & total night minute are positiveliy highly correlated with a value of 1.

Customer service call is positively correlated only with area code and negative correlated with rest variables.

#### Chart - 15 - Pair Plot 

In [None]:
# Pair Plot
sns.pairplot(churn_df, hue="Churn")
plt.show()

##### 1. Why did you pick the specific chart?

A pairplot, also known as a scatterplot matrix, is a visualization that allows you to visualize the relationships between all pairs of variables in a dataset. It is a useful tool for data exploration because it allows you to quickly see how all of the variables in a dataset are related to one another.

##### 2. What is/are the insight(s) found from the chart?

From the above chart we got to know, there are less linear relationship between variables and data points aren't linearly separable. Churned customers data is clustered and ovearlapped each other. Non churn data are quite symmetrical in nature and churned customer data are quite non symmetric in nature. In this whole pair plot, the importance of area code can be seen and the number of churn with respect to different features are really insightful. Rest insights can be depicted from the above graph.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ? 
Explain Briefly.

* They should improve in covrage area and solving network issues (both domestic as well as international.
* Give discount or create a plan in which as the day call mins increases above certain level the charges(i.e. the tariff per min) decrease means they are charged lower as compared to normal per min tariff.
* lower the interational plan tariff or provide with some discounts.
*They can provide better customer service and provide better problem solution, also take their feedback and work on the feedback suggested by the customers
*	Modify International Plan as the charge is same as normal one.
*	Be proactive with communication.
*	Ask for feedback often.
*	Periodically throw Offers to retain customers.
*	Look at the customers facing problem in  the most churning states.
*	Lean into  best customers. 
*	Regular Server Maintenance.
*	Solving Poor Network Connectivity Issue.
*	Define a roadmap for new customers.
*	Analyze churn when it happens.
*	Stay competitive.

# **Conclusion**

From the above exploratory data analysis this are the following conclusions that can be incurred:
* Some states have higher churn rate than other, for which network issues could the reason because if the competitor company had low tariff for calls then most of the states would have shown the appprox same churn rate.
* Area and Account lenght has no relation with churn rate, hence this columns can be omitted or it can be said that the data is redundant.
* Customers with international plan ON has higher churn rate compared to customerswith international plan OFF , this could be because the customer could be unhappy with th high tariff cost or network issues.
* It could been seen that customers with vmails more than 20 (approx.) has higher churn rate.
* Customers with higher day call mins has higher churn rate compared to other , could be because of the higher charges which is quite obvious, frequent caller might have found some other company offering low tariff .
*With other varaibles such as evening ,night calls no relation could be found.
* The churn rate increases as the call to the service center increases. Customers who have called customer service three or fewer times have a markedly lower churn rate than that of customers who have called customer service four or more times.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***