<a href="https://colab.research.google.com/github/Negiamit034/EDA-Telecom-Churn-Analysis-Project/blob/main/EDA_Telecom_Churn_Analysis_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -**EDA Telecom Churn Analysis Project**



##### **Project Type**    - EDA
##### **Contribution**    - Team
##### **Ananda M**
##### **Amit Negi**

# **Project Summary -**

The project involved analyzing a telecom customer churn dataset to gain insights and provide recommendations for reducing churn. The first step was to familiarize ourselves with the data by importing necessary libraries, loading the dataset, and examining its structure and information. The dataset consisted of 23 columns, including variables such as state, account length, international plan, voice mail plan, call durations, charges, customer service calls, and churn status.
After understanding the dataset, the next step was to explore and understand the variables. Descriptive statistics were analyzed to get a sense of the distribution and range of values for each variable. Additionally, unique values were examined to identify any anomalies or patterns within the data.Data wrangling was performed to manipulate the dataset for further analysis. Column renaming was done to improve clarity, and new columns were created, such as cost per minute, to better understand price distributions across different states. The analysis of the dataset revealed several interesting insights. The account length distribution indicated that most customers had an account length between 90 and 105 days, with a noticeable drop in retention after 105 days. Certain states, including WV, MN, NY, VA, and WY, had higher account lengths, suggesting a correlation between state and customer loyalty. These states also had a larger customer base, indicating a strong customer presence.
Furthermore, states with higher account lengths also had a higher number of customer service calls, implying that these states may be facing more issues requiring resolution. The relatively low churn rate of 16% indicated that the majority of customers had trust in the telecom company.The dataset also provided insights into the usage of international plans and voice mail plans. Around 10% of customers did not use the international plan, suggesting potential areas for improvement in international plan offerings. On the other hand, the data indicated that most customers made use of the voice mail plan, indicating its popularity among the user base.
Correlation analysis revealed that the total day minutes, total day calls, and total day charges were highly correlated, indicating that customers tended to make more calls and have longer conversations during the day. Similar patterns were observed for evening and night durations and charges. Interestingly, the usage of international services did not exhibit a strong correlation with the presence of an international plan, suggesting that other factors might influence international usage.The distribution of customer service calls showed that the majority of customers made a low number of calls, but there were outliers who made significantly higher numbers of calls, possibly indicating more complex or unresolved issues.
Overall, these insights provided valuable information for the telecom company to focus on improving international plans, addressing customer service concerns in specific states, and potentially enhancing the voice mail plan further. The findings also highlighted the importance of customer engagement during the day and the need to ensure excellent service and issue resolution.
By utilizing these insights, the telecom company can develop strategies to reduce customer churn. This may include offering personalized discounts or incentives to at-risk customers, enhancing customer service quality and responsiveness, conducting regular customer surveys to gather feedback, implementing loyalty programs, providing additional value-added services, and adopting targeted marketing campaigns. Furthermore, data-driven approaches can be employed to identify early warning signs of potential churn and take proactive measures to retain customers.
In conclusion, the project analysis of the telecom customer churn dataset provided valuable insights and recommendations for reducing churn. By understanding the customer behavior, preferences, and pain points, the telecom company can make informed decisions to enhance customer satisfaction, improve retention, and ultimately drive business growth.

# **GitHub Link -**

https://github.com/Negiamit034

# **Problem Statement**


**BUSINESS PROBLEM OVERVIEW**

Customer churn prediction is extremely important for any business as it recognizes the clients who are likely to stop using their services.

In the telecom industry, customers are able to choose from multiple service providers and actively switch from one operator to another. In this highly competitive market, the telecommunications industry experiences an average of 15-25% annual churn rate. Given the fact that it costs 5-10 times more to acquire a new customer than to retain an existing one, customer retention has now become even more important than customer acquisition.

For many incumbent operators, retaining high profitable customers is the number one business goal. To reduce customer churn, telecom companies need to predict which customers are at high risk of churn. In this project, you will analyse customer-level data of a leading telecom firm, do exploratory data analysis to identify the main indicators why customers are leaving the company.

#### **Define Your Business Objective?**

**Reducing Customer Churn Rate**

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
from google.colab import drive   #import dataset from google drive
drive.mount('/content/drive')

In [None]:
telecom_df=pd.read_csv('/content/drive/MyDrive/Capstone EDA Telecom Churn Analysis Project /Telecom Churn.csv') #load the dataset

### Dataset First View

In [None]:
telecom_df  # Dataset First Look

In [None]:
telecom_df.head()

In [None]:
telecom_df.tail()

### Dataset Rows & Columns count

In [None]:
telecom_df.shape   # Dataset Rows & Columns count

### Dataset Information

In [None]:
telecom_df.info()  # Dataset Info

#### Duplicate Values

In [None]:
telecom_df[telecom_df.duplicated()].sum()    # Dataset Duplicate Value Count

#### Missing Values/Null Values

In [None]:
null_telecom_df=telecom_df.isnull()

In [None]:
null_sum_telecom_df=telecom_df[telecom_df.isnull()].sum()   # Missing Values/Null Values Count
null_sum_telecom_df

In [None]:
plt.figure(figsize=(10, 6))
sns.heatmap(null_telecom_df, cmap='YlGnBu', cbar=False)
plt.title('Null Values in Dataset')
plt.show() #Visualizing the missing values

### What did you know about your dataset?

The dataset given is a dataset from Telecommunication industry, and we have to analysis the churn of customers and the insights behind it.
Churn prediction is analytical studies on the possibility of a customer abandoning a product or service. The goal is to understand and take steps to change it before the costumer gives up the product or service.

**Info-**
As per till now we can see there are 3333 entries in the Telecom Dataset and there are 20 columns.There is 1 Boolean Datatype,8 Float datatypes,8 interger datatypes,and 3 text/object datatypes.

**Duplicates-**
There is no duplicate values in this Dataset.

**Missing/Null Values-**
There is no null values in the Dataset which help us to get the insights in a good manner


## ***2. Understanding Your Variables***

In [None]:
telecom_df.columns  # Dataset Columns

In [None]:
telecom_df.describe(include='all') # Dataset Describe

### Variables Description

The variable descriptions for the columns in the `telecom_df` DataFrame:

1. State: The state in the United States where the customer resides.
2. Account length: The number of days the customer has been an account holder.
3. Area code: The three-digit area code of the customer's phone number.
4. International plan: Whether the customer has an international calling plan (yes/no).
5. Voice mail plan: Whether the customer has a voicemail plan (yes/no).
6. Number vmail messages: The number of voicemail messages the customer has.
7. Total day minutes: The total number of minutes the customer used during the day.
8. Total day calls: The total number of calls the customer made during the day.
9. Total day charge: The total charge in dollars for the customer's day usage.
10. Total eve minutes: The total number of minutes the customer used during the evening.
11. Total eve calls: The total number of calls the customer made during the evening.
12. Total eve charge: The total charge in dollars for the customer's evening usage.
13. Total night minutes: The total number of minutes the customer used during the night.
14. Total night calls: The total number of calls the customer made during the night.
15. Total night charge: The total charge in dollars for the customer's night usage.
16. Total intl minutes: The total number of international minutes the customer used.
17. Total intl calls: The total number of international calls the customer made.
18. Total intl charge: The total charge in dollars for the customer's international usage.
19. Customer service calls: The number of customer service calls made by the customer.
20. Churn: Whether the customer churned (cancelled the service) (True/False).


### Check Unique Values for each variable.

In [None]:
telecom_df.nunique() #Check Unique Values for each variable.

In [None]:
telecom_df['State'].unique()

In [None]:
telecom_df['Account length'].unique()

In [None]:
telecom_df['Area code'].unique()

In [None]:
telecom_df['International plan'].unique()

In [None]:
telecom_df['Voice mail plan'].unique()

In [None]:
telecom_df['Total day minutes'].unique()

In [None]:
telecom_df['Total day calls'].unique()

In [None]:
telecom_df['Total day charge'].unique()

In [None]:
telecom_df['Total eve minutes'].unique()

In [None]:
telecom_df['Total eve calls'].unique()

In [None]:
telecom_df['Total eve charge'].unique()

In [None]:
telecom_df['Total night minutes'].unique()

In [None]:
telecom_df['Total intl minutes'].unique()

In [None]:
telecom_df['Total intl calls'].unique()

In [None]:
telecom_df['Total intl charge'].unique()

In [None]:
telecom_df['Customer service calls'].unique()

In [None]:
telecom_df['Churn'].unique()

In [None]:
telecom_df['Day cost per minute'].unique()

In [None]:
telecom_df['Eve cost per minute'].unique()

In [None]:
telecom_df['Night cost per minute'].unique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
telecom_df['Account length'].value_counts().head() #total count account length

In [None]:
telecom_df['Account length'].value_counts().tail() #total count account length for lowest 5 states

In [None]:
telecom_df.groupby(['State']).agg({'Account length':'sum'}).sort_values(by='Account length', ascending=False).head(10) #total account length statewise for top 10 states


In [None]:
x=telecom_df.groupby(['State']).agg({'State':'count'}) #most top 10 states where orange telecomm's is used
x['State'].sort_values(ascending=False).head(10)


In [None]:
telecom_df.groupby(['State']).agg({'Customer service calls':'count'}).sort_values(by='Customer service calls', ascending=False).head(10)  #total customer service calls called by the customers in different states

In [None]:
telecom_df['Churn'].value_counts()  #total no. of churn

In [None]:
telecom_df['International plan'].value_counts() #counting the internation plan used by the customers

In [None]:
telecom_df['Voice mail plan'].value_counts() #counting the voice mail plan used by the customers

In [None]:
st_ch_intpln=telecom_df.groupby(['State','Churn','International plan']).agg({'Churn': 'count'})  #to the count of churn group wise using state,churn,international plan

In [None]:
st_ch_intpln['Churn'].sort_values(ascending=False)  #sorting of variable descending wise

In [None]:
telecom_df.groupby(['State']).agg({'Total day charge': 'mean','Total eve charge':'mean','Total night charge':'mean','Total intl charge':'mean'})   #Total day charge,Total eve charge,Total night charge,Total intl charge mean values state wise

In [None]:
telecom_df.corr() #to find the correlation between the columns

**Renaming the Columns**

In [None]:
telecom_df.rename(columns={'Number vmail messages':'Number Of Mail Messages'},inplace=True) #renaming the number vmail messages

In [None]:
telecom_df.columns #to check the columns name

**Feature Engineering**

In [None]:
telecom_df.head()

In [None]:
telecom_df['Day cost per minute']=round(telecom_df['Total day minutes']/telecom_df['Total day charge'],2) #created a column day cost per minute

In [None]:
telecom_df['Day cost per minute'].value_counts() #to check the column value counts

In [None]:
telecom_df['Eve cost per minute']=round(telecom_df['Total eve minutes']/telecom_df['Total eve charge'],2) #created a column eve cost per minute

In [None]:
telecom_df['Eve cost per minute'].value_counts() #to check the value counts

In [None]:
telecom_df['Night cost per minute']=round(telecom_df['Total night minutes']/telecom_df['Total night charge'],2) #created a column night cost per minute

In [None]:
telecom_df['Night cost per minute'].value_counts() #to check the value counts of the column

In [None]:
telecom_df[telecom_df['Day cost per minute'].isna()] #to check the null values in the new column

In [None]:
x=0  #after analysing the column filled the null values with 0
telecom_df['Day cost per minute'].fillna(x,inplace=True)

In [None]:
telecom_df.info() #to check the null values

In [None]:
telecom_df[telecom_df['Eve cost per minute'].isna()] #to check the null values in the new column

In [None]:
x=0  #after analysing the column filled the null values with 0
telecom_df['Eve cost per minute'].fillna(x,inplace=True)

In [None]:
telecom_df.info()  #to check the null values

### What all manipulations have you done and insights you found?

The manupulation we have done in this dataset that we renamed the column name and created a new columns cost per minute wise which will help us to identify the distributions of price among different states.

The analysis of the dataset reveals interesting insights. The account length distribution suggests that most customers have an account length between 90-105 days, with a significant drop in retention after 105 days. States such as WV, MN, NY, VA, and WY have higher account lengths, indicating a correlation between state and customer loyalty. These states also have a higher number of users, indicating a strong customer base.

Furthermore, the states with higher account lengths also have a higher number of customer service calls, suggesting that these states may be facing more issues that require resolution. The relatively low churn rate of 16% indicates that the majority of customers trust the telecom company.

Additionally, around 10% of customers do not use the international plan, indicating potential areas for improvement in international plan offerings. On the other hand, the data suggests that customers generally make use of the voice mail plan, highlighting its popularity.

These insights provide valuable information for the telecom company to focus on improving international plans, addressing customer service concerns in specific states, and potentially enhancing the voice mail plan further.
Upon further analysis of the dataset, it is evident that the total day minutes, total day calls, and total day charge columns are highly correlated, indicating that customers tend to make more calls and have longer conversations during the day. A similar pattern can be observed for the evening and night durations and charges.

Interestingly, the total international minutes and total international calls do not exhibit a strong correlation with the international plan column, suggesting that usage of international services may not solely depend on having an international plan.

The distribution of customer service calls indicates that the majority of customers make a low number of calls. However, there are a few outliers who make a significantly higher number of calls, potentially indicating more complex or unresolved issues.

Overall, these additional insights shed light on the calling patterns, international usage, and customer service interactions within the telecom company. Further exploration and analysis of these factors could provide valuable information for refining service offerings, addressing customer needs, and improving overall customer satisfaction.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

**Histogram Chart-Distribution of Total Day Minutes(Univariate)**


In [None]:
plt.figure(figsize=(8, 6)) #visualisation code
plt.hist(telecom_df['Total day minutes'], bins=20, color='skyblue', edgecolor='black')
plt.title('Distribution of Total Day Minutes')
plt.xlabel('Total Day Minutes')
plt.ylabel('Frequency')
plt.show()

##### 1. Why did you pick the specific chart?

The reason behind to choose this histogram is to show the distribution of the total day minutes.

##### 2. What is/are the insight(s) found from the chart?

The insight found from the chart is that total day minutes is high in between 150 and 200 so we can assume that in day cost per min is less so people prefer calls in day more as compared to others.

##### 3. Will the gained insights help creating a positive business impact?


Yes,it will help in creating a positive bussiness impact.

**Pie Chart-Distribution of International Plan(Univariate)**

In [None]:
telecom_df['International plan'].value_counts().plot(kind='pie', autopct='%1.1f%%')  #visualisation code
plt.title('Distribution of International Plan')
plt.ylabel('')
plt.show()

##### 1. Why did you pick the specific chart?

The reason behind the specific chart is to show the proportionality of the Distribution of International plan if it is YES and NO.

##### 2. What is/are the insight(s) found from the chart?

As we can see in the chart 9.7% of the people choose International Plan and 90.3% of the people don't choose any international plan.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This insights will lead to postive growth so we can change the plan for the customer if they are intrested in international calls by giving them more offer or charge less on a cost per minute.

####Bar Chart-Count of Customers per State(Univariate)

In [None]:
#visualization code
plt.figure(figsize=(20,10))  # Adjust the figure size as needed
sns.countplot(data=telecom_df, x='State')
plt.title('Count of Customers per State')
plt.xlabel('State')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

The reason behind to choose this chart is to show in which states the no of customers are there which help us to analyse in which state where we have to focus more

##### 2. What is/are the insight(s) found from the chart?

As we can see in the chart in state "WV" there is high customer as compared to others and in "CA" is very less customers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from analyzing the top states where orange telecomm's service is used the most can potentially create a positive business impact. By focusing on these states, the company can allocate resources, tailor marketing strategies, and provide localized offerings to effectively target the customer base. However, it is important to consider factors such as market saturation, competition, customer satisfaction, and external influences that could lead to negative growth. Careful analysis and proactive measures are necessary to address these challenges and ensure sustained positive growth in the telecom market.

####Box Plot-Total Day Charge by Churn Status(Univariate)

In [None]:
sns.boxplot(data=telecom_df, x='Churn', y='Total day charge') #visualization code
plt.title('Total Day Charge by Churn Status')
plt.xlabel('Churn')
plt.ylabel('Total Day Charge')
plt.show()

##### 1. Why did you pick the specific chart?

The reason behind using Box Plot is to gain insights about the distribution, central tendency, variability, and potential outliers in the data.

##### 2. What is/are the insight(s) found from the chart?

 A symmetrical box indicates a relatively balanced distribution, while a skewed box may suggest an asymmetrical distribution.The position of the median line within the box provides information about the central tendency of the data.the median is closer to the bottom of the box, the distribution is positively skewed, and it is closer to the top, the distribution is negatively skewed.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The skewed distribution of Total Day Charge by Churn Status, specifically for customers with a "False" churn status, suggests a potential negative impact on growth. Higher day charges for these customers may indicate dissatisfaction or increased costs, potentially leading to a higher churn rate. It is crucial to investigate the underlying reasons behind the skewed distribution and take corrective actions. By addressing concerns related to service quality, pricing, or customer support, the company can mitigate the risk of negative growth and work towards creating a positive business impact.

####Bar Chart-Churn Status by State(Bivariate)

In [None]:
plt.figure(figsize=(12, 10))
sns.countplot(data=telecom_df, x='State', hue='Churn', palette='viridis')
plt.title('Churn Status by State', fontsize=16)
plt.xlabel('State', fontsize=14)
plt.ylabel('Count', fontsize=14)
plt.xticks(rotation=90, fontsize=12)
plt.legend(['False', 'True'], loc='upper right', fontsize=12)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

The specific reason to pick the chart is to compare the churn status state wise.



##### 2. What is/are the insight(s) found from the chart?

From the chart showing the churn count by state, one insight that can be observed is that the churn count is relatively lower in the state of Iowa (IA) compared to other states. This suggests that customers from Iowa have a relatively lower likelihood of churning compared to customers from other states. This insight can be valuable for the business to understand the factors or strategies that contribute to higher customer retention in Iowa, which can then be leveraged to improve customer retention in other states.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insight that customers from Iowa have a lower churn count can potentially create a positive business impact by understanding factors for higher customer retention. However, without further analysis, it is challenging to identify specific insights leading to negative growth. Multiple factors like customer satisfaction, service quality, and pricing need consideration to determine if any insights from the chart would have a negative impact on growth.

####Line Chart-Average Total Day Minutes by State(Bivariate)

In [None]:
states = telecom_df['State'].unique() #code for the line chart
plt.figure(figsize=(20,10))
telecom_df.groupby('State')['Total day minutes'].mean().plot(kind='line', marker='o')
plt.title('Average Total Day Minutes by State')
plt.xlabel('State')
plt.ylabel('Average Total Day Minutes')
plt.xticks(np.arange(len(states)), states, rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

A line plot was chosen to visualize the average total day minutes by state because it allows for easy comparison between states, shows variations in values, and can reveal trends or patterns.

##### 2. What is/are the insight(s) found from the chart?

Insights from the chart indicate that Vermont (VT), Arizona (AZ), and Wisconsin (WI) have higher average total day minutes, while North Carolina (NC) and Washington (WA) have lower average total day minutes. These variations in call durations across states can inform targeted marketing strategies and service offerings to cater to customer preferences in each region.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights on variations in average total day minutes by state can positively impact the business by enabling targeted strategies.

####Scatter Plot-Total Day Minutes vs. Total Eve Minutes(Bivariate)

In [None]:
sns.set(style='whitegrid', font_scale=1.2)
# Create the scatter plot with customized settings
plt.figure(figsize=(10, 8))  # Adjust the figure size as needed
sns.scatterplot(data=telecom_df, x='Total day minutes', y='Total eve minutes', alpha=0.7, edgecolor='k')
plt.title('Total Day Minutes vs. Total Eve Minutes', fontsize=16)
plt.xlabel('Total Day Minutes')
plt.ylabel('Total Eve Minutes')
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.grid(True, linestyle='--')

# Add a regression line
sns.regplot(data=telecom_df, x='Total day minutes', y='Total eve minutes', scatter=False, color='r')

# Show the plot
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

The scatter plot with a horizontal line was chosen to visualize the relationship between 'Total Day Minutes' and 'Total Eve Minutes' and assess any outliers. The plot helps identify any patterns or correlations between the variables, and the added mean line provides a reference point for comparison. This chart enables insights into usage patterns, outlier detection, and decision-making regarding service offerings or customer engagement strategies.

##### 2. What is/are the insight(s) found from the chart?

The scatter plot comparing 'Total Day Minutes' and 'Total Eve Minutes' indicates a positive correlation between the two variables. The red horizontal line represents the mean value of 'Total Eve Minutes'. Outliers can be identified, and the plot provides insights into the distribution and relationship between the variables, helping understand the usage patterns and potential deviations from the average evening minutes.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

yes it helps to gain the insights.from the chart we can see both are correlated with each other so we have to consider both before giving the plans.

**Line Chart-Mean Total Night Charge by Area Code(Bivariate)**

In [None]:
plt.figure(figsize=(8, 6)) #data visualisation
mean_night_charge = telecom_df.groupby('Area code')['Total night charge'].mean()
plt.plot(mean_night_charge.index, mean_night_charge, marker='o')
plt.title('Mean Total Night Charge by Area Code')
plt.xlabel('Area Code')
plt.ylabel('Mean Total Night Charge')
plt.xticks(mean_night_charge.index)
plt.grid(True, linestyle='--')
plt.show()

##### 1. Why did you pick the specific chart?

The main motive to choose this line chart to show the mean total night charge area code to know the trend in different area code.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals that the mean total night charge for area code 415 is higher compared to area codes 510 and 408, indicating an imbalance in pricing or usage patterns. This suggests the need to equalize the mean total night charge among the area codes to ensure fairness. Further analysis can be done to understand the factors contributing to the disparity and develop strategies to achieve pricing consistency across all area codes.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insight of the disparity in mean total night charges can positively impact the business by promoting pricing fairness and enhancing customer satisfaction. However, without considering other factors, it is challenging to identify specific insights that may lead to negative growth. A comprehensive analysis is necessary to assess potential negative growth implications, considering various factors like market dynamics, customer demand, and competitive landscape.

####Bar Chart-Average Customer Service Calls by Churn Status(Bivariate)

In [None]:
plt.figure(figsize=(8, 6))   #data visualisation code
churn_counts = telecom_df.groupby('Churn')['Customer service calls'].mean()
bar_width = 0.35
index = np.arange(len(churn_counts.index))
plt.bar(index, churn_counts, width=bar_width, label='Customer Service Calls')
plt.title('Average Customer Service Calls by Churn Status')
plt.xlabel('Churn')
plt.ylabel('Average Customer Service Calls')
plt.xticks(index, churn_counts.index)
plt.legend()
plt.grid(True, linestyle='--')
plt.show()

##### 1. Why did you pick the specific chart?

The grouped bar chart was chosen to facilitate easy comparison of the average customer service calls between different churn statuses. This chart effectively represents the categorical churn status variable and the numerical average customer service calls. Its simplicity and familiar format allow for easy interpretation and understanding by a wide range of audiences.

##### 2. What is/are the insight(s) found from the chart?

The insight from the chart is that, on average, customers who churn (churn=True) have higher customer service call volumes compared to customers who do not churn (churn=False). However, the difference in average customer service calls between the two churn statuses is relatively moderate, with a range of only 1.7

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Insights of higher customer service calls for churned customers can help improve retention but no specific negative growth insights identified.

**Area Chart-Cumulative Sum of Total Intl Minutes by State(Bivariate)**

In [None]:
df = telecom_df.groupby('State')['Total intl minutes'].sum().sort_values(ascending=False)  # visualization code
df.cumsum().plot(kind='area')
plt.title('Cumulative Sum of Total Intl Minutes by State')
plt.xlabel('States')
plt.ylabel('Cumulative Sum of Total Intl Minutes')
plt.show()

##### 1. Why did you pick the specific chart?

The area plot was chosen to visualize the cumulative sum of total international minutes by state as it effectively showcases the progression and accumulation of minutes over the different states. The filled area under the curve emphasizes the overall magnitude and allows for easy comparison of the cumulative sums between states.

##### 2. What is/are the insight(s) found from the chart?

The insight from the chart is that states like CA, DC, NE, MS, MD, and WV have a higher cumulative sum of total international minutes. This indicates potential higher demand or usage of international calling services in these states compared to others.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insight of higher cumulative international minutes in certain states can potentially help create a positive business impact by highlighting regions with higher demand. No specific insights indicating negative growth are identified from the given information.

####Clustered Bar Chart-Churn by Voice Mail Plan and International Plan(Multivariate)

In [None]:
df = telecom_df.groupby(['Churn', 'Voice mail plan', 'International plan']).size().unstack() #visualization code
df.plot(kind='bar', stacked=True)
plt.title('Churn by Voice Mail Plan and International Plan')
plt.xlabel('Churn')
plt.ylabel('Count')
plt.legend(title='Plans', loc='upper right')
plt.xticks(rotation=0)
plt.show()


##### 1. Why did you pick the specific chart?

The specific chart, a stacked bar chart comparing churn by voice mail plan and international plan, was chosen to visually depict the count of each plan combination and their distribution across churn status. It effectively illustrates the relationships between churn and the presence/absence of voice mail and international plans.

##### 2. What is/are the insight(s) found from the chart?

Firstly, the majority of customers, indicated by the highest count, do not have a voice mail plan or an international plan (False, No). This suggests a significant portion of customers opt not to subscribe to these additional services. Secondly, the count decreases progressively from (False, Yes) to (True, No) to (True, Yes), indicating varying levels of adoption for voice mail and international plans among churned and non-churned customers. Overall, the chart highlights the distribution of plan combinations and their association with churn status, providing valuable insights into customer preferences and potential factors influencing churn rates.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights can help create a positive business impact by identifying plan preferences.

**Bar Chart-Comparison of Cost per Minute(Multivariate)**

In [None]:
cost_per_minute = ['Day cost per minute', 'Eve cost per minute', 'Night cost per minute']  # Data for comparison
values = [telecom_df['Day cost per minute'].mean(),
          telecom_df['Eve cost per minute'].mean(),
          telecom_df['Night cost per minute'].mean()]
# Setting the figure size
plt.figure(figsize=(10, 6))
# Creating the bar chart using Seaborn
sns.barplot(x=cost_per_minute, y=values)
plt.xlabel('Cost per minute')
plt.ylabel('Average cost')
plt.title('Comparison of Cost per Minute')
plt.show()

##### 1. Why did you pick the specific chart?

The specific bar chart was chosen to compare the average cost per minute for different time periods (day, evening, and night) in a visually clear and concise manner.

##### 2. What is/are the insight(s) found from the chart?

The insight from the chart is that the average cost per minute follows a pattern, with the night cost per minute being the highest, followed by the evening cost per minute, and the day cost per minute being the lowest. This indicates that the cost of phone calls tends to be higher during the night hours compared to the evening and day hours.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights regarding the varying average cost per minute for different time periods can potentially help create a positive business impact. Businesses can strategically adjust pricing or promotional offers based on these insights to optimize revenue and customer satisfaction.

**Line Chart-Total day calls,Total eve calls,Total night calls for top 10 States**

In [None]:
fig, axes = plt.subplots(3, 1, figsize=(12, 10))

# Selecting the top 10 states based on total call counts
top_10_states = telecom_df.groupby('State')[['Total day calls', 'Total eve calls', 'Total night calls']].sum().sum(axis=1).nlargest(10).index

# Filtering the data for the top 10 states
top_10_data = telecom_df.loc[telecom_df['State'].isin(top_10_states)]

# Grouping the filtered data by state and calculating the mean of call counts
call_counts = top_10_data.groupby('State')[['Total day calls', 'Total eve calls', 'Total night calls']].mean()

# Plotting the line charts for each call type
for i, call_type in enumerate(call_counts.columns):
    sns.lineplot(data=call_counts, x=call_counts.index, y=call_type, ax=axes[i])
    axes[i].set_title(f'Average {call_type} by State (Top 10 States)')
    axes[i].set_xlabel('State')
    axes[i].set_ylabel('Average Call Count')
    axes[i].set_xticks(range(len(call_counts.index)))
    axes[i].set_xticklabels(call_counts.index, rotation=45)

plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

I picked the specific chart, which is a line chart, because it allows for the comparison of trends or changes over time or different categories. In this case, the line chart is used to compare the average call counts across different call types (total day calls, total eve calls, and total night calls) for the top 10 states. It helps visualize any variations or patterns in call counts for each call type across the selected states.

##### 2. What is/are the insight(s) found from the chart?

The insights from the chart show the average call counts for 'Total day calls', 'Total eve calls', and 'Total night calls' across the top 10 states, allowing for comparisons of call activity between different call types and states.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes,Positive impact Insights can inform targeted strategies for call optimization.

**Correlation Heatmap-Correlation Matrix for Numeric Columns(Multivariate)**

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize=(20, 10))  # Adjust the figure size as needed
numeric_columns = telecom_df.select_dtypes(include=['float64', 'int64']).columns
sns.heatmap(telecom_df[numeric_columns].corr(), cmap='coolwarm', annot=True)
plt.title('Correlation Matrix for Numeric Columns')
plt.show()

##### 1. Why did you pick the specific chart?

The correlation matrix provides an overview of the relationships between variables in the dataset. It helps identify patterns and dependencies.

##### 2. What is/are the insight(s) found from the chart?

From the correlation matrix, we can observe the strength and direction of the relationships between different variables. For example, there is a positive correlation between total day minutes and total day charge, indicating that longer calls result in higher charges. Additionally, there is a negative correlation between customer service calls and churn, suggesting that higher customer service calls are associated with lower churn rates.

####Pair Plot-Pairwise Relationships of Numeric Columns(Multivariate)

In [None]:
numeric_columns = telecom_df.select_dtypes(include=['float64', 'int64']).columns

# Set style and context for improved aesthetics
sns.set(style='ticks', font_scale=1.2)

# Create the pair plot with customized settings
g = sns.pairplot(telecom_df[numeric_columns], diag_kind='kde', plot_kws={'alpha': 0.6, 's': 30}, corner=True)

# Customize plot titles and labels
g.fig.suptitle('Pairwise Relationships of Numeric Columns', y=1.03, fontsize=16)
g.set(xlabel='', ylabel='')

# Adjust subplot spacing
g.fig.subplots_adjust(top=0.92, bottom=0.08, left=0.08, right=0.92, hspace=0.2, wspace=0.2)

# Save the plot as a PNG image
g.savefig('pairplot.png', dpi=300)

# Show a message indicating the image is saved
print("Pair plot image saved as 'pairplot.png'")

##### 1. Why did you pick the specific chart?

The pair plot was chosen because it allows for visual exploration of pairwise relationships between multiple numeric variables, helping to identify potential correlations and patterns in the data.

##### 2. What is/are the insight(s) found from the chart?

The pair plot provides insights into the relationships between different numeric variables in the dataset. It helps identify correlations, such as a positive relationship between total day minutes and total day charge, and can reveal patterns or clusters that may exist within the data.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

**Solution to Reduce Customer Churn**

1. Offer personalized discounts or incentives to customers who are at a higher risk of churning.
2. Enhance the quality and reliability of customer service to ensure prompt issue resolution and customer satisfaction.
3. Conduct customer surveys and implement improvements based on feedback to address pain points and enhance overall experience.
4. Implement a loyalty program to reward long-term customers and encourage their continued loyalty.
5. Provide additional value-added services or features that differentiate your offering from competitors.
6. Offer flexible pricing plans or options that cater to the diverse needs and budgets of customers.
7. Implement targeted marketing campaigns to re-engage inactive or at-risk customers.
8. Develop a customer retention team dedicated to monitoring and proactively addressing customer churn.
9. Leverage data analytics to identify early warning signs of potential churn and take preventive measures.
10. Foster a customer-centric culture within the organization, focusing on building strong relationships and delivering exceptional customer experiences.

These strategies can help reduce customer churn by improving customer satisfaction, addressing pain points, and providing added value to retain customers in a highly competitive market.

# **Conclusion**

1. Charge fields exhibit a direct and predictable relationship with minute fields, indicating a linear dependency between these variables.

2. The area code field and/or the state field display unusual patterns and can be excluded from the analysis without significant impact on the overall results.

3. Customers who have opted for the International Plan are observed to have a higher likelihood of churning compared to those who haven't chosen this plan.

4. Notably, customers who make four or more customer service calls exhibit a significantly higher churn rate, more than four times that of other customers.

5. Customers with elevated day minutes and evening minutes experience a notably higher churn rate when compared to other customers.

6. Surprisingly, no apparent correlation between churn and variables such as day calls, evening calls, night calls, international calls, night minutes, international minutes, account length, or voice mail messages has been observed. These variables do not seem to directly influence customer churn in a discernible manner.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***