<a href="https://colab.research.google.com/github/Sahariya55/EDA_of_Telecom_Churn/blob/master/EDA_of_Telecom_Churn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -  **Telecom Churn Analysis**



##### **Project Type**    - EDA
##### **Contribution**    - Individual

# **Project Summary -**

**Introduction:**

Orange S.A., a prominent player in the telecommunications industry, faces the challenge of retaining its customer base amidst fierce competition and evolving consumer preferences. In an era where customer retention directly impacts revenue and market share, understanding the factors influencing churn is crucial. This project aims to delve into Orange Telecom's Churn Dataset, leveraging data analysis techniques to uncover insights and devise effective strategies for enhancing customer retention.

**Data Exploration and Analysis:**

The dataset comprises meticulously cleaned customer activity data, encompassing various features alongside a churn label indicating subscription cancellation. Through exploratory data analysis (EDA) techniques, we identified key patterns and trends correlating with churn behavior. These include factors such as usage patterns, service satisfaction levels, contract specifics, and customer demographics.

# Project Activities :-

*   Defining the Problem Statement
*   Defining Business Objective
*   Knowing and Understanding the Data
*   Understanding the Variables
*   Data Wrangling
*   Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables
*   Solution to Business Objective
*   Conclusion


# **GitHub Link -**

https://github.com/Sahariya55/EDA_of_Telecom_Churn

# **Problem Statement**


In the fiercely competitive telecommunications landscape, Orange S.A. faces a critical challenge: customer churn. As subscribers opt to discontinue their services, the company experiences not only a loss in revenue but also a dent in its market share and brand reputation. To address this pressing issue, Orange S.A. requires a thorough understanding of the factors driving churn and effective strategies to mitigate it.

#### **Define Your Business Objective?**

The primary objective is to reduce customer churn and enhance retention rates within Orange Telecom's subscriber base. This entails:

1. Identifying the key factors contributing to customer churn through comprehensive data analysis.
2. Developing actionable insights to inform targeted retention strategies.
3. Implementing initiatives aimed at improving customer satisfaction, loyalty, and long-term engagement.
4. Ultimately, increasing customer lifetime value (CLV) and fortifying Orange S.A.'s position in the telecommunications market.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
#import visualization packages
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

### Dataset Loading

In [None]:
#mount the drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load Dataset
#insert the data file
working_dir_path='/content/drive/MyDrive/Colab Notebooks/Personal_Project/Telecom Churn.csv'
telecom_df= pd.read_csv(working_dir_path)

### Dataset First View

In [None]:
# Dataset First Look
# Viewing the data of top 5 rows to look the glimps of the data
telecom_df.head(5)

In [None]:
# View the data of bottom 5 rows to look the glimps of the data
telecom_df.tail(5)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
telecom_df.shape

### Dataset Information

In [None]:
# Dataset Info
telecom_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(telecom_df[telecom_df.duplicated()])

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
null_val = telecom_df.isnull().sum()
print(null_val)

In [None]:
# Visualizing the missing values
sns.heatmap(telecom_df.isnull(),cbar=False)

### What did you know about your dataset?


1. **Rows & Columns Count:** The dataset consists of 3333 rows and 20 columns.

2. **Data Types:** The dataset contains a mix of data types:
   - 1 boolean column (`Churn`)
   - 3 object columns (`State`, `International plan`, `Voice mail plan`)
   - 8 integer columns (`Account length`, `Area code`, `Number vmail messages`, `Total day calls`, `Total eve calls`, `Total night calls`, `Total intl calls`, `Customer service calls`)
   - 8 float columns (`Total day minutes`, `Total day charge`, `Total eve minutes`, `Total eve charge`, `Total night minutes`, `Total night charge`, `Total intl minutes`, `Total intl charge`)

3. **Missing Values:** There are no missing or null values in the dataset, as indicated by the absence of null values in all columns.

4. **Duplicate Values:** There are no duplicate values in the dataset, ensuring data integrity and reliability.

5. **Target Variable:** The target variable for analysis and prediction is the `Churn` column, which indicates whether a customer has churned (True/False).

6. **Features:** The dataset contains various features such as account length, usage statistics (daytime, evening, and nighttime), international calling details, voicemail usage, customer service calls, and geographic information (state and area code).

7. **Objective:** The objective of analyzing this dataset is to uncover insights into customer churn behavior and identify factors influencing churn. This will facilitate the development of effective retention strategies aimed at reducing churn and improving customer satisfaction and loyalty.

Overall, the dataset appears to be well-structured, with no missing or duplicate values, providing a solid foundation for exploratory data analysis and predictive modeling to address the business challenge of customer churn.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
telecom_df.columns

In [None]:
# Dataset Describe
telecom_df.describe(include='all')

### Variables Description

**STATE:** 51 Unique States name

**Account Length:** Length of The Account

**Area Code:** Code Number of Area having some States

**International Plan:** Yes Indicate International Plan is Present and No Indicates no subscription for Internatinal Plan

**Voice Mail Plan:** Yes Indicates Voice Mail Plan is Present and No Indicates no subscription for Voice Mail Plan

**Number vmail messages:** Number of Voice Mail Messages ranging from 0 to 50

**Total day minutes:** Total Number of Minutes Spent in Morning

**Total day calls:** Total Number of Calls made in Morning.

**Total day charge:** Total Charge to the Customers in Morning.

**Total eve minutes:** Total Number of Minutes Spent in Evening

**Total eve calls:** Total Number of Calls made r in Evening.

**Total eve charge:** Total Charge to the Customers in Morning.

**Total night minutes:** Total Number of Minutes Spent in the Night.

**Total night calls:** Total Number of Calls made in Night.

**Total night charge:** Total Charge to the Customers in Night.


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for item in list(telecom_df.columns):
  print(f"Column name {item}-No. of unique values: {telecom_df[item].nunique()}")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# make a copy  of the dataset
telecom_df_copy = telecom_df.copy()

In [None]:
telecom_df_copy.head()

In [None]:
telecom_df_copy.shape

### What all manipulations have you done and insights you found?



*   we make a copy of the original dataset to work
*   The dataset is already cleaned no missing values , no null values , no duplicate values .



## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

# **Does account length impact churn rate?**

#### Chart - 1 - **Visualize the distribution of account lengths for churned and non-churned customers**

In [None]:
# Filter the dataset into churned and non-churned customers
churned_customers = telecom_df_copy[telecom_df_copy['Churn'] == True]
non_churned_customers = telecom_df_copy[telecom_df_copy['Churn'] == False]
print(churned_customers)
print(non_churned_customers)

In [None]:
# Chart - 1 visualization code

# Set up the figure and axes
fig, ax = plt.subplots(figsize=(10, 6))

# Plot histograms for account lengths of churned and non-churned customers
sns.histplot(data=churned_customers, x='Account length', color='red', label='Churned', kde=True, ax=ax)
sns.histplot(data=non_churned_customers, x='Account length', color='blue', label='Non-Churned', kde=True, ax=ax)

# Add labels and title
ax.set_xlabel('Account Length')
ax.set_ylabel('Frequency')
ax.set_title('Distribution of Account Lengths for Churned and Non-Churned Customers')

# Add legend
plt.legend()

# Show plot
plt.show()

##### 1. Why did you pick the specific chart?

Histograms are effective for comparing the distributions of numerical variables across different groups. In this case, we want to compare the distribution of account lengths between churned and non-churned customers.

##### 2. What is/are the insight(s) found from the chart?

1. **Account Length Distribution:**
   - For both churned and non-churned customers, the distribution of account lengths appears to be right-skewed, with a higher frequency of shorter account lengths and a gradual decrease in frequency as account length increases.
   - This suggests that the majority of customers, regardless of churn status, have relatively shorter account lengths.

2. **Churned vs. Non-Churned Customers:**
   - The histogram highlights the differences in account length distributions between churned and non-churned customers.
   - Churned customers seem to have a relatively higher frequency of shorter account lengths compared to non-churned customers.
   - Non-churned customers exhibit a more even distribution across different account lengths, with a slight decrease in frequency for longer account lengths.

3. **Potential Implications:**
   - The higher frequency of shorter account lengths among churned customers may indicate that customers who have been subscribed for a shorter duration are more likely to churn.
   - Longer account lengths among non-churned customers could suggest greater loyalty or satisfaction with the service, leading to higher retention rates.
   - Understanding these differences in account length distributions can help Orange Telecom tailor retention strategies to address the needs and preferences of customers at different stages of their subscription.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. **Tailored Retention Strategies:** Understanding that customers with shorter account lengths are more likely to churn allows Orange Telecom to focus its retention efforts on this segment of customers. By identifying and addressing the specific needs and pain points of new subscribers, Orange Telecom can implement targeted retention strategies aimed at improving customer satisfaction and loyalty during the early stages of their subscription.

2. **Enhanced Customer Engagement:** Recognizing the differences in account length distributions between churned and non-churned customers enables Orange Telecom to tailor its communication and engagement strategies accordingly. For example, Orange Telecom can implement personalized onboarding processes and proactive outreach campaigns to engage new subscribers and foster stronger relationships from the outset.

3. **Improved Customer Lifetime Value (CLV):** By reducing churn among customers with shorter account lengths, Orange Telecom can increase the average customer lifetime value (CLV) over time. Longer-lasting customer relationships contribute to higher revenue streams and profitability for the company, as loyal customers continue to utilize Orange Telecom services and potentially upgrade to higher-value plans or packages.

4. **Competitive Advantage:** Utilizing data-driven insights to optimize retention strategies and improve customer satisfaction can give Orange Telecom a competitive edge in the telecommunications industry. By proactively addressing churn risk factors and enhancing the overall customer experience, Orange Telecom can differentiate itself from competitors and attract and retain more customers in the long term.



#### Chart - 2 - **Plot a histogram or boxplot to compare the account lengths between the two groups**

In [None]:
# Filter the dataset into non-churned customers
non_churned_account_lengths = telecom_df_copy[telecom_df_copy['Churn'] == False]['Account length']

print(non_churned_account_lengths)

In [None]:
# Filter the dataset into churned
churned_account_lengths = telecom_df_copy[telecom_df_copy['Churn'] == True]['Account length']

print(churned_account_lengths)


In [None]:
# Chart - 2 visualization code

# Set up the figure and axes
fig, axes = plt.subplots(1, 2, figsize=(15, 6), sharey=True)

# Plot histogram for churned customers
sns.histplot(churned_account_lengths, ax=axes[0], color='red', bins=20)
axes[0].set_title('Account Length Distribution for Churned Customers')
axes[0].set_xlabel('Account Length')
axes[0].set_ylabel('Frequency')

# Plot histogram for non-churned customers
sns.histplot(non_churned_account_lengths, ax=axes[1], color='blue', bins=20)
axes[1].set_title('Account Length Distribution for Non-Churned Customers')
axes[1].set_xlabel('Account Length')
axes[1].set_ylabel('Frequency')

# Show plot
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Histograms are effective for comparing the distributions of numerical variables across different groups. In this case, we want to compare the distribution of account lengths between churned and non-churned customers.

##### 2. What is/are the insight(s) found from the chart?

1. **Account Length Distribution:** Both churned and non-churned customers have a similar range of account lengths, but the distribution patterns differ. Non-churned customers exhibit a more evenly distributed pattern across various account lengths, while churned customers are more concentrated towards shorter account lengths.

2. **Churn Risk:** The higher concentration of churned customers with shorter account lengths suggests that customers who have been subscribed for a shorter duration are more likely to churn. This highlights the importance of early customer engagement and retention efforts to prevent churn among new subscribers.

3. **Retention Strategies:** Orange Telecom may need to implement targeted retention strategies aimed at new subscribers with shorter account lengths to improve customer retention rates. Engaging these customers early on and providing personalized incentives or offers may help increase loyalty and reduce churn risk.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. **Improved Customer Retention:** By understanding that customers with shorter account lengths are at higher risk of churning, Orange Telecom can focus its retention efforts on this segment. Implementing targeted retention strategies, such as personalized offers or enhanced customer support, for new subscribers with shorter account lengths can help improve retention rates and reduce churn.

2. **Enhanced Customer Engagement:** Engaging new subscribers early on and providing them with a positive experience can lead to greater customer satisfaction and loyalty. By proactively reaching out to customers with shorter account lengths and offering them incentives to stay, Orange Telecom can build stronger relationships and foster long-term loyalty.

3. **Increased Customer Lifetime Value (CLV):** Retaining customers with shorter account lengths can contribute to higher CLV over time. By preventing churn among new subscribers, Orange Telecom can maximize the revenue potential of these customers and increase their lifetime value to the company.

4. **Competitive Advantage:** Implementing effective retention strategies based on insights gained from analyzing account length distribution can give Orange Telecom a competitive edge in the telecommunications industry. By demonstrating a commitment to customer satisfaction and loyalty, Orange Telecom can differentiate itself from competitors and attract and retain more customers in the long term.



# **Are customers with international plans more likely to churn?**

#### Chart - 3 - **Create a bar chart showing the churn rates for customers with and without international plans**

In [None]:
# Calculate churn rates for customers with and without international plans
churn_rate_intl_plan = telecom_df_copy.groupby('International plan')['Churn'].mean() * 100
print(churn_rate_intl_plan)

In [None]:
# Chart - 3 visualization code
# Create a bar chart
plt.figure(figsize=(8, 6))
churn_rate_intl_plan.plot(kind='bar', color=['skyblue', 'orange'])
plt.title('Churn Rates for Customers with and without International Plans')
plt.xlabel('International Plan')
plt.ylabel('Churn Rate (%)')
plt.xticks(rotation=0)
plt.grid(axis='y', linestyle='--', alpha=0.7)

# Show plot
plt.show()

##### 1. Why did you pick the specific chart?

Bar charts are well-suited for comparing categorical data, such as the presence or absence of international plans in this case. Each bar represents a category, making it easy to compare values between different groups.

##### 2. What is/are the insight(s) found from the chart?

1. **Higher Churn Rate for Customers with International Plans:** Customers who have subscribed to international plans have a significantly higher churn rate compared to those without international plans. This suggests that the presence of an international plan may not necessarily contribute to greater customer loyalty or retention.
  
2. **Retention Opportunity:** There is an opportunity for Orange Telecom to improve retention strategies for customers with international plans. Addressing the factors contributing to churn among this group, such as service quality, pricing, or customer satisfaction, may help reduce churn rates and increase overall customer retention.

3. **Differentiated Approach:** Given the significant disparity in churn rates between customers with and without international plans, Orange Telecom may need to adopt a differentiated approach to customer retention. Tailoring retention strategies to address the specific needs and concerns of customers with international plans can help mitigate churn risk and improve overall customer satisfaction and loyalty.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. **Informed Decision-Making:** Understanding the higher churn rate among customers with international plans provides Orange Telecom with valuable insights into customer behavior. This knowledge allows the company to make informed decisions regarding retention strategies and resource allocation to address churn effectively.

2. **Tailored Retention Strategies:** Armed with the knowledge that customers with international plans are more likely to churn, Orange Telecom can develop targeted retention strategies specifically designed to address the needs and concerns of this customer segment. By focusing on improving service quality, enhancing customer satisfaction, and offering personalized incentives, Orange Telecom can mitigate churn risk and increase customer retention rates among this group.

3. **Enhanced Customer Experience:** Implementing retention strategies aimed at reducing churn among customers with international plans can lead to an overall improvement in the customer experience. By addressing factors such as service quality and pricing, Orange Telecom can enhance customer satisfaction, loyalty, and long-term value, resulting in a positive impact on the business.

4. **Competitive Advantage:** By effectively managing churn among customers with international plans, Orange Telecom can differentiate itself from competitors and strengthen its position in the telecommunications market. Providing superior customer service and retaining high-value customers can contribute to sustainable growth and profitability for the company.



# **What is the distribution of churned and non-churned customers in the dataset?**

#### Chart - 4 - **The Donut Plot to analyze churn**

In [None]:
# Prepare data
data = telecom_df_copy['Churn'].value_counts()
print(data)

In [None]:
# Chart - 4 visualization code
explode = (0, 0.2)
# Plot
plt.figure(figsize=(6, 6))
plt.pie(data, explode=explode, autopct='%1.1f%%', shadow=True, radius=2.0, labels=['Not churned customer', 'Churned customer'], colors=['royalblue', 'lime'])
circle = plt.Circle((0, 0), 1, color='white')
plt.gca().add_artist(circle)

# Add title
plt.title('Donut Plot for Churn')

# Show plot
plt.show()

##### 1. Why did you pick the specific chart?

I choose a donut plot because it effectively visualizes the proportion of churned and non-churned customers in a visually appealing way. The use of colors and the "donut hole" in the center adds an extra layer of clarity to the representation of the data, making it easy for viewers to quickly grasp the relative proportions of each category. Additionally, the addition of shadows provides a sense of depth to the plot, making it more aesthetically pleasing. Overall, it's an engaging and informative chart for illustrating the distribution of churn in the dataset.

##### 2. What is/are the insight(s) found from the chart?

- The majority of customers in the dataset are not churned, making up around 85.5% of the total, while approximately 14.5% of customers have churned.
- This insight provides a clear visualization of the churn distribution, which is valuable for understanding the overall churn rate and informing strategies aimed at reducing churn and retaining customers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. **Understanding Churn Distribution:** Knowing that the majority of customers are not churned (85.5%) provides reassurance about the overall customer retention rate. This insight can bolster confidence in the company's ability to retain customers and maintain steady revenue streams.

2. **Identifying Churned Customers:** By visualizing the churned customers as a distinct portion of the plot (14.5%), businesses can focus their attention on this segment to understand why customers are leaving and implement targeted strategies to mitigate churn.

3. **Informing Retention Strategies:** Armed with insights into churn distribution, businesses can tailor their retention strategies more effectively. For instance, they can allocate resources towards initiatives aimed at retaining customers who are at risk of churning, such as improving service quality, offering personalized incentives, or enhancing customer support.

4. **Measuring Impact:** Over time, monitoring changes in churn distribution through similar visualizations can help businesses gauge the effectiveness of their retention efforts. If the proportion of churned customers decreases, it indicates that the implemented strategies are yielding positive results.


# **How does the total day minutes usage vary between churned and non-churned customers ?**

#### Chart - 5 - **Visualize the Relationship (Box Plot)**

In [None]:
# Prepare Data
data = telecom_df_copy[['Churn', 'Total day minutes']]
print(data)

In [None]:
# Chart - 5 visualization code
# Visualize the Relationship (Box Plot)
plt.figure(figsize=(10, 6))
sns.boxplot(x='Churn', y='Total day minutes', data=data)
plt.title('Total Day Minutes by Churn Status')
plt.xlabel('Churn')
plt.ylabel('Total Day Minutes')
plt.show()

##### 1. Why did you pick the specific chart?

I choose a box plot because it allows us to compare the distribution of total day minutes between churned and non-churned customers. This type of plot provides information about the median, quartiles, and potential outliers in each group, making it easy to identify any differences in usage patterns between the two groups. Additionally, it's effective for visualizing numerical data across different categories, making it suitable for comparing total day minutes by churn status.

##### 2. What is/are the insight(s) found from the chart?

- **Median Total Day Minutes:** The median total day minutes for churned customers appears to be slightly higher than that for non-churned customers.
  
- **Interquartile Range (IQR):** The box plot shows that the IQR for churned customers is slightly wider than for non-churned customers, indicating a greater variability in total day minutes among churned customers.

- **Outliers:** There are outliers present in both churned and non-churned groups, particularly among churned customers, suggesting that some customers, regardless of churn status, have unusually high or low total day minutes compared to the rest of the group.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. **Identifying Usage Patterns:** Understanding the differences in total day minutes between churned and non-churned customers can help telecom companies identify distinct usage patterns associated with customer churn. For instance, if churned customers consistently exhibit higher total day minutes, it may indicate dissatisfaction or overutilization of services, providing insights into areas that need improvement.

2. **Tailored Marketing and Service Strategies:** Armed with insights into usage patterns, telecom companies can tailor their marketing and service strategies to address the needs and preferences of different customer segments. For instance, they can offer personalized plans or incentives to high-usage customers to improve satisfaction and loyalty, thereby reducing churn rates.

3. **Retention Efforts:** By proactively identifying customers with high churn propensity based on their usage patterns, telecom companies can implement targeted retention efforts. For example, they can reach out to customers with unusually low or high usage to understand their needs better and offer solutions to prevent them from churning.

4. **Operational Efficiency:** Insights into usage patterns can also inform operational decisions, such as network capacity planning and resource allocation. Understanding when and how customers use telecom services can help optimize network performance and resource utilization, leading to cost savings and improved service quality.



# **How does the average churn percentage vary across different states?**

#### Chart - 6 - **Visualize average churn percentage by state**

In [None]:
# Calculate average churn percentage by state
average_churn_percentage_by_state = telecom_df_copy.groupby('State')['Churn'].mean() * 100

# Sort states by average churn percentage
average_churn_percentage_by_state = average_churn_percentage_by_state.sort_values(ascending=False)
print(average_churn_percentage_by_state)

In [None]:
# Chart - 6 visualization code
# Plot
plt.figure(figsize=(15, 6))
average_churn_percentage_by_state.plot(kind='bar', color='skyblue')
plt.title('Average Churn Percentage by State')
plt.xlabel('State')
plt.ylabel('Average Churn Percentage (%)')
plt.xticks(rotation=90)  # Rotate x-axis labels for better readability
plt.grid(axis='y', linestyle='--', alpha=0.7)  # Add horizontal grid lines
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

The bar chart was choosen because it effectively visualizes the average churn percentage for each state. Each bar represents a state, making it easy to compare the churn rates across different geographic regions. The use of color (sky blue) and rotated x-axis labels enhances readability and aesthetics. Additionally, grid lines provide a reference for interpreting the data accurately. Overall, the bar chart offers a clear and intuitive representation of the average churn percentages by state.

##### 2. What is/are the insight(s) found from the chart?

1. **Regional Variation in Churn Rates:** The churn rates vary significantly across different states. Some states, such as New Jersey (NJ) and California (CA), have relatively high average churn percentages, indicating a higher proportion of customers switching telecom providers.
   
2. **Identifying High-Churn States:** States like New Jersey, California, Texas, and Maryland exhibit notably high average churn percentages, suggesting potential challenges in customer retention or satisfaction within these regions.
   
3. **Opportunities for Targeted Strategies:** States with lower churn rates, such as Hawaii (HI) and Alaska (AK), may present opportunities for telecom companies to study and implement successful retention strategies that could be applied to higher-churn areas.

4. **Importance of Localized Insights:** Understanding regional variations in churn rates is crucial for telecom companies to tailor their marketing, customer service, and retention efforts according to the specific needs and preferences of customers in different geographic areas.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. **Strategic Decision-Making:** By understanding regional variations in churn rates, telecom companies can make more informed decisions regarding resource allocation, marketing strategies, and service offerings. They can prioritize regions with higher churn rates for targeted interventions aimed at improving customer satisfaction and retention.

2. **Customer Retention:** Identifying high-churn states allows companies to focus on implementing tailored retention strategies in those regions. By addressing the specific challenges contributing to churn in these areas, such as service quality issues or pricing concerns, companies can work towards reducing churn and increasing customer loyalty.

3. **Market Expansion:** States with lower churn rates present opportunities for telecom companies to expand their customer base and market presence. By analyzing successful retention strategies in these regions, companies can replicate them in other areas with higher churn rates, potentially leading to business growth and increased market share.

4. **Enhanced Customer Experience:** Tailoring marketing, customer service, and retention efforts based on localized insights allows companies to provide a more personalized and satisfactory experience to their customers. This can lead to higher customer satisfaction, improved brand loyalty, and ultimately, a positive impact on business performance.


# **How does the distribution of numeric features vary across the dataset?**

#### Chart - 7 - **Plot histograms for each numeric column**

In [None]:
# Select only numeric columns
numeric_columns = telecom_df.select_dtypes(include=['int64', 'float64'])

# Set up subplots
num_cols = len(numeric_columns.columns)
print(numeric_columns)
print(num_cols)


In [None]:
# Chart - 7 visualization code
fig, axes = plt.subplots(num_cols, 1, figsize=(8, 6*num_cols))

# Plot histograms for each numeric column
for i, col in enumerate(numeric_columns.columns):
    sns.histplot(data=telecom_df, x=col, kde=True, ax=axes[i])
    axes[i].set_title(f'Histogram of {col}')

plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I choose to create a set of histograms for each numeric column in the dataset because histograms are effective for visualizing the distribution of numerical data. By plotting histograms for each column, we can quickly understand the range, central tendency, and spread of values in each feature, providing valuable insights into the dataset's structure and potential patterns. Additionally, using subplots allows for a compact and organized presentation of multiple histograms in a single figure, making it easier to compare distributions across different features.

##### 2. What is/are the insight(s) found from the chart?

1. **Account Length:** The histogram shows that the distribution of account lengths is relatively uniform, with most values spread across a wide range.

2. **Area Code:** The histogram indicates that the area code is discrete and categorical rather than continuous, as it comprises only three distinct values.

3. **Number of Voicemail Messages:** There is a spike at zero voicemail messages, indicating that a significant portion of customers does not use voicemail.

4. **Total Day Minutes, Total Evening Minutes, Total Night Minutes, Total International Minutes:** These histograms show that the distribution of minutes used during different times of the day varies, with some skewness towards higher values.

5. **Total Day Calls, Total Evening Calls, Total Night Calls, Total International Calls:** The histograms for call counts during different times of the day show relatively symmetric distributions.

6. **Total Day Charge, Total Evening Charge, Total Night Charge, Total International Charge:** The histograms of charges exhibit similar patterns to the histograms of minutes used, as charges are typically calculated based on usage.

7. **Customer Service Calls:** The histogram of customer service calls shows that most customers make fewer than five service calls, with a few outliers making significantly more calls.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. **Understanding Customer Behavior:** Insights into the distribution of features like total minutes used during different times of the day, voicemail usage, and customer service calls can provide valuable information about customer behavior and preferences. This understanding can guide the development of targeted marketing strategies and service offerings tailored to specific customer needs.

2. **Identifying Service Usage Patterns:** By analyzing the distribution of features related to call minutes and charges, telecom companies can identify usage patterns among their customers. This information can be used to optimize service plans, pricing structures, and network capacity to better meet customer demand and improve satisfaction.

3. **Improving Customer Experience:** Insights from histograms can highlight areas where customers may be experiencing issues or dissatisfaction, such as high numbers of customer service calls or unusually high charges. Identifying these pain points enables companies to implement improvements in service delivery, customer support, and billing processes to enhance the overall customer experience.

4. **Retention and Churn Prediction:** Understanding patterns in customer behavior, such as the frequency of service calls or usage of different features, can help identify customers at risk of churn. By proactively addressing customer concerns and offering targeted retention strategies, telecom companies can reduce churn rates and retain more customers, leading to increased revenue and long-term business growth.



# **Correlation Heatmap**

#### Chart - 8 - **Correlation Heatmap**

In [None]:
# Select only numeric columns
numeric_columns = telecom_df_copy.select_dtypes(include=['int64', 'float64'])

# Calculate the correlation matrix
correlation_matrix = numeric_columns.corr()


print(correlation_matrix)


In [None]:
# Correlation Heatmap visualization code

# Plot the heatmap
plt.figure(figsize=(12, 10))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f", annot_kws={"size": 10})
plt.title('Correlation Heatmap')
plt.show()

##### 1. Why did you pick the specific chart?


I picked a correlation heatmap because it's a powerful visualization tool to explore relationships between variables in a dataset. In this case, the correlation heatmap allows us to quickly identify which pairs of numeric variables are strongly correlated, positively or negatively. By including correlation coefficients as annotations in each cell, the heatmap provides quantitative insights into the strength and direction of these relationships.

##### 2. What is/are the insight(s) found from the chart?

1. **Positive Correlations:**
   - There is a strong positive correlation between variables such as "Total day minutes" and "Total day charge," "Total eve minutes" and "Total eve charge," and "Total night minutes" and "Total night charge." This indicates that as the number of minutes spent during each period increases, the corresponding charges also increase proportionally.
   - There is also a moderate positive correlation between "Total day minutes" and "Total eve minutes," "Total day minutes" and "Total night minutes," and "Total eve minutes" and "Total night minutes." This suggests that customers who spend more minutes during one period are likely to spend more minutes during other periods as well.

2. **Negative Correlations:**
   - There is a negative correlation between the number of customer service calls and other usage metrics such as "Total day minutes," "Total eve minutes," and "Total night minutes." This implies that customers who make more customer service calls tend to have lower usage minutes, potentially indicating dissatisfaction with the service.

3. **Weak Correlations:**
   - Some variables, such as "Account length," "Number vmail messages," and "Total intl calls," show weak correlations with other variables. This suggests that these variables may not significantly influence other aspects of customer behavior or usage patterns in the dataset.


# **Pair Plot**

#### Chart - 9 - **Pair Plot**

In [None]:
# Importing necessary libraries
import seaborn as sns

# Selecting only numeric columns
numeric_columns = telecom_df_copy.select_dtypes(include=['int64', 'float64'])

print(numeric_columns)

In [None]:
# Pair Plot visualization code
# Generating pair plot
sns.pairplot(numeric_columns)
plt.show()

##### 1. Why did you pick the specific chart?

The pair plot is a suitable choice for exploring relationships between multiple numeric variables in a dataset. It allows for a visual examination of pairwise relationships, including distributions along the diagonal and scatterplots for each pair of variables off the diagonal. This visualization facilitates the identification of potential correlations or patterns between variables, which can be valuable for understanding the underlying structure of the data and informing further analysis.

##### 2. What is/are the insight(s) found from the chart?

1. **Diagonal Plots (Histograms):**
   - The diagonal plots show the distribution of each variable. For example, 'Total day minutes', 'Total eve minutes', and 'Total night minutes' appear to be normally distributed, while others, like 'Number vmail messages' and 'Total intl calls', are heavily skewed.

2. **Scatterplots (Off-diagonal Plots):**
   - Scatterplots represent the relationship between pairs of variables. For instance, there seems to be no clear linear relationship between most pairs of variables, indicating weak correlations.
   - However, there are some exceptions. For example, 'Total day minutes' and 'Total day charge' exhibit a strong positive linear relationship, which is expected since charge is calculated based on minutes.
   - Similarly, 'Total night minutes' and 'Total night charge' also show a strong positive linear relationship.

3. **Insights from Correlation:**
   - Looking at the correlation coefficients from the previous heatmap, variables like 'Total day minutes' and 'Total day charge' have a correlation coefficient of 1. This indicates a perfect positive linear relationship, which is reflected in the pair plot.
   - Conversely, variables like 'Total day minutes' and 'Customer service calls' have a correlation coefficient close to zero, indicating a weak linear relationship, which is also evident in the scatterplot.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

1. **Modify International Plan Pricing:**
   - Consider adjusting the pricing or offering incentives for international plans to make them more attractive compared to standard plans. This can help in retaining customers who frequently use international services.

2. **Proactive Communication:**
   - Initiate proactive communication with customers to understand their needs, address concerns, and provide assistance before issues escalate. Regularly reaching out to customers demonstrates attentiveness and care, fostering loyalty.

3. **Solicit Feedback Regularly:**
   - Encourage customers to provide feedback through surveys, reviews, or direct communication channels. Actively listening to customer feedback allows the company to identify pain points, improve service quality, and tailor offerings to meet customer expectations.

4. **Offer Retention Incentives:**
   - Periodically offer personalized discounts, upgrades, or bonuses to incentivize customers to stay with the service. Tailor retention offers based on individual usage patterns, preferences, and tenure to maximize effectiveness.

5. **Focus on High-Churn States:**
   - Identify and prioritize efforts to retain customers in regions with the highest churn rates. Analyze the root causes of churn specific to these areas and develop targeted strategies to address local challenges and preferences.

6. **Engage with High-Value Customers:**
   - Identify and engage with high-value customers who contribute significantly to revenue and loyalty. Offer exclusive benefits, personalized support, and rewards to nurture these relationships and minimize churn among this segment.

7. **Ensure Service Reliability:**
   - Conduct regular maintenance of network infrastructure and servers to ensure optimal performance and reliability. Addressing issues related to poor network connectivity or service disruptions promptly can prevent customer dissatisfaction and churn.

8. **New Customer Onboarding:**
   - Define a structured onboarding process for new customers to familiarize them with the service features, benefits, and support channels. Providing clear guidance and support during the initial stages of the customer journey can improve satisfaction and retention.

9. **Churn Analysis and Prevention:**
   - Continuously analyze churn data to identify trends, patterns, and predictive indicators of churn. Utilize advanced analytics and predictive modeling to proactively identify at-risk customers and intervene with targeted retention strategies.

10. **Stay Competitive:**
    - Monitor the competitive landscape regularly to stay informed about industry trends, pricing strategies, and competitor offerings. Adapt and innovate service offerings to remain competitive and retain customers in a dynamic market.


# **Conclusion**

1. **Understanding Churn Patterns:** Analyzing churn data reveals distinct patterns and trends, such as regional variations in churn rates and differences in usage behavior among churned and non-churned customers.

2. **Identifying High-Churn Segments:** By segmenting customers based on various factors like geographic location, usage patterns, and service preferences, telecom companies can pinpoint high-churn segments that require targeted retention efforts.

3. **Tailoring Retention Strategies:** Armed with insights into customer preferences, pain points, and satisfaction levels, telecom companies can tailor retention strategies to address specific customer needs and concerns effectively.

4. **Proactive Customer Engagement:** Proactively engaging with customers through personalized communication, feedback solicitation, and retention offers can foster stronger relationships and reduce the likelihood of churn.

5. **Service Quality and Reliability:** Ensuring high service quality, network reliability, and responsive customer support are crucial for minimizing customer dissatisfaction and churn due to service-related issues.

6. **Continuous Improvement:** By continuously analyzing churn data, monitoring market trends, and refining retention strategies based on feedback and performance metrics, telecom companies can adapt and evolve their approach to churn management over time.

In conclusion, leveraging insights from churn analysis enables telecom companies to implement targeted, data-driven strategies that enhance customer satisfaction, loyalty, and long-term profitability in a competitive market landscape.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***