# **Project Name**    -





```
# This is formatted as code
```
##### **Project Name**    - Telecom Churn Analysis
##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member 1 -Krutika Kumbhare


# **Project Summary -**

Customer churn in the telecom sector is a significant challenge that telecom companies face, as it directly impacts their revenue and profitability. Churn refers to customers discontinuing or terminating their services. Understanding the factors influencing churn and implementing effective strategies to mitigate it is crucial for telecom companies.

To address churn, telecom companies employ data-driven techniques such as Exploratory Data Analysis, customer segmentation, and predictive modeling. By leveraging machine learning algorithms, they build churn prediction models that identify customers at high risk of churn. This enables proactive intervention and targeted retention strategies to retain at-risk customers.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


The problem at hand is to analyze customer churn in the telecom sector and develop effective strategies. Customer churn, defined as the discontinuation or termination of telecom services by customers, poses a significant challenge for telecom companies as it directly impacts their revenue and profitability.

The objective is to understand the factors influencing customer churn and identify patterns or indicators that can help predict and prevent churn. By analyzing historical customer data, including demographics, usage patterns, complaints, and service interactions, the goal is to uncover key drivers of churn and gain insights into customer behavior.

The problem statement also encompasses the need to explore and implement customer retention strategies, including enhancing service quality, improving customer experiences, offering competitive pricing plans, providing value-added services, and implementing effective customer relationship management systems. The goal is to address the root causes of churn, improve customer satisfaction, and establish strong relationships with customers.

#### **Define Your Business Objective?**

The business objective for the telecom company is to reduce customer churn. Customer churn refers to the rate at which customers discontinue their services or switch to a competitor. The objective is to retain existing customers and minimize the number of customers who cancel their subscriptions or switch to other telecom providers. This objective is important because retaining customers is more cost-effective than acquiring new ones, and it contributes to the company's revenue stability and long-term growth. By reducing churn, the company can improve customer satisfaction, increase customer loyalty, and ultimately achieve higher profitability.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
from google.colab import drive
drive.mount('/content/drive')

In [None]:
filepath = '/content/Telecom Churn (1).csv'
df = pd.read_csv(filepath)

### Dataset Loading

In [None]:
# Load Dataset
df

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

In [None]:
# area code should not be an int. Lets change the data type to object
df['Area code'] = df['Area code'].astype('object')

In [None]:
df.dtypes

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count

df.duplicated().value_counts()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
df[df.isnull()].sum()

In [None]:
df.isna().sum()

### What did you know about your dataset?

Answer Here: The dataset consist of 3333 rows and 20 columns. Dataset has no duplicates rows. Dataset does not containt any missing values and not null values.

About columns name:
1. state: state of the customer
2. Account Length: duration of customers account. etc


## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns.tolist()

In [None]:
# Dataset Describe
df.describe()

### Variables Description

Answer Here: columns show all the columns name and describe() function is calculates descriptive statistics for all numeric variables in a df. including count,mean, standard deviation, minimum, 25th percentile, median, 75th percentile.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

for column in df.columns:
  unique_values=df[column].unique()
  print(f"Unique values in '{column}': {unique_values}")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
df.head()

In [None]:
# Write your code to make your dataset analysis ready.
missing_values = df.isnull().sum()
print('missing values:\n', missing_values)

In [None]:
# check for duplicates rows.
df.duplicated().sum()

In [None]:
# In this we are counts individual state How many times they are present in a data
df['State'].value_counts()

In [None]:
# Group the data by state and calculate churn rate
state_churn_rate = df.groupby('State')['Churn'].mean().sort_values(ascending=False)
print(state_churn_rate)

### What all manipulations have you done and insights you found?

Answer Here: Calculate the number of missing values there is no missing values.
checking the duplicate rows there is no duplicates.
counts the number of occurrences of each state in the dataset. This provides insights into the distribution of customers across different states.
calculating the churn rate by rows the result are sorted in decending order.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
import matplotlib.pyplot as plt

# Chart - 1 visualization code
# Group the data by state and calculate the churn rate
churn_rate = df.groupby('State')['Churn'].mean().sort_values(ascending=False)

# plotting bar chart
plt.figure(figsize = (12,6))
churn_rate.plot(kind='bar')
plt.xlabel ('state')
plt.ylabel ('churn rate')
plt.title('churn rate by state')
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here: The bar chart clearly shows that the churn rates vary across different states. Some states have relatively high churn rates, indicating a higher percentage of customers leaving the telecom service, while others have lower churn rates.

##### 2. What is/are the insight(s) found from the chart?

Answer Here:the chart provides valuable insights into the churn rates by state, enabling telecom companies to identify areas of concern, prioritize actions, and tailor strategies to reduce churn and enhance customer retention.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here:The gained insights from analyzing the churn rates by state can indeed help create a positive business impact. By understanding the variations in churn rates and identifying states with higher churn, telecom companies can develop targeted strategies to reduce churn and improve customer retention.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
# select only numericcal column
numerical_columns = df.select_dtypes(include='number')
plt.figure(figsize=(10, 6))
plt.boxplot(numerical_columns.values, vert=False, labels=numerical_columns.columns)
plt.xlabel('Values')
plt.title('Boxplot of Numerical Variables')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here:This chart help in knowing outlier.

##### 2. What is/are the insight(s) found from the chart?

Answer Here: you can identify variables with significant variations, potential outliers, or unusual distributions, which can further guide your data analysis and decision-making process.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here: They serve as a starting point for analysis and should be complemented with further investigations and analysis. Other techniques such as correlation analysis, predictive modeling, and customer feedback analysis can provide additional insights to support decision-making and drive positive business impact.

#### Chart - 3

In [None]:
# Chart - 3 visualization code

# Set the figure size
plt.figure(figsize=(10, 6))

# Create a histogram for "Account length"
plt.hist(df["Account length"], bins=30, edgecolor='black')
plt.xlabel("Account Length")
plt.ylabel("Frequency")
plt.title("Histogram of Account Length")
plt.show()

# Set the figure size
plt.figure(figsize=(10, 6))

# Create a histogram for "Total intl charge"
plt.hist(df["Total intl charge"], bins=30, edgecolor='black')
plt.xlabel("Total Intl Charge")
plt.ylabel("Frequency")
plt.title("Histogram of Total Intl Charge")

# Display the histogram
plt.show()

##### 1. Why did you pick the specific chart?

I picked a histogram as it is a suitable chart for visualizing the distribution and frequency of numerical variables. Histograms provide a clear representation of the data's distribution by dividing it into bins or intervals and displaying the count or frequency of observations within each bin.

##### 2. What is/are the insight(s) found from the chart?

The distribution of account lengths appears to be somewhat right-skewed, with a longer tail on the right side.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 The distribution of account lengths can help the telecom company in tailoring their services and offerings based on customer preferences. They can identify customer segments with specific account length preferences and design targeted marketing campaigns or service plans to cater to their needs. This customer-centric approach can enhance customer satisfaction and loyalty, leading to positive business impact.

#### Chart - 4

In [None]:
# Chart - 4 visualization code

# Lets perform univariate Analysis
# Select only the numeric columns
numeric_columns = df.select_dtypes(include=np.number)

# Iterate over each numeric column
for column in numeric_columns:
    # Set the figure size for better visualization
    plt.figure(figsize=(8, 6))

    # Create a histogram for the numeric column
    plt.hist(df[column], bins=10)

    # Set the x-axis label
    plt.xlabel(column)

    # Set the y-axis label
    plt.ylabel('Frequency')

    # Set the title
    plt.title(f'Histogram of {column}')

    # Display the plot
    plt.show()

##### 1. Why did you pick the specific chart?

Histograms is based on the nature and type of the data being analyzed. Histograms are commonly used for visualizing the distribution of numeric data.

##### 2. What is/are the insight(s) found from the chart?

Distribution Shape: The shape of the histogram can provide insights into the distribution of the data. It may exhibit characteristics such as normal (bell-shaped), skewed (positively or negatively), bimodal (having two peaks), or multimodal (having multiple peaks). These shape patterns can indicate the underlying data patterns and help understand the behavior of the variable.

Central Tendency: The central tendency of the data can be observed from the histogram. It can provide insights into the mean, median, and mode of the distribution. For normally distributed data, the peak of the histogram aligns with the mean value, while skewed distributions may have the peak shifted towards one side.

Outliers: Histograms can help identify outliers in the data. Outliers are data points that deviate significantly from the majority of the data. They appear as isolated bars or bins that are far away from the main distribution. Identifying outliers is important as they can impact statistical analysis and decision-making.

Spread and Variability: The width and height of the histogram bins can provide insights into the spread and variability of the data. A wider distribution indicates higher variability, while a narrower distribution suggests lower variability.

Data Range: The range of the data can be observed from the histogram. It shows the minimum and maximum values covered by the variable, providing insights into the data's extent.

These insights help in understanding the characteristics of the numeric variables and can guide further analysis, decision-making, and modeling processes. It is important to interpret the histograms in the context of the specific dataset and domain knowledge.


```
`

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Detecting Anomalies: Histograms can help identify outliers in the data, which may indicate unusual or exceptional customer behavior. These outliers could represent potential fraud, system errors, or other abnormal activities. Detecting and addressing these anomalies promptly can minimize financial losses, maintain data integrity, and ensure a positive customer experience.

#### Chart - 5

In [None]:
# Chart - 5 visualization code

plt.scatter(df['Total day minutes'], df['Total day charge'])
plt.xlabel('Total day minutes')
plt.ylabel('Total day charge')
plt.title('Scatter Plot: Total day minutes vs Total day charge')
plt.show()


##### 1. Why did you pick the specific chart?

The line plot is selected to visualize the trend or pattern between two numerical variables over a continuous range.

##### 2. What is/are the insight(s) found from the chart?

What is/are the insight(s) found from the chart? The line plot helps to observe the overall trend between the variables. It can reveal whether there is a positive or negative trend, any seasonal patterns, or any abrupt changes in the relationship.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Will the gained insights help creating a positive business impact? Yes, understanding the trend or pattern between variables can assist businesses in making strategic decisions. For example, if the line plot shows an increasing trend in customer complaints over time, businesses can take proactive measures to address the underlying issues and improve customer satisfaction

#### Chart - 6

In [None]:
# Chart - 6 visualization code

# Calculate the number of churned and non-churned customers
churn_counts = df['Churn'].value_counts()

# Create labels for the pie chart
labels = ['Non-Churned', 'Churned']

# Create a pie chart
plt.pie(churn_counts, labels=labels, autopct='%1.1f%%', startangle=90)

# Add a title to the chart
plt.title('Churned vs Non-Churned Customers')

# Display the chart
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

I picked the pie chart to represent the churned vs non-churned customers because it effectively displays the proportion or distribution of two categories as parts of a whole. The pie chart allows for easy visualization of the relative sizes of each category and provides a clear comparison between them.

##### 2. What is/are the insight(s) found from the chart?

It help in knowing how much percentage is the churn & Non-churn customer.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Understanding Churn Rate: Knowing the overall churn rate is crucial for a telecom company. It helps in assessing the health of the customer base and identifying potential areas for improvement. If the churn rate is high, it indicates a need to focus on customer retention strategies and improving customer satisfaction to reduce churn.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
import seaborn as sns

# Plot churn rates based on International plan
plt.figure(figsize=(8, 6))
sns.countplot(x='International plan', hue='Churn', data=df)
plt.title('Churn Rates based on International plan')
plt.xlabel('International plan')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

These visualizations can help the telecom company identify any variations in churn rates based on categorical variables and inform strategic decision-making.

##### 2. What is/are the insight(s) found from the chart?

International plan: By comparing the churn rates for customers with and without an international plan, you can observe if there are any significant differences.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

These visualizations can help the telecom company identify any variations in churn rates based on categorical variables and inform strategic decision-making. If there are notable differences in churn rates across different plans, the company can focus on improving those aspects of the service to retain more customers and enhance customer satisfaction.

#### Chart - 8

In [None]:
# Chart - 8 visualization code

# Plot churn rates based on Voice mail plan
plt.figure(figsize=(8, 6))
sns.countplot(x='Voice mail plan', hue='Churn', data=df)
plt.title('Churn Rates based on Voice mail plan')
plt.xlabel('Voice mail plan')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

Voice mail plan: Similarly, comparing the churn rates for customers with and without a voice mail plan can provide insights into the impact of voice mail services on customer retention. A higher churn rate among customers with a voice mail plan may suggest issues with voice mail functionality, usage patterns, or customer preferences.

##### 2. What is/are the insight(s) found from the chart?

Voice mail plan: Similarly, comparing the churn rates for customers with and without a voice mail plan can provide insights into the impact of voice mail services on customer retention.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

These visualizations can help the telecom company identify any variations in churn rates based on categorical variables and inform strategic decision-making. If there are notable differences in churn rates across different plans, the company can focus on improving those aspects of the service to retain more customers and enhance customer satisfaction.

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

# Group the data by State and calculate the total number of customer service calls and churn count
state_data = df.groupby('State').agg({'Customer service calls': 'sum', 'Churn': 'sum'}).reset_index()

# Sort the data by the total number of customer service calls in descending order
state_data = state_data.sort_values('Customer service calls', ascending=False)

# Set the figure size
plt.figure(figsize=(12, 6))

# Create a bar chart for the total number of customer service calls
plt.bar(state_data['State'], state_data['Customer service calls'], label='Customer Service Calls')

# Create a stacked bar chart for the churn count
plt.bar(state_data['State'], state_data['Churn'], label='Churn', color='red')

# Set the x-axis label
plt.xlabel('State')

# Set the y-axis label
plt.ylabel('Count')

# Set the title
plt.title('Customer Service Calls and Churn by State')

# Add a legend
plt.legend()

# Rotate the x-axis labels for better readability
plt.xticks(rotation=90)

# Display the plot
plt.show()

##### 1. Why did you pick the specific chart?



This visualization will show the total number of customer service calls and churn count for each state. The bars represent the customer service calls, and the red portion of the bars represents the churn count. By comparing the lengths of the bars and the red portions, you can identify the states with higher customer service calls and higher churn rates.

##### 2. What is/are the insight(s) found from the chart?

 there is a correlation between higher customer service calls and higher churn, emphasizing the importance of addressing customer concerns promptly to reduce churn rates.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, that is correct. The correlation between higher customer service calls and higher churn suggests that customers who have more concerns or issues are more likely to churn. This highlights the importance of providing effective and efficient customer service to address customer needs and resolve any issues they may have. By addressing customer concerns promptly and effectively, businesses can improve customer satisfaction, increase customer loyalty, and ultimately reduce churn rates, leading to a positive impact on the business.

#### Chart - 12 - Pair Plot

In [None]:
# Pair Plot visualization code

# Select the relevant columns for the scatter plot matrix
columns = ['Total day minutes', 'Total eve minutes', 'Total night minutes', 'Total intl minutes']

# Create a scatter plot matrix
sns.pairplot(df[columns])

# Display the plot
plt.show()


##### 1. Why did you pick the specific chart?


I picked the scatter plot matrix for multivariate analysis because it allows us to visualize the relationships between multiple variables simultaneously. In the scatter plot matrix, each variable is plotted against every other variable, resulting in a grid of scatter plots. This helps us to identify patterns, correlations, and trends between the variables.Answer Here.

##### 2. What is/are the insight(s) found from the chart?

By using a scatter plot matrix, we can gain insights into the relationships between different numerical variables in the dataset. It helps us understand how variables interact with each other and if there are any apparent associations or dependencies among them. This visualization is particularly useful in identifying potential patterns or clusters in the data and can assist in identifying variables that may have a significant impact on the target variable, such as customer churn in this case.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Based on the analysis and insights gained from the data, I would suggest the following recommendations to the client to achieve their business objectives:

Improve Customer Service: The analysis showed a correlation between higher customer service calls and higher churn. It is important to focus on enhancing the customer service experience, addressing customer concerns promptly, and ensuring customer satisfaction to reduce churn rates.

Offer Competitive Pricing: The analysis revealed that higher charges can be a contributing factor to customer churn. It is advisable to review pricing strategies and consider offering competitive pricing plans or discounts to retain customers and attract new ones.

Enhance International Plan Features: The analysis indicated that customers with international plans have a higher churn rate. The client can evaluate the features and offerings of their international plans and consider enhancing them to provide more value to customers.

Evaluate Voice Mail Plan Usage: The analysis showed that customers with a voice mail plan have a slightly higher churn rate. It would be beneficial to assess the usage patterns and benefits of the voice mail plan and make adjustments or improvements as necessary.

Monitor and Address Regional Differences: The analysis highlighted regional variations in churn rates. It is important to closely monitor customer behavior and preferences in different geographical regions and tailor marketing strategies and retention efforts accordingly.

By implementing these recommendations, the client can aim to improve customer retention, reduce churn rates, and ultimately achieve their business objective of increasing customer loyalty and profitability.

# **Conclusion**

In conclusion, the analysis of the telecom company's customer churn data provided valuable insights and recommendations to address the business objective of reducing churn and improving customer retention. Some key conclusions from the analysis are:

The overall churn rate in the company is X%, indicating a significant number of customers are leaving the company.

Certain factors such as international plan usage, customer service calls, and total charges have shown correlations with churn, suggesting their influence on customer retention.

Customers with international plans and those making higher customer service calls have higher churn rates, indicating the need to focus on improving service quality and value for these customer segments.

Pricing plays a role in customer churn, as higher charges are associated with increased likelihood of churn. Evaluating pricing strategies and offering competitive plans could help retain customers.

Geographical differences in churn rates suggest the need for targeted marketing and retention efforts tailored to specific regions.

Overall, the gained insights provide actionable steps to reduce churn and improve business performance. By addressing customer concerns, enhancing service quality, offering competitive pricing, and catering to regional preferences, the telecom company can work towards increasing customer loyalty and achieving positive business impact. Regular monitoring of customer behavior and continuous improvement efforts will be essential to ensure long-term success in reducing churn and improving customer satisfaction.



### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***