<a href="https://colab.research.google.com/github/Patrick-1324/Almabetter/blob/main/EDA_Prateek_Joshi.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member 1 -** Prateek Joshi


# **Project Summary -**

Telecom churn analysis is a critical process for understanding customer attrition and its impact on companies in the telecom industry. Churn, the rate at which customers discontinue services or switch to competitors, poses significant challenges to telecom companies like Orange S.A., a multinational telecommunications corporation.

To gain insights into customer behavior and devise effective retention strategies, an exploratory data analysis was conducted on Orange Telecom's churn dataset. This dataset encompasses customer activity data, coupled with a churn label indicating subscription cancellations. With information from 51 unique states across the United States, this dataset holds immense potential for uncovering meaningful patterns.

By studying customer demographics, service usage patterns, and other relevant indicators, we aimed to identify factors contributing to customer churn. Revenue loss, decreased profitability, market share erosion, and compromised brand image are some consequences of high churn rates. These effects create competitive disadvantages, forcing companies to invest significantly in customer acquisition and retention efforts.

Our analysis of the Orange Telecom churn dataset sought to extract valuable insights that inform targeted retention strategies. By identifying correlations and patterns, we developed tailored approaches such as personalized offers, loyalty programs, and enhanced customer support. These strategies aim to boost satisfaction, minimize churn, and bolster the company's bottom line.

Through the EDA process, we aim to empower Orange S.A. and other telecom companies with knowledge to mitigate churn's detrimental effects. Understanding the factors driving customer attrition allows companies to enhance service quality, improve customer experience, and foster long-term loyalty.

In summary, telecom churn analysis plays a crucial role in the industry. By analyzing customer behavior and employing data-driven strategies, telecom companies can combat churn, mitigate its adverse effects, and ensure sustainable growth in a competitive market.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


As a data analyst, my aim is to perform data analysis (EDA) on mobile data and build predictive models to identify products that users can compete with. The database contains customer information, including features such as minutes of calls, data usage, customer demographics, and churn status.

#### **Define Your Business Objective?**

The Objective of this project is to run a data analysis (EDA) of phone call data and discover insights that can help improve the performance of customer retention strategies. The motive of the analysis is to understand the factors that affect churn and to identify important patterns and relationships in data and strategies.

Perform univariate analysis to understand the distribution and characteristics of variables such as customer demographics, call duration, service usage patterns, and other relevant factors. Visualize these variables using appropriate charts, histograms, or box plots. Conduct bivariate analysis to explore relationships between variables. For example, analyze the relationship between churn and factors such as call duration, customer tenure, or service plans. Use visualizations like scatter plots or correlation matrices to identify any patterns or correlations.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [1]:
# Import Libraries
import numpy as np
import pandas as pd

### Dataset Loading

In [2]:
# Load Dataset
url = 'https://raw.githubusercontent.com/Patrick-1324/Almabetter/124210d0194dae68b6a271a556806dc4bb4ce0dc/Telecom%20Churn.csv'
telecom_df = pd.read_csv(url)

### Dataset First View

In [3]:
# Dataset First Look
print("Top 5 rows of the DataFrame: ")
telecom_df.head()

Top 5 rows of the DataFrame: 


Unnamed: 0,State,Account length,Area code,International plan,Voice mail plan,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,Total eve calls,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls,Churn
0,KS,128,415,No,Yes,25,265.1,110,45.07,197.4,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False
1,OH,107,415,No,Yes,26,161.6,123,27.47,195.5,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
2,NJ,137,415,No,No,0,243.4,114,41.38,121.2,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False
3,OH,84,408,Yes,No,0,299.4,71,50.9,61.9,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
4,OK,75,415,Yes,No,0,166.7,113,28.34,148.3,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False


In [4]:
print("last 5 rows of the DataFrame: ")
telecom_df.tail()

last 5 rows of the DataFrame: 


Unnamed: 0,State,Account length,Area code,International plan,Voice mail plan,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,Total eve calls,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls,Churn
3328,AZ,192,415,No,Yes,36,156.2,77,26.55,215.5,126,18.32,279.1,83,12.56,9.9,6,2.67,2,False
3329,WV,68,415,No,No,0,231.1,57,39.29,153.4,55,13.04,191.3,123,8.61,9.6,4,2.59,3,False
3330,RI,28,510,No,No,0,180.8,109,30.74,288.8,58,24.55,191.9,91,8.64,14.1,6,3.81,2,False
3331,CT,184,510,Yes,No,0,213.8,105,36.35,159.6,84,13.57,139.2,137,6.26,5.0,10,1.35,2,False
3332,TN,74,415,No,Yes,25,234.4,113,39.85,265.9,82,22.6,241.4,77,10.86,13.7,4,3.7,0,False


### Dataset Rows & Columns count

In [5]:
# Dataset Rows & Columns count
print("Number of rows:", telecom_df.shape[0])
print("Number of columns:", telecom_df.shape[1])

Number of rows: 3333
Number of columns: 20


### Dataset Information

In [6]:
# Dataset Info
print("Column DataTypes and Non-null Counts:")
print(telecom_df.info())  # View column data types and non-null values

Column DataTypes and Non-null Counts:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3333 entries, 0 to 3332
Data columns (total 20 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   State                   3333 non-null   object 
 1   Account length          3333 non-null   int64  
 2   Area code               3333 non-null   int64  
 3   International plan      3333 non-null   object 
 4   Voice mail plan         3333 non-null   object 
 5   Number vmail messages   3333 non-null   int64  
 6   Total day minutes       3333 non-null   float64
 7   Total day calls         3333 non-null   int64  
 8   Total day charge        3333 non-null   float64
 9   Total eve minutes       3333 non-null   float64
 10  Total eve calls         3333 non-null   int64  
 11  Total eve charge        3333 non-null   float64
 12  Total night minutes     3333 non-null   float64
 13  Total night calls       3333 non-null   int64  
 14  To

#### Duplicate Values

In [7]:
# Dataset Duplicate Value Count
duplicate_count = telecom_df.duplicated().sum()
print("Number of duplicate values:", duplicate_count)

Number of duplicate values: 0


#### Missing Values/Null Values

In [8]:
# Missing Values/Null Values Count
print("Columns and their missing values:")
print(telecom_df.isnull().sum())
miss_count = telecom_df.isnull().sum().sum()
print("\nColumns with missing values:", miss_count)

Columns and their missing values:
State                     0
Account length            0
Area code                 0
International plan        0
Voice mail plan           0
Number vmail messages     0
Total day minutes         0
Total day calls           0
Total day charge          0
Total eve minutes         0
Total eve calls           0
Total eve charge          0
Total night minutes       0
Total night calls         0
Total night charge        0
Total intl minutes        0
Total intl calls          0
Total intl charge         0
Customer service calls    0
Churn                     0
dtype: int64

Columns with missing values: 0


In [None]:
# Visualizing the missing values

### What did you know about your dataset?

1. The dataset provided by Orange S.A. contains 20 columns and 3333 rows. The dataset consists of four major data types: bool (1), float64 (8), int64 (8), and object (3).

2. During the initial inspection, it was determined that there are no null values present in the dataset, indicating that it is complete in terms of missing data.

3. The numerical data in the dataset is represented by float64 (8) and int64 (8) data types. These numerical variables likely contain information such as call durations, charges, and other numerical measurements.

4. Additionally, there are object (3) data types in the dataset that store categorical information such as state names, international plan status, and voice mail plan status. These variables provide context and additional details about the calls made to the service provider.

5. By having this dataset, we can analyze various aspects of the calls made throughout a 24-hour day and explore relationships between different variables. This information can be valuable for understanding customer behavior, service usage patterns, and potentially identifying areas for improvement or optimization in the service provided by Orange S.A.

6. In industry terms, this initial dataset overview provides a concise summary of the dataset's structure, including the data types, dimensions, presence of null values, and the types of information contained in the numerical and categorical variables.

## ***2. Understanding Your Variables***

In [9]:
# Dataset Columns
print("Column list:")
print(telecom_df.columns)

Column list:
Index(['State', 'Account length', 'Area code', 'International plan',
       'Voice mail plan', 'Number vmail messages', 'Total day minutes',
       'Total day calls', 'Total day charge', 'Total eve minutes',
       'Total eve calls', 'Total eve charge', 'Total night minutes',
       'Total night calls', 'Total night charge', 'Total intl minutes',
       'Total intl calls', 'Total intl charge', 'Customer service calls',
       'Churn'],
      dtype='object')


In [10]:
# Dataset Describe
telecom_df.describe()

Unnamed: 0,Account length,Area code,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,Total eve calls,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls
count,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0
mean,101.064806,437.182418,8.09901,179.775098,100.435644,30.562307,200.980348,100.114311,17.08354,200.872037,100.107711,9.039325,10.237294,4.479448,2.764581,1.562856
std,39.822106,42.37129,13.688365,54.467389,20.069084,9.259435,50.713844,19.922625,4.310668,50.573847,19.568609,2.275873,2.79184,2.461214,0.753773,1.315491
min,1.0,408.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,23.2,33.0,1.04,0.0,0.0,0.0,0.0
25%,74.0,408.0,0.0,143.7,87.0,24.43,166.6,87.0,14.16,167.0,87.0,7.52,8.5,3.0,2.3,1.0
50%,101.0,415.0,0.0,179.4,101.0,30.5,201.4,100.0,17.12,201.2,100.0,9.05,10.3,4.0,2.78,1.0
75%,127.0,510.0,20.0,216.4,114.0,36.79,235.3,114.0,20.0,235.3,113.0,10.59,12.1,6.0,3.27,2.0
max,243.0,510.0,51.0,350.8,165.0,59.64,363.7,170.0,30.91,395.0,175.0,17.77,20.0,20.0,5.4,9.0


In [11]:
# Check Unique Values for each variable using a for loop
print(telecom_df.nunique())

State                       51
Account length             212
Area code                    3
International plan           2
Voice mail plan              2
Number vmail messages       46
Total day minutes         1667
Total day calls            119
Total day charge          1667
Total eve minutes         1611
Total eve calls            123
Total eve charge          1440
Total night minutes       1591
Total night calls          120
Total night charge         933
Total intl minutes         162
Total intl calls            21
Total intl charge          162
Customer service calls      10
Churn                        2
dtype: int64


In [12]:
#### State count DataFrame ####
state_count = pd.DataFrame()
state_count['State'] = telecom_df['State'].value_counts().index
state_count['Count'] = telecom_df['State'].value_counts().values
print(state_count.head())

  State  Count
0    WV    106
1    MN     84
2    NY     83
3    AL     80
4    WI     78


### Variables Description

#### **Name: State**
Data Type: object(3)

Description: The State data provides the state code of the user from the 51 unique states in the USA

Example: 'OH' stands for OHIO

Potential Insights: This data can be used to know the customer distribution throughout the USA

### **Name: Account Length**
Data Type: int64(8)

Description: describes the length of the account

Example: 243

Potential Insights: Determine if customer tenure influences churn rate, if longer or shorter account lengths are associated with higher churn

### **Name: Area Code**
Data Type: int64

Description: The code for the area in the United States of America

Example: 415 relates to San Francisco

Potential Insights: Identifying regions with higher or lower churn rates.

### **Name: International Plan**
Data Type: object(3)

Description: Indicates if the Customer has active international calls

Example: Yes

Potential Insights: If customers with international plans have different churn behaviour; if international plan subscription affects churn likelihood.

### **Name: Voice mail plan**
Data Type: object(3)

Description: Indicates if the Customer has an active voicemail plan

Example: No

Potential Insights: If customers with voice mail plans exhibit different churn patterns; if voice mail plan subscription influences churn rates.

### **Name: Number Vmail messages**
Data Type: int64(8)

Description: describes the number of voicemail messages ranging from 0-50

Example: 49

Potential Insights: If the number of voicemail messages affects churn rates; if higher or lower usage of voicemail impacts customer retention.

### **Name: Total day minutes**
Data Type: float64(8)

Description: Total number of minutes spent by customers in the morning

Example: 54.46

Potential Insights: if customers with high daytime usage are more likely to churn

### **Name: Total day calls**
Data Type: int64(8)

Description: Total number of calls made by customers in the morning

Example: 20

Potential Insights: if customers with high daytime usage are more likely to churn

### **Name: Total day charge**
Data Type: float64(8)

Description: Total charge to the customer in the morning

Example: 64.86

Potential Insights: if customers with high daytime usage are more likely to churn

### **Name: Total eve minutes**
Data Type: float64(8)

Description: Total number of calls made by customers in the evening

Example: 50.71

Potential Insights: if customers with high evening usage are more likely to churn

### **Name: Total eve calls**
Data Type: int64(8)

Description: Total number of calls made by customers in the evening

Example: 19

Potential Insights: if customers with high evening usage are more likely to churn

### **Name: Total eve charge**
Data Type: float64(8)

Description: Total charge to the customers in the evening

Example: 46.5

Potential Insights: if customers with high evening usage are more likely to churn

### **Name: Total night minutes**
Data Type: float64(8)

Description: Total number of minutes spent by customers in the night

Example: 50.57

Potential Insights: if higher or lower nighttime usage influences customer retention

### **Name: Total night calls**
Data Type: int64(8)

Description: Total number of calls made by customers in the night

Example: 20

Potential Insights: if higher or lower nighttime usage influences customer retention

### **Name: Total night charge**
Data Type: float64(8)

Description: Total charge to the customers in the night

Example: 50.57

Potential Insights: if higher or lower nighttime usage influences customer retention

### **Name: Total intl minutes**
Data Type: float64(8)

Description: Total number of minutes spent by customers on international call

Example: 50.57

Potential Insights: if higher or lower international usage influences customer retention

### **Name: Total intl calls**
Data Type: int64(8)

Description: Total number of international calls made by customers

Example: 20

Potential Insights: if international call usage affects churn rates

### **Name: Total intl charge**
Data Type: float64(8)

Description: Total charge to the customers for international calls

Example: 50.57

Potential Insights: if international call charges affect churn rates

### **Name: Customer service calls**
Data Type: int64(8)

Description: Total number of customer-support calls made by the customer

Example: 50

Potential Insights: if customers who contact customer service more frequently are more likely to churn

### **Name: Churn**
Data Type: bool(1)

Description: indicates if there is a churn or not

Example: False

Potential Insights: Analyzing the churn rate and investigate factors influencing churn

### Check Unique Values for each variable.

In [14]:
# Check Unique Values for each variable.
categorical_columns = ['State', 'Area code', 'International plan', 'Voice mail plan', 'Churn']

for column in categorical_columns:
    unique_values = telecom_df[column].unique()
    print(f"Unique values for column '{column}':")
    print(unique_values)
    print()

Unique values for column 'State':
['KS' 'OH' 'NJ' 'OK' 'AL' 'MA' 'MO' 'LA' 'WV' 'IN' 'RI' 'IA' 'MT' 'NY'
 'ID' 'VT' 'VA' 'TX' 'FL' 'CO' 'AZ' 'SC' 'NE' 'WY' 'HI' 'IL' 'NH' 'GA'
 'AK' 'MD' 'AR' 'WI' 'OR' 'MI' 'DE' 'UT' 'CA' 'MN' 'SD' 'NC' 'WA' 'NM'
 'NV' 'DC' 'KY' 'ME' 'MS' 'TN' 'PA' 'CT' 'ND']

Unique values for column 'Area code':
[415 408 510]

Unique values for column 'International plan':
['No' 'Yes']

Unique values for column 'Voice mail plan':
['Yes' 'No']

Unique values for column 'Churn':
[False  True]



## 3. ***Data Wrangling***

### Data Wrangling Code

In [15]:
# Write your code to make your dataset analysis ready.
# Copy the telecom_df so original_df so the original dataframe do not get affected
df_copy = telecom_df.copy()


# 1. State: This data can be used to know the customer distribution throughout the USA
# A
state_counts = telecom_df['State'].value_counts()
state_counts_df = pd.DataFrame({'State': state_counts.index, 'Count': state_counts.values})

# B Calculate the churn rate for each state
state_wise_churn_rate = telecom_df.groupby('State')['Churn'].mean().reset_index()
# Sort the data by churn rate in descending order for better visualization
state_wise_churn_rate = state_wise_churn_rate.sort_values(by='Churn', ascending=False)
statesBychurn_fst = state_wise_churn_rate.head()
statesBychurn_lst = state_wise_churn_rate.tail()
# Filter the data for the state "CA" (California)
ca_df = telecom_df[telecom_df['State'] == 'CA']
churn_counts_ca = ca_df['Churn'].value_counts()
# Filter the data for the state "HI" (Hawaii)
hi_df = telecom_df[telecom_df['State'] == 'HI']
churn_counts_hi = hi_df['Churn'].value_counts()



# 2. Account Length: Determine if customer tenure influences churn rate, if longer or shorter account lengths are associated with higher churn
churnBylength = telecom_df.groupby('Account length')['Churn'].mean()
# Convert churnBylength to a DataFrame for easier plotting
churn_df = pd.DataFrame({'Account length': churnBylength.index, 'Churn Rate': churnBylength.values})



# 3. Area Code: Identifying regions with higher or lower churn rates.
# Calculate the average churn rate for each unique area code
churnByAreaCode = telecom_df.groupby('Area code')['Churn'].mean()
# Convert churnByAreaCode to a DataFrame
churnByarea_df = pd.DataFrame({'Area code': churnByAreaCode.index, 'Churn Rate': churnByAreaCode.values})
# Sort the DataFrame based on churn rate
churnByarea_df.sort_values(by='Churn Rate', ascending=True, inplace=True)


# 4. International Plan: If customers with international plans have different churn behaviour; if international plan subscription affects churn likelihood.
churnbyiplan_df = telecom_df.groupby('International plan')['Churn'].mean().reset_index()
# churnbyiplan_df


# 5. Voice mail plan: If customers with voice mail plans exhibit different churn patterns; if voice mail plan subscription influences churn rates.
churnByvmp = telecom_df.groupby('Voice mail plan')['Churn'].mean()
# churnByvmp


# 6. Number Vmail messages : If the number of voicemail messages affects churn rates; if higher or lower usage of voicemail impacts customer retention.
bin_edges = [0, 10, 20, 30, float('inf')]
bin_labels = ['Low', 'Medium', 'High', 'Very High']
df_copy['Voicemail Usage Group'] = pd.cut(df_copy['Number vmail messages'], bins=bin_edges, labels=bin_labels)
churnByvmm = df_copy.groupby('Voicemail Usage Group')['Churn'].mean()
# churnByvmm


# 7. Total day minutes : if customers with high daytime usage are more likely to churn
bin_edges = [0, 143, 179, 216, 350]
bin_labels = ['Low', 'Medium', 'High', 'Very High']
df_copy['Daytime min Usage Group'] = pd.cut(df_copy['Total day minutes'], bins=bin_edges, labels=bin_labels)
churnBytdm = df_copy.groupby('Daytime min Usage Group')['Churn'].mean()
# churnBytdm


# 8. Total day calls : if customers with high daytime usage are more likely to churn
bin_edges = [0, 87, 101, 114, 165]
bin_labels = ['Low', 'Medium', 'High', 'Very High']
df_copy['Daytime calls Usage Group'] = pd.cut(df_copy['Total day calls'], bins=bin_edges, labels=bin_labels)
churnBytdc = df_copy.groupby('Daytime calls Usage Group')['Churn'].mean()
# churnBytdc


# 9. Total day charge : if customers with high daytime usage are more likely to churn
bin_edges = [0, 25, 30, 37, 60]
bin_labels = ['Low', 'Medium', 'High', 'Very High']
df_copy['Daytime charge Usage Group'] = pd.cut(df_copy['Total day charge'], bins=bin_edges, labels=bin_labels)
churnBytdch = df_copy.groupby('Daytime charge Usage Group')['Churn'].mean()
# churnBytdch


# 10. Total eve minutes : if customers with high evening usage are more likely to churn
bin_edges = [0, 166, 201, 235, 360]
bin_labels = ['Low', 'Medium', 'High', 'Very High']
df_copy['Evetime mins Usage Group'] = pd.cut(df_copy['Total eve minutes'], bins=bin_edges, labels=bin_labels)
churnBytem = df_copy.groupby('Evetime mins Usage Group')['Churn'].mean()
# churnBytem


# 11. Total eve calls : if customers with high evening usage are more likely to churn
bin_edges = [0, 87, 100, 114, 170]
bin_labels = ['Low', 'Medium', 'High', 'Very High']
df_copy['Evetime calls Usage Group'] = pd.cut(df_copy['Total eve calls'], bins=bin_edges, labels=bin_labels)
churnBytec = df_copy.groupby('Evetime calls Usage Group')['Churn'].mean()
# churnBytec


# 12. Total eve charge : if customers with high evening usage are more likely to churn
bin_edges = [0, 10, 16, 20, 30]
bin_labels = ['Low', 'Medium', 'High', 'Very High']
df_copy['Evetime charge Usage Group'] = pd.cut(df_copy['Total eve charge'], bins=bin_edges, labels=bin_labels)
churnBytech = df_copy.groupby('Evetime charge Usage Group')['Churn'].mean()
# churnBytech


# 13. Total night minutes : if higher or lower nighttime usage influences customer retention
bin_edges = [0, 167, 201, 235, 395]
bin_labels = ['Low', 'Medium', 'High', 'Very High']
df_copy['Nighttime mins Usage Group'] = pd.cut(df_copy['Total night minutes'], bins=bin_edges, labels=bin_labels)
churnBytnm = df_copy.groupby('Nighttime mins Usage Group')['Churn'].mean()
# churnBytnm


# 14. Total night calls : if higher or lower nighttime usage influences customer retention
bin_edges = [0, 87, 100, 113, 175]
bin_labels = ['Low', 'Medium', 'High', 'Very High']
df_copy['Nighttime calls Usage Group'] = pd.cut(df_copy['Total night calls'], bins=bin_edges, labels=bin_labels)
churnBytnc = df_copy.groupby('Nighttime calls Usage Group')['Churn'].mean()
# churnBytnc


# 15. Total night charge : if higher or lower nighttime usage influences customer retention
bin_edges = [0, 8, 9, 11, 18]
bin_labels = ['Low', 'Medium', 'High', 'Very High']
df_copy['Nighttime charge Usage Group'] = pd.cut(df_copy['Total night charge'], bins=bin_edges, labels=bin_labels)
churnBytnch = df_copy.groupby('Nighttime charge Usage Group')['Churn'].mean()
# churnBytnch


# 16. Total intl minutes : if higher or lower international usage influences customer retention
bin_edges = [0, 8, 10, 12, 20]
bin_labels = ['Low', 'Medium', 'High', 'Very High']
df_copy['Intl mins Usage Group'] = pd.cut(df_copy['Total intl minutes'], bins=bin_edges, labels=bin_labels)
churnByitm = df_copy.groupby('Intl mins Usage Group')['Churn'].mean()
# churnByitm


# 17. Total intl calls : if international call usage affects churn rates
# bin_edges = [0, 3, 4, 6, 20]
bin_edges = [0, 5, 10, 15, 20]
bin_labels = ['Low', 'Medium', 'High', 'Very High']
df_copy['Intl calls Usage Group'] = pd.cut(df_copy['Total intl calls'], bins=bin_edges, labels=bin_labels)
churnByitc = df_copy.groupby('Intl calls Usage Group')['Churn'].mean()
# churnByitc


# 18. Total intl charge : if international call charges affect churn rates
bin_edges = [0, 2, 3, 4, 6]
bin_labels = ['Low', 'Medium', 'High', 'Very High']
df_copy['Intl charge Usage Group'] = pd.cut(df_copy['Total intl charge'], bins=bin_edges, labels=bin_labels)
churnByitch = df_copy.groupby('Intl charge Usage Group')['Churn'].mean()
# churnByitch

# Customer service calls : if customers who contact customer service more frequently are more likely to churn
bin_edges = [0, 2, 4, 6, 9]
bin_labels = ['Low', 'Medium', 'High', 'Very High']
telecom_df['Customer calls Usage Group'] = pd.cut(telecom_df['Customer service calls'], bins=bin_edges, labels=bin_labels)
churnBycsc = telecom_df.groupby('Customer calls Usage Group')['Churn'].mean()
# churnBycsc

df_copy.head(5)

Unnamed: 0,State,Account length,Area code,International plan,Voice mail plan,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,...,Daytime charge Usage Group,Evetime mins Usage Group,Evetime calls Usage Group,Evetime charge Usage Group,Nighttime mins Usage Group,Nighttime calls Usage Group,Nighttime charge Usage Group,Intl mins Usage Group,Intl calls Usage Group,Intl charge Usage Group
0,KS,128,415,No,Yes,25,265.1,110,45.07,197.4,...,Very High,Medium,Medium,High,Very High,Medium,Very High,Medium,Low,Medium
1,OH,107,415,No,Yes,26,161.6,123,27.47,195.5,...,Medium,Medium,High,High,Very High,High,Very High,Very High,Low,High
2,NJ,137,415,No,No,0,243.4,114,41.38,121.2,...,Very High,Low,High,Medium,Low,High,Low,Very High,Low,High
3,OH,84,408,Yes,No,0,299.4,71,50.9,61.9,...,Very High,Low,Medium,Low,Medium,Medium,Medium,Low,Medium,Low
4,OK,75,415,Yes,No,0,166.7,113,28.34,148.3,...,Medium,Low,Very High,Medium,Medium,Very High,Medium,High,Low,Medium


### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***