# Importing Necessary Libraries

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Importing Dataset 

In [4]:
df_raw = pd.read_csv('No_Churn_Telecom_Dataset.csv')
df_raw

Unnamed: 0,columns1,columns2,columns3,columns4,columns5,columns6,columns7,columns8,columns9,columns10,...,columns12,columns13,columns14,columns15,columns16,columns17,columns18,columns19,columns20,columns21
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,...,99,16.78,244.7,91,11.01,10.0,3,2.70,1,False.
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,...,103,16.62,254.4,103,11.45,13.7,3,3.70,1,False.
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,...,110,10.30,162.6,104,7.32,12.2,5,3.29,0,False.
3,OH,84,408,375-9999,yes,no,0,299.4,71,50.90,...,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False.
4,OK,75,415,330-6626,yes,no,0,166.7,113,28.34,...,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False.
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,NV,94,510,379-8805,no,no,0,190.6,108,32.40,...,95,12.95,144.7,97,6.51,7.5,5,2.03,1,False.
996,IL,179,510,348-2150,no,no,0,116.1,101,19.74,...,99,17.15,181.9,103,8.19,11.6,5,3.13,0,False.
997,MS,116,415,417-9128,no,no,0,217.3,91,36.94,...,95,18.37,148.1,76,6.66,11.3,3,3.05,2,False.
998,ND,59,510,351-4226,no,no,0,179.4,80,30.50,...,99,19.76,175.8,105,7.91,14.7,3,3.97,0,False.


### Rename the columns as per the business case

In [6]:
df = df_raw.rename(columns={
    'columns1': 'State',
    'columns2': 'Account Length',
    'columns3': 'Area Code',
    'columns4': 'Phone',
    'columns5': 'International Plan',
    'columns6': 'VMail Plan',
    'columns7': 'VMail Message',
    'columns8': 'Day Mins',
    'columns9': 'Day Calls',
    'columns10': 'Day Charge',
    'columns11': 'Eve Mins',
    'columns12': 'Eve Calls',
    'columns13': 'Eve Charge',
    'columns14': 'Night Mins',
    'columns15': 'Night Calls',
    'columns16': 'Night Charge',
    'columns17': 'International Mins',
    'columns18': 'International calls',
    'columns19': 'International Charge',
    'columns20': 'CustServ Calls',
    'columns21': 'Churn'
})
df

Unnamed: 0,State,Account Length,Area Code,Phone,International Plan,VMail Plan,VMail Message,Day Mins,Day Calls,Day Charge,...,Eve Calls,Eve Charge,Night Mins,Night Calls,Night Charge,International Mins,International calls,International Charge,CustServ Calls,Churn
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,...,99,16.78,244.7,91,11.01,10.0,3,2.70,1,False.
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,...,103,16.62,254.4,103,11.45,13.7,3,3.70,1,False.
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,...,110,10.30,162.6,104,7.32,12.2,5,3.29,0,False.
3,OH,84,408,375-9999,yes,no,0,299.4,71,50.90,...,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False.
4,OK,75,415,330-6626,yes,no,0,166.7,113,28.34,...,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False.
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,NV,94,510,379-8805,no,no,0,190.6,108,32.40,...,95,12.95,144.7,97,6.51,7.5,5,2.03,1,False.
996,IL,179,510,348-2150,no,no,0,116.1,101,19.74,...,99,17.15,181.9,103,8.19,11.6,5,3.13,0,False.
997,MS,116,415,417-9128,no,no,0,217.3,91,36.94,...,95,18.37,148.1,76,6.66,11.3,3,3.05,2,False.
998,ND,59,510,351-4226,no,no,0,179.4,80,30.50,...,99,19.76,175.8,105,7.91,14.7,3,3.97,0,False.


# Domain Study

## Business Context:
#### No-Churn Telecom is a telecom company in Europe that has been around for over ten years. Recently, new companies have entered the market, making it harder for No-Churn to keep their customers from switching to competitors.
####
## Problem:
#### No-Churn Telecom has been trying to keep its customers happy by reducing prices and offering more deals. However, many customers (more than 10%) are still switching to other telecom companies. This high number of customers leaving is a big concern for No-Churn Telecom.
####
## Project Goals:
### 1) Understanding why Customers Leave:
#### Identify the key factors that make customers decide to leave No-Churn. By knowing these reasons, the company can address the issues and improve customer satisfaction.
### 2) Creating Churn Risk Scores:
#### Develop a model that assigns a "churn risk score" to each customer. This score indicates how likely a customer is to leave. Customers with higher scores are at greater risk of leaving.
### 3) Introducing CHURN-FLAG:
#### Create a new column called "CHURN-FLAG" in the customer data, with values YES (1) or NO (0). This flag will indicate whether a customer is likely to leave (YES) or stay (NO). Customers with a YES flag can be targeted with special offers to encourage them to stay.
####
## Features Information for understanding
#### 1) State: The state in which the customer resides. This is a categorical variable.
#### 2) Account Length: The duration (in days or months) that the customer has been with the company.
#### 3) Area Code: The area code of the customer's phone number.
#### 4) Phone: The customer's phone number.
#### 5) International Plan: Indicates whether the customer has an international plan (Yes/No).
#### 6) VMail Plan: Indicates whether the customer has a voice mail plan (Yes/No).
#### 7) VMail Message: The number of voice mail messages the customer has.
#### 8) Day Mins: The total number of minutes the customer spent on daytime calls.
#### 9) Day Calls: The total number of daytime calls made by the customer.
#### 10) Day Charge: The total charges for daytime calls.
#### 11) Eve Mins: The total number of minutes the customer spent on evening calls.
#### 12) Eve Calls: The total number of evening calls made by the customer.
#### Eve Charge: The total charges for evening calls.
#### Night Mins: The total number of minutes the customer spent on nighttime calls.
#### Night Calls: The total number of nighttime calls made by the customer.
#### Night Charge: The total charges for nighttime calls.
#### International Mins: The total number of minutes the customer spent on international calls.
#### International calls: The total number of international calls made by the customer.
#### International Charge: The total charges for international calls.
#### CustServ Calls: The number of calls made by the customer to customer service.
#### Churn: Indicates whether the customer has churned (Yes/No).


# Basic Checks

In [10]:
df.head()

Unnamed: 0,State,Account Length,Area Code,Phone,International Plan,VMail Plan,VMail Message,Day Mins,Day Calls,Day Charge,...,Eve Calls,Eve Charge,Night Mins,Night Calls,Night Charge,International Mins,International calls,International Charge,CustServ Calls,Churn
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,...,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False.
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,...,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False.
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,...,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False.
3,OH,84,408,375-9999,yes,no,0,299.4,71,50.9,...,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False.
4,OK,75,415,330-6626,yes,no,0,166.7,113,28.34,...,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False.


In [11]:
df.tail()

Unnamed: 0,State,Account Length,Area Code,Phone,International Plan,VMail Plan,VMail Message,Day Mins,Day Calls,Day Charge,...,Eve Calls,Eve Charge,Night Mins,Night Calls,Night Charge,International Mins,International calls,International Charge,CustServ Calls,Churn
995,NV,94,510,379-8805,no,no,0,190.6,108,32.4,...,95,12.95,144.7,97,6.51,7.5,5,2.03,1,False.
996,IL,179,510,348-2150,no,no,0,116.1,101,19.74,...,99,17.15,181.9,103,8.19,11.6,5,3.13,0,False.
997,MS,116,415,417-9128,no,no,0,217.3,91,36.94,...,95,18.37,148.1,76,6.66,11.3,3,3.05,2,False.
998,ND,59,510,351-4226,no,no,0,179.4,80,30.5,...,99,19.76,175.8,105,7.91,14.7,3,3.97,0,False.
999,NC,165,415,330-6630,no,no,0,207.7,109,35.31,...,94,14.01,54.5,91,2.45,7.9,3,2.13,0,False.


In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 21 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   State                 1000 non-null   object 
 1   Account Length        1000 non-null   int64  
 2   Area Code             1000 non-null   int64  
 3   Phone                 1000 non-null   object 
 4   International Plan    1000 non-null   object 
 5   VMail Plan            1000 non-null   object 
 6   VMail Message         1000 non-null   int64  
 7   Day Mins              1000 non-null   float64
 8   Day Calls             1000 non-null   int64  
 9   Day Charge            1000 non-null   float64
 10  Eve Mins              1000 non-null   float64
 11  Eve Calls             1000 non-null   int64  
 12  Eve Charge            1000 non-null   float64
 13  Night Mins            1000 non-null   float64
 14  Night Calls           1000 non-null   int64  
 15  Night Charge          

#### There are total 16 Numeric and 5 Non-Numeric features present in the data.

In [14]:
df.isnull().sum()

State                   0
Account Length          0
Area Code               0
Phone                   0
International Plan      0
VMail Plan              0
VMail Message           0
Day Mins                0
Day Calls               0
Day Charge              0
Eve Mins                0
Eve Calls               0
Eve Charge              0
Night Mins              0
Night Calls             0
Night Charge            0
International Mins      0
International calls     0
International Charge    0
CustServ Calls          0
Churn                   0
dtype: int64

#### There are no null values present in the data.

In [16]:
df.duplicated().sum()

0

#### There are no duplicates present in the data.

# EDA

## 1) Univariate Analysis

In [44]:
df.describe()

Unnamed: 0,Account Length,Area Code,VMail Message,Day Mins,Day Calls,Day Charge,Eve Mins,Eve Calls,Eve Charge,Night Mins,Night Calls,Night Charge,International Mins,International calls,International Charge,CustServ Calls
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,99.273,437.894,8.404,182.6167,100.558,31.04537,200.0627,99.79,17.00553,200.9016,99.967,9.04061,10.255,4.55,2.76946,1.553
std,39.482291,42.858413,13.779865,54.601866,19.702559,9.282283,53.490201,19.982322,4.546632,51.222937,19.976811,2.304919,2.821708,2.566727,0.761825,1.317166
min,1.0,408.0,0.0,30.9,36.0,5.25,31.2,12.0,2.65,45.0,42.0,2.03,0.0,0.0,0.0,0.0
25%,73.0,408.0,0.0,148.575,88.0,25.2575,163.85,87.0,13.925,167.2,86.0,7.52,8.5,3.0,2.3,1.0
50%,97.0,415.0,0.0,183.6,101.0,31.21,200.2,101.0,17.02,201.3,101.0,9.055,10.3,4.0,2.78,1.0
75%,127.0,510.0,22.0,217.125,113.25,36.9125,235.925,114.0,20.0525,236.825,114.0,10.66,12.1,6.0,3.27,2.0
max,243.0,510.0,51.0,350.8,163.0,59.64,351.6,168.0,29.89,364.3,175.0,16.39,20.0,19.0,5.4,9.0


In [46]:
modes = df.mode().iloc[0] 
mode_counts = df.apply(lambda x: x.value_counts().iloc[0])  # Get the count of the mode for each column

mode_summary = pd.DataFrame({'Mode': modes, 'Count': mode_counts})
print("Mode and count of each column:")
print(mode_summary)

Mode and count of each column:
                           Mode  Count
State                        AL     29
Account Length             74.0     16
Area Code                 415.0    482
Phone                  327-1319      1
International Plan           no    900
VMail Plan                   no    713
VMail Message               0.0    713
Day Mins                  154.0      5
Day Calls                  97.0     28
Day Charge                26.18      5
Eve Mins                  129.4      4
Eve Calls                  94.0     25
Eve Charge                 11.0      4
Night Mins                168.9      4
Night Calls                91.0     26
Night Charge               7.52      6
International Mins          9.5     22
International calls         3.0    209
International Charge       2.57     22
CustServ Calls              1.0    368
Churn                    False.    871


### Interpretation:
#### 1) It is observed that, on average, customers have been with No-Churn for about 99 days, and the variation in duration is 39 days around the mean. This means approximately 68% of customers have been with the company for a range of 62-138 days, which is roughly 2-5 months. This is a significant observation for the telecom company.
####
#### 2) Maximum customers of the company are from the Area having Area Code 415.
####
#### 3) Maximum customers of the company belongs to the State 'AL'.
####
#### 4)  A significant number of customers do not have an international as well as VMail plan, indicating that most customers do not prefer or need any international and VMail plans.
#### 
#### 5) It is observed that the majority of customers do not have any VMail messages because they do not have VMail plans. However, the remaining customers who do have VMail plans show a wide variation in the number of VMail messages.
#### 6) On average daytime call duration is 182.61 minutes.There is considerable variability around this mean, with a standard deviation of 54.60 minutes.Most customers' daytime call durations fall within the range of 128.01 to 237.21 minutes.
#### 
#### 7) On average number of day calls is around 101, with a standard deviation of 20. The minimum and maximum values are 36 and 163, respectively. This indicates that, on average, customers make around 101 calls per day, with moderate variability of 20 calls. Most customers' day call counts fall within the range of approximately 81 to 121 calls, with some outliers making as few as 36 calls or as many as 163 calls in a day.
####
#### 8) On average, customers are charged around 31.04 units for daytime calls, with a standard deviation of 9.28 units. The charges range from a minimum of 5.25 units to a maximum of 59.64 units. This indicates that most customers' daytime charges fall within the range of approximately 21.76 to 40.32 units, showing moderate variability in the charges.
#### 
#### 9) On average, customers spend around 200.06 minutes on evening calls, with a standard deviation of 53.49 minutes. The minimum time spent on evening calls is 31.2 minutes, while the maximum is 351.6 minutes. This indicates that most customers' evening call durations fall within a wide range, showing significant variability in their usage patterns.
#### 
#### 10) On average, customers make around 100 evening calls, with a standard deviation of 20 calls. The number of evening calls ranges from a minimum of 12 to a maximum of 168 calls. This indicates that most customers' evening call counts fall within a range of approximately 80 to 120 calls, showing some variability in their calling patterns.
####
#### 11) On average, customers are charged around 17.05 units for evening calls, with a standard deviation of 4.54 units. The charges range from a minimum of 2.65 units to a maximum of 28.89 units. This indicates that most customers' evening charges fall within a range of approximately 12.51 to 21.59 units, showing moderate variability in the charges.
#### 
#### 12) On average, customers spend around 200.90 minutes on nighttime calls, with a standard deviation of 51.22 minutes. The minimum time spent on nighttime calls is 45 minutes, while the maximum is 364.3 minutes. This indicates that most customers' nighttime call durations fall within a range, showing considerable variability in their usage patterns.
#### 
#### 13) On average, customers make around 100 nighttime calls, with a standard deviation of 20 calls. The number of nighttime calls ranges from a minimum of 42 to a maximum of 175 calls. This indicates that most customers' nighttime call counts fall within a range of approximately 80 to 120 calls, showing some variability in their calling patterns.
####
#### 14) On average, customers are charged around 9.04 units for nighttime calls, with a standard deviation of 2.30 units. The charges range from a minimum of 2.03 units to a maximum of 16.39 units. This indicates that most customers' nighttime charges fall within a range of approximately 6.74 to 11.34 units, showing moderate variability in the charges.
#### 
#### 15) On average, customers spend around 10.25 minutes on international calls, with a standard deviation of 2.82 minutes. The minimum time spent on international calls is 0 minutes, while the maximum is 20 minutes. This indicates that most customers do not need or prefer international calling, as evidenced by the relatively low average and significant portion of customers who do not use international significantly.
####
#### 16) On average, customers spend around 10.25 minutes on international calls, with charges around 2.79 units. The standard deviation for international calls is 2.82 minutes, and for charges, it is 0.76 units. The minimum and maximum values for calls are 0 and 20 minutes, respectively, and for charges, they are 0 and 5.40 units. This indicates that international calling is not a significant need for most customers, as evidenced by the relatively low average usage and charges, with many customers not using international calls at all.
####
#### 17) On average, customers make around 2 calls to customer service, with a standard deviation of 1 call. The number of customer service calls ranges from a minimum of 0 to a maximum of 9. This indicates that most customers' customer service call counts fall within a range of approximately 1 to 3 calls. The data suggests that while some customers frequently contact customer service, most customers do not need extensive support, indicating either overall satisfaction with the service or potential irritation from recurring issues.
####
#### 18) The mode for churn is "False," indicating that the majority of customers do not churn, meaning they remain with the telecom company. This suggests that most customers are either satisfied with the services provided or do not have sufficient reasons to switch to another provider. However, it's important to investigate the factors influencing the minority of customers who do churn to improve retention strategies further.