### Name: Emilly Murugi Njue

## **BUSINESS OVERVIEW**

#### INTRODUCTION
SyriaTel is a telecommunications company facing the challenge of customer churn, which refers to customers discontinuing their services or switching to a competitor. To minimize financial losses and improve customer retention, SyriaTel aims to develop a classifier that can predict whether a customer is likely to churn in the near future. By identifying predictable patterns in customer behavior, SyriaTel can implement targeted strategies to retain valuable customers.

#### BUSINESS PROBLEM
The business problem at hand is to build a classifier capable of predicting whether a customer will "soon" stop doing business with SyriaTel. This binary classification problem is crucial for SyriaTel to proactively address customer churn and reduce its financial impact.

#### OBJECTIVES
1. To identify patterns in customer behavior that indicate a likelihood of churn.
2. To determine the specific factors that contribute to customer churn in the telecommunications industry.
3. To predict which customers are likely to churn in the near future with a high degree of accuracy.
4. To develop a classifier model that assists in predicting customer churn and improving customer retention rates.
5. To identify specific customer segments or demographics with a higher propensity for churn and tailor effective strategies to retain these customers.

In [41]:
import pandas as pd

data = pd.read_csv('bigml_59c28831336c6604c800002a.csv')
data.head()

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,...,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,...,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,...,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False
3,OH,84,408,375-9999,yes,no,0,299.4,71,50.9,...,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
4,OK,75,415,330-6626,yes,no,0,166.7,113,28.34,...,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False


#### Data Understanding

In [42]:
#Get the column names
print("Column names: ")
print()
print(data.columns)

Column names: 

Index(['state', 'account length', 'area code', 'phone number',
       'international plan', 'voice mail plan', 'number vmail messages',
       'total day minutes', 'total day calls', 'total day charge',
       'total eve minutes', 'total eve calls', 'total eve charge',
       'total night minutes', 'total night calls', 'total night charge',
       'total intl minutes', 'total intl calls', 'total intl charge',
       'customer service calls', 'churn'],
      dtype='object')


In [43]:
#Get data types of each column
print("Data types: ")
print()
print(data.dtypes)

Data types: 

state                      object
account length              int64
area code                   int64
phone number               object
international plan         object
voice mail plan            object
number vmail messages       int64
total day minutes         float64
total day calls             int64
total day charge          float64
total eve minutes         float64
total eve calls             int64
total eve charge          float64
total night minutes       float64
total night calls           int64
total night charge        float64
total intl minutes        float64
total intl calls            int64
total intl charge         float64
customer service calls      int64
churn                        bool
dtype: object


In [44]:
# Check for missing values
print("Missing values: ")
print()
print(data.isna().sum())

Missing values: 

state                     0
account length            0
area code                 0
phone number              0
international plan        0
voice mail plan           0
number vmail messages     0
total day minutes         0
total day calls           0
total day charge          0
total eve minutes         0
total eve calls           0
total eve charge          0
total night minutes       0
total night calls         0
total night charge        0
total intl minutes        0
total intl calls          0
total intl charge         0
customer service calls    0
churn                     0
dtype: int64


In [45]:
#Check for duplicated rows
print("Duplicated rows: ", data.duplicated().sum())

Duplicated rows:  0


##### This dataset has no missing values and duplicates.

In [46]:
categorical_columns = data.select_dtypes(include=['object', 'bool'])

# Print the list of categorical columns
print("Categorical Columns:")
for column in categorical_columns:
    print(column)


Categorical Columns:
state
phone number
international plan
voice mail plan
churn


In [47]:
print("Churn values: ", data['churn'].unique())
print()
print("Voice Mail Plan values: ", data['voice mail plan'].unique())
print()
print("International Plan values: ", data['international plan'].unique())

Churn values:  [False  True]

Voice Mail Plan values:  ['yes' 'no']

International Plan values:  ['no' 'yes']
