# <font color='darkred'>Telco Churn Predication</font>

## Appendix
- Click on the links:
>1. <a href=#imports>Imports</a>
>2. <a href=#load>Load Data</a>
>3. <a href=#Formating>Data Formating</a>
>4. <a href=#valid>Validate Assumption(Optional If exist)</a>
>5. <a href=#EDA1>Exploratory Data Analysis (Uni-variable)</a>
>6. <a href=#EDA2>Exploratory Data Analysis (muli-variable-relationships)</a>
>7. <a href=#ques>Questions & Insights</a>
>8. <a href=#con>Conculsion</a>

### Company Background:

Saudi Telecom Company (STC) is a Saudi Arabia-based digital company that offers telecommunications services, landline, mobile, Internet services, enterprise digital solutions, entertainment, fintech, and computer networks

### Motivation:
STC has a problem with retaining its customers due to new competitors emerging in the market (Virgin, Mobily, etc..) so we will try to offer some analysis to diagnosis the behavior of the churned customers so that we can first address these problems as organizational challenges to improve our services as well as identify common behaviors in churned customers



### Data Description

>I will be using the Telco Customer Churn “Focused customer retention programs” Dataset from Kaggle that can be found [here](https://www.kaggle.com/blastchar/telco-customer-churn).
This was uploaded for examining customer retention and predicting churn and will be well suited to this study

The data contains more than **7043** row each row represents a customer and **22** features included in our analysis.
#### Features
>- customerID: A unique identifier for each cutomer
>- gender: Whether the customer is a male or a female
>- SeniorCitizen: Whether the customer is a senior citizen or not (1, 0)
>- Partner: Whether the customer has a partner or not (Yes, No)
>- Dependents: Whether the customer has dependents or not (Yes, No)
>- tenureNumber of months the customer has stayed with the company
>- PhoneService: Whether the customer has a phone service or not (Yes, No)
>- MultipleLines: Whether the customer has multiple lines or not (Yes, No, No phone service)
>- InternetServiceCustomer’s internet service provider (DSL, Fiber optic, No)
>- OnlineSecurity: Whether the customer has online security or not (Yes, No, No internet service)
>- OnlineBackup: Whether the customer has online backup or not (Yes, No, No internet service)
>- DeviceProtection: Whether the customer has device protection or not (Yes, No, No internet service)
>- TechSupport: Whether the customer has tech support or not (Yes, No, No internet service)
>- StreamingTV: Whether the customer has streaming TV or not (Yes, No, No internet service)
>- StreamingMovies: Whether the customer has streaming movies or not (Yes, No, No internet service)
>- ContractThe contract term of the customer (Month-to-month, One year, Two year)
>- PaperlessBilling: Whether the customer has paperless billing or not (Yes, No)
>- PaymentMethod: The customer’s payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic))
>- MonthlyCharges: The amount charged to the customer monthly • TotalCharges`: The total amount charged to the customer
>- Churn: Whether the customer churned or not (Yes or No)
An imbalance can be observed in the target label as the number of lost customers are 1890 rows while the number of non-churn customers are 5174 rows

# <a name='imports' >Libraries Imports</a>

In [1]:
# Data Analysis Libs
print("Importing.....", end="", flush=True)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# Suppress warnings 
import warnings
warnings.filterwarnings('ignore')
print("[Done]")

Importing.....[Done]


# <a name='LoadData' >Load Data</a>

In [13]:
df = pd.read_csv("WA_Fn-UseC_-Telco-Customer-Churn.csv", na_values="Naaaa")
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1.0,No,No phone service,DSL,No,...,No,,No,No,Month-to-month,Yes,Electronic check,29.85,$29.85,No
1,5575-GNVDE,Male,0,No,No,34.0,Yes,No,DSL,Yes,...,Yes,,No,No,One year,No,Mailed check,56.95,"$1,889.50",No
2,3668-QPYBK,Male,0,No,No,2.0,Yes,No,DSL,Yes,...,No,,No,No,Month-to-month,Yes,Mailed check,53.85,$108.15,Yes
3,7795-CFOCW,Male,0,No,No,45.0,No,No phone service,DSL,Yes,...,Yes,,No,No,One year,No,Bank transfer (automatic),42.3,"$1,840.75",No
4,9237-HQITU,Female,0,No,No,2.0,Yes,No,Fiber optic,No,...,No,,No,No,Month-to-month,Yes,Electronic check,70.7,$151.65,Yes


In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           6948 non-null   object 
 4   Dependents        6948 non-null   object 
 5   tenure            6948 non-null   float64
 6   PhoneService      6948 non-null   object 
 7   MultipleLines     6948 non-null   object 
 8   InternetService   6948 non-null   object 
 9   OnlineSecurity    6948 non-null   object 
 10  OnlineBackup      6948 non-null   object 
 11  DeviceProtection  6948 non-null   object 
 12  TechSupport       410 non-null    object 
 13  StreamingTV       6948 non-null   object 
 14  StreamingMovies   6948 non-null   object 
 15  Contract          6948 non-null   object 
 16  PaperlessBilling  6948 non-null   object 


# <a name='Formating' >Data Formating</a>

In [25]:
df["TotalCharges"] = df["TotalCharges"].str.replace("$","").str.replace(",","")
df["TotalCharges"] = pd.to_numeric(df["TotalCharges"],errors='coerce')

In [28]:
df['SeniorCitizen'] = df["SeniorCitizen"].map({0: 'No', 1: 'Yes'},na_action='ignore')

In [29]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   object 
 3   Partner           6948 non-null   object 
 4   Dependents        6948 non-null   object 
 5   tenure            6948 non-null   float64
 6   PhoneService      6948 non-null   object 
 7   MultipleLines     6948 non-null   object 
 8   InternetService   6948 non-null   object 
 9   OnlineSecurity    6948 non-null   object 
 10  OnlineBackup      6948 non-null   object 
 11  DeviceProtection  6948 non-null   object 
 12  TechSupport       410 non-null    object 
 13  StreamingTV       6948 non-null   object 
 14  StreamingMovies   6948 non-null   object 
 15  Contract          6948 non-null   object 
 16  PaperlessBilling  6948 non-null   object 


# <a name='valid' >Validate Assumption</a>

In [9]:
df["Contract"].nunique()

4

In [12]:
for col in df.columns.to_list():
    num_uniq = df[col].nunique()
    if num_uniq <= 10:
        uniq_list = df[col].unique()
    else:
        uniq_list = "more than 10"
    print(col, ":", num_uniq, ",", uniq_list)

customerID : 7043 , more than 10
gender : 2 , ['Female' 'Male']
SeniorCitizen : 2 , [0 1]
Partner : 3 , ['Yes' 'No' 'Naaaa']
Dependents : 3 , ['No' 'Yes' 'Naaaa']
tenure : 74 , more than 10
PhoneService : 3 , ['No' 'Yes' 'Naaaa']
MultipleLines : 4 , ['No phone service' 'No' 'Yes' 'Naaaa']
InternetService : 4 , ['DSL' 'Fiber optic' 'No' 'Naaaa']
OnlineSecurity : 4 , ['No' 'Yes' 'No internet service' 'Naaaa']
OnlineBackup : 4 , ['Yes' 'No' 'No internet service' 'Naaaa']
DeviceProtection : 4 , ['No' 'Yes' 'No internet service' 'Naaaa']
TechSupport : 4 , ['Naaaa' 'No' 'No internet service' 'Yes']
StreamingTV : 4 , ['No' 'Yes' 'No internet service' 'Naaaa']
StreamingMovies : 4 , ['No' 'Yes' 'No internet service' 'Naaaa']
Contract : 4 , ['Month-to-month' 'One year' 'Two year' 'Naaaa']
PaperlessBilling : 3 , ['Yes' 'No' 'Naaaa']
PaymentMethod : 5 , ['Electronic check' 'Mailed check' 'Bank transfer (automatic)'
 'Credit card (automatic)' 'Naaaa']
MonthlyCharges : 1581 , more than 10
TotalCharg