<a href="https://www.kaggle.com/code/hssanshahid/telco-customer-churn?scriptVersionId=131362634" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

<div style="padding:20px;
            color:white;
            margin:10;
            font-size:170%;
            text-align:left;
            display:fill;
            border-radius:5px;
            background-color:#64A36F;
            overflow:hidden;
            font-weight:700;
            font-weight:bold;"><span style='color:#CDA63A'>|</span> Introduction</div>

Telco companies face a critical need to acquire new customers while minimizing customer churn, which incurs significant costs for the company. The primary challenge lies in accurately predicting whether individual customers will churn and identifying the primary factors contributing to churn.

<div style="padding:20px;
            color:white;
            margin:10;
            font-size:170%;
            text-align:left;
            display:fill;
            border-radius:5px;
            background-color:#64A36F;
            overflow:hidden;
            font-weight:700"><span style='color:#CDA63A'>|</span> Table of Contents</div>


<a id="toc"></a>
- [1. Set-up](#1)
    - [1.1 Import Libraries](#1.1)
    - [1.2 Import Data](#1.2)
    - [1.3 Understanding data set characteristics](#1.3)
    - [1.4 Identifying dataset attributes](#1.4)
- [2. Data preprocessing](#2)
    - [2.1 Dealing with missing values](#2.1)
    - [2.2 Data type (catagorical data)](#2.2)
    - [2.3 Label Encoding](#2.3)
- [3. Data Visulization](#3)
    - [Malplotlib](#3.1)
    - [plotly](#3.2)
- [4. Models](#4)
    - [Applying Models](#4.1)
    - [using Flask](#4.2)

<div style="padding:20px;
            color:white;
            margin:10;
            font-size:170%;
            text-align:left;
            display:fill;
            border-radius:5px;
            background-color:#64A36F;
            overflow:hidden;
            font-weight:700;
            font-weight:bold;"><span style='color:#CDA63A'>|</span> Set-Up</div>

<a id="1.1"></a>
## <b>1.1 <span>Import Libraries</span></b> 

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns

<a id="1.2"></a>
## <b>1.2 <span>Import Data</span></b> 

In [2]:
df_ = pd.read_csv("/kaggle/input/telco-customer-churn/WA_Fn-UseC_-Telco-Customer-Churn.csv")
df = df_.copy()

<a id="1.3"></a>
## <b>1.3 <span>Understanding data set characteristics</span></b> 

The dataset contains customer information organized in rows, where each row represents an individual customer. The columns in the dataset provide specific details about these customers. 

* Churn: This column indicates whether a customer has recently terminated their service within the past month.

* Services: Each customer's subscription details are listed in this column, including phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies.

* Customer account information: This section includes various aspects of a customer's account, such as their tenure as a customer, contract type, payment method, preference for paperless billing, monthly charges, and total charges.

* Demographic information: This column provides insights into the customers' demographic characteristics, such as gender, age range, and whether they have partners and dependents.

<a id="1.4"></a>
## <b>1.4 <span>Identifying dataset attributes</span></b> 

<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.3/css/all.min.css" />

<table>
  <tr>
    <th>Attribute</th>
    <th>Icon</th>
    <th>Description</th>
  </tr>
  <tr>
    <td>customerID</td>
    <td><i class="fas fa-id-card"></i></td>
    <td>Customer ID</td>
  </tr>
  <tr>
    <td>gender</td>
    <td><i class="fas fa-venus-mars"></i></td>
    <td>Whether the customer is a male or a female</td>
  </tr>
  <tr>
    <td>SeniorCitizen</td>
    <td><i class="fas fa-user-alt"></i></td>
    <td>Whether the customer is a senior citizen (1, 0)</td>
  </tr>
  <tr>
    <td>Partner</td>
    <td><i class="fas fa-users"></i></td>
    <td>Whether the customer has a partner (Yes, No)</td>
  </tr>
  <tr>
    <td>Dependents</td>
    <td><i class="fas fa-child"></i></td>
    <td>Whether the customer has dependents (Yes, No)</td>
  </tr>
  <tr>
    <td>tenure</td>
    <td><i class="fas fa-hourglass-half"></i></td>
    <td>Number of months the customer has stayed with the company</td>
  </tr>
  <tr>
    <td>PhoneService</td>
    <td><i class="fas fa-phone"></i></td>
    <td>Whether the customer has a phone service (Yes, No)</td>
  </tr>
  <tr>
    <td>MultipleLines</td>
    <td><i class="fas fa-project-diagram"></i></td>
    <td>Whether the customer has multiple lines (Yes, No, No phone service)</td>
  </tr>
  <tr>
    <td>InternetService</td>
    <td><i class="fas fa-wifi"></i></td>
    <td>Customer’s internet service provider (DSL, Fiber optic, No)</td>
  </tr>
  <tr>
    <td>OnlineSecurity</td>
    <td><i class="fas fa-shield-alt"></i></td>
    <td>Whether the customer has online security (Yes, No, No internet service)</td>
  </tr>
  <tr>
    <td>OnlineBackup</td>
    <td><i class="fas fa-hdd"></i></td>
    <td>Whether the customer has online backup or not (Yes, No, No internet service)</td>
  </tr>
  <tr>
    <td>DeviceProtection</td>
    <td><i class="fas fa-shield-virus"></i></td>
    <td>Whether the customer has device protection (Yes, No, No internet service)</td>
  </tr>
  <tr>
    <td>TechSupport</td>
    <td><i class="fas fa-headset"></i></td>
    <td>Whether the customer has tech support (Yes, No, No internet service)</td>
  </tr>
  <tr>
    <td>StreamingTV</td>
    <td><i class="fas fa-tv"></i></td>
    <td>Whether the customer has streaming TV service (Yes, No, No internet service)</td>
  </tr>
  <tr>
    <td>StreamingMovies</td>
    <td><i class="fas fa-film"></i></td>
    <td>Whether the customer has streaming movies service (Yes, No, No internet service)</td>
  </tr>
  <tr>
    <td>Contract</td>
    <td><i class="fas fa-file-contract"></i></td>
    <td>Indicates the type of contract (Month-to-month, One year, Two year)</td>
  </tr>
  <tr>
    <td>PaperlessBilling</td>
    <td><i class="fas fa-file-invoice-dollar"></i></td>
    <td>Whether the customer has paperless billing (Yes, No)</td>
  </tr>
  <tr>
    <td>PaymentMethod</td>
    <td><i class="fas fa-credit-card"></i></td>
    <td>Indicates the payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic))</td>
  </tr>
  <tr>
    <td>MonthlyCharges</td>
    <td><i class="fas fa-dollar-sign"></i></td>
    <td>Indicates the current monthly subscription cost of the customer</td>
  </tr>
  <tr>
    <td>TotalCharges</td>
    <td><i class="fas fa-dollar-sign"></i></td>
    <td>Indicates the total charges paid by the customer so far</td>
  </tr>
  <tr>
    <td>Churn</td>
    <td><i class="fas fa-sign-out-alt"></i></td>
    <td>Indicates whether the customer churned</td>
  </tr>
</table>


<div style="padding:20px;
            color:white;
            margin:10;
            font-size:170%;
            text-align:left;
            display:fill;
            border-radius:5px;
            background-color:#64A36F;
            overflow:hidden;
            font-weight:700;
            font-weight:bold;"><span style='color:#CDA63A'>|</span> Exploroing Data Set</div>

In [3]:
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [4]:
df.tail()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
7038,6840-RESVB,Male,0,Yes,Yes,24,Yes,Yes,DSL,Yes,...,Yes,Yes,Yes,Yes,One year,Yes,Mailed check,84.8,1990.5,No
7039,2234-XADUH,Female,0,Yes,Yes,72,Yes,Yes,Fiber optic,No,...,Yes,No,Yes,Yes,One year,Yes,Credit card (automatic),103.2,7362.9,No
7040,4801-JZAZL,Female,0,Yes,Yes,11,No,No phone service,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.6,346.45,No
7041,8361-LTMKD,Male,1,Yes,No,4,Yes,Yes,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Mailed check,74.4,306.6,Yes
7042,3186-AJIEK,Male,0,No,No,66,Yes,No,Fiber optic,Yes,...,Yes,Yes,Yes,Yes,Two year,Yes,Bank transfer (automatic),105.65,6844.5,No


In [5]:
# Display Specific column
df.customerID

0       7590-VHVEG
1       5575-GNVDE
2       3668-QPYBK
3       7795-CFOCW
4       9237-HQITU
           ...    
7038    6840-RESVB
7039    2234-XADUH
7040    4801-JZAZL
7041    8361-LTMKD
7042    3186-AJIEK
Name: customerID, Length: 7043, dtype: object

In [6]:
# Display all Column name
df.columns

Index(['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn'],
      dtype='object')

In [7]:
# Drop any coloumn
df.drop('customerID', axis=1)

Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,Female,0,Yes,No,1,No,No phone service,DSL,No,Yes,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,Male,0,No,No,34,Yes,No,DSL,Yes,No,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,Male,0,No,No,2,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,Male,0,No,No,45,No,No phone service,DSL,Yes,No,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.30,1840.75,No
4,Female,0,No,No,2,Yes,No,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,70.70,151.65,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7038,Male,0,Yes,Yes,24,Yes,Yes,DSL,Yes,No,Yes,Yes,Yes,Yes,One year,Yes,Mailed check,84.80,1990.5,No
7039,Female,0,Yes,Yes,72,Yes,Yes,Fiber optic,No,Yes,Yes,No,Yes,Yes,One year,Yes,Credit card (automatic),103.20,7362.9,No
7040,Female,0,Yes,Yes,11,No,No phone service,DSL,Yes,No,No,No,No,No,Month-to-month,Yes,Electronic check,29.60,346.45,No
7041,Male,1,Yes,No,4,Yes,Yes,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Mailed check,74.40,306.6,Yes


<a id="2.1"></a>
## <b>2.1 <span>Dealing with missing values</span></b> 

In [8]:
# Check for missing values in each row
print(df.isnull())

      customerID  gender  SeniorCitizen  Partner  Dependents  tenure  \
0          False   False          False    False       False   False   
1          False   False          False    False       False   False   
2          False   False          False    False       False   False   
3          False   False          False    False       False   False   
4          False   False          False    False       False   False   
...          ...     ...            ...      ...         ...     ...   
7038       False   False          False    False       False   False   
7039       False   False          False    False       False   False   
7040       False   False          False    False       False   False   
7041       False   False          False    False       False   False   
7042       False   False          False    False       False   False   

      PhoneService  MultipleLines  InternetService  OnlineSecurity  ...  \
0            False          False            False          

In [9]:
df.customerID.fillna(df.customerID.mean, inplace=True)
null_values = df.isnull().sum()
print(null_values)

customerID          0
gender              0
SeniorCitizen       0
Partner             0
Dependents          0
tenure              0
PhoneService        0
MultipleLines       0
InternetService     0
OnlineSecurity      0
OnlineBackup        0
DeviceProtection    0
TechSupport         0
StreamingTV         0
StreamingMovies     0
Contract            0
PaperlessBilling    0
PaymentMethod       0
MonthlyCharges      0
TotalCharges        0
Churn               0
dtype: int64


In [10]:
print(df.isnull())

      customerID  gender  SeniorCitizen  Partner  Dependents  tenure  \
0          False   False          False    False       False   False   
1          False   False          False    False       False   False   
2          False   False          False    False       False   False   
3          False   False          False    False       False   False   
4          False   False          False    False       False   False   
...          ...     ...            ...      ...         ...     ...   
7038       False   False          False    False       False   False   
7039       False   False          False    False       False   False   
7040       False   False          False    False       False   False   
7041       False   False          False    False       False   False   
7042       False   False          False    False       False   False   

      PhoneService  MultipleLines  InternetService  OnlineSecurity  ...  \
0            False          False            False          