## CUSTOMER CHURN PREDICTION - Vodafone Coporation

### Introduction:
Customer churn prediction is a critical task for businesses operating in subscription-based industries such as telecommunications, software as a service (SaaS), and retail. Churn refers to the phenomenon where customers discontinue their relationship with a company or (unsubscribe) stop using its services. It is essential for businesses to identify and understand the factors that contribute to churn in order to take proactive measures to retain customers and maximize revenue.

### Business Understanding:
#### **Project Scenario**

In today's world of machine leaening, most companies build classification models to perform churn analysis on their customers. We have been tasked to create a prediction model for a telecommunication compant to help predict if a customer will churn or not. We are also to help them understand their data and know what factors affect the rate at which customers stop using their network to understand their data.

#### **Hypothesis**
 Null hypothesis - There is a dominant attribute that affects churn
   
 
Alternative hypothesis - There is no dominant attribute that affects churn

#### **Analytical Questions**
 
1. What is the total churn rate?
2. How does the different boolean attributes affect the churn?
3. How does the different non-boolean attributes affect the churn?
4. How does the different numerical attributes affect the churn?
5. How does tenure affect the churn?
6. How does the type of internet service affect churn?
7. How does the type of contract affect the churn?
8. How does the type of payment method affect the churn?

### Data Understanding:

The first dataset consists of the first 3000 records of the companies customer churn data. This dataset was stored in a remote database and to access it, a connection to the Microsoft SQL Server database using an Open Database Connectivity (ODBC) library such as pyodbc or an Object-Relational Mapping (ORM) library like SQLAlchemy needs to be established. This allows for querying the database and retrieving the records.

The second dataset is a csv file, with 2000 records and the third dataset is the test dataset.

The following describes the columns present in the data.

| Variable         | Description                                       | Data Type | Possible Values                          |
|------------------|---------------------------------------------------|-----------|------------------------------------------|
| customerID       | Unique identifier for each customer               | String    |                                          |
| gender           | Gender of the customer                            | String    | 'Male', 'Female'                        |
| SeniorCitizen    | Indicates if the customer is a senior citizen     | Boolean   | True (1), False (0)                     |
| Partner          | Indicates if the customer has a partner           | Boolean   | True (1), False (0)                     |
| Dependents       | Indicates if the customer has dependents          | Boolean   | True (1), False (0)                     |
| tenure           | Number of months the customer has been with the company | Integer |                                          |
| PhoneService     | Indicates if the customer has phone service       | Boolean   | True (1), False (0)                     |
| MultipleLines    | Indicates if the customer has multiple lines      | String    | 'Yes', 'No', 'No phone service'        |
| InternetService  | Type of internet service                          | String    | 'DSL', 'Fiber optic', 'No'             |
| OnlineSecurity   | Indicates if the customer has online security     | String    | 'Yes', 'No', 'No internet service'     |
| DeviceProtection | Indicates if the customer has device protection   | String    | 'Yes', 'No', 'No internet service'     |
| TechSupport      | Indicates if the customer has tech support        | String    | 'Yes', 'No', 'No internet service'     |
| StreamingTV      | Indicates if the customer has streaming TV        | String    | 'Yes', 'No', 'No internet service'     |
| StreamingMovies  | Indicates if the customer has streaming movies    | String    | 'Yes', 'No', 'No internet service'     |
| Contract         | Type of contract                                  | String    | 'Month-to-month', 'One year', 'Two year' |
| PaperlessBilling | Indicates if the customer has paperless billing  | Boolean   | True (1), False (0)                     |
| PaymentMethod    | Payment method                                    | String    | 'Electronic check', 'Mailed check', 'Bank transfer (automatic)', 'Credit card (automatic)' |
| MonthlyCharges   | Monthly charges                                   | Float     |                                          |
| TotalCharges     | Total charges                                     | Float     |                                          |
| Churn            | Indicates if the customer churned                | Boolean   | True (1), False 

#### **Data Connection**

In [6]:
# Import the necessary packages

# Data handling
import pyodbc
import pandas as pd
import numpy as np

# For creating connection
import pyodbc

# For loading environment variables
from dotenv import dotenv_values
import dotenv


# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Other packages
import os
import warnings
warnings.filterwarnings('ignore')

In [7]:
# Load environment variables from .env file into a dictionary
environment_variables = dotenv_values('.env')

In [8]:
# Get the values for the credentials you set in the '.env' file
server = environment_variables.get("SERVER")
database = environment_variables.get("DATABASE")
username = environment_variables.get("USERNAME")
password = environment_variables.get("PASSWORD")

In [9]:
#creating a connection sring to the SQL  database
connection_string = f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password};MARS_Connection=yes;MinProtocolVersion=TLSv1.2;"

In [10]:
#creating a connection
connection = pyodbc.connect(connection_string)

In [11]:
#loading the first data set from the database
query = "Select * from dbo.LP2_Telco_churn_first_3000"

data1 = pd.read_sql(query, connection)