## Customer Churn Prediction



### `Business Understanding`



**Objective Business:** The main aim of Vodafone Corporation is to decrease the rate of customer attrition, a common issue in many industries, particularly telecommunications. The ultimate goal is to predict the probability of a customer discontinuing their service, pinpoint the main factors contributing to customer attrition, and devise strategies to retain customers and prevent this issue.

**Understanding Current Situation:** At present, Vodafone has customer data at its disposal, provided by the business development unit and the marketing and sales team. This data will be utilized to construct machine learning models to forecast customer attrition.

**Data Mining Goals:** The objective of data mining in this context is to create a classification model that can predict if a customer is likely to discontinue their service. This is a supervised learning problem where the model will learn from the provided data and make predictions on new data. The model should also be capable of identifying the key attributes or indicators that lead to customer attrition.

**Project Plan:** The project plan would involve data preparation for analysis, selection of appropriate machine learning algorithms for model construction, model training and testing, evaluation of the model’s performance, and finally, deployment of the model for predicting customer attrition. The key indicators identified by the model can then be used to formulate effective strategies for customer retention.

###  Hypothesis

**Hypothesis 1:** Customers with higher tenure are more likely to churn.

**Hypothesis 2:** Customers with higher monthly charges are more likely to churn.

**Hypothesis 3:** Customers with higher total charges are more likely to churn.

**Hypothesis 4:** Customers with longer tenure are more likely to churn.

`Null Hypothesis (H0):` There is no relationship between customers having a bad experience with the service (such as poor network quality or customer service) and the likelihood of them churning. In other words, the churn rate is the same for customers regardless of their experience with the service.

`Alternative Hypothesis (H1):` Customers who have had a bad experience with the service (such as poor network quality or customer service) are more likely to churn. In other words, the churn rate is higher for customers who have had a bad experience with the service.

**Null Hypothesis (H0):** Billing issues or finding the service too expensive compared to competitors has no effect on a customer’s decision to churn. In other words, customers with these concerns are just as likely to stay as those without these concerns.

**Alternative Hypothesis (H1):** Customers who have had billing issues or find the service too expensive compared to competitors are more likely to churn. 

**Null Hypothesis (H0):** There is no significant correlation between bad service or customer satisfaction and customer churn.

**Alternative Hypothesis (H0):** There is a significant correlation between bad service or customer satisfaction and customer churn.

### Analytical Questions

**Question 1:** What are the most important factors that influence customer churn?

**Question 2:** What are the key characteristics of customers who churn? 

**Question 3:** What is the proportion of customers who churn compared to those who remain? 

**Question 4:** What is the impact of billing issues on customer churn? 

**Question 5:** Are there specific customer segments that are more likely to churn? 

**Question 6:** What is the churn rate among new customers compared to long-term customers? 

**Question 7:** What is the churn rate among customers with higher tenure?

**Question 8:** Does the frequency of top-ups/recharges correlate with customer churn? 

## `Data Understanding`






#### Importations

In [1]:
# Data Analysis and Manipulation of Packages

import pyodbc     
from dotenv import dotenv_values   
import pandas as pd
import numpy as np
import scipy.stats as stats
import seaborn as sns
import matplotlib.pyplot as plt
import os
import warnings


warnings.filterwarnings('ignore')

#### Extracting Data

In [2]:
# Load from SQL Database source

# Load environment variables from .env file into a dictionary
environment_variables  = dotenv_values('.env')

# Get the values for the credentials you set in the '.env' file
database = environment_variables.get("DATABASE")
server = environment_variables.get("SERVER")
username = environment_variables.get("USERNAME")
password = environment_variables.get("PASSWORD")

In [7]:
# Construct the connection string
connection_string = f"DRIVER=ODBC Driver 17 for SQL Server;SERVER={server};DATABASE={database};UID={username};PWD={password}"


# Connect to the database
connection = pyodbc.connect(connection_string)

In [8]:
# SQL query to extract the data from Table1 
query = "SELECT * from dbo.LP2_Telco_churn_first_3000"
 
 # Execute the SQL query to load data from Table1 into pandas Dataframe
data_1= pd.read_sql(query, connection)

data_1

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,False,True,False,1,False,,DSL,False,...,False,False,False,False,Month-to-month,True,Electronic check,29.850000,29.850000,False
1,5575-GNVDE,Male,False,False,False,34,True,False,DSL,True,...,True,False,False,False,One year,False,Mailed check,56.950001,1889.500000,False
2,3668-QPYBK,Male,False,False,False,2,True,False,DSL,True,...,False,False,False,False,Month-to-month,True,Mailed check,53.849998,108.150002,True
3,7795-CFOCW,Male,False,False,False,45,False,,DSL,True,...,True,True,False,False,One year,False,Bank transfer (automatic),42.299999,1840.750000,False
4,9237-HQITU,Female,False,False,False,2,True,False,Fiber optic,False,...,False,False,False,False,Month-to-month,True,Electronic check,70.699997,151.649994,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2995,2209-XADXF,Female,False,False,False,1,False,,DSL,False,...,False,False,False,False,Month-to-month,False,Bank transfer (automatic),25.250000,25.250000,False
2996,6620-JDYNW,Female,False,False,False,18,True,True,DSL,True,...,True,False,False,False,Month-to-month,True,Mailed check,60.599998,1156.349976,False
2997,1891-FZYSA,Male,True,True,False,69,True,True,Fiber optic,False,...,False,False,True,False,Month-to-month,True,Electronic check,89.949997,6143.149902,True
2998,4770-UEZOX,Male,False,False,False,2,True,False,Fiber optic,False,...,False,False,False,False,Month-to-month,True,Electronic check,74.750000,144.800003,False
