### **Customer Churn Prediction**

### **Business Understanding**

#### Customer attrition is one of the biggest expenditures of any organization. Customer churn otherwise known as customer attrition or customer turnover is the percentage of customers that stopped using your company's product or service within a specified timeframe.

#### For instance, if you began the year with 500 customers but later ended with 480 customers, the percentage of customers that left would be 4%. If we could figure out why a customer leaves and when they leave with reasonable accuracy, it would immensely help the organization to strategize their retention initiatives manifold.

#### In this project, we aim to find the likelihood of a customer leaving the organization, the key indicators of churn as well as the retention strategies that can be implemented to avert this problem


##### **Problem Statement**: A telecommunications company intends to understand the lifetime value of each customer and know what factors affect the rate at which customers stop using their network. The company would like to build a model that predicts whether a customer will churn or not.

##### **Goal**: To build a machine learning model that predicts whether a customer will churn or not

#### **Hypothesis**

##### **Null Hypothesis**: There is no statistically significant relationship between tenure and customer churn 
##### **Alternative hypothesis**: There is a statistically significant relationship between tenure and customer churn 






#### Analytical Questions
1. How do monthly charges and total charges impact customer churning?
2. How does tech support influence the likelihood of a customer to churn or not?
3. How does tenure impact customer churning?
4. What are the most influential factors that influence the likelihood of a customer to churn or not?
5. What are the least influential factors that influence the likelihood of a customer to churn or not? 
 

#### Project Timeline
- Modeling - 1 week
- Strategy & business insights - 1 week

## **Data Understanding**

#### **Import Packages**

In [1]:
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np 

In [3]:
import pyodbc    
from dotenv import dotenv_values
import pandas as pd
import warnings 

warnings.filterwarnings('ignore')

#### **Data Loading**

In [7]:

# Fill in database credentials to connect to it
server = 'dap-projects-database.database.windows.net'
database = 'dapDB'
username = 'LP2_project'
password = 'Stat$AndD@t@Rul3'

connection_string = f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password};MARS_Connection=yes;MinProtocolVersion=TLSv1.2;"

# Connect to the database
connection = pyodbc.connect(connection_string)


In [8]:
# create query to connect to data set
query = "SELECT * FROM dbo.LP2_Telco_churn_first_3000"

data = pd.read_sql(query, connection)

#display the data
data.head(5)

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,False,True,False,1,False,,DSL,False,...,False,False,False,False,Month-to-month,True,Electronic check,29.85,29.85,False
1,5575-GNVDE,Male,False,False,False,34,True,False,DSL,True,...,True,False,False,False,One year,False,Mailed check,56.950001,1889.5,False
2,3668-QPYBK,Male,False,False,False,2,True,False,DSL,True,...,False,False,False,False,Month-to-month,True,Mailed check,53.849998,108.150002,True
3,7795-CFOCW,Male,False,False,False,45,False,,DSL,True,...,True,True,False,False,One year,False,Bank transfer (automatic),42.299999,1840.75,False
4,9237-HQITU,Female,False,False,False,2,True,False,Fiber optic,False,...,False,False,False,False,Month-to-month,True,Electronic check,70.699997,151.649994,True


In [14]:
# Load the second data set

second_data = pd.read_csv("C:/Users/Zawadi/Desktop/Azubi/Customer-Churn-Prediction/Data/LP2_Telco-churn-second-2000.csv")

# Display first five rows of the data

second_data.head(5)


Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,5600-PDUJF,Male,0,No,No,6,Yes,No,DSL,No,...,No,Yes,No,No,Month-to-month,Yes,Credit card (automatic),49.5,312.7,No
1,8292-TYSPY,Male,0,No,No,19,Yes,No,DSL,No,...,Yes,Yes,No,No,Month-to-month,Yes,Credit card (automatic),55.0,1046.5,Yes
2,0567-XRHCU,Female,0,Yes,Yes,69,No,No phone service,DSL,Yes,...,Yes,No,No,Yes,Two year,Yes,Credit card (automatic),43.95,2960.1,No
3,1867-BDVFH,Male,0,Yes,Yes,11,Yes,Yes,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,74.35,834.2,Yes
4,2067-QYTCF,Female,0,Yes,No,64,Yes,Yes,Fiber optic,No,...,Yes,Yes,Yes,Yes,Month-to-month,Yes,Electronic check,111.15,6953.4,No
