## **Customer Churn Predictor**


### **Business Understanding**

##### **Problem Statement**
Customer retention is at the heart of most business models in their effort to increase their profit or revenue margin. Presently, most companies leverage machine learning to build classification models to perform churn analysis on their customers. The highly competitive nature of the telecommunications industry makes retaining customers extremely crucial. This project involves accessing and analyzing customer churn data from multiple sources, building a robust classification model, and helping a telecommunication company predict customer churn to improve retention strategies. The objective is to help a telecommunication company understand customer churn and its impact on profitability. 

##### **Goal and Objectives**

•    To understand the current customer churn rate.

•    To identify factors (such as demographics, usage patterns, etc.) that influence customer churn aiming to gain a deeper understanding of customer behavior

•    To build a predictive machine learning model to predict customer churn for a telecommunications company to forecast which customers are likely to churn

##### **Stakeholders**
•	Company Executives and Management

•	Data Science and Analytics Team

•	Customer Service and Support Teams

•	Marketing and Sales and Advertisement Teams:

•	Finance 

•	Legal and Compliance Team

##### **Key Metrics and Success Criteria**

•  Accuracy Requirement:

•	This model must achieve an accuracy score of at least 85% when evaluated on balanced data, ensuring a high proportion of correct predictions.
•  F1 Score Benchmark

•	Models should attain an F1 score greater than 0.80 (80%), indicating a strong balance between precision and recall, which is crucial for handling both false positives and false negatives effectively.

•  ROC Curve Standard:

•	An ROC curve with an area under the curve (AUC) of 80% is desired, demonstrating the model's ability to generalize well and maintain a good balance between sensitivity and specificity.

•  Baseline Models Requirement:

•	At least four different baseline models should be developed to serve as benchmarks. These could include logistic regression, decision trees, support vector machines, and k-nearest neighbors, providing a range of reference points for comparison.

•  Hyperparameter Tuning Condition:

Hyperparameter tuning will be conducted only on those baseline models that achieve an F1 score above the 0.80 threshold. This ensures that tuning efforts are concentrated on models that show initial promise and meet the performance criteria.


##### **Hypothesis**

•  Null Hypothesis (Ho): There is a significant difference in churn rates between customers who have received promotional offers and those who have not.

 •Alternate Hypothesis (Ha) : Customers who have not received any promotional offers in the past year have a higher churn rate.

     




##### **Analytical Questions**
I.	What are the key demographic and behavioral characteristics of customers who churn compared to those who stay, and how do these characteristics vary across different customer segments?

II.	What are the financial implications of churn, and how can retention strategies be optimized to maximize customer lifetime value (CLV) across different customer segments?

III.	Which factors have the highest influence on customer churn, and how do they interact with each other?

IV.	How do external factors such as economic conditions, competitor actions, and market trends influence customer churn rates?

V.	Is there a correlation between the customer’s payment method and churn rate?

##### **Scope and Constraints**
Some constraints of this project include, computational resources, model complexity, time limitations, stakeholder expectations, and ethical and legal considerations.

##### **Additional Information**

This project is to be completed in 4 weeks 




### **Data Understanding**

In [1]:
 #Importation of libraries 
 #Data manipulation and analysis
import pandas as pd
import numpy as np
 
# Database connectivity
import pyodbc
 
# Database ORM (optional)
from sqlalchemy import create_engine
 
# Data visualization
import matplotlib.pyplot as plt
import seaborn as sns
 
# Machine learning 
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
 


#### Database connectivity

In [2]:
#Connecting to the first database
# Now the sql query to get the data is what what you see below.
# Define the connection string
server = 'dap-projects-database.database.windows.net'
database = 'dapDB'
username = 'LP2_project'
password = 'Stat$AndD@t@Rul3'
conn_str = f'DRIVER={{ODBC Driver 17 for SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password}'

# Connect to the database
conn = pyodbc.connect(conn_str)

# Query the data
query = "SELECT * FROM dbo.LP2_Telco_churn_first_3000"
df = pd.read_sql(query, conn)

# Close the connection
conn.close()

# Display the data
print(df.head())


  df = pd.read_sql(query, conn)


   customerID  gender  SeniorCitizen  Partner  Dependents  tenure  \
0  7590-VHVEG  Female          False     True       False       1   
1  5575-GNVDE    Male          False    False       False      34   
2  3668-QPYBK    Male          False    False       False       2   
3  7795-CFOCW    Male          False    False       False      45   
4  9237-HQITU  Female          False    False       False       2   

   PhoneService MultipleLines InternetService OnlineSecurity  ...  \
0         False          None             DSL          False  ...   
1          True         False             DSL           True  ...   
2          True         False             DSL           True  ...   
3         False          None             DSL           True  ...   
4          True         False     Fiber optic          False  ...   

  DeviceProtection TechSupport StreamingTV StreamingMovies        Contract  \
0            False       False       False           False  Month-to-month   
1             

In [3]:
#Connecting to the second database

# URL of the CSV file
data2= "https://raw.githubusercontent.com/Azubi-Africa/Career_Accelerator_LP2-Classifcation/main/LP2_Telco-churn-second-2000.csv"

# Load the dataframe from the URL
df = pd.read_csv(data2)

# Display the first few rows to verify
print(df.head())


   customerID  gender  SeniorCitizen Partner Dependents  tenure PhoneService  \
0  5600-PDUJF    Male              0      No         No       6          Yes   
1  8292-TYSPY    Male              0      No         No      19          Yes   
2  0567-XRHCU  Female              0     Yes        Yes      69           No   
3  1867-BDVFH    Male              0     Yes        Yes      11          Yes   
4  2067-QYTCF  Female              0     Yes         No      64          Yes   

      MultipleLines InternetService OnlineSecurity  ... DeviceProtection  \
0                No             DSL             No  ...               No   
1                No             DSL             No  ...              Yes   
2  No phone service             DSL            Yes  ...              Yes   
3               Yes     Fiber optic             No  ...               No   
4               Yes     Fiber optic             No  ...              Yes   

  TechSupport StreamingTV StreamingMovies        Contract Pape

In [4]:
#Connecting to the third database

# file path of the Excel file
file_path = r"C:\Users\USER\Desktop\Telco-churn-last-2000.xlsx"

# Load Excel file into a DataFrame
Test_data = pd.read_excel(file_path)

# Display the first few rows of the DataFrame
print(Test_data.head())


   customerID gender  SeniorCitizen Partner Dependents  tenure PhoneService  \
0  7613-LLQFO   Male              0      No         No      12          Yes   
1  4568-TTZRT   Male              0      No         No       9          Yes   
2  9513-DXHDA   Male              0      No         No      27          Yes   
3  2640-PMGFL   Male              0      No        Yes      27          Yes   
4  3801-HMYNL   Male              0     Yes        Yes       1          Yes   

  MultipleLines InternetService       OnlineSecurity         OnlineBackup  \
0           Yes     Fiber optic                   No                   No   
1            No              No  No internet service  No internet service   
2            No             DSL                  Yes                   No   
3           Yes     Fiber optic                   No                   No   
4            No     Fiber optic                   No                   No   

      DeviceProtection          TechSupport          Streaming