# **Telco Customer Churn Project**

### Team: Team Namibia
#### Author: Brian Siaw

## Table of Contents


[**Business Understanding**](#Business-Understanding)

[**Data Understanding**](#Data-Understanding)

[**Exploratory Data Analysis**](#exploratory-data-analysis)

[**Data Preparation**](#Check-Data-Quality)

[**Hypothesis Testing**](#hypothesis-testing)

[**Analytical Questions**](#analytical-questions)
 



## Business Understanding
This project analyzes and predicts customer churn rate for a telecommunications company using Python and machine learning. Customer churn refers to the rate at which customers stop using the company's services. Identifying key factors influencing churn allows the company to implement strategies for customer retention and churn reduction.


#### Problem Statement:
The telecommunications company is experiencing a high rate of customer churn, which negatively impacts revenue and growth. Despite efforts to retain customers, the company lacks a systematic and data-driven approach to identifying the key factors driving churn and predicting which customers are most likely to leave. This project aims to address this issue by leveraging machine learning techniques to analyze customer data, identify patterns and predictors of churn, and develop a predictive model to help the company implement targeted retention strategies.



#### Goal and Objectives
##### Goal
 Identify factors contributing to customer churn and develop a predictive model to forecast churn accurately. This enables the company to take proactive measures to improve customer retention.

 ##### Objective
1. Data Collection & Cleaning: Gather and preprocess customer data from various sources.
2. Exploratory Data Analysis (EDA): Understand underlying patterns and trends in the data.
3. Feature Engineering: Create relevant features to improve the model's predictive power.
4. Model Building: Develop and train machine learning models to predict customer churn.
5. Model Evaluation: Evaluate model performance using appropriate metrics.
6. Recommendations: Provide actionable insights and recommendations to the Telco company toreduce churn.

#### Stakeholders
1. Management: Interested in overall churn rates and revenue impact.
2. Marketing Team: Needs to understand customer segments at higher churn risk.
3. Customer Service Team: Can use insights to improve customer support and retention strategies 
4. Product Development Team: Can use feedback to enhance service offerings.

#### Key Metrics and Success Criteria
#### Key Metrics:
1. Churn Rate: Percentage of customers who stop using the service within a specific period.
2. Accuracy: Proportion of correctly predicted churn vs. non-churn instances.
3. Precision & Recall: Precision measures positive prediction accuracy, while recall measures identifying actual churn cases.
4. F1 Score: Harmonic mean of precision and recall.
5. ROC-AUC: Area Under the Receiver Operating Characteristic Curve, indicating the model's ability to distinguish between classes.

#### Success Criteria:
1. Achieve a predictive model with at least 85% accuracy.
2. High precision and recall scores (above 80%).
3. Implement recommendations that result in a measurable decrease in churn rate over the next year.

#### Hypothesis (Null and Alternate)
Null Hypothesis (H0): There is no significant relationship between customer features (contract type, monthly charges, tenure, dependents, etc.) and customer churn.
Alternate Hypothesis (H1): There is a significant relationship between the selected customer features and customer churn.

#### Business Analytical Questions
1. What are the primary factors influencing customer churn in the telecommunications industry?
Objective: Identify and analyze key factors like service quality, pricing, customer service interactions, and contract types that contribute to customer churn. Look for trends and patterns in these factors to understand their impact on customer retention.

2. How does contract type affect the likelihood of customer churn?
Objective: Examine the relationship between different contract types (e.g., month-to-month, one-year, two-year contracts) and churn rates. Determine if certain contract types are associated with higher or lower churn.

3. Is there a correlation between customer service interactions and churn rates?
Objective: Investigate whether the frequency and nature of customer service interactions (e.g., number of service requests, resolution time, satisfaction levels) correlate with customer churn rates.

4. How do demographic factors (age, income) impact customer churn?
Objective: Assess the influence of demographic factors such as age and income on customer churn. Determine if certain demographic segments are more likely to churn and explore potential reasons.

5. What is the impact of monthly charges and tenure on the likelihood of a customer churning?
Objective: Analyze how monthly charges and customer tenure affect churn rates. Identify pricing thresholds that lead to higher churn and understand how customer loyalty changes over time.

6. Are specific customer segments at higher risk of churning?
Objective: Segment the customer base to identify groups at higher risk of churning. Use clustering and predictive modeling to determine high-risk segments and their characteristics.

#### Scope and Constraints
#### Scope:
Analysis limited to customer data provided by the telecommunications company.
Focus on developing a machine learning model to predict churn and providing actionable insights based on the model's findings.

#### Constraints:
1. Data availability and quality: The analysis depends on the accuracy and completeness of the customer data.
2. Resource limitations: Time and computational resources for data processing and model training.
3. Privacy concerns: Ensuring customer data is handled securely in order to maintain confidentiality and privacy.



## Data Understanding

The being used for this project is obtained from 3 data sources. The first two datasets will be used to train the model and the tbird will be used to test the model.

- The first dataset is hosted on a databse on the Microsoft SQL Server
- The second dataset was obtained from a Github repository
- The third dataset was obtained from OneDrive

### Load Data

#### Install pyodbc and python-dotenv if necessary

In [None]:
# For creating a connection
!pip install pyodbc 

# For loading environment variables
!pip install python-dotenv  

# For creating visualizations
!pip install matplotlib
!pip install seaborn

# For statistical model analysis
!pip install statsmodels

#For reading .xlsx files
!pip install openpyxl


#### Import the necessary packages

In [6]:
# Import the pyodbc library to handle ODBC database connections
import pyodbc 

# Import the dotenv function to load environment variables from a .env file
from dotenv import dotenv_values 

# Import the pandas library for data manipulation and analysis
import pandas as pd 
import numpy as np

# Import Matplotlib for  visualizations in Python
import matplotlib.pyplot as plt

# Import Seaborn for statistical data visualization based on Matplotlib
import seaborn as sns

# Import the warnings library to handle warning messages
import warnings

# Filter out (ignore) any warnings that are raised
warnings.filterwarnings('ignore')

# Import re for string manipulation (searching, matching, and modifying strings based on specific patterns)
import re

# Import for statistical model analysis
import statsmodels.api as sm
from statsmodels.formula.api import ols

#### Establishing a connection to the SQL database

In [7]:
# Load environment variables from .env file into a dictionary
environment_variables = dotenv_values('.env')

# Get the values for the credentials you set in the '.env' file
database = environment_variables.get('DATABASE')
server = environment_variables.get('SERVER')
username = environment_variables.get('UID')
password = environment_variables.get('PWD')

connection_string = f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password}"

In [8]:
print(connection_string)

DRIVER={SQL Server};SERVER=None;DATABASE=None;UID=None;PWD=None


In [None]:
# Use the connect method of the pyodbc library and pass in the connection string.
connection = pyodbc.connect(connection_string)

#### Load Dataset 1

In [None]:
query = 'Select * from dbo.LP2_Telco_churn_first_3000'
df1= pd.read_sql(query, connection)
df1.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,column10
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,,
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,,Pre-seed,
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,,
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,,


#### Load Dataset 2

In [None]:
df2 = pd.read_csv(r'\\Mac\Home\Downloads\Customer_Churn_ML_Prediction\dataset\LP2_Telco-churn-second-2000.csv')
df2.head()

#### Load Dataset 3

In [None]:
df3 = pd.read_excel(r'\\Mac\Home\Downloads\Customer_Churn_ML_Prediction\dataset\Telco-churn-last-2000.xlsx')
df3.head()

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...
