### Tittle: Telco Churn Classification Project

In this project,we aim to find the likelihood of a customer leaving an organization, the key indicators of churn, as well as the retention strategies that can be implemented to avert this problem,thus, we focus on building classification models to perform churn analysis on customer data, a critical task for companies looking to enhance their revenue by retaining customers.

## 1.0 Business Understanding

### 1.1 Introduction
Customer churn  is a significant problem in the telecom industry as it results in reduced profit margin and negatively impacting long-term sustainability. 
Churn, which refers to customers discontinuing their service and moving to a competitor, can be driven by various factors such as charges, customer service quality, network coverage, and the competitiveness of offerings. The implications of high churn rates are:

* Revenue Loss
* Decreased ROI on marketing
* Reputational Damage due to Customer dissatisfaction
* Reduced Market Share and Growth
* Lower Employee Morale
* Financial Uncertainty

Due to this, Machne Learning and Advanced Analytics has proided us with the technologies to transform raw data into actions and insights. We will employ Classification models to get actionable insights.

Classification in Machine Learning is a supervised learning approach where the computer program learns from provided data to make new observations by classifying. Various classification algorithms such as logistic regression, decision trees, random forests, and gradient boosting will be explored to identify the most effective model for the given dataset.
The **objective** is to determine the class or category into which new data points will fall. 

In this project, an elaborate analysis will be conducted to train at least some models for predicting customer churn in a telecom company. This analysis will adhere to the **CRISP-DM framework**, ensuring a structured and systematic approach to model development and evaluation.

### 1.2 Project Objective
Objective of this project is to develop a classification model for churn analysis which is to predict whether customers are likely to leave or continue their relationship with the company. By identifying customers at risk of churning, the company can take proactive measures to retain them, thus increasing revenue and profit margins.

### 1.3 Data Description
The project will utilize historical data that contains details on customer behaviours and transactional details.
Datasets will be retrieved from various sources including a database, GitHub repository and Onedrive.


### 1.4 Success metrics
- `Good:` accurately predicting churn at least 75% measured with the harmonic f1-score metric. 

- `Excellent:` accurately predicting churn at least 80%.


### 1.5 Hypothesis
**Hypothesis 1**

`Null Hypothesis (Ho):` There is no significant difference in churn rates between customers with higher and lower monthly charge.

`Alternative Hypothesis (Ha):` There is a significant difference in churn rates between customers with higher and lower monthly charge.


### 1.6 Business Questions
1. What is the average tenure of customers who churned compared to those who stayed?
2. Do customers with partners or dependents have a lower churn rate?
3. Is there a correlation between the contract term (Contract) and customer churn?
4. What are the common payment methods (Payment Method) among customers who churned?
5. How does the availability of tech-related services (e.g., OnlineSecurity, TechSupport) impact churn rates?
6. What percentage of customers who churned had streaming services (StreamingTV, StreamingMovies)?
7. How does the total amount charged to customers (TotalCharges) correlate with churn behavior?


## 2.0 Data Understanding

### 2.1 Inspecting the dataset in depth.

A. Data Quality Assessment(info, duplicates, null values, describe etc.)

B. Univariate Analysis: Explore, analyze and visualize key variables independently of others

C. Bivariate Analysis: Explore, analyze and visualize the relationship variables pairs of different dimensions

D. Multivariate Analysis: Explore, analyze and visualize the relationship among variables

E. Answer Analytical Questions

F. Test Hypothesis:

G. Provide insights:

### 2.2 Installing and importing necessary libraries

In [None]:
# A package for creating a connection
%pip install pyodbc

In [None]:
# Database library
import pyodbc
import pandas as pd
import numpy as np


# Suppress all warnings
import warnings
warnings.filterwarnings('ignore')

# set display options to view all columns
pd.set_option("display.max_columns",None)

In [None]:
BASE_DIR = '../'
ENV_FILE = os.path.join(BASE_DIR, '.env')
SECOND_FILE = os.path.join(BASE_DIR, 'data/untouched/LP2_Telco-churn-second-2000.csv')
TEST_FILE = os.path.join(BASE_DIR, 'data/untouched/Telco-churn-last-2000.xlsx')
TRAIN_FILE = os.path.join(BASE_DIR, 'data/untouched/df_train.csv')
TRAIN_FILE_CLEANED = os.path.join(BASE_DIR, 'data/cleaned/df_train.csv')
SAVE_MODELS = os.path.join(BASE_DIR, 'models/')