# Customer Churn Analysis for a Telecommunication Company

## Introduction

 ## 1.1 Business Understanding


### Description 

The goal of this project is to develop a machine learning model to predict customer churn for a telecommunications company. Churn refers to customers who stop using the company’s services. By identifying factors that influence churn, the company can develop strategies to retain customers, thus increasing revenue and profitability.

### Objective:

- Understand the data and identify key factors that affect customer churn.
- Build and evaluate a classification model to predict churn.
- Provide actionable insights to the company to reduce churn rates.



### Hypothesis Statements

**Null Hypothesis (H0)**: There is no significant difference in the features (demographics, usage patterns, etc.) between customers who churn and those who do not.

**Alternative Hypothesis (H1)**: There is a significant difference in the features (demographics, usage patterns, etc.) between customers who churn and those who do not.

### Analytical Question

**1. What are the key factors contributing to customer churn?**

This question seeks to identify the main features that influence whether a customer will leave the service. Understanding these factors can help in formulating retention strategies
**2. Can we predict the likelihood of a customer churning?**

This question aims to develop a predictive model to forecast which customers are at high risk of churning. This prediction can allow proactive measures to retain these customers.

**3. How does customer lifetime value vary between churned and retained customers?**

This question examines the financial impact of customer churn by comparing the lifetime value of churned customers versus those who stay. Insights from this analysis can help prioritize retention efforts based on customer value.

## 2. Data Understanding

In this section,I will import important libraries and datasets and also perform EDA to understand the data better. This includes visualizing the distribution of variables, identifying patterns, and discovering anomalies.

**Importations**

In [21]:
# Importing Library packages

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import warnings

warnings.filterwarnings('ignore')


#### Load Datasets 

In [19]:
# Database connection setup

from dotenv import dotenv_values

import pyodbc
            

In [24]:
# load environment variables from .env file into a dictionary
environment_variables  = dotenv_values(".env")

# Get the values for the credentials from .env file
server=environment_variables.get("server_name")
database=environment_variables.get("database_name")
username=environment_variables.get("user")
password=environment_variables.get("password")

# create a connection string
connection_string=f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password}"

connection = pyodbc.connect(connection_string)

# Call DataFrame to understand DataFrame details for 2020
query=  "SELECT * FROM dbo.LP2_Telco_churn_first_3000"
data_1 =pd.read_sql(query, connection)

data_1.head()
            

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,False,True,False,1,False,,DSL,False,...,False,False,False,False,Month-to-month,True,Electronic check,29.85,29.85,False
1,5575-GNVDE,Male,False,False,False,34,True,False,DSL,True,...,True,False,False,False,One year,False,Mailed check,56.950001,1889.5,False
2,3668-QPYBK,Male,False,False,False,2,True,False,DSL,True,...,False,False,False,False,Month-to-month,True,Mailed check,53.849998,108.150002,True
3,7795-CFOCW,Male,False,False,False,45,False,,DSL,True,...,True,True,False,False,One year,False,Bank transfer (automatic),42.299999,1840.75,False
4,9237-HQITU,Female,False,False,False,2,True,False,Fiber optic,False,...,False,False,False,False,Month-to-month,True,Electronic check,70.699997,151.649994,True
