 Customer Churn Classification Project 

Overview:
This project aims to analyze customer churn in a telecommunication company using classification models. We will explore and analyze the provided datasets to gain insights into factors influencing customer retention. The goal is to build machine learning models capable of predicting customer churn and offer valuable insights to the company for strategic decision-making.

1.0 Business Understanding:

   1.1 Objective:
   The main objective is to identify and understand factors influencing customer churn in the telecommunication industry. We aim to build an effective classification model to predicting whether a customer will churn or not.  

1.1 Explanation of features

The following describes the columns present in the data.
1.	Gender -- Whether the customer is a male or a female
2.	SeniorCitizen -- Whether a customer is a senior citizen or not
3.	Partner -- Whether the customer has a partner or not (Yes, No)
4.	Dependents -- Whether the customer has dependents or not (Yes, No)
5.	Tenure -- Number of months the customer has stayed with the company
6.	Phone Service -- Whether the customer has a phone service or not (Yes, No)
7.	MultipleLines -- Whether the customer has multiple lines or not
8.	InternetService -- Customer's internet service provider (DSL, Fiber Optic, No)
9.	OnlineSecurity -- Whether the customer has online security or not (Yes, No, No Internet)
10.	OnlineBackup -- Whether the customer has online backup or not (Yes, No, No Internet)
11.	DeviceProtection -- Whether the customer has device protection or not (Yes, No, No internet service)
12.	TechSupport -- Whether the customer has tech support or not (Yes, No, No internet)
13.	StreamingTV -- Whether the customer has streaming TV or not (Yes, No, No internet service)
14.	StreamingMovies -- Whether the customer has streaming movies or not (Yes, No, No Internet service)
15.	Contract -- The contract term of the customer (Month-to-Month, One year, Two year)
16.	PaperlessBilling -- Whether the customer has paperless billing or not (Yes, No)
17.	Payment Method -- The customer's payment method (Electronic check, mailed check, Bank transfer(automatic), Credit card(automatic))
18.	MonthlyCharges -- The amount charged to the customer monthly
19.	TotalCharges -- The total amount charged to the customer
20.	Churn -- Whether the customer churned or not (Yes or No)

1.2 Hypothesis (Null and Alternate):
   - Null Hypothesis (H0): There is no significant relationship between the Total Charges and customer churn.
   - Alternate Hypothesis (H1): There is a significant relationship between the Total Charges and customer churn.
   Ho : There seems to be no relationship between gender and whether the person is a senior citizen and the churn rate
   H1: There is a relationship between age and senior citizen and churn rate

1.3 Research/Analytical Questions:
  
   - How do different services (PhoneService, MultipleLines, InternetService, OnlineSecurity, OnlineBackup, etc.) contribute to customer churn?
   - Is there any correlation between gender, age (SeniorCitizen), having a partner, having dependents, and customer churn?
   - How does the contract type (Contract) and billing preferences (PaperlessBilling, PaymentMethod) influence customer churn?
   - Is there a correlation between monthly charges (MonthlyCharges), total charges (TotalCharges), and customer churn?
   - What insights can be gained from the analysis to improve customer retention?
   - Is there a relationship between gender and the churn rate
   - How does internet service affect churn rate?
   - Are senior citizens more likely to churn


2.0 Data Understanding:
 
2.1 Importation:
   Import necessary libraries and modules for data analysis and machine learning.

In [1]:
# Data manipulation packages
import pandas as pd
import numpy as np

#Data Visualization packages
import matplotlib.pyplot as plt
import seaborn as sns
import os

# Machine learning Packages
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler,MinMaxScaler,RobustScaler
from sklearn.preprocessing import OneHotEncoder , LabelEncoder , OrdinalEncoder
from sklearn.preprocessing import FunctionTransformer
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.model_selection import train_test_split
from sklearn import set_config
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import GradientBoostingClassifier
from scipy.stats import pearsonr,stats as stats
from sklearn.model_selection import cross_val_score
from imblearn.under_sampling import RandomUnderSampler
from imblearn.over_sampling import RandomOverSampler,SMOTE
from sklearn.feature_selection import SelectKBest,mutual_info_classif
from imblearn.over_sampling import SMOTE
from sklearn.datasets import make_classification
from sklearn.metrics import confusion_matrix
from sklearn.metrics import roc_auc_score,roc_curve,auc
from sklearn.model_selection import GridSearchCV
from imblearn.pipeline import Pipeline as imbpipeline


# Database connection package
import pyodbc

# Ignore warnings (optional)
import warnings
warnings.filterwarnings("ignore")


In [1]:
# Loading environment variables from .env file into a dictionary
environment_variables = dotenv_values('.env')

# Getting the values for the credentials set in the '.env' file
server = environment_variables.get("SERVER")
database = environment_variables.get("DATABASE")
username = environment_variables.get("USERNAME")
password = environment_variables.get("PASSWORD")


connection_string = f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password}"
conn = pyodbc.connect(connection_string)

NameError: name 'dotenv_values' is not defined