# Customer Churn Prediction - Telecom Dataset

## Project Overview
This project aims to analyze customer data from a telecommunications company to predict **customer churn** — identifying customers who are likely to leave the service. By understanding the factors influencing churn, the company can improve customer retention strategies and customer satisfaction.

## Dataset Source
- **Name:** Telco Customer Churn Dataset
- **Source:** [Telco Customer Churn - Kaggle](https://www.kaggle.com/datasets/blastchar/telco-customer-churn)
- **Number of Records:** 7,043 customers
- **Number of Features:** 21 columns

## Data Dictionary
| Column Name         | Description                                                                                      |
|--------------------|--------------------------------------------------------------------------------------------------|
| **customerID**      | Unique identifier for each customer.                                                             |
| **gender**          | Gender of the customer — either Male or Female.                                                   |
| **SeniorCitizen**   | Indicates if the customer is a senior citizen (1 = Yes, 0 = No).                                 |
| **Partner**         | Whether the customer has a partner — Yes or No.                                                   |
| **Dependents**      | Whether the customer has dependents — Yes or No.                                                  |
| **tenure**           | Number of months the customer has been with the company.                                          |
| **PhoneService**    | Whether the customer has phone service — Yes or No.                                               |
| **MultipleLines**   | Whether the customer has multiple phone lines — Yes, No, or No phone service.                     |
| **InternetService** | Type of internet service — DSL, Fiber optic, or No internet service.                             |
| **OnlineSecurity**  | Whether the customer subscribes to online security services — Yes, No, or No internet service.    |
| **OnlineBackup**    | Whether the customer subscribes to online backup services — Yes, No, or No internet service.      |
| **DeviceProtection**| Whether the customer subscribes to device protection services — Yes, No, or No internet service.  |
| **TechSupport**     | Whether the customer subscribes to tech support services — Yes, No, or No internet service.       |
| **StreamingTV**     | Whether the customer subscribes to streaming TV services — Yes, No, or No internet service.       |
| **StreamingMovies** | Whether the customer subscribes to streaming movie services — Yes, No, or No internet service.    |
| **Contract**        | Type of contract — Month-to-month, One year, or Two year.                                        |
| **PaperlessBilling**| Whether the customer uses paperless billing — Yes or No.                                          |
| **PaymentMethod**   | Payment method used by the customer — Electronic check, Mailed check, Bank transfer, or Credit card. |
| **MonthlyCharges**  | Monthly charges incurred by the customer.                                                         |
| **TotalCharges**    | Total charges incurred during the customer’s tenure.                                              |
| **Churn**           | Whether the customer churned (left the service) — Yes or No.                                      |

## Project Objectives
- Perform **Exploratory Data Analysis (EDA)** to uncover trends and patterns related to customer churn.
- Identify the **key factors driving customer churn**.
- Build and evaluate **classification models** to predict which customers are likely to churn.
- Provide **business recommendations** to help reduce customer churn and improve retention.

## Tools & Libraries
- **Python**
- **Pandas, NumPy** (Data Manipulation)
- **Matplotlib, Seaborn** (Data Visualization)
- **Scikit-learn** (Machine Learning)

## Project Workflow
1. **Data Understanding & Loading**
2. **Data Cleaning & Preprocessing**
3. **Exploratory Data Analysis (EDA)**
4. **Feature Engineering (if needed)**
5. **Model Training & Evaluation**
6. **Insights & Business Recommendations**


## Importing Libraries

In this section, we import all the necessary libraries required for data loading, exploration, visualization, modeling, and evaluation.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, classification_report,roc_auc_score
from sklearn.metrics import roc_curve, auc
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from imblearn.over_sampling import SMOTE
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier

## Data Loading and Initial Exploration

In this section, we load the dataset and perform basic data checks to understand its structure, spot potential issues (like missing values), and plan the next steps for data cleaning and preprocessing.


In [4]:
# Load the dataset
url = "C:\\Users\\attafuro\\Desktop\\Customer Churn Prediction\\WA_Fn-UseC_-Telco-Customer-Churn.csv"
df = pd.read_csv(url)

# First look at the data
print("First 5 Rows of the Dataset:")
display(df.head())

#Basic Info
print("\n Basic Information About the Dataset:")
df.info() 

# Check for missing values
print("\n Missing Values Per Column:")
print(df.isnull().sum())

# Check shape
print(f"\n Dataset contains {df.shape[0]} rows and {df.shape[1]} columns.")

# Check unique values in each column 
print("\n Unique Values Per Column:")
for col in df.columns:
    print(f"{col}: {df[col].nunique()} unique values")
    
# Check class distribution (how many churn vs not churn)
print("\n Churn (Target - Churn):")
print(df['Churn'].value_counts())
print(df['Churn'].value_counts(normalize=True) * 100)


First 5 Rows of the Dataset:


Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes



 Basic Information About the Dataset:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  Pap

## Cleaning the Data