# Telecom Customer Churn Prediction

![Churn Overview](images/telc.jpg)

## Project Overview
This project explores customer churn in a telecom company. The goal is to analyze customer behavior, identify patterns associated with churn, and build a model that predicts which customers are likely to leave.

## Business Problem
**As a stakeholder:**  
In a growing telecom company, I’ve observed a troubling pattern — we’re losing customers at an increasing rate each month. Despite competitive pricing and a wide range of services, customer churn continues to rise, cutting into our recurring revenue and increasing customer acquisition costs.
From our current customer data, out of 7,043 customers, 1,869 have churned — that's roughly 26.5% of our customer base. This is a significant red flag.
After several internal reviews, it's clear that retaining existing customers is more cost-effective than acquiring new ones. But we currently lack a systematic approach to identify which customers are likely to leave — and why.


### Objectives:
- Understand which factors most influence churn
- Build a model to predict the likelihood of churn
- Provide actionable recommendations to reduce churn

## Dataset Description

**Source:** [Kaggle - Telco Customer Churn](https://www.kaggle.com/datasets/blastchar/telco-customer-churn)

The dataset contains 21 columns including:
- Customer demographics
- Account information
- Services subscribed
- Monthly charges
- Whether they churned (`Churn`)

## Load and Inspect the Data

In [25]:
#Importing Nesecary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np


sns.set(style="whitegrid", rc={
    'axes.grid': True,
    'axes.grid.axis': 'y',
    'grid.color': 'dimgray',
    'grid.linestyle': '-',
    'grid.linewidth': 0.7
})
%matplotlib inline

In [27]:
# Load dataset
df = pd.read_csv("Data\WA_Fn-UseC_-Telco-Customer-Churn.csv")

print("Shape of dataset:", df.shape)
df.head()


Shape of dataset: (7043, 21)


Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


## CLEANING AND EDA

### CLEANING

In [29]:
# Checking for missing values and duplicates
print(df.isna().sum(), '\n')
print(f'Number of duplicated rows: {df.duplicated().sum()}')

customerID          0
gender              0
SeniorCitizen       0
Partner             0
Dependents          0
tenure              0
PhoneService        0
MultipleLines       0
InternetService     0
OnlineSecurity      0
OnlineBackup        0
DeviceProtection    0
TechSupport         0
StreamingTV         0
StreamingMovies     0
Contract            0
PaperlessBilling    0
PaymentMethod       0
MonthlyCharges      0
TotalCharges        0
Churn               0
dtype: int64 

Number of duplicated rows: 0


In [36]:
df.describe()

Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges
count,7043.0,7043.0,7043.0
mean,0.162147,32.371149,64.761692
std,0.368612,24.559481,30.090047
min,0.0,0.0,18.25
25%,0.0,9.0,35.5
50%,0.0,29.0,70.35
75%,0.0,55.0,89.85
max,1.0,72.0,118.75
