### Telecom Customer Churn Analysis and Prediction

This project aims to analyze customer churn behavior in a telecom company, identify key factors leading to churn, and build a machine learning model to predict high-risk customers.
The insights are visualized using a Tableau dashboard to support data-driven retention strategies.

### Business Problem

The telecom industry faces high customer churn due to intense competition and low switching costs. Customer churn leads to significant revenue loss, as acquiring new customers is more expensive than retaining existing ones. The objective of this project is to understand churn patterns and predict customers who are likely to leave the service.

### Business objective
- Identify key factors influencing customer churn
- Analyze churn patterns across customer segments
- Build a machine learning model to predict churn
- Present insights through an interactive Tableau dashboard

## Machine Learning Problem Framing
- Problem Type: Classification
- Learning Type: Supervised Learning
- Target Variable: Churn
- Output: Yes / No

### DATASET

In [11]:
import pandas as pd
import numpy as np

df = pd.read_csv(r'C:\Users\AKSHAT\Telecom_Churn_Project/Data/Telco-Customer-Churn.csv')

### Data Preview

In [12]:
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [14]:
### Dataset shape
df.shape

(7043, 21)

The dataset contains customer-level records with multiple demographic, service, and billing attributes.

### Column overview

## Feature Description

### Customer Demographics
- gender
- SeniorCitizen
- Partner
- Dependents

### Service Information
- PhoneService
- InternetService
- OnlineSecurity
- StreamingTV

### Account Information
- tenure
- Contract
- PaymentMethod

### Billing Information
- MonthlyCharges
- TotalCharges

### Target Variable
- Churn

In [17]:
### DATA TYPE
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 


### Observations and next steps

## Initial Observations
- The dataset contains both categorical and numerical features
- The target variable Churn is binary
- TotalCharges is stored as an object and requires cleaning
- Data cleaning is required before analysis and modeling

## Next Steps
- Handle data type inconsistencies
- Clean missing values
- Prepare data for exploratory data analysis