# Phase 1: Data Loading & Cleaning
**Objective:** Prepare the raw Telco dataset for analysis by fixing data types and handling missing values.

### 1.1 Load Libraries & Dataset
First, we import the necessary libraries for data manipulation (Pandas, NumPy) and visualization (Matplotlib, Seaborn). We then load the dataset to begin our inspection.

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
df = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv')

# Display the first 5 rows to verify loading
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


### 1.2 Data Cleaning: The `TotalCharges` Fix
**Critical Issue:** The `TotalCharges` column is initially read as an `object` (string) because it contains blank strings (`" "`) for customers with 0 tenure. If ignored, this predictive feature would be treated as categorical.

**Action:**
1. Force-convert `TotalCharges` to numeric, turning errors into `NaN`.
2. Identify and count these missing values.
3. Impute `NaN` with `0`, as these customers have not yet been charged.

In [3]:
# Force convert to numeric, turning errors (blank strings) into NaN
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')

# Check how many missing values we created (these were the blank strings)
missing_count = df['TotalCharges'].isnull().sum()
print(f"Missing values in TotalCharges: {missing_count}")

# Fill missing values with 0
df['TotalCharges'] = df['TotalCharges'].fillna(0)

Missing values in TotalCharges: 11
