In [1]:
import pandas as pd

# Load dataset
df = pd.read_csv("../data/customer_churn.csv")

print("\n--- CHECK MISSING VALUES (BEFORE CLEANING) ---")
print(df.isnull().sum())

# Convert TotalCharges to numeric (important step)
df["TotalCharges"] = pd.to_numeric(df["TotalCharges"], errors="coerce")

# Separate numeric and categorical columns
numeric_columns = df.select_dtypes(include=["int64", "float64"]).columns
categorical_columns = df.select_dtypes(include=["object"]).columns

# Fill numeric missing values with median
for col in numeric_columns:
    df[col].fillna(df[col].median(), inplace=True)

# Fill categorical missing values with most frequent value
for col in categorical_columns:
    df[col].fillna(df[col].mode()[0], inplace=True)

print("\n--- CHECK MISSING VALUES (AFTER CLEANING) ---")
print(df.isnull().sum())

# Save cleaned dataset
df.to_csv("../data/customer_churn_cleaned.csv", index=False)

print("\nDAY 4 COMPLETED: Missing values handled successfully")



--- CHECK MISSING VALUES (BEFORE CLEANING) ---
customerID          0
gender              0
SeniorCitizen       0
Partner             0
Dependents          0
tenure              0
PhoneService        0
MultipleLines       0
InternetService     0
OnlineSecurity      0
OnlineBackup        0
DeviceProtection    0
TechSupport         0
StreamingTV         0
StreamingMovies     0
Contract            0
PaperlessBilling    0
PaymentMethod       0
MonthlyCharges      0
TotalCharges        0
Churn               0
dtype: int64

--- CHECK MISSING VALUES (AFTER CLEANING) ---
customerID          0
gender              0
SeniorCitizen       0
Partner             0
Dependents          0
tenure              0
PhoneService        0
MultipleLines       0
InternetService     0
OnlineSecurity      0
OnlineBackup        0
DeviceProtection    0
TechSupport         0
StreamingTV         0
StreamingMovies     0
Contract            0
PaperlessBilling    0
PaymentMethod       0
MonthlyCharges      0
TotalCharges

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[col].fillna(df[col].median(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[col].fillna(df[col].median(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values