# Assignment -  Data Analytics Process and Interpretation
### **Business Domain -** Telecommunications
### **Dataset -** Telco Customer Churn (Kaggle)
### **Analytical Goal -** To identify the key drivers of customer attrition and provide data-driven recommendations to reduce the churn rate.

## 1. Initial Setup and Data Ingestion
We begin by importing the necessary libraries for data manipulation, statistical testing, and visualization.

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import chi2_contingency

### 1.1 Visual styling

In [6]:
sns.set_theme(style="whitegrid", context="notebook", font_scale=1.1)
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['axes.titlesize'] = 16
pd.set_option('display.max_columns', None)

### 1.2 Import data

In [7]:
from google.colab import drive
drive.mount('/content/drive')

path = "/content/drive/MyDrive/Telco-Customer-Churn.csv"
df = pd.read_csv(path)

print("Data loaded successfully.")

Mounted at /content/drive
Data loaded successfully.


## 2. Data Cleaning & Type Formatting
 Before analysis, we must handle structural issues. 'TotalCharges' contains 11 empty strings for customers with 0 tenure. We convert these to 0.0.

In [8]:
# Drop the CustomerID
df.drop('customerID', axis=1, inplace=True)

# Convert TotalCharges to numeric, coercing errors to NaN
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')

# Check for missing values
missing_val_count = df['TotalCharges'].isnull().sum()
print(f"Missing values in TotalCharges: {missing_val_count} \n")

# Impute missing values with 0 (consistent with 0 tenure)
df['TotalCharges'].fillna(0, inplace=True)

# Verify Types
df.info()


Missing values in TotalCharges: 11 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 20 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   gender            7043 non-null   object 
 1   SeniorCitizen     7043 non-null   int64  
 2   Partner           7043 non-null   object 
 3   Dependents        7043 non-null   object 
 4   tenure            7043 non-null   int64  
 5   PhoneService      7043 non-null   object 
 6   MultipleLines     7043 non-null   object 
 7   InternetService   7043 non-null   object 
 8   OnlineSecurity    7043 non-null   object 
 9   OnlineBackup      7043 non-null   object 
 10  DeviceProtection  7043 non-null   object 
 11  TechSupport       7043 non-null   object 
 12  StreamingTV       7043 non-null   object 
 13  StreamingMovies   7043 non-null   object 
 14  Contract          7043 non-null   object 
 15  PaperlessBilling  7043 non-null   object 
 16  Payme

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['TotalCharges'].fillna(0, inplace=True)


In [9]:
# Converting Categorical Features to Numerical Indexes

df['gender'] = df['gender'].replace({'Female': 1, 'Male': 0})

replace_cols = ['Partner', 'Dependents', 'PhoneService', 'MultipleLines', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection',
                'TechSupport', 'StreamingTV', 'StreamingMovies', 'PaperlessBilling','Churn']
for i in replace_cols:
    df[i] = df[i].replace({'No internet service': 0})
    df[i] = df[i].replace({'No phone service': 0})

    df[i] = df[i].replace({'No': 0})
    df[i] = df[i].replace({'Yes': 1})

  df['gender'] = df['gender'].replace({'Female': 1, 'Male': 0})
  df[i] = df[i].replace({'Yes': 1})


In [10]:
# Save a copy of the data to be used for other graphs.
df_original_multicats = df[['InternetService','Contract','PaymentMethod']]

# One-hot encoding of classification features for multiple classification scenarios
df = pd.get_dummies(df)