# Phase 3 – Data Exploration & Feature Engineering
## Objectives
Analyze the collected data to understand its structure, uncover meaningful insights, and engineer useful features for modeling.

## Tasks

### 1. Exploratory Data Analysis (EDA)
- Compute descriptive statistics (mean, median, standard deviation, etc.).
- Visualize data distribution and relationships using charts (histograms, scatter plots, correlation heatmaps, box plots, etc.).
- Identify trends, outliers, anomalies, and potential issues.
### 2. Insight Extraction
- Highlight important patterns or relationships relevant to your research problem.
- Describe how these findings guide your next steps (modeling or deeper analysis).
### 3. Feature Engineering
- Create new attributes from existing raw features (e.g., ratios, aggregated variables, domain-based transformations).
- Encode categorical variables if needed.
- Scale/normalize features where appropriate.
- Justify why each engineered feature might improve performance.
### Deliverables
A concise EDA and Feature Engineering Report including:
- Key statistics and summary tables
- Visualizations with clear explanations
- A list of extracted insights
- A table of selected features with description and justification

### 1. Exploratory Data Analysis (EDA)
- Compute descriptive statistics (mean, median, standard deviation, etc.).
- Visualize data distribution and relationships using charts (histograms, scatter plots, correlation heatmaps, box plots, etc.).
- Identify trends, outliers, anomalies, and potential issues.

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
# Load the dataset

data_file = "datasets/telco_churn_clean_stage_2.csv"

df = pd.read_csv(data_file)

print("The shape of the dataset is: ", df.shape, "\n")
df.head()

The shape of the dataset is:  (5901, 21) 



Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,churn
0,7590-vhveg,female,0,yes,no,1,no,no phone service,dsl,no,...,no,no,no,no,month-to-month,yes,electronic check,29.85,29.85,no
1,5575-gnvde,male,0,no,no,34,yes,no,dsl,yes,...,yes,no,no,no,one year,no,mailed check,56.95,1889.5,no
2,3668-qpybk,male,0,no,no,2,yes,no,dsl,yes,...,no,no,no,no,month-to-month,yes,mailed check,53.85,108.15,yes
3,7795-cfocw,male,0,no,no,45,no,no phone service,dsl,yes,...,yes,yes,no,no,one year,no,bank transfer (automatic),42.3,1840.75,no
4,9237-hqitu,female,0,no,no,2,yes,no,fiber optic,no,...,no,no,no,no,month-to-month,yes,electronic check,70.7,151.65,yes
