# Task 5: Data Cleaning with Pandas
**Author:** Aditya Singh Tomar
**Objective:** The goal of this task is to load the Titanic dataset, identify data quality issues (missing values, duplicates), and perform cleaning operations to prepare the data for analysis.

## 1. Loading the Dataset
First, we import the necessary libraries (`pandas` and `numpy`) and load the Titanic dataset. We will display the first few rows to understand the column structure and data types.

## 2. Identifying Missing Values
Before cleaning, we must identify which columns have null values. Using `.isnull().sum()`, we can see the count of missing entries for each feature.

## 3. Handling Missing Values
**Strategy:**
* **Age:** Since age is numerical, we fill missing values with the **median** to minimize the impact of outliers.
* **Embarked:** This is a categorical column, so we fill missing values with the **mode** (the most frequent port).
* **Cabin:** This column has too many missing values to be useful, so we **drop** it entirely.

## 4. Removing Duplicates
Duplicate rows can skew analysis and lead to incorrect statistical results. We use `.drop_duplicates()` to ensure every row represents a unique passenger.

## 5. Datatype Conversion
The `Age` column is currently a float (e.g., 22.0), but age should conceptually be an integer. We convert it using `.astype(int)` to standardize the format.

## 6. Feature Engineering (Creating New Columns)
We can extract more value from the data by creating new features. Here, we create a **`FamilySize`** column by combining `SibSp` (Siblings/Spouse) and `Parch` (Parents/Children) plus the passenger themselves.

## 7. Saving the Cleaned Data
Finally, we save the processed dataframe to a new CSV file (`cleaned_titanic_data.csv`). This file is now ready for visualization or modeling.