# 🧪 Mini AI Project: Data Cleaning & Exploration
Welcome to your team-based mini-project! In this notebook, you'll work together to explore and clean a small dataset. This is a practical introduction to the data preparation phase in an AI project pipeline.

👉 **Goal**: Identify and clean common issues in a real-world dataset.
👉 **Skills practiced**: data loading, exploration, cleaning, and documentation.


In [None]:
# 📦 Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


In [None]:
# 📂 Load the dataset
# You can replace this path with any dataset of your choice
url = 'https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv'
df = pd.read_csv(url)
df.head()

In [None]:
# 🔍 Quick data overview
df.info()
df.describe()

In [None]:
# 🕳️ Check for missing values
df.isnull().sum()

In [None]:
# 🧼 Example cleaning: Fill missing Age values with median
df['Age'].fillna(df['Age'].median(), inplace=True)
# Drop rows with missing Embarked
df.dropna(subset=['Embarked'], inplace=True)
df.isnull().sum()

In [None]:
# 📊 Visualize some data
sns.histplot(df['Age'], bins=20)
plt.title('Distribution of Age')
plt.show()

## 🧠 Team Reflection
- What kinds of data issues did you find?
- How did you decide what to clean or keep?
- What ethical concerns might arise from this data?
