# 📂 Notebook 03: Loading and Exploring Data

Welcome to the real world — where data lives in messy CSVs and your job is to make sense of it.

In this notebook, you’ll:
- Load a CSV file using `pd.read_csv()`
- Use `.head()`, `.tail()`, `.info()`, and `.describe()` to explore your data
- Identify potential issues (missing values, bad types, oddball rows)

Let’s get our hands dirty.
---

In [None]:
import pandas as pd

## 📥 Load a CSV

Replace the filename below with a real CSV path or URL. For testing, use built-in seaborn datasets.

In [1]:
# Example with seaborn's Titanic dataset
import seaborn as sns
df = sns.load_dataset("titanic")
df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


## 📊 Peek at the Data

In [None]:
# First and last few rows
print("First 5 rows:")
print(df.head())

print("\nLast 5 rows:")
print(df.tail())

## 🧠 Understand Structure

In [None]:
# Dimensions and columns
print("Shape:", df.shape)
print("\nColumns:", df.columns.tolist())

## 🧼 Data Types & Missing Values

In [None]:
# Data types and null counts
print("\nInfo:")
df.info()

# How many missing values per column?
print("\nMissing values per column:")
print(df.isnull().sum())

## 📈 Summary Statistics

In [None]:
df.describe(include="all")

## ✏️ Renaming Columns (Optional but fun)

In [None]:
# Rename 'sex' to 'gender' for clarity
df.rename(columns={"sex": "gender"}, inplace=True)
df.head(2)

---
## 🔍 Your Turn

1. Load a dataset of your choice (`pd.read_csv()` or `sns.load_dataset()`)
2. Print the first and last 5 rows.
3. Show `.info()` and `.describe()` results.
4. Print the column names. Rename one of them.

🎯 **Bonus:** What percentage of rows have *any* missing values?

```python
# HINT:
df.isnull().any(axis=1).mean() * 100  # percent of rows with any NaNs
```


In [11]:
# Your exploratory code playground!
import pandas as pd
titanic = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv")
print(titanic.head())
print(titanic.tail())
print(titanic.info())
print(titanic.describe())
print(titanic.columns.tolist())
titanic.rename(columns={'total_bill': 'bill'}, inplace=True)
print(titanic)
print(titanic.isnull().any(axis=1).mean() * 100)

   total_bill   tip     sex smoker  day    time  size
0       16.99  1.01  Female     No  Sun  Dinner     2
1       10.34  1.66    Male     No  Sun  Dinner     3
2       21.01  3.50    Male     No  Sun  Dinner     3
3       23.68  3.31    Male     No  Sun  Dinner     2
4       24.59  3.61  Female     No  Sun  Dinner     4
     total_bill   tip     sex smoker   day    time  size
239       29.03  5.92    Male     No   Sat  Dinner     3
240       27.18  2.00  Female    Yes   Sat  Dinner     2
241       22.67  2.00    Male    Yes   Sat  Dinner     2
242       17.82  1.75    Male     No   Sat  Dinner     2
243       18.78  3.00  Female     No  Thur  Dinner     2
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   total_bill  244 non-null    float64
 1   tip         244 non-null    float64
 2   sex         244 non-null    object 
 3   smoker      244 non-

---
## 🎓 Why This Matters

Every data science project begins here — importing and inspecting data. If your data is a mess (spoiler: it always is), you need to know how to check it, clean it, and prep it.

Next up: slicing and dicing — using `.loc[]`, `.iloc[]`, and boolean masks to select what you want and ignore what you don't.