# Week 2 — pandas Foundations (Series, DataFrames, Indexing)

**Goal:** Build confidence with pandas fundamentals: loading data, inspecting dtypes, selecting rows/columns, filtering, assigning new columns, and basic aggregations.

**Use ONE of these stable datasets (raw CSV links):**
- **Penguins** (clean, small):
  `https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins.csv`
- **Drinks by country** (FiveThirtyEight):
  `https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv`


## Setup

In [8]:
# Pick ONE dataset and set DATA_URL to it
DATA_URL = r"C:\Users\taylorlile\projects\data-science-career-accelerator\week_2\datasets\penguins.csv"

import pandas as pd
df = pd.read_csv(DATA_URL)
df.head()

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,male,2007
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,female,2007
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,female,2007
3,Adelie,Torgersen,,,,,,2007
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,female,2007


## Task 1 — Load & Inspect

**Expectations:**
- Show shape, columns, dtypes, and the first 5 rows.
- Identify any obvious missing values.


In [22]:
# Shape
print('df shape:',df.shape)

# Columns
print('Columns:',list(df.columns))

# dtypes


# 5 rows
df.head()

df shape: (344, 8)
Columns: ['species', 'island', 'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'sex', 'year']


Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,male,2007
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,female,2007
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,female,2007
3,Adelie,Torgersen,,,,,,2007
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,female,2007


## Task 2 — Column Selection & Renaming

**Expectations:**
- Select 3–5 columns relevant to simple questions you want to answer.
- If needed, rename columns to snake_case for consistency.
- Briefly explain *why* you chose these columns.

In [None]:
# Your code here

## Task 3 — Row Filtering

**Expectations:**
- Create at least 3 different filters (e.g., equality, string contains, numeric threshold, date-based).
- Show the count of rows that match each filter.
- Keep outputs tidy (head() and counts; avoid flooding the screen).

In [None]:
# Your code here

## Task 4 — Derived Columns (Assign)

**Expectations:**
- Create 1–2 new columns derived from existing ones.
- Explain the logic in 1–2 sentences.

In [None]:
# Your code here

## Task 5 — Basic Aggregations

**Expectations:**
- Compute simple aggregates (count, mean/median, min/max).
- If dataset is mostly categories, compute frequencies.
- Show results in a small, readable table.

In [None]:
# Your code here

## Task 6 — Groupby Warm‑Up

**Expectations:**
- Group by 1 categorical column and compute 1–2 aggregates.
- Sort the result and show the top/bottom few groups.
- Add one short sentence interpreting what you see.

In [None]:
# Your code here

## (Optional) Task 7 — Tiny Visualization

**Expectations:**
- Make a small bar/line plot from your groupby output.
- Keep it simple; labels and a title required.

In [None]:
# Your code here

## Wrap‑Up — Summary

In 3–5 bullets, summarize what you learned about the dataset and any data quality issues you noticed.