
# Data 101 — Module 5, Session 2
## Pandas Basics (Demo Notebook)

This notebook follows the Session 2 slide outline:
- Importing Pandas
- Series and DataFrames
- Loading a CSV
- Selecting columns and rows (`loc` vs `iloc`)
- Column slicing with `loc`
- Boolean filtering
- Modifying data
- Summary statistics and grouping


## Import Pandas

In [None]:

import pandas as pd
import numpy as np

print("pandas:", pd.__version__)
print("numpy:", np.__version__)


## Series

In [None]:
s = pd.Series([10, 20, 30], index=["a", "b", "c"])
display(s)

In [None]:
print("Access by label:", s["b"])

In [None]:
print("Access by position:", s[1])

## DataFrame

In [None]:

data = {"Name": ["Alex", "Jamie"], "GPA": [3.4, 3.8]}
df_small = pd.DataFrame(data)
display(df_small)


## Load Data from CSV

In [None]:
# A sample CSV is saved at: ./data/students.csv
df = pd.read_csv("./data/students.csv")
display(df.head())


In [None]:
print("\nInfo:")
display(df.info())

In [None]:
print("\nDescribe:")
display(df.describe())

In [None]:
print("\nDescribe:")
display(df.describe(include="all"))

## Selecting Columns

In [None]:
gpa_series = df["GPA"]
display(gpa_series.head())


In [None]:
name_gpa_df = df[["Name", "GPA"]]
display(name_gpa_df.head())

## Selecting Rows with `.loc` (label-based)

In [None]:

df = df.reset_index(drop=True)

row_label_0 = df.loc[0]
rows_0_to_2 = df.loc[0:2]
gpa_all_rows = df.loc[:, "GPA"]

display(row_label_0)
display(rows_0_to_2)
display(gpa_all_rows.head())


## Selecting Rows with `.iloc` (position-based)

In [None]:

first_row = df.iloc[0]
first_three_rows = df.iloc[0:3]
second_column_all_rows = df.iloc[:, 1]

display(first_row)
display(first_three_rows)
display(second_column_all_rows.head())


## `loc` vs `iloc` example

In [None]:

example_loc = df.loc[2, "GPA"]   # row label 2, column "GPA"
example_iloc = df.iloc[2, 3]     # third row, fourth column (0-based indexing)
print("df.loc[2, 'GPA']  ->", example_loc)
print("df.iloc[2, 3]     ->", example_iloc)


## Column Slicing with `loc` and a common pitfall

In [None]:

one_col = df.loc[:, "GPA"]
multi_cols = df.loc[:, ["Name", "GPA"]]
range_cols = df.loc[:, "Name":"GPA"]

display(one_col.head())
display(multi_cols.head())
display(range_cols.head())

try:
    bad = df[:, "GPA"]
except Exception as e:
    print("Using df[:, 'GPA'] raises ->", repr(e))


## Boolean Filtering

In [None]:

high_gpa = df[df["GPA"] > 3.5]
cs_or_math = df[(df["Major"] == "CS") | (df["Major"] == "Math")]
display(high_gpa)
display(cs_or_math)


## Modifying Data

In [None]:

df["Passed"] = df["GPA"] >= 2.0
df["GPA_rounded"] = df["GPA"].round(1)
display(df.head())


## Summary Statistics

In [None]:

print("mean:", df["GPA"].mean())
print("median:", df["GPA"].median())
print("mode:", df["GPA"].mode().tolist())
print("min:", df["GPA"].min())
print("max:", df["GPA"].max())
print("std:", df["GPA"].std())

display(df.describe())


## Grouping

In [None]:

avg_gpa_by_major = df.groupby("Major")["GPA"].mean().reset_index(name="Avg_GPA")
display(avg_gpa_by_major)



## Summary
- `pd.Series` and `pd.DataFrame` are core pandas objects.
- Use `.loc` for label-based and `.iloc` for position-based selection.
- Always select with `[rows, columns]` when slicing tables.
- Filter rows with boolean conditions.
- Create or transform columns for analysis.
- Compute descriptive statistics and group summaries.
