# üêº Notebook 02: Pandas Series and DataFrames

Congratulations. You've graduated from lists and dictionaries ‚Äî it‚Äôs time to wield a more powerful tool: **Pandas**.

This notebook introduces you to the two most essential data structures in data science:

- `Series`: a one-dimensional labeled array (like a dictionary and list had a well-organized child)
- `DataFrame`: a two-dimensional table with labeled axes (basically Excel‚Äôs smarter cousin)

---

In [None]:
import pandas as pd

## üî¢ Series from a List
Simple but powerful ‚Äî labels give it superpowers.

In [None]:
# A Series from a list ‚Äî just like a list, but fancier and labeled
prices = pd.Series([2.99, 4.49, 1.99], index=["Apple", "Milk", "Bread"])
print("Grocery Prices:")
print(prices)

# Accessing elements like a dictionary
print("\nPrice of Milk:", prices["Milk"])

## üó∫Ô∏è Series from a Dictionary
Keys become the index ‚Äî pretty intuitive, right?

In [2]:
import pandas as pd
population = {
    "Texas": 29_000_000,
    "California": 39_000_000,
    "New York": 19_000_000
}

state_pop = pd.Series(population)
print("State Populations:")
print(state_pop)

# Series support math!
print("\nPopulation in millions:")
print(state_pop / 1_000_000)

State Populations:
Texas         29000000
California    39000000
New York      19000000
dtype: int64

Population in millions:
Texas         29.0
California    39.0
New York      19.0
dtype: float64


## üìã Creating a DataFrame
Like a spreadsheet, but you control the universe.

In [None]:
# Creating a DataFrame ‚Äî the bread and butter of Pandas
data = {
    "Name": ["Alice", "Bob", "Charlie", "Diana"],
    "GPA": [3.9, 2.7, 3.4, 3.8],
    "Credits": [90, 45, 60, 120],
    "Graduating": [False, False, False, True]
}

students = pd.DataFrame(data)
print("Student DataFrame:")
print(students)

## üéØ Column Access + Row Access

In [None]:
# Access a column
print("\nGPA column:")
print(students["GPA"])

# Access a row
print("\nCharlie's record:")
print(students.loc[2])

## üß™ Inspect the DataFrame

In [None]:
# Shape and structure
print("\nShape:", students.shape)
print("\nColumns:", students.columns.tolist())

# Info dump
print("\nDataFrame Info:")
print(students.info())

# Stats summary
print("\nDataFrame Summary Stats:")
print(students.describe())

## üßº Rename a Column

In [None]:
students.rename(columns={"Graduating": "Is_Graduating"}, inplace=True)
print("\nRenamed column:")
print(students.head())

---
## üîç Your Turn

1. Create a `Series` from a dictionary of state abbreviations to population (use fake or real data).
2. Create a `DataFrame` for 5 students with columns: `Name`, `GPA`, `Credits`, `Graduating` (True/False).
3. Try accessing a row using `.loc[]` and a column using bracket notation.
4. Print the `.shape`, `.columns`, and `.info()` of your DataFrame.

üéØ Bonus: Rename a column just to mess with the future grader.

In [3]:
# Your data science destiny begins here.
import pandas as pd
population = {
    "TX": 29_000_000,
    "CAL": 39_000_000,
    "NY": 19_000_000,
    "FL": 20_000_000,
    "IL": 12_000_000
}

state_pop = pd.Series(population)
print("State Populations:")
print(state_pop)


State Populations:
TX     29000000
CAL    39000000
NY     19000000
FL     20000000
IL     12000000
dtype: int64


In [4]:
import pandas as pd

student_data = {
    "Name": ["Emily", "Frank", "Grace", "Henry", "Ivy"],
    "GPA": [3.5, 2.8, 4.0, 3.2, 3.9],
    "Credits": [75, 50, 100, 80, 110],
    "Graduating": [False, False, True, False, True]
}

students_df = pd.DataFrame(student_data)
print("Students DataFrame:")
print(students_df)

Students DataFrame:
    Name  GPA  Credits  Graduating
0  Emily  3.5       75       False
1  Frank  2.8       50       False
2  Grace  4.0      100        True
3  Henry  3.2       80       False
4    Ivy  3.9      110        True


In [5]:
# Access the 'GPA' column
print("\nGPA column:")
print(students_df["GPA"])

# Access the row at index 2 (Grace's record)
print("\nGrace's record:")
print(students_df.loc[2])


GPA column:
0    3.5
1    2.8
2    4.0
3    3.2
4    3.9
Name: GPA, dtype: float64

Grace's record:
Name          Grace
GPA             4.0
Credits         100
Graduating     True
Name: 2, dtype: object


In [6]:
# Print shape, columns, and info of the DataFrame
print("\nShape:", students_df.shape)
print("\nColumns:", students_df.columns.tolist())
print("\nDataFrame Info:")
students_df.info()


Shape: (5, 4)

Columns: ['Name', 'GPA', 'Credits', 'Graduating']

DataFrame Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Name        5 non-null      object 
 1   GPA         5 non-null      float64
 2   Credits     5 non-null      int64  
 3   Graduating  5 non-null      bool   
dtypes: bool(1), float64(1), int64(1), object(1)
memory usage: 257.0+ bytes


---
## üìé Side Notes
- A `Series` is like a single column of a spreadsheet.
- A `DataFrame` is like the full spreadsheet.
- Rows and columns can both have labels (called the **index** and **columns**).

Next stop: loading data from the real world. Brace yourself for CSVs.