# **AI TECH INSTITUTE** · *Intermediate AI & Data Science*
### Week 01 · Notebook 01 — Introduction to Pandas & Series
**Instructor:** Amir Charkhi  |  **Goal:** Master the transition from Python basics to Pandas data structures.

> Format: short theory → quick practice → build understanding → mini-challenges.


---
## Learning Objectives
- Understand why we need Pandas for data analysis
- Master Series creation and manipulation
- Connect Week 0 Python concepts to Pandas operations
- Prepare for DataFrames

## 1. Why Pandas? From Lists to Series
Remember our Week 0 lists? Let's see why we need something more powerful.

In [1]:
# Week 0 way - calculating average temperature
temps = [22.5, 23.1, 21.8, 24.2, 22.9]
avg_temp = sum(temps) / len(temps)
print(f"Average (Python list): {avg_temp:.2f}°C")

# What if we want temps above 23?
above_23 = [t for t in temps if t > 23]
print(f"Temps > 23: {above_23}")

Average (Python list): 22.90°C
Temps > 23: [23.1, 24.2]


In [2]:
import pandas as pd
import numpy as np

In [2]:
# Week 1 way - with Pandas!
# Create a Series (like a smart list with superpowers)
temps_series = pd.Series([22.5, 23.1, 21.8, 24.2, 22.9])
print("Pandas Series:")
print(temps_series)
print(f"\nAverage (Pandas): {temps_series.mean():.2f}°C")
print(f"\nTemps > 23:")
print(temps_series[temps_series > 23])

Pandas Series:
0    22.5
1    23.1
2    21.8
3    24.2
4    22.9
dtype: float64

Average (Pandas): 22.90°C

Temps > 23:
1    23.1
3    24.2
dtype: float64


**Exercise 1 — Feel the Difference (easy)**  
Create a Series of 5 student scores and find: mean, max, min, and scores above 80.


In [4]:
# Your turn
score_series = pd.Series([87.3, 65.8, 75.6, 92.8, 72.4])
print("Score Series:")
print(score_series)
print(f"\nHighest (Score): {score_series.max():.2f}")
print(f"\nAverage (Score): {score_series.mean():.2f}")
print(f"\nLowest (Score): {score_series.min():.2f}")
print(f"\nScore > 80:")
print(score_series[score_series > 80])

Score Series:
0    87.3
1    65.8
2    75.6
3    92.8
4    72.4
dtype: float64

Highest (Score): 92.80

Average (Score): 78.78

Lowest (Score): 65.80

Score > 80:
0    87.3
3    92.8
dtype: float64


<details>
<summary><b>Solution</b></summary>

```python
scores = pd.Series([75, 82, 91, 68, 87])
print(f"Mean: {scores.mean():.1f}")
print(f"Max: {scores.max()}")
print(f"Min: {scores.min()}")
print("\nScores > 80:")
print(scores[scores > 80])
```
</details>

## 2. Series with Index Labels
Unlike lists, Series can have meaningful labels!

In [None]:
# Create a Series with custom index
cities = ['Perth', 'Sydney', 'Melbourne', 'Brisbane', 'Adelaide']
populations = [2.1, 5.3, 5.0, 2.6, 1.4]  # in millions

pop_series = pd.Series(populations, index=cities, name='Population (M)')
print(pop_series)
print(f"\nPerth population: {pop_series['Perth']}M")
print(f"\nCities over 3M:")
print(pop_series[pop_series > 3])

**Exercise 2 — Product Inventory (medium)**  
Create a Series for product inventory: iPhone:45, iPad:32, MacBook:18, AirPods:67.
Find products with stock < 40.


In [9]:
# Your turn
product = ["iPhone", "iPad", "Macbook", "Airpods"]
stock = [45, 32, 18, 67]

stock_series = pd.Series(stock, index=product, name="Stock (items)")
print(f"\nProduct inventory stock: \n{stock_series} items")
print(f"\nProduct with less than 40 items:")
print(stock_series[stock_series < 40])


Product inventory stock: 
iPhone     45
iPad       32
Macbook    18
Airpods    67
Name: Stock (items), dtype: int64 items

Product with less than 40 items:
iPad       32
Macbook    18
Name: Stock (items), dtype: int64


<details>
<summary><b>Solution</b></summary>

```python
inventory = pd.Series(
    {'iPhone': 45, 'iPad': 32, 'MacBook': 18, 'AirPods': 67},
    name='Stock Count'
)
print("Current Inventory:")
print(inventory)
print("\nLow stock items (< 40):")
print(inventory[inventory < 40])
```
</details>

## 3. Series Operations & Methods

In [None]:
# Mathematical operations work element-wise
prices = pd.Series([99.99, 149.99, 199.99, 79.99], 
                   index=['Basic', 'Standard', 'Premium', 'Student'])

# Apply 20% discount
discounted = prices * 0.8
print("Original prices:")
print(prices)
print("\nAfter 20% discount:")
print(discounted.round(2))

# Useful methods
print(f"\nPrice stats:")
print(f"Mean: ${prices.mean():.2f}")
print(f"Median: ${prices.median():.2f}")
print(f"Std Dev: ${prices.std():.2f}")

## 4. Handling Missing Data

In [None]:
# Real data often has missing values
sales = pd.Series([1200, None, 1450, 980, None, 1680],
                  index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat'])
print("Sales with missing data:")
print(sales)
print(f"\nCount of missing: {sales.isna().sum()}")
print(f"Mean (ignoring NaN): ${sales.mean():.2f}")

# Fill missing values
sales_filled = sales.fillna(sales.mean())
print("\nAfter filling with mean:")
print(sales_filled.round(2))

**Exercise 3 — Temperature Analysis (medium)**  
Given a week of temperatures with some missing values, fill them with the median and find days above average.


In [18]:
# Your turn
temps = pd.Series([22.5, None, 24.1, 23.8, None, 25.2, 21.9],
                  index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'])

print(f"Last week daily temp (degC): \n{temps}")
print(f"Missing temp data count: {temps.isna().sum()} days")

temp_fill = temps.fillna(temps.median())
print("\nWeek temp with filled median (degC):")
print(temp_fill.round(1))

print(f"\nWeekly average temp {temp_fill.mean().round(2)}")
print("Days with temp above average (degC):")
print(temp_fill.round(1)[temp_fill > temp_fill.mean()])

Last week daily temp (degC): 
Mon    22.5
Tue     NaN
Wed    24.1
Thu    23.8
Fri     NaN
Sat    25.2
Sun    21.9
dtype: float64
Missing temp data count: 2 days

Week temp with filled median (degC):
Mon    22.5
Tue    23.8
Wed    24.1
Thu    23.8
Fri    23.8
Sat    25.2
Sun    21.9
dtype: float64

Weekly average temp 23.59
Days with temp above average (degC):
Tue    23.8
Wed    24.1
Thu    23.8
Fri    23.8
Sat    25.2
dtype: float64


<details>
<summary><b>Solution</b></summary>

```python
temps = pd.Series([22.5, None, 24.1, 23.8, None, 25.2, 21.9],
                  index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'])
print("Original temperatures:")
print(temps)

# Fill with median
temps_filled = temps.fillna(temps.median())
print("\nFilled temperatures:")
print(temps_filled)

# Find above average days
avg_temp = temps_filled.mean()
print(f"\nAverage: {avg_temp:.1f}°C")
print("\nDays above average:")
print(temps_filled[temps_filled > avg_temp])
```
</details>

## 5. Series Alignment & Combining

In [None]:
# Pandas aligns by index automatically!
q1_sales = pd.Series({'Product_A': 100, 'Product_B': 150, 'Product_C': 200})
q2_sales = pd.Series({'Product_B': 180, 'Product_C': 220, 'Product_D': 90})

print("Q1 Sales:")
print(q1_sales)
print("\nQ2 Sales:")
print(q2_sales)

# Addition aligns by index
total_sales = q1_sales.add(q2_sales, fill_value=0)
print("\nTotal Sales (Q1 + Q2):")
print(total_sales)

**Exercise 4 — Revenue Calculator (hard)**  
Given prices and quantities sold, calculate total revenue per product and overall total.


In [19]:
# Your turn
prices = pd.Series({'Laptop': 1200, 'Mouse': 25, 'Keyboard': 80, 'Monitor': 350})
quantities = pd.Series({'Laptop': 5, 'Mouse': 45, 'Keyboard': 30, 'Webcam': 15})

revenue = prices * quantities
total_rev = revenue.sum()

print(f"Revenue for each product: \n{revenue}")
print("\nOverall total revenue:", total_rev)

Revenue for each product: 
Keyboard    2400.0
Laptop      6000.0
Monitor        NaN
Mouse       1125.0
Webcam         NaN
dtype: float64

Overall total revenue: 9525.0


<details>
<summary><b>Solution</b></summary>

```python
prices = pd.Series({'Laptop': 1200, 'Mouse': 25, 'Keyboard': 80, 'Monitor': 350})
quantities = pd.Series({'Laptop': 5, 'Mouse': 45, 'Keyboard': 30, 'Webcam': 15})

# Calculate revenue (handles mismatched indices)
revenue = prices * quantities
print("Revenue per product:")
print(revenue.dropna())  # Drop products we can't calculate

print(f"\nTotal revenue: ${revenue.sum():.2f}")
print(f"Best seller: {revenue.idxmax()} (${revenue.max():.2f})")
```
</details>

## 6. Mini-Challenges
- **M1 (easy):** Create a Series of 10 random numbers and find values > mean
- **M2 (medium):** Create a grade Series, convert letter grades to numeric (A=4, B=3, etc.)
- **M3 (hard):** Combine two Series with different indices and calculate percentage change

In [21]:
# Your turn - try the challenges!
#M1 challenge:
numbers = pd.Series(np.random.randint(0, 100, size=10))

mean_value = numbers.mean()
above_mean = numbers[numbers > mean_value]

print("Series of random numbers:")
print(numbers)
print("\nMean value:", mean_value)
print("\nValues greater than the mean:")
print(above_mean)

Series of random numbers:
0    79
1    50
2    68
3    57
4    89
5    16
6    17
7    19
8    72
9    84
dtype: int32

Mean value: 55.1

Values greater than the mean:
0    79
2    68
3    57
4    89
8    72
9    84
dtype: int32


<details>
<summary><b>Solutions</b></summary>

```python
# M1
random_series = pd.Series(np.random.randn(10))
print(random_series[random_series > random_series.mean()])

# M2
grades = pd.Series(['A', 'B', 'A', 'C', 'B', 'D'])
grade_map = {'A': 4, 'B': 3, 'C': 2, 'D': 1, 'F': 0}
numeric_grades = grades.map(grade_map)
print(f"GPA: {numeric_grades.mean():.2f}")

# M3
jan = pd.Series({'A': 100, 'B': 200, 'C': 150})
feb = pd.Series({'B': 220, 'C': 140, 'D': 80})
pct_change = ((feb - jan) / jan * 100).round(2)
print(pct_change.dropna())
```
</details>

## Wrap-Up & Next Steps
✅ You've mastered Series - the building block of DataFrames!  
✅ You can create, filter, and manipulate data efficiently  
✅ You understand index alignment and missing data handling  

**Next:** DataFrames - think of them as multiple Series combined into a table!
