<a href="https://colab.research.google.com/github/c-marq/CAP3321C-Data-Wrangling/blob/main/exercises/chapter-07/exercise_7_1_solution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exercise 7-1: Prepare the Forest Fires Data

## üîë INSTRUCTOR SOLUTION KEY

**CAP3321C - Data Wrangling**

---

## Read the Data

In [None]:
import pandas as pd

In [None]:
# Download the data file from GitHub
!wget -q https://raw.githubusercontent.com/c-marq/CAP3321C-Data-Wrangling/main/data/fires_by_month.pkl
print("Data file downloaded successfully!")

In [None]:
# Load the fires data
fires_by_month = pd.read_pickle('fires_by_month.pkl')
print("Data shape:", fires_by_month.shape)

### Task 4: Display the First Five Rows

In [None]:
# ‚úÖ SOLUTION
fires_by_month.head()

---

## Part 1: Add and Modify Columns

### Task 5: Add Mean Acres per Day Column

In [None]:
# ‚úÖ SOLUTION
fires_by_month['mean_acres_per_day'] = fires_by_month['acres_burned'] / fires_by_month['days_burning']
fires_by_month.head()

#### üìù Instructor Notes - Task 5

**Key Teaching Points:**
- Direct column math is the simplest approach
- Division by zero will produce `inf` (infinity) values
- This is why Task 6 uses a lambda with error handling

**Common Student Errors:**
- Forgetting to assign back to the DataFrame
- Column name typos

### Task 6: Add Column Using Lambda Expression

In [None]:
# ‚úÖ SOLUTION
fires_by_month['mean_acres_per_day_lambda'] = fires_by_month.apply(
    lambda x: x.acres_burned / x.days_burning if x.days_burning != 0 else 0, 
    axis=1
)
fires_by_month.head()

#### üìù Instructor Notes - Task 6

**Key Teaching Points:**
- Lambda syntax: `lambda x: expression`
- `axis=1` means apply to each row (row-wise)
- Ternary operator: `value_if_true if condition else value_if_false`
- This handles division by zero gracefully

**Acceptable Variations:**
```python
# Using x['column'] syntax instead of x.column
lambda x: x['acres_burned'] / x['days_burning'] if x['days_burning'] != 0 else 0
```

**Common Student Errors:**
- Forgetting `axis=1`
- Wrong ternary operator syntax
- Using `== 0` instead of `!= 0` (logic reversed)

### Task 7: Write a Function to Convert Month Number to Name

In [None]:
# ‚úÖ SOLUTION
def convert_month(row):
    months = {
        1: 'Jan', 2: 'Feb', 3: 'Mar', 4: 'Apr',
        5: 'May', 6: 'Jun', 7: 'Jul', 8: 'Aug',
        9: 'Sep', 10: 'Oct', 11: 'Nov', 12: 'Dec'
    }
    return months[row.fire_month]

#### üìù Instructor Notes - Task 7

**Key Teaching Points:**
- Function takes a row as parameter
- Access row values with `row.column_name` or `row['column_name']`
- Dictionary lookup is clean and readable

**Acceptable Variations:**
```python
# Using a list instead of dict
def convert_month(row):
    months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
              'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
    return months[row.fire_month - 1]  # -1 because lists are 0-indexed

# Using calendar module
import calendar
def convert_month(row):
    return calendar.month_abbr[row.fire_month]
```

### Task 8: Apply the Function to fire_month Column

In [None]:
# ‚úÖ SOLUTION
fires_by_month['fire_month'] = fires_by_month.apply(convert_month, axis=1)

In [None]:
# Verify the conversion
fires_by_month.head()

#### üìù Instructor Notes - Task 8

**Key Teaching Points:**
- `apply(function_name, axis=1)` - no parentheses after function name
- This overwrites the original column
- Could also create a new column instead

**Common Student Errors:**
- Using `convert_month()` instead of `convert_month` (don't call the function)
- Forgetting `axis=1`

---

## Part 2: Work with Indexes

### Task 9: Set a Multi-Level Index

In [None]:
# ‚úÖ SOLUTION
fires_by_month = fires_by_month.set_index(['state', 'fire_year', 'fire_month'])
fires_by_month.head()

#### üìù Instructor Notes - Task 9

**Key Teaching Points:**
- Multi-level (hierarchical) indexes enable powerful grouping and selection
- Order of columns in the list matters
- Must reassign or use `inplace=True`

### Task 10: Unstack the fire_month Column

In [None]:
# ‚úÖ SOLUTION
fires_by_month_wide = fires_by_month.unstack('fire_month')

In [None]:
# View the wide format
fires_by_month_wide.head()

#### üìù Instructor Notes - Task 10

**Key Teaching Points:**
- `unstack()` pivots an index level to become columns
- Creates hierarchical column index (metric, month)
- NaN appears where data doesn't exist for that combination
- Important: store in NEW variable, don't overwrite fires_by_month

### Task 11: Reset the Index

In [None]:
# ‚úÖ SOLUTION
fires_by_month = fires_by_month.reset_index()
fires_by_month.head()

#### üìù Instructor Notes - Task 11

**Key Teaching Points:**
- `reset_index()` moves index levels back to columns
- Without `drop=True`, all index levels become columns
- Creates a new numeric index 0, 1, 2, ...

---

## Part 3: Add a Row of Data

### Task 12: Create New Fire Data (PRE-FILLED)

In [None]:
# PRE-FILLED: Create new fire data
new_fire = pd.DataFrame(
    data=[['CA', 2021, 'Jun', 1000, 100, 1, 10, 10]], 
    columns=fires_by_month.columns
)
new_fire

### Task 13: Add New Fire Data to Original DataFrame

In [None]:
# ‚úÖ SOLUTION
fires_by_month = pd.concat([fires_by_month, new_fire])

#### üìù Instructor Notes - Task 13

**Key Teaching Points:**
- `pd.concat()` combines DataFrames vertically (by default)
- Without `ignore_index=True`, the new row keeps its original index (0)
- This creates a duplicate index issue (two rows with index 0)

### Task 14: Display Last Five Rows

In [None]:
# ‚úÖ SOLUTION
fires_by_month.tail()

# Note: The last row has index 0, not 9300

### Task 15: Reset Index and Drop Old Index

In [None]:
# ‚úÖ SOLUTION
fires_by_month = fires_by_month.reset_index(drop=True)

#### üìù Instructor Notes - Task 15

**Key Teaching Points:**
- `drop=True` discards the old index instead of making it a column
- Creates a clean sequential index
- Alternative: use `ignore_index=True` in `concat()` from the start

### Task 16: Display Last Five Rows Again

In [None]:
# ‚úÖ SOLUTION
fires_by_month.tail()

# Note: The last row now has index 9300

---

## Part 4: Fix the SettingWithCopyWarning

### Task 17: Run the Cell That Causes the Warning (PRE-FILLED)

In [None]:
# PRE-FILLED: This cell causes a SettingWithCopyWarning
fires_ak = fires_by_month.query('state == "AK"')
fires_ak.mean_acres_per_day = fires_ak.mean_acres_per_day.round()
fires_ak.head()

### Task 18: Check the Original DataFrame (PRE-FILLED)

In [None]:
# PRE-FILLED: Check if original was affected
fires_by_month.head()

# Note: The original DataFrame's mean_acres_per_day column is NOT rounded
# This shows the warning can sometimes be safely ignored

### Task 19: Fix the Warning with copy()

In [None]:
# ‚úÖ SOLUTION
fires_ak = fires_by_month.query('state == "AK"').copy()
fires_ak.mean_acres_per_day = fires_ak.mean_acres_per_day.round()
fires_ak.head()

# No warning this time!

#### üìù Instructor Notes - Task 19

**Key Teaching Points:**
- The warning occurs because pandas isn't sure if you're modifying a view or a copy
- `.copy()` explicitly creates a copy, removing ambiguity
- Best practice: always use `.copy()` when you plan to modify a filtered DataFrame

**When to care about this warning:**
- If you WANT to modify the original: use `.loc[]` or `.iloc[]`
- If you DON'T want to modify the original: use `.copy()`
- If you're just reading, you can ignore the warning

---

## Summary

In this exercise, you practiced data preparation techniques:

**Adding and Modifying Columns:**
- Direct column calculations
- Lambda expressions with `apply()`
- User-defined functions with `apply()`

**Working with Indexes:**
- `set_index()` - Create hierarchical indexes
- `unstack()` - Reshape data from long to wide
- `reset_index()` - Convert index back to columns

**Combining Data:**
- `pd.concat()` - Add rows to a DataFrame
- `reset_index(drop=True)` - Fix index after concatenation

**Avoiding Warnings:**
- `copy()` - Create explicit copies to avoid SettingWithCopyWarning