<a href="https://colab.research.google.com/github/c-marq/CAP3321C-Data-Wrangling/blob/main/exercises/chapter-07/exercise_7_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exercise 7-1: Prepare the Forest Fires Data

**CAP3321C - Data Wrangling**

---

## Overview

This exercise will guide you through the process of preparing the Forest Fires data. You'll practice using lambda expressions, user-defined functions, working with indexes, and combining data.

**Group Members:**
- Name 1:
- Name 2:
- Name 3:
- Name 4:

---

## Read the Data

Run these cells to load the data.

In [None]:
import pandas as pd

In [None]:
# Download the data file from GitHub
!wget -q https://raw.githubusercontent.com/c-marq/CAP3321C-Data-Wrangling/main/data/fires_by_month.pkl
print("Data file downloaded successfully!")

In [None]:
# Load the fires data
fires_by_month = pd.read_pickle('fires_by_month.pkl')
print("Data shape:", fires_by_month.shape)

### Task 4: Display the First Five Rows (YOUR CODE)

Display the first five rows of the DataFrame.

**Expected output:** Columns include state, fire_year, fire_month, acres_burned, days_burning, fire_count

In [None]:
# YOUR CODE HERE - display the first 5 rows


---

## Part 1: Add and Modify Columns

Practice creating calculated columns using different methods.

### Task 5: Add Mean Acres per Day Column (YOUR CODE)

Add a column for the mean number of acres burned per day for each row. Name it `mean_acres_per_day`.

**Hint:** Divide `acres_burned` by `days_burning`.

**Example syntax:**
```python
df['new_col'] = df['col1'] / df['col2']
```

**Expected output:** New column added with calculated values

In [None]:
# YOUR CODE HERE - add mean_acres_per_day column


### Task 6: Add Column Using Lambda Expression (YOUR CODE)

Add a column for the mean number of acres burned per day for each row by applying a **lambda expression**. Use an if-else structure to handle the division by 0 error (when `days_burning` is 0).

Name this column `mean_acres_per_day_lambda`.

**Hint:** Use `apply()` with a lambda. The lambda should check if `days_burning` is 0 and return 0 in that case.

**Example syntax:**
```python
df['new_col'] = df.apply(lambda x: x.col1 / x.col2 if x.col2 != 0 else 0, axis=1)
```

**Expected output:** New column with same values as Task 5 (but handles division by zero)

In [None]:
# YOUR CODE HERE - add mean_acres_per_day_lambda column using lambda


### Task 7: Write a Function to Convert Month Number to Name (YOUR CODE)

Write a function that accepts a row and converts the numeric value in the `fire_month` column to a string value such as 'Jan', 'Feb', 'Mar', etc.

**Hint:** Create a dictionary mapping month numbers to abbreviations, then look up the value.

**Example syntax:**
```python
def convert_month(row):
    months = {1: 'Jan', 2: 'Feb', 3: 'Mar', ...}
    return months[row.fire_month]
```

**Expected output:** A function that can be applied to rows

In [None]:
# YOUR CODE HERE - write the convert_month function


### Task 8: Apply the Function to fire_month Column (YOUR CODE)

Apply the function you wrote in Task 7 to every row in the DataFrame. Store the result back in the `fire_month` column.

**Hint:** Use `apply()` with `axis=1` to apply to each row.

**Example syntax:**
```python
df['column'] = df.apply(function_name, axis=1)
```

**Expected output:** fire_month column now contains 'Jan', 'Feb', etc. instead of 1, 2, etc.

In [None]:
# YOUR CODE HERE - apply the function to the fire_month column


In [None]:
# Verify the conversion
fires_by_month.head()

---

## Part 2: Work with Indexes

Practice setting indexes and reshaping data with unstack.

### Task 9: Set a Multi-Level Index (YOUR CODE)

Set an index on the `state`, `fire_year`, and `fire_month` columns.

**Hint:** Use `set_index()` with a list of column names.

**Example syntax:**
```python
df = df.set_index(['col1', 'col2', 'col3'])
```

**Expected output:** DataFrame with a 3-level hierarchical index

In [None]:
# YOUR CODE HERE - set index on state, fire_year, fire_month


### Task 10: Unstack the fire_month Column (YOUR CODE)

Unstack the `fire_month` column and store the resulting DataFrame in a **different variable** named `fires_by_month_wide`.

**Hint:** Use `unstack()` with the column name or level to unstack.

**Example syntax:**
```python
df_wide = df.unstack('column_name')
```

**Expected output:** A wide DataFrame with months as columns

In [None]:
# YOUR CODE HERE - unstack fire_month into fires_by_month_wide


In [None]:
# View the wide format
fires_by_month_wide.head()

### Task 11: Reset the Index (YOUR CODE)

Reset the index for the `fires_by_month` DataFrame, but don't drop any columns. This should add a numeric index from 0 to 9299 to the DataFrame.

**Hint:** Use `reset_index()` without the `drop` parameter.

**Example syntax:**
```python
df = df.reset_index()
```

**Expected output:** DataFrame with state, fire_year, fire_month back as regular columns

In [None]:
# YOUR CODE HERE - reset the index for fires_by_month


---

## Part 3: Add a Row of Data

Practice combining DataFrames with concat.

### Task 12: Create New Fire Data (PRE-FILLED)

Run the cell that creates and displays the DataFrame named `new_fire`.

In [None]:
# PRE-FILLED: Create new fire data
new_fire = pd.DataFrame(
    data=[['CA', 2021, 'Jun', 1000, 100, 1, 10, 10]], 
    columns=fires_by_month.columns
)
new_fire

### Task 13: Add New Fire Data to Original DataFrame (YOUR CODE)

Add the row in the `new_fire` DataFrame to the end of the original DataFrame. When you do that, **don't use the `ignore_index` parameter**.

**Hint:** Use `pd.concat()` to combine DataFrames.

**Example syntax:**
```python
df = pd.concat([df, new_row_df])
```

**Expected output:** fires_by_month now has one more row

In [None]:
# YOUR CODE HERE - add new_fire to fires_by_month using concat


### Task 14: Display Last Five Rows (YOUR CODE)

Display the last five rows of the original DataFrame. The last row should include the new fire, but the index label for this row shouldn't be correct (it will be 0 instead of 9300).

**Expected output:** Last row shows CA 2021 data with index 0

In [None]:
# YOUR CODE HERE - display the last 5 rows


### Task 15: Reset Index and Drop Old Index (YOUR CODE)

Use the `reset_index()` command to reset the index, and use the `drop` parameter to drop the old numeric index, which is no longer needed.

**Hint:** Use `drop=True` to avoid keeping the old index as a column.

**Example syntax:**
```python
df = df.reset_index(drop=True)
```

**Expected output:** DataFrame with a clean sequential index 0 to 9300

In [None]:
# YOUR CODE HERE - reset index with drop=True


### Task 16: Display Last Five Rows Again (YOUR CODE)

Display the last five rows of the original DataFrame again. This time, the index for the last row should be correct (9300).

**Expected output:** Last row now has index 9300

In [None]:
# YOUR CODE HERE - display the last 5 rows again


---

## Part 4: Fix the SettingWithCopyWarning

Learn about a common pandas warning and how to fix it.

### Task 17: Run the Cell That Causes the Warning (PRE-FILLED)

Run the cell below. Note that it successfully rounds the values in the `mean_acres_per_day` column of the `fires_ak` DataFrame, but it also generates a `SettingWithCopyWarning`.

In [None]:
# PRE-FILLED: This cell causes a SettingWithCopyWarning
fires_ak = fires_by_month.query('state == "AK"')
fires_ak.mean_acres_per_day = fires_ak.mean_acres_per_day.round()
fires_ak.head()

### Task 18: Check the Original DataFrame (PRE-FILLED)

Run the cell that displays the `fires_by_month` DataFrame. Note that the values in the `mean_acres_per_day` column aren't rounded. This shows that you can ignore the warning if you want.

In [None]:
# PRE-FILLED: Check if original was affected
fires_by_month.head()

### Task 19: Fix the Warning with copy() (YOUR CODE)

Fix the warning by adding the `copy()` method to the statement that creates the `fires_ak` DataFrame.

**Hint:** Add `.copy()` after the query.

**Example syntax:**
```python
df_subset = df.query('condition').copy()
```

**Expected output:** No warning when running the code

In [None]:
# YOUR CODE HERE - fix the warning by using .copy()


---

## Summary

In this exercise, you practiced data preparation techniques:

**Adding and Modifying Columns:**
- Direct column calculations
- Lambda expressions with `apply()`
- User-defined functions with `apply()`

**Working with Indexes:**
- `set_index()` - Create hierarchical indexes
- `unstack()` - Reshape data from long to wide
- `reset_index()` - Convert index back to columns

**Combining Data:**
- `pd.concat()` - Add rows to a DataFrame
- `reset_index(drop=True)` - Fix index after concatenation

**Avoiding Warnings:**
- `copy()` - Create explicit copies to avoid SettingWithCopyWarning

---

**Submission:** Save this notebook and submit to Canvas before the deadline.