<a href="https://colab.research.google.com/github/c-marq/CAP3321C-Data-Wrangling/blob/main/exercises/chapter-08/exercise_8_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exercise 8-1: Analyze the Forest Fires Data

**CAP3321C - Data Wrangling**

---

## Overview

This exercise guides you through the process of analyzing the Forest Fires data. You'll practice grouping and aggregating data, using pivot tables, and working with bins.

**Group Members:**
- Name 1:
- Name 2:
- Name 3:
- Name 4:

---

## Read the Data

In [None]:
import pandas as pd

In [None]:
# Download the data file from GitHub
!wget -q https://raw.githubusercontent.com/c-marq/CAP3321C-Data-Wrangling/main/data/fires_by_month.pkl
print("Data file downloaded successfully!")

In [None]:
# Load the fires data
fires_by_month = pd.read_pickle('fires_by_month.pkl')
print("Data shape:", fires_by_month.shape)

### Task 4: Display the First Five Rows (YOUR CODE)

Display the first five rows of the DataFrame.

**Expected output:** Columns include state, fire_year, fire_month, acres_burned, days_burning, fire_count

In [None]:
# YOUR CODE HERE - display the first 5 rows


---

## Part 1: Group and Aggregate the Data

Practice using groupby() to aggregate data.

### Task 5: Group the Data by State and Year (YOUR CODE)

Group the data by `state` and `fire_year` and assign it to a variable called `fires_grouped`.

**Hint:** Use `groupby()` with a list of column names.

**Example syntax:**
```python
grouped = df.groupby(['col1', 'col2'])
```

**Expected output:** A DataFrameGroupBy object

In [None]:
# YOUR CODE HERE - group data by state and fire_year


### Task 6: Sum the Grouped Data (YOUR CODE)

Sum the grouped data and assign the DataFrame that's returned to the variable `fires_by_year`.

**Hint:** Use the `sum()` method on the grouped object. Add `numeric_only=True` to avoid warnings.

**Example syntax:**
```python
df_summed = grouped.sum(numeric_only=True)
```

**Expected output:** DataFrame with summed values for each state/year combination

In [None]:
# YOUR CODE HERE - sum the grouped data into fires_by_year


### Task 7: Drop the fire_month Column (YOUR CODE)

Drop the `fire_month` column because it doesn't make sense anymore (we summed across all months).

**Hint:** Use `drop(columns=[...])` and reassign to `fires_by_year`.

**Expected output:** fires_by_year without the fire_month column

In [None]:
# YOUR CODE HERE - drop the fire_month column


In [None]:
# Verify
fires_by_year.head()

---

## Part 2: Use Pivot Tables

Practice reshaping data with pivot() and pivot_table().

### Task 8: Select Recent Data (YOUR CODE)

Use the `query()` method to select all the data for the years **2013 and later**. Then, reset the index for the DataFrame that's returned and assign the DataFrame to a variable named `fires_recent`.

**Hint:** 
- Use `query('fire_year >= 2013')`
- Chain `.reset_index()` after the query

**Example syntax:**
```python
df_filtered = df.query('column >= value').reset_index()
```

**Expected output:** DataFrame with only years 2013-2016

In [None]:
# YOUR CODE HERE - filter for 2013+ and reset index


### Task 9: Use pivot() Method (YOUR CODE)

Use the `pivot()` method to pivot the data so:
- The `state` column provides the values for the **row labels**
- The `fire_year` column provides the values for the **column labels**
- The `acres_burned` column provides the **data** for the table

**Hint:** Use `pivot(index=, columns=, values=)`.

**Example syntax:**
```python
df.pivot(index='row_col', columns='col_col', values='data_col')
```

**Expected output:** Wide DataFrame with states as rows, years as columns, acres_burned as values

In [None]:
# YOUR CODE HERE - pivot the fires_recent data


### Task 10: Use pivot_table() Method (YOUR CODE)

Use the `pivot_table()` method with the **fires_by_month** DataFrame to get the same result as the previous step. Note how this saves you several steps (no need to group, sum, filter first).

**Hint:** 
- Use `pivot_table()` on the original `fires_by_month` DataFrame
- Add `aggfunc='sum'` to aggregate
- Filter for years >= 2013 in the query first, or use the full data and filter columns after

**Example syntax:**
```python
df.query('fire_year >= 2013').pivot_table(
    index='row_col', 
    columns='col_col', 
    values='data_col',
    aggfunc='sum'
)
```

**Expected output:** Same result as Task 9

In [None]:
# YOUR CODE HERE - use pivot_table on fires_by_month


---

## Part 3: Work with Bins

Practice using cut() to bin continuous data into categories.

### Task 11: Reset the Index for fires_by_year (YOUR CODE)

Reset the index for the DataFrame named `fires_by_year`.

**Expected output:** fires_by_year with state and fire_year as regular columns

In [None]:
# YOUR CODE HERE - reset index for fires_by_year


### Task 12: Bin the Rows by Decade (YOUR CODE)

Use the `cut()` method to bin the rows by decade and store the results in a new column named `decade`.

**Hint:** 
- The data spans 1992-2016, so bins should be: 1990, 2000, 2010, 2020
- Use `labels` parameter to name the bins: '1990s', '2000s', '2010s'

**Example syntax:**
```python
df['decade'] = pd.cut(df['fire_year'], 
                       bins=[1990, 2000, 2010, 2020],
                       labels=['1990s', '2000s', '2010s'])
```

**Expected output:** New 'decade' column with categorical values

In [None]:
# YOUR CODE HERE - bin by decade using cut()


### Task 13: Double-Check the Edge Values (YOUR CODE)

Double-check the values on the edge of each bin to make sure that they are binned properly. To do that, display the first 25 or so rows of the DataFrame.

**Hint:** Look for years like 1999, 2000, 2009, 2010 to verify binning.

**Expected output:** First 25 rows showing correct decade assignments

In [None]:
# YOUR CODE HERE - display first 25 rows to verify binning


### Task 14: Drop fire_year and Create fires_by_decade (YOUR CODE)

Drop the `fire_year` column and assign the DataFrame that's returned to a variable named `fires_by_decade`.

**Expected output:** New DataFrame without fire_year column

In [None]:
# YOUR CODE HERE - drop fire_year into fires_by_decade


### Task 15: Group by State and Decade (YOUR CODE)

Group the DataFrame by the `state` and `decade` columns and sum the data.

**Expected output:** Aggregated data by state and decade

In [None]:
# YOUR CODE HERE - group by state and decade, then sum


---

## Summary

In this exercise, you practiced data analysis techniques:

**Grouping and Aggregating:**
- `groupby()` - Group data by one or more columns
- `sum()` - Aggregate with sum

**Pivot Tables:**
- `pivot()` - Reshape data (requires pre-aggregated data)
- `pivot_table()` - Reshape and aggregate in one step

**Binning:**
- `pd.cut()` - Bin continuous data into categories
- Use `labels` parameter to name the bins

---

**Submission:** Save this notebook and submit to Canvas before the deadline.