![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Callysto’s Weekly Data Visualization

## Wildfires

### Recommended Grade levels: 9-12
<br>

### Instructions

Click "Cell" and select "Run All".

This will import the data and run all the code, so you can see this week's data visualization. Scroll back to the top after you’ve run the cells.

![instructions](https://github.com/callysto/data-viz-of-the-week/blob/main/images/instructions.png?raw=true)

**You don't need to do any coding to view the visualizations**.

The plots generated in this notebook are interactive. You can hover over and click on elements to see more information. 

Email contact@callysto.ca if you experience issues.

### About this Notebook

Callysto's Weekly Data Visualization is a learning resource that aims to develop data literacy skills. We provide Grades 5-12 teachers and students with a data visualization, like a graph, to interpret. This companion resource walks learners through how the data visualization is created and interpreted by a data scientist. 

The steps of the data analysis process are listed below and applied to each weekly topic.

1. Question - What are we trying to answer?
2. Gather - Find the data source(s) you will need. 
3. Organize - Arrange the data, so that you can easily explore it. 
4. Explore - Examine the data to look for evidence to answer the question. This includes creating visualizations. 
5. Interpret - Describe what's happening in the data visualization. 
6. Communicate - Explain how the evidence answers the question. 

# Question

Recently in Canada, wildfires have become a *hot* topic. They are a significant natural disaster that poses a threatening risk to ecosystems, human lives, and the environment. Canada, known for its vast forests and diverse ecosystems, has experienced its fair share of devastating wildfires in recent years. These fires not only have a profound impact on Canada but also affect nearby countries and have broader implications for the climate. 

Wildfires in Canada have a significant impact on the environment, particularly in heavily forested regions with a lack of rain. The primary damage done by fires is the loss of trees, which are being essential carbon sinks that help in absorbing greenhouse gases and mitigating climate change. The loss of these forests due to wildfires can release a substantial amount of carbon dioxide back in the atmosphere, contributing to global warming. 

The consequences of Canadian wildfires beyond its borders, impacting nearby countries such as the United States has also reached global news. Due to the raging wildfires in northwest Quebec, New York City recorded some its worst air quality to date, being equivalent to smoking 25-30 cigarettes. This smoke continues to travel in the United States, damaging nearby cities with poor air quality.

### Goal

Our goal is look at the different causes of wildfires in Canada and discover trends, such as where historically wildfires stem from and the main factors that cause wildfires. From this gathered information, individuals can garner a more educated perspective on the main causes of wildfires in Canada. 

The datasets used in this notebook are taken from [National Forestry Database](http://nfdp.ccfm.org/en/data/fires.php#tab322), and contains information on Canadian wildfires events from 1990-2021.

# Gather

### Code: 

Run the code cells below to import the libraries we need for this project. Libraries are pre-made code that make it easier to analyze our data.

In [105]:
import pandas as pd
import plotly_express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import warnings
import matplotlib.pyplot as plt
warnings.simplefilter(action='ignore', category=FutureWarning)
print("Libaries imported")

Libaries imported


### Data

In this notebook, we will use 4 different datasets from [National Forestry Database](http://nfdp.ccfm.org/en/data/fires.php#tab322). To measure the main causes of wildfires, we'll be utilizing `areabygroup` and `numfiresbygroup` which measure the number of hectares of land that are damaged by wildfires and the number of fires started due to specific *causes* respectively. 

The datasets `areabyclass` and `numfiresbymonth` will also be used later in this notebook, and these datasets delve more into the sizes of fires and their impact on the environment and the number of fires recorded each month respectively. 

### Import the data

In [106]:
# Import data
areabygroup = pd.read_excel('https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/wildfires/areaburnedbygroup.xlsx', skiprows=1)
areabyclass = pd.read_excel('https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/wildfires/areaburnedfireclass.xlsx', skiprows=1)
numfiresbygroup = pd.read_excel('https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/wildfires/numfiresbygroup.xlsx')
numfiresbymonth = pd.read_excel('https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/wildfires/numfiresbymonth.xlsx', skiprows=1)

In [107]:
display(areabygroup.head())
display(areabyclass.head())
display(numfiresbygroup.head())
display(numfiresbymonth.head())

Unnamed: 0,Jurisdiction,Cause,Data Qualifier,1990,1991,1992,1993,1994,1995,1996,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Alberta,Human activity,a,2394.0,2461.0,2002.0,1490.0,2485.0,7887.0,655.0,...,7553.0,2101.0,1204.0,38618.0,1134.0,6834.0,1892.0,3026.0,2949.0,
1,Alberta,Lightning,a,55483.0,4118.0,1089.0,26170.0,94199.0,331046.0,14917.0,...,644115.0,48634.0,286625.0,801712.0,27284.0,116704.0,69845.0,164062.0,393.0,
2,Alberta,Prescribed burn,a,,,,,,,,...,2.0,167.0,201.0,2972.0,838.0,867.0,339.0,141.0,1250.0,
3,Alberta,Reburn,a,,,,,,,,...,,,,,,,,,,
4,Alberta,Unspecified,a,1009.0,135.0,471.0,32.0,43.0,3047.0,24.0,...,414.0,4830.0,19.0,570.0,485908.0,25364.0,12941.0,836602.0,206.0,


Unnamed: 0,Jurisdiction,Fire size class,Data Qualifier,1990,1991,1992,1993,1994,1995,1996,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Alberta,0.11 - 1.0 ha,a,171.0,123.0,131.0,135.0,148.0,118.0,59.0,...,137.0,85.0,122.0,183.0,142.0,102.0,117.0,92.0,43.0,
1,Alberta,1.1 - 10 ha,a,504.0,499.0,457.0,595.0,579.0,627.0,248.0,...,388.0,332.0,443.0,713.0,449.0,307.0,472.0,375.0,144.0,
2,Alberta,10.1 - 100 ha,a,1768.0,1575.0,1179.0,1986.0,1721.0,1644.0,497.0,...,1249.0,2043.0,1461.0,3240.0,1875.0,1905.0,1740.0,1447.0,427.0,
3,Alberta,100.1 - 1 000 ha,a,8179.0,2911.0,1731.0,6413.0,6407.0,3271.0,1207.0,...,7758.0,6435.0,3007.0,23604.0,5910.0,7966.0,4050.0,9659.0,648.0,
4,Alberta,Up to 0.1 ha,a,74.0,47.0,63.0,35.0,22.0,17.0,9.0,...,28.0,20.0,27.0,36.0,27.0,25.0,28.0,20.0,15.0,


Unnamed: 0,Jurisdiction,Cause,Data Qualifier,1990,1991,1992,1993,1994,1995,1996,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Alberta,Human activity,a,379.0,433.0,398.0,306.0,361.0,414.0,143.0,...,1133.0,892.0,865.0,1033.0,878.0,834.0,782.0,728.0,681.0,
1,Alberta,Lightning,a,971.0,484.0,631.0,547.0,551.0,366.0,239.0,...,470.0,337.0,609.0,829.0,537.0,448.0,535.0,318.0,89.0,
2,Alberta,Prescribed burn,a,,,,,,,,...,1.0,7.0,3.0,7.0,7.0,3.0,2.0,2.0,2.0,
3,Alberta,Reburn,a,,,,,,,,...,,,,,,,,,,
4,Alberta,Unspecified,a,16.0,27.0,32.0,23.0,19.0,28.0,16.0,...,42.0,43.0,29.0,83.0,63.0,45.0,63.0,33.0,28.0,


Unnamed: 0,Jurisdiction,Month,Data Qualifier,1990,1991,1992,1993,1994,1995,1996,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Alberta,January,a,1.0,9.0,14.0,8.0,4.0,10.0,1.0,...,14.0,1.0,9.0,2.0,7.0,4.0,4.0,17.0,1.0,
1,Alberta,February,a,5.0,4.0,11.0,12.0,,6.0,,...,3.0,3.0,,4.0,12.0,4.0,,1.0,1.0,
2,Alberta,March,a,8.0,8.0,61.0,29.0,12.0,29.0,1.0,...,32.0,9.0,11.0,32.0,62.0,14.0,19.0,34.0,4.0,
3,Alberta,April,a,26.0,111.0,110.0,52.0,64.0,57.0,26.0,...,123.0,37.0,81.0,277.0,287.0,101.0,97.0,188.0,68.0,
4,Alberta,May,a,114.0,201.0,91.0,242.0,132.0,215.0,56.0,...,456.0,529.0,360.0,505.0,250.0,348.0,454.0,315.0,209.0,


### Comment on the data
Notice that in our datasets we have some repetitious columns. The first column is `Jurisdiction`, and it refers to the province in which the wildfire occurred. The column `Data Qualifier` is not important for our sake and can be ignored. Then there are columns `1990`-`2021` referring to either the quantity of area burned in hectares or the number of wildfires that occurred in that year. 

The columns that are different in each dataset are `Cause` referring to how the wildfire was started, `Fire size class` referring to the size of wildfire, and `Month` referring to the month that a particular number of wildfires started in.

# Organize
Let's organize our data by organizing it into relevant information. In regard to our `areabygroup` and `numfiresbygroup` dataset, let's sum the different causes in the `Cause` column together to identify common causes of wildfires. For our `areabyclass` dataset, let's remove any classes that are noted as *Unspecified* as we want to categorize our data without any external outliers. 

Let's also organize a pleasant colour scheme which most our visualizations will use throughout the notebook.

Note: We'll be constantly organizing our data to make it useful to the visualization at hand.

In [None]:
areabygroup_causes = areabygroup.groupby(['Cause']).sum().reset_index()
numfiresbygroup_causes = numfiresbygroup.groupby(['Cause']).sum().reset_index()
areabyclass = areabyclass[areabyclass['Fire size class'] != 'Unspecified']

In [108]:
years = list(range(1990, 2022))

# Generate a diverging color scheme from red to blue
color_scheme = plt.cm.get_cmap('RdYlBu', len(years))

color_map = {}

for i, year in enumerate(years):
    color = color_scheme(i)
    hex_color = '#{:02x}{:02x}{:02x}'.format(int(color[0]*255), int(color[1]*255), int(color[2]*255))
    color_map[year] = hex_color
print(color_map)

{1990: '#a50026', 1991: '#b50f26', 1992: '#c51e26', 1993: '#d52e26', 1994: '#df412f', 1995: '#e85538', 1996: '#f26941', 1997: '#f67d4a', 1998: '#f99254', 1999: '#fca75e', 2000: '#fdb96b', 2001: '#fdc97a', 2002: '#fdd989', 2003: '#fee599', 2004: '#fef0a8', 2005: '#fefab7', 2006: '#fafdc8', 2007: '#f0f9da', 2008: '#e6f5ec', 2009: '#d9eff6', 2010: '#c8e7f1', 2011: '#b6deec', 2012: '#a5d4e6', 2013: '#93c6de', 2014: '#82b8d7', 2015: '#70a9cf', 2016: '#6197c5', 2017: '#5285bc', 2018: '#4472b3', 2019: '#3d5ea9', 2020: '#374a9f', 2021: '#313695'}


# Explore

### Visualizations on Causations of Wildfires

With our new datasets `areabygroup_causes` and `numfiresbygroup_causes`, let's visualize the trends from 1990-2021 and notice if there are any apparent differences between the figures.

Clarifications on the `Cause` of wildfires can be found below.

| **Cause** | **Description** |
| :- |:------------- | 
| Human Activity | Wildfires caused by human activity refer to fires ignited as a result of human actions. These can include accidental causes like discarded cigarettes, unattended campfires, sparks from equipment or machinery, power line failures, or intentional acts of arson. Human-caused wildfires can occur in both urban and rural areas and can have various impacts depending on the location, climate, and fuel availability. |
| Lightning | Lightning-caused wildfires originate from natural electrical discharges during dry thunderstorms. Lightning strikes can ignite fires when they come into contact with dry vegetation, especially in regions with dry and windy conditions. These wildfires are a natural occurrence and can be prevalent in remote or forested areas where lightning strikes are more likely to occur. |
| Prescribed Burn | A prescribed burn, also known as a controlled burn or prescribed fire, is a planned fire intentionally set by fire management authorities. These burns are carefully controlled and conducted under specific conditions to reduce the risk of uncontrolled wildfires. Prescribed burns are often used for ecological purposes, such as promoting forest health, reducing fuel loads, managing vegetation, or restoring natural fire regimes. |
| Reburn | Reburn refers to a situation where a previously burned area reignites due to residual heat or smoldering embers from a previous wildfire. Even after a fire has been extinguished, heat can remain trapped within logs, root systems, or organic material. Under certain conditions, such as dry and windy weather, these residual heat sources can reignite, leading to a new fire within the previously burned area. Reburns can be challenging to detect and control, as they can occur unexpectedly and often in areas where firefighting resources have been recently deployed.  |
| Unspecified | Causes where the source of wildfire is unknown or unspecified. Mainly due to a lack of evidence. |

In [109]:
cause_figs = make_subplots(rows=2, cols=1, subplot_titles=['Area Burned from Wildfires by Cause', 'Number of Fires by Cause'])

for i in range(1990,2022):

    cause_figs.add_bar(x=areabygroup_causes['Cause'], y=areabygroup_causes[i], name=i, marker=dict(color=color_map[i]), row=1, col=1)
    cause_figs.add_bar(x=numfiresbygroup_causes['Cause'], y=numfiresbygroup_causes[i], name=i, marker=dict(color=color_map[i]), row=2, col=1)
    
cause_figs.update_layout(height=800, width=1780)
cause_figs.update_xaxes(title_text='Cause', row=1, col=1)
cause_figs.update_xaxes(title_text='Cause', row=2, col=1)
cause_figs.update_yaxes(title_text='Area Burned (hectares)', row=1, col=1)
cause_figs.update_yaxes(title_text='Number of Fires', row=2, col=1).show()

In terms of general trends found in the visualizations, starting with the figure *Area Burned from Wildfires by Cause*, `Lightning` appears to be the main factor in terms of area burned by wildfires. In comparison with the *Number of Figures by Cause* figure, `Human Activity` and `Lightning` are clear front-runners in terms of number of fires started. Something that also stands out is despite the frequency of fires started by `Human Activity`, there seems to be a drastic fall-off in terms of area burned. Furthermore, due to the sheer scale of `Human Activity` and `Lightning`, columns like `Prescribed burn` and `Reburn` become hard to see.

Some particular reasons why lightning may generate more hectares burned compared to human activity could be:
1. Location: The locations of where wildfires begin can influence their behavior and the extent of the burned area. Human-caused fires often occur in more populated areas, where vegetation may be managed or more fragmented, leading to smaller, localized fires. In contrast, lightning-caused fires can occur in remote and inaccessible areas with abundant and continuous fuel, allowing the fire to spread over larger areas.
   
2. Natural Factors: Lightning-caused fires are often associated with thunderstorms, which can occur in regions with favorable atmospheric conditions for fire spread, such as dry and windy conditions. Lightning strikes during these conditions can ignite wildfires that are more likely to spread rapidly and cover extensive areas.
   
3. Human Intervention: Like mentioned earlier, many human activity related fires begin in populated areas. As a result, oftentimes there is prompt response and intervention in order to mitigate the spread of the fire. Well-established firefighting infrastructure and strategies, such as firebreaks, water sources, and aerial resources, also significantly mitigate the impact of human-caused fires.

Due to sheer scale of wildfires started by `Human Activity` and `Lightning`, let's only look at `Prescribed Burns` and `Reburns` in order to visualize any noticeable trends.

In [110]:
burn_reburn = areabygroup_causes.iloc[:-1, :]
burn_reburn = burn_reburn.tail(2)
num_burn_reburn = numfiresbygroup_causes.iloc[:-1, :]
num_burn_reburn = num_burn_reburn.tail(2)

burns = make_subplots(rows=2, cols=1, subplot_titles=['Area Burned from Wildfires for Prescribed Burns and Reburns', 'Number of Fires caused by Prescribed Burns and Reburns'])

for i in range(1990,2022):

    burns.add_bar(x=burn_reburn['Cause'], y=burn_reburn[i], name=i, marker=dict(color=color_map[i]), row=1, col=1)
    burns.add_bar(x=num_burn_reburn['Cause'], y=num_burn_reburn[i], name=i, marker=dict(color=color_map[i]), row=2, col=1)

burns.update_layout(height=800, width=1780)
burns.update_xaxes(title_text='Cause', row=1, col=1)
burns.update_xaxes(title_text='Cause', row=2, col=1)
burns.update_yaxes(title_text='Area Burned (hectares)', row=1, col=1)
burns.update_yaxes(title_text='Number of Fires', row=2, col=1).show()

In general, it seems that there are consistent trends in terms of the number of fires started by `Prescribed Burns` and `Reburns` and areas burned from wildfires. However, it should be noted that there does not seem to be any recorded cases of area burned for `Reburns`.

Note: One particular reason why there are no recorded cases of areas burned due to reburns could be that reburns are often very uncommon. When they do occur, individuals may not have the tools available to note how many hectares of land were burnt, and as a result put down a **None**/**NaN** for it's recorded value. 

Next, let's visualize which causes are most frequent in particular provinces and identify any reasons why this could be.

In [111]:
areabygroup['Sum']= areabygroup.iloc[:,4:35].sum(axis=1)
summationareabygrp = go.Figure()

unique_causes = areabygroup['Cause'].unique()
colors = ['blue', 'red', 'green', 'orange', 'purple']  

cause_color_map = {cause: color for cause, color in zip(unique_causes, colors)}

# Create a list of colors based on the cause of each bar
bar_colors = [cause_color_map[cause] for cause in areabygroup['Cause']]
summationareabygrp.add_bar(x=[areabygroup['Cause'],areabygroup['Jurisdiction']], y=areabygroup['Sum'],marker=dict(color=bar_colors))
summationareabygrp.update_layout(yaxis_title="Area Burned (hectares)")

In [112]:
provincal_fires = areabygroup.groupby(['Jurisdiction', 'Cause']).sum().reset_index()

# Change this value to change the year
# For example, changing 2020 to 2019 will show Area Burned based on Causes in 2019
year_to_check = 2020

provincal_fires_fig = px.bar(provincal_fires, x='Jurisdiction', y=year_to_check, color='Cause', title='Area Burned from Wildfires based on Province').update_layout(yaxis_title="Area Burned (hectares)").show()

In [113]:
columns_to_check = range(1990, 2022) 

max_values = {}
corresponding_rows = {}

for column in columns_to_check:
    if column in areabyclass.columns:
        max_values[column] = areabyclass[column].max()
        corresponding_rows[column] = areabyclass.loc[areabyclass[column] == max_values[column]]

# Find the maximum value for each year
max_values = areabyclass.loc[:, 1990:2021].max()

# Find the corresponding rows with the maximum values
corresponding_rows = areabyclass.loc[areabyclass.isin(max_values.values).any(axis=1)]
corresponding_rows = corresponding_rows.reset_index()
corresponding_rows = corresponding_rows.melt(id_vars='index', value_vars=list(max_values.index),
                                             var_name='Year', value_name='MaxValue')

# Merge with areabyclass to get Jurisdiction and Fire size class
corresponding_rows = pd.merge(corresponding_rows, areabyclass[['Jurisdiction', 'Fire size class']], left_on='index', right_index=True)

corresponding_rows = corresponding_rows[['Year', 'MaxValue', 'Jurisdiction', 'Fire size class']]
corresponding_rows_1 = corresponding_rows.query("`Fire size class` == 'Up to 0.1 ha' | `Fire size class` == '0.11 - 1.0 ha' | `Fire size class` == '1.1 - 10 ha'").reset_index()
corresponding_rows_2 = corresponding_rows.query("`Fire size class` == '10.1 - 100 ha' | `Fire size class` == '100.1 - 1 000 ha' | `Fire size class` == '1000.1 - 10 000 ha'").reset_index()
corresponding_rows_3 = corresponding_rows.query("`Fire size class` == '10 000.1 - 100 000 ha' | `Fire size class` == 'Over 100 000 ha'").reset_index()
all_dataframes = [corresponding_rows_1, corresponding_rows_2, corresponding_rows_3]

In [114]:
for i in all_dataframes:
    all_figs = px.scatter(i, x='Year', y='MaxValue',
                    hover_data={'Year': True, 'Jurisdiction': True, 'Fire size class': True}, color='Fire size class')

    all_figs.update_layout(xaxis_title='Year', yaxis_title='Highest Value (in Hectares)').show() 

In [115]:
# Get unique jurisdictions excluding "Parks Canada"
jurisdictions = numfiresbymonth[numfiresbymonth['Jurisdiction'] != 'Parks Canada']['Jurisdiction'].unique()

fig = make_subplots(rows=12, cols=1, subplot_titles=jurisdictions)

for i, jurisdiction in enumerate(jurisdictions, start=1):
    # Filter data for the current jurisdiction
    jurisdiction_data = numfiresbymonth[numfiresbymonth['Jurisdiction'] == jurisdiction]
    
    for year in range(1990, 2022):
        # Check if "Unspecified" month exists in the data
        if "Unspecified" in jurisdiction_data['Month'].values:
            jurisdiction_data_filtered = jurisdiction_data[jurisdiction_data['Month'] != "Unspecified"]
        else:
            jurisdiction_data_filtered = jurisdiction_data
        
        fig.add_trace(
            go.Bar(x=jurisdiction_data_filtered['Month'], y=jurisdiction_data_filtered[year], name=str(year)),
            row=i, col=1
        )

fig.update_layout(height=1800, width=1780, showlegend=False, title_text="Trends of Number of Wildfires each Month from 1990-2021 by Province").show()