![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Callysto’s Weekly Data Visualization

## Wildfires

### Recommended Grade levels: 5-9
<br>

### Instructions

Click "Cell" and select "Run All".

This will import the data and run all the code, so you can see this week's data visualization. Scroll back to the top after you’ve run the cells.

![instructions](https://github.com/callysto/data-viz-of-the-week/blob/main/images/instructions.png?raw=true)

**You don't need to do any coding to view the visualizations**.

The plots generated in this notebook are interactive. You can hover over and click on elements to see more information. 

Email contact@callysto.ca if you experience issues.

### About this Notebook

Callysto's Weekly Data Visualization is a learning resource that aims to develop data literacy skills. We provide Grades 5-12 teachers and students with a data visualization, like a graph, to interpret. This companion resource walks learners through how the data visualization is created and interpreted by a data scientist. 

The steps of the data analysis process are listed below and applied to each weekly topic.

1. Question - What are we trying to answer?
2. Gather - Find the data source(s) you will need. 
3. Organize - Arrange the data, so that you can easily explore it. 
4. Explore - Examine the data to look for evidence to answer the question. This includes creating visualizations. 
5. Interpret - Describe what's happening in the data visualization. 
6. Communicate - Explain how the evidence answers the question. 

# Question

Recently in Canada, wildfires have become a *hot* topic. They are a significant natural disaster that poses a threatening risk to ecosystems, human lives, and the environment. Canada, known for its vast forests and diverse ecosystems, has experienced its fair share of devastating wildfires in recent years. These fires not only have a profound impact on Canada but also affect nearby countries and have broader implications for the climate. 

Wildfires in Canada have a significant impact on the environment, particularly in heavily forested regions with a lack of rain. The primary damage done by fires is the loss of trees, which are being essential carbon sinks that help in absorbing greenhouse gases and mitigating climate change. The loss of these forests due to wildfires can release a substantial amount of carbon dioxide back in the atmosphere, contributing to global warming. 

The consequences of Canadian wildfires beyond its borders, impacting nearby countries such as the United States has also reached global news. Due to the raging wildfires in northwest Quebec, New York City recorded some its worst air quality to date, being equivalent to smoking 25-30 cigarettes. This smoke continues to travel in the United States, damaging nearby cities with poor air quality.

### Goal

Our goal is look at the different causes of wildfires in Canada and discover trends, such as where historically wildfires stem from and the main factors that cause wildfires. From this gathered information, individuals can garner a more educated perspective on the main causes of wildfires in Canada. 

The datasets used in this notebook are taken from [National Forestry Database](http://nfdp.ccfm.org/en/data/fires.php#tab322), and contains information on Canadian wildfires events from 1990-2021.

# Gather

### Code: 

Run the code cells below to import the libraries we need for this project. Libraries are pre-made code that make it easier to analyze our data.

In [None]:
import pandas as pd
import plotly_express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import warnings
import matplotlib.pyplot as plt
warnings.simplefilter(action='ignore', category=FutureWarning)
print("Libaries imported")

### Data

In this notebook, we will use 4 different datasets from [National Forestry Database](http://nfdp.ccfm.org/en/data/fires.php#tab322). To measure the main causes of wildfires, we'll be utilizing `areabygroup` and `numfiresbygroup` which measure the number of hectares of land that are damaged by wildfires and the number of fires started due to specific *causes* respectively. 

The datasets `areabyclass` and `numfiresbymonth` will also be used later in this notebook, and these datasets delve more into the sizes of fires and their impact on the environment and the number of fires recorded each month respectively. 

### Import the data

In [None]:
# Import data
areabygroup = pd.read_excel('https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/wildfires/areaburnedbygroup.xlsx', skiprows=1)
areabyclass = pd.read_excel('https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/wildfires/areaburnedfireclass.xlsx', skiprows=1)
numfiresbygroup = pd.read_excel('https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/wildfires/numfiresbygroup.xlsx')
numfiresbymonth = pd.read_excel('https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/wildfires/numfiresbymonth.xlsx', skiprows=1)

In [None]:
display(areabygroup.head())
display(areabyclass.head())
display(numfiresbygroup.head())
display(numfiresbymonth.head())

### Comment on the data
Notice that in our datasets we have some repetitious columns. The first column is `Jurisdiction`, and it refers to the province in which the wildfire occurred. The column `Data Qualifier` is not important for our sake and can be ignored. Then there are columns `1990`-`2021` referring to either the quantity of area burned in hectares or the number of wildfires that occurred in that year. 

The columns that are different in each dataset are `Cause` referring to how the wildfire was started, `Fire size class` referring to the size of wildfire, and `Month` referring to the month that a particular number of wildfires started in.

# Organize
Let's organize our data by organizing it into relevant information. In regard to our `areabygroup` and `numfiresbygroup` dataset, let's sum the different causes in the `Cause` column together to identify common causes of wildfires. For our `areabyclass` dataset, let's remove any classes that are noted as *Unspecified* as we want to categorize our data without any external outliers. 

Let's also organize a pleasant colour scheme which most our visualizations will use throughout the notebook.

Note: We'll be constantly organizing our data to make it useful to the visualization at hand.

In [None]:
areabygroup_causes = areabygroup.groupby(['Cause']).sum().reset_index()
numfiresbygroup_causes = numfiresbygroup.groupby(['Cause']).sum().reset_index()
areabyclass = areabyclass[areabyclass['Fire size class'] != 'Unspecified']

In [None]:
years = list(range(1990, 2022))

# Generate a diverging color scheme from red to blue
color_scheme = plt.cm.get_cmap('RdYlBu', len(years))

color_map = {}

for i, year in enumerate(years):
    color = color_scheme(i)
    hex_color = '#{:02x}{:02x}{:02x}'.format(int(color[0]*255), int(color[1]*255), int(color[2]*255))
    color_map[year] = hex_color
print(color_map)

# Explore

### Visualizations on Causations of Wildfires

With our new datasets `areabygroup_causes` and `numfiresbygroup_causes`, let's visualize the trends from 1990-2021 and notice if there are any apparent differences between the figures.

Clarifications on the `Cause` of wildfires can be found below.

| **Cause** | **Description** |
| :- |:------------- | 
| Human Activity | Wildfires caused by human activity refer to fires ignited as a result of human actions. These can include accidental causes like discarded cigarettes, unattended campfires, sparks from equipment or machinery, power line failures, or intentional acts of arson. Human-caused wildfires can occur in both urban and rural areas and can have various impacts depending on the location, climate, and fuel availability. |
| Lightning | Lightning-caused wildfires originate from natural electrical discharges during dry thunderstorms. Lightning strikes can ignite fires when they come into contact with dry vegetation, especially in regions with dry and windy conditions. These wildfires are a natural occurrence and can be prevalent in remote or forested areas where lightning strikes are more likely to occur. |
| Prescribed Burn | A prescribed burn, also known as a controlled burn or prescribed fire, is a planned fire intentionally set by fire management authorities. These burns are carefully controlled and conducted under specific conditions to reduce the risk of uncontrolled wildfires. Prescribed burns are often used for ecological purposes, such as promoting forest health, reducing fuel loads, managing vegetation, or restoring natural fire regimes. |
| Reburn | Reburn refers to a situation where a previously burned area reignites due to residual heat or smoldering embers from a previous wildfire. Even after a fire has been extinguished, heat can remain trapped within logs, root systems, or organic material. Under certain conditions, such as dry and windy weather, these residual heat sources can reignite, leading to a new fire within the previously burned area. Reburns can be challenging to detect and control, as they can occur unexpectedly and often in areas where firefighting resources have been recently deployed.  |
| Unspecified | Causes where the source of wildfire is unknown or unspecified. Mainly due to a lack of evidence. |

In [None]:
cause_figs = make_subplots(rows=2, cols=1, subplot_titles=['Area Burned from Wildfires by Cause', 'Number of Fires by Cause'])

for i in range(1990,2022):

    cause_figs.add_bar(x=areabygroup_causes['Cause'], y=areabygroup_causes[i], name=i, marker=dict(color=color_map[i]), row=1, col=1)
    cause_figs.add_bar(x=numfiresbygroup_causes['Cause'], y=numfiresbygroup_causes[i], name=i, marker=dict(color=color_map[i]), row=2, col=1)
    
cause_figs.update_layout(showlegend=False, height=800)
cause_figs.update_xaxes(title_text='Cause', row=1, col=1)
cause_figs.update_xaxes(title_text='Cause', row=2, col=1)
cause_figs.update_yaxes(title_text='Area Burned (hectares)', row=1, col=1)
cause_figs.update_yaxes(title_text='Number of Fires', row=2, col=1).show()

# Intrepret

### Explore `Prescribed Burn` and `Reburn` Causations

In terms of general trends found in the visualizations, starting with the figure *Area Burned from Wildfires by Cause*, `Lightning` appears to be the main factor in terms of area burned by wildfires. In comparison with the *Number of Figures by Cause* figure, `Human Activity` and `Lightning` are clear front-runners in terms of number of fires started. Something that also stands out is despite the frequency of fires started by `Human Activity`, there seems to be a drastic fall-off in terms of area burned. Furthermore, due to the sheer scale of `Human Activity` and `Lightning`, columns like `Prescribed burn` and `Reburn` become hard to see.

Some particular reasons why lightning may generate more hectares burned compared to human activity could be:
1. Location: The locations of where wildfires begin can influence their behavior and the extent of the burned area. Human-caused fires often occur in more populated areas, where vegetation may be managed or more fragmented, leading to smaller, localized fires. In contrast, lightning-caused fires can occur in remote and inaccessible areas with abundant and continuous fuel, allowing the fire to spread over larger areas.
   
2. Natural Factors: Lightning-caused fires are often associated with thunderstorms, which can occur in regions with favorable atmospheric conditions for fire spread, such as dry and windy conditions. Lightning strikes during these conditions can ignite wildfires that are more likely to spread rapidly and cover extensive areas.
   
3. Human Intervention: Like mentioned earlier, many human activity related fires begin in populated areas. As a result, oftentimes there is prompt response and intervention in order to mitigate the spread of the fire. Well-established firefighting infrastructure and strategies, such as firebreaks, water sources, and aerial resources, also significantly mitigate the impact of human-caused fires.

Due to sheer scale of wildfires started by `Human Activity` and `Lightning`, let's only look at `Prescribed Burns` and `Reburns` in order to visualize any noticeable trends.

In [None]:
burn_reburn = areabygroup_causes.iloc[:-1, :]
burn_reburn = burn_reburn.tail(2)
num_burn_reburn = numfiresbygroup_causes.iloc[:-1, :]
burns = make_subplots(rows=2, cols=1, subplot_titles=['Area Burned from Wildfires for Prescribed Burns and Reburns', 'Number of Fires caused by Prescribed Burns and Reburns'])

for i in range(1990,2022):

    burns.add_bar(x=burn_reburn['Cause'], y=burn_reburn[i], name=i, marker=dict(color=color_map[i]), row=1, col=1)
    burns.add_bar(x=num_burn_reburn['Cause'], y=num_burn_reburn[i], name=i, marker=dict(color=color_map[i]), row=2, col=1)

burns.update_layout(showlegend=False, height=800)
burns.update_xaxes(title_text='Cause', row=1, col=1)
burns.update_xaxes(title_text='Cause', row=2, col=1)
burns.update_yaxes(title_text='Area Burned (hectares)', row=1, col=1)
burns.update_yaxes(title_text='Number of Fires', row=2, col=1).show()

In general, it seems that there are consistent trends in terms of the number of fires started by `Prescribed Burns` and `Reburns` and areas burned from wildfires. However, it should be noted that there does not seem to be any recorded cases of area burned for `Reburns`.

Note: One particular reason why there are no recorded cases of areas burned due to reburns could be that reburns are often very uncommon. When they do occur, individuals may not have the tools available to note how many hectares of land were burnt, and as a result put down a **None**/**NaN** for it's recorded value. 

### Separating Provinces: Exploring Wildfire Trends by Cause

Next, let's visualize which causes are most frequent in particular provinces and identify any reasons why this could be.

In [None]:
areabygroup['Sum']= areabygroup.iloc[:,4:35].sum(axis=1)
numfiresbygroup['Sum']= numfiresbygroup.iloc[:,4:35].sum(axis=1)
# Calculate the sum of area burned and number of fires by cause
province_area = px.sunburst(areabygroup, path=['Jurisdiction', 'Cause'], values='Sum')
province_area.update_layout(title='Area Burned of each Province by Cause').show()
province_num = px.sunburst(numfiresbygroup, path=['Jurisdiction', 'Cause'], values='Sum')
province_num.update_layout(title='Number of Fires of each Province by Cause').show()

Similarly to our plots before, we notice the `Human Activity` and `Lightning` as the main causes of wildfires, with lightning causing more overall area burned and human activity being more prevalent in starting fires compared to burning area. 

Provinces such as *Alberta*, *British Columbia*, and *Ontario* stand out as having a high number of fires started compared to other provinces such as *Prince Edward Island*. One of the main reasons for this is due to population density. Alberta, B.C, and Ontario are among the most populous provinces in Canada, with larger urban centers and a higher concentration of human activity. Increased human presence, along with more infrastructure, such as roads, power lines, and recreational areas, can lead to a higher likelihood of human-caused fires.

An interesting observation seen in the `Lightning` cause is that despite *Northwest Territories* and *Saskatchewan* having an average number of fires started, they have the 1st and 2nd highest number of land burnt respectively. A particular reason for this could be due to the land characteristics of these provinces. Saskatchewan and the Northwest Territories have extensive areas of grasslands, boreal forests, and tundra, which are highly flammable during dry periods. These regions have a significant amount of fuel available to burn, leading to larger fire sizes when ignited. In contrast, B.C and Alberta have a more diverse landscape, including mountainous terrain and a mix of forest types, which can sometimes act as natural barriers and limit the spread of wildfires. Another significant factor is the climate and weather patterns in Saskatchewan and the Northwest Territories. Often, these provinces have drier and hotter climates compared to other provinces. The combination of low rainfall, high temperatures, and prolonged periods of drought increases the potential for fires to ignite and spread more rapidly. 

Sometimes, due to the scale of certain values in figures, it can be hard to visualize comparisons. In the figure below, you can find the main causes of landmass burnt in different provinces in differing years by changing the value of `year_to_check` to the year you are interested in observing. 

Note: The dataset obtained only has years from 1990-2021, and the *2021* data cannot identify the causes of wildfires.

In [None]:
provincal_fires = areabygroup.groupby(['Jurisdiction', 'Cause']).sum().reset_index()

# Change this value to change the year (Note: This data goes from 1990-2021)
# For example, changing 2020 to 2019 will show Area Burned based on Causes in 2019
year_to_check = 2020

if year_to_check < 1900 or year_to_check > 2021:
    print("Please input a valid date from 1990-2021")
else:
    provincal_fires_fig = px.bar(provincal_fires, x='Jurisdiction', y=year_to_check, color='Cause', title=f'Area Burned from Wildfires based on Province in {year_to_check}').update_layout(yaxis_title="Area Burned (hectares)").show()

### Exploring Fire Size Differences

Now that we can a clear sense of the main causes of wildfires in Canada, including particular causes more often seen in certain provinces, let's visualize trends in how much area is burnt from to the *size of wildfires*. We can do this by taking a subset of data and taking rows which contain the maximum value of particular years. As a result, we can obtain a figure which doesn't contain an obsessive amount of data-points.

In [None]:
columns_to_check = range(1990, 2022) 

max_values = {}
corresponding_rows = {}

for column in columns_to_check:
    if column in areabyclass.columns:
        max_values[column] = areabyclass[column].max()
        corresponding_rows[column] = areabyclass.loc[areabyclass[column] == max_values[column]]

# Find the maximum value for each year
max_values = areabyclass.loc[:, 1990:2021].max()

# Find the corresponding rows with the maximum values
corresponding_rows = areabyclass.loc[areabyclass.isin(max_values.values).any(axis=1)]
corresponding_rows = corresponding_rows.reset_index()
corresponding_rows = corresponding_rows.melt(id_vars='index', value_vars=list(max_values.index),
                                             var_name='Year', value_name='MaxValue')

# Merge with areabyclass to get Jurisdiction and Fire size class
corresponding_rows = pd.merge(corresponding_rows, areabyclass[['Jurisdiction', 'Fire size class']], left_on='index', right_index=True)

corresponding_rows = corresponding_rows[['Year', 'MaxValue', 'Jurisdiction', 'Fire size class']]
corresponding_rows_1 = corresponding_rows.query("`Fire size class` == 'Up to 0.1 ha' | `Fire size class` == '0.11 - 1.0 ha' | `Fire size class` == '1.1 - 10 ha'").reset_index()
corresponding_rows_2 = corresponding_rows.query("`Fire size class` == '10.1 - 100 ha' | `Fire size class` == '100.1 - 1 000 ha' | `Fire size class` == '1000.1 - 10 000 ha'").reset_index()
corresponding_rows_3 = corresponding_rows.query("`Fire size class` == '10 000.1 - 100 000 ha' | `Fire size class` == 'Over 100 000 ha'").reset_index()
all_dataframes = [corresponding_rows_1, corresponding_rows_2, corresponding_rows_3]

In [None]:
for i in all_dataframes:
    all_figs = px.scatter(i, x='Year', y='MaxValue',
                    hover_data={'Year': True, 'Jurisdiction': True, 'Fire size class': True}, color='Fire size class')

    all_figs.update_layout(xaxis_title='Year', yaxis_title='Highest Value (in Hectares)').show() 

### Provincial Trends: Separating Differences by Months

Looking at the figures above, one quite obvious trend seen is that as the size of fires increase, the more damaging they become in terms of area burnt due to the wildfire. The largest point seen in *figure 3*, was found in the Northwest Territories with a value of 2,447,000 million hectares of land burnt. That's approximately equivalent 5 million football fields, which is an absurd number to visualize. On the opposite side, surprisingly certain wildfires were recorded having no land burnt despite being considered a "wildfire". A particular reason why this could have happened is due to an immediate response to a fire starting, and thus forced being recorded into the certain fire databases despite a lack of land burnt. 

In our final visualization, let's look at the trends of which months fires are most likely to start in alongside months that fires are most likely not to start in.

In [None]:
# Get unique jurisdictions excluding "Parks Canada"
jurisdictions = numfiresbymonth[numfiresbymonth['Jurisdiction'] != 'Parks Canada']['Jurisdiction'].unique()

fig = make_subplots(rows=12, cols=1, subplot_titles=jurisdictions)

for i, jurisdiction in enumerate(jurisdictions, start=1):
    # Filter data for the current jurisdiction
    jurisdiction_data = numfiresbymonth[numfiresbymonth['Jurisdiction'] == jurisdiction]
    
    for year in range(1990, 2022):
        # Check if "Unspecified" month exists in the data
        if "Unspecified" in jurisdiction_data['Month'].values:
            jurisdiction_data_filtered = jurisdiction_data[jurisdiction_data['Month'] != "Unspecified"]
        else:
            jurisdiction_data_filtered = jurisdiction_data
        
        fig.add_trace(
            go.Bar(x=jurisdiction_data_filtered['Month'], y=jurisdiction_data_filtered[year], name=str(year)),
            row=i, col=1
        )

fig.update_layout(height=1800, width=1200, showlegend=False, title_text="Trends of Number of Wildfires each Month from 1990-2021 by Province").show()

Starting with general trends found in many provinces, it appears that most wildfires start during the months of **April-September**, peaking in August. This makes sense as during the spring and summer months, many provinces experience higher temperatures, lower humidity levels, and reduced precipitation. These weather conditions create a drier environment, which increases the risk of wildfires. Dry vegetation, such as grasses, shrubs, and trees, becomes more flammable and susceptible to ignition from various sources. During fall and winter, cooler temperatures and higher humidity levels help maintain higher moisture content in vegetation, making it less prone to ignition and slower to burn. Furthermore, as noted before, *lightning* is the primary causes of wildfires in Canada. As such, in certain provinces, thunderstorms are more prevalent during the summer months, leading to an increased number of lightning strikes that can ignite fires.

Certain provinces that are similar in composition such as *Alberta* and *Saskatchewan* and *Northwest Territories* and *Yukon* shared similarities in trends of wildfires, which makes sense. Provinces that are closer to each other are expected to share similarities in terms of environmental composition, and thus, would be likely to share wildfire patterns.

In terms of minor differences in provinces found in this visualization, certain maritime provinces such as *P.E.I* and *New Brunswick* peaked closer to **April-May**. In particular, the maritime provinces appeared to have an extremely low level of wildfires started. This can be attributed to the maritime provinces having a cooler and more humid climate compared to other regions in Canada. The proximity to the Atlantic Ocean also influences the weather patterns, resulting in milder summers with higher precipitation levels.

# Communicate

Below are some writing prompts to help you reflect on the new information that is presented from the data. When we look at the evidence, think about what you perceive about the information. Is this perception based on what the evidence shows? If others were to view it, what perceptions might they have?

- I used to think ____________________but now I know____________________. 
- I wish I knew more about ____________________. 
- This visualization reminds me of ____________________. 
- I really like ____________________.


[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)