![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Callysto’s Weekly Data Visualization

## Wildfires

### Recommended Grade levels: 9-12
<br>

### Instructions

Click "Cell" and select "Run All".

This will import the data and run all the code, so you can see this week's data visualization. Scroll back to the top after you’ve run the cells.

![instructions](https://github.com/callysto/data-viz-of-the-week/blob/main/images/instructions.png?raw=true)

**You don't need to do any coding to view the visualizations**.

The plots generated in this notebook are interactive. You can hover over and click on elements to see more information. 

Email contact@callysto.ca if you experience issues.

### About this Notebook

Callysto's Weekly Data Visualization is a learning resource that aims to develop data literacy skills. We provide Grades 5-12 teachers and students with a data visualization, like a graph, to interpret. This companion resource walks learners through how the data visualization is created and interpreted by a data scientist. 

The steps of the data analysis process are listed below and applied to each weekly topic.

1. Question - What are we trying to answer?
2. Gather - Find the data source(s) you will need. 
3. Organize - Arrange the data, so that you can easily explore it. 
4. Explore - Examine the data to look for evidence to answer the question. This includes creating visualizations. 
5. Interpret - Describe what's happening in the data visualization. 
6. Communicate - Explain how the evidence answers the question. 

## Question

What are the primary causes of wildfires and when do they occur?

### Goal

Our goal is to show which natural disasters led to the greatest financial costs and use visualizations to discover any patterns to their impact.

The dataset is taken from [Public Safety Canada](https://www.publicsafety.gc.ca/cnt/rsrcs/cndn-dsstr-dtbs/index-en.aspx), and contains information on Canadian natural disaster events from the years 1900 to 2019.

### Background

Weather events and natural diasters have the potential to cause huge amounts of damage to property. Have you ever wondered what the most expensive natural disasters and weather events are in Canada? We are going to explore the costliest natural disasters in the 2010 decade in this notebook. 


## Gather

### Code: 

Run the code cells below to import the libraries we need for this project. Libraries are pre-made code that make it easier to analyze our data.

In [20]:
import pandas as pd
import plotly_express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import warnings
import matplotlib.pyplot as plt
warnings.simplefilter(action='ignore', category=FutureWarning)

In [21]:
areabygroup = pd.read_excel('https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/wildfires/areaburnedbygroup.xlsx', skiprows=1)
areabymonth = pd.read_excel('https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/wildfires/areaburnedbymonth.xlsx', skiprows=1)
areabyclass = pd.read_excel('https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/wildfires/areaburnedfireclass.xlsx', skiprows=1)
numfiresbyclass = pd.read_excel('https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/wildfires/numfiresbyfiresize.xlsx', skiprows=1)
numfiresbygroup = pd.read_excel('https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/wildfires/numfiresbygroup.xlsx')
numfiresbymonth = pd.read_excel('https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/wildfires/numfiresbymonth.xlsx', skiprows=1)

In [22]:
areabygroup.columns

Index([  'Jurisdiction',          'Cause', 'Data Qualifier',             1990,
                   1991,             1992,             1993,             1994,
                   1995,             1996,             1997,             1998,
                   1999,             2000,             2001,             2002,
                   2003,             2004,             2005,             2006,
                   2007,             2008,             2009,             2010,
                   2011,             2012,             2013,             2014,
                   2015,             2016,             2017,             2018,
                   2019,             2020,             2021],
      dtype='object')

In [23]:
#display(areabygroup)
display(areabymonth.head())
#display(areabyclass.head())
#display(numfiresbyclass.head())
#display(numfiresbygroup.head())
display(numfiresbymonth.head()) 

Unnamed: 0,Jurisdiction,Month,Data Qualifier,1990,1991,1992,1993,1994,1995,1996,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Alberta,January,a,0.0,86.0,12.0,3.0,15.0,1.0,0.0,...,1.0,0.0,11.0,1.0,0.0,1.0,0.0,2.0,0.0,
1,Alberta,February,a,3.0,1.0,54.0,39.0,,10.0,,...,0.0,7.0,,0.0,6.0,90.0,,0.0,0.0,
2,Alberta,March,a,7.0,118.0,248.0,577.0,32.0,289.0,1.0,...,165.0,2.0,1.0,13.0,642.0,15.0,2.0,14.0,0.0,
3,Alberta,April,a,25.0,1158.0,1254.0,301.0,868.0,579.0,93.0,...,125.0,144.0,435.0,2235.0,2709.0,1380.0,117.0,628.0,77.0,
4,Alberta,May,a,636.0,3678.0,420.0,6476.0,3460.0,330669.0,522.0,...,122256.0,7215.0,505.0,263450.0,498402.0,27072.0,42831.0,896887.0,515.0,


Unnamed: 0,Jurisdiction,Month,Data Qualifier,1990,1991,1992,1993,1994,1995,1996,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Alberta,January,a,1.0,9.0,14.0,8.0,4.0,10.0,1.0,...,14.0,1.0,9.0,2.0,7.0,4.0,4.0,17.0,1.0,
1,Alberta,February,a,5.0,4.0,11.0,12.0,,6.0,,...,3.0,3.0,,4.0,12.0,4.0,,1.0,1.0,
2,Alberta,March,a,8.0,8.0,61.0,29.0,12.0,29.0,1.0,...,32.0,9.0,11.0,32.0,62.0,14.0,19.0,34.0,4.0,
3,Alberta,April,a,26.0,111.0,110.0,52.0,64.0,57.0,26.0,...,123.0,37.0,81.0,277.0,287.0,101.0,97.0,188.0,68.0,
4,Alberta,May,a,114.0,201.0,91.0,242.0,132.0,215.0,56.0,...,456.0,529.0,360.0,505.0,250.0,348.0,454.0,315.0,209.0,


In [24]:
years = list(range(1990, 2022))

# Generate a diverging color scheme from red to blue
color_scheme = plt.cm.get_cmap('RdYlBu', len(years))

color_map = {}

for i, year in enumerate(years):
    color = color_scheme(i)
    hex_color = '#{:02x}{:02x}{:02x}'.format(int(color[0]*255), int(color[1]*255), int(color[2]*255))
    color_map[year] = hex_color
print(color_map)

{1990: '#a50026', 1991: '#b50f26', 1992: '#c51e26', 1993: '#d52e26', 1994: '#df412f', 1995: '#e85538', 1996: '#f26941', 1997: '#f67d4a', 1998: '#f99254', 1999: '#fca75e', 2000: '#fdb96b', 2001: '#fdc97a', 2002: '#fdd989', 2003: '#fee599', 2004: '#fef0a8', 2005: '#fefab7', 2006: '#fafdc8', 2007: '#f0f9da', 2008: '#e6f5ec', 2009: '#d9eff6', 2010: '#c8e7f1', 2011: '#b6deec', 2012: '#a5d4e6', 2013: '#93c6de', 2014: '#82b8d7', 2015: '#70a9cf', 2016: '#6197c5', 2017: '#5285bc', 2018: '#4472b3', 2019: '#3d5ea9', 2020: '#374a9f', 2021: '#313695'}


In [25]:
areabygroup_causes = areabygroup.groupby(['Cause']).sum().reset_index()
numfiresbygroup_causes = numfiresbygroup.groupby(['Cause']).sum().reset_index()
cause_figs = make_subplots(rows=2, cols=1, x_title='Cause', y_title='Area Burned (hectares)', subplot_titles=['Area Burned from Wildfires by Cause', 'Number of Fires by Cause'])
for i in range(1990,2022):
    cause_figs.add_bar(x=areabygroup_causes['Cause'], y=areabygroup_causes[i], name=i, marker=dict(color=color_map[i]), row=1, col=1)
    cause_figs.add_bar(x=numfiresbygroup_causes['Cause'], y=numfiresbygroup_causes[i], name=i, marker=dict(color=color_map[i]), row=2, col=1)
cause_figs.update_layout(height=800, width=1780).show()

In [26]:
burn_reburn = areabygroup_causes.iloc[:-1, :]
burn_reburn = burn_reburn.tail(2)
num_burn_reburn = numfiresbygroup_causes.iloc[:-1, :]
num_burn_reburn = num_burn_reburn.tail(2)

burns = make_subplots(rows=2, cols=1, x_title='Cause', y_title='Area Burned (hectares)', subplot_titles=['Area Burned from Wildfires for Prescribed Burns and Reburns', 'Number of Fires caused by Prescribed Burns and Reburns'])
for i in range(1990,2022):
    burns.add_bar(x=burn_reburn['Cause'], y=burn_reburn[i], name=i, marker=dict(color=color_map[i]), row=1, col=1)
    burns.add_bar(x=num_burn_reburn['Cause'], y=num_burn_reburn[i], name=i, marker=dict(color=color_map[i]), row=2, col=1)
burns.update_layout(height=800, width=1780).show()

In [27]:
areabygroup['Sum']= areabygroup.iloc[:,4:35].sum(axis=1)
summationareabygrp = go.Figure()

unique_causes = areabygroup['Cause'].unique()
colors = ['blue', 'red', 'green', 'orange', 'purple']  # Specify your desired colors here

# Create a dictionary to map each cause to its corresponding color
cause_color_map = {cause: color for cause, color in zip(unique_causes, colors)}

# Create a list of colors based on the cause of each bar
bar_colors = [cause_color_map[cause] for cause in areabygroup['Cause']]
summationareabygrp.add_bar(x=[areabygroup['Cause'],areabygroup['Jurisdiction']], y=areabygroup['Sum'],marker=dict(color=bar_colors))
summationareabygrp.update_layout(yaxis_title="Area Burned (hectares)")

In [28]:
provincal_fires = areabygroup.groupby(['Jurisdiction', 'Cause']).sum().reset_index()

# Change this value to change the year
# For example, changing 2020 to 2019 will show Area Burned based on Causes in 2019
year_to_check = 2020

provincal_fires_fig = px.bar(provincal_fires, x='Jurisdiction', y=year_to_check, color='Cause', title='Area Burned from Wildfires based on Province').update_layout(yaxis_title="Area Burned (hectares)").show()

In [29]:
display(numfiresbymonth.head()) 
display(areabymonth.head())

Unnamed: 0,Jurisdiction,Month,Data Qualifier,1990,1991,1992,1993,1994,1995,1996,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Alberta,January,a,1.0,9.0,14.0,8.0,4.0,10.0,1.0,...,14.0,1.0,9.0,2.0,7.0,4.0,4.0,17.0,1.0,
1,Alberta,February,a,5.0,4.0,11.0,12.0,,6.0,,...,3.0,3.0,,4.0,12.0,4.0,,1.0,1.0,
2,Alberta,March,a,8.0,8.0,61.0,29.0,12.0,29.0,1.0,...,32.0,9.0,11.0,32.0,62.0,14.0,19.0,34.0,4.0,
3,Alberta,April,a,26.0,111.0,110.0,52.0,64.0,57.0,26.0,...,123.0,37.0,81.0,277.0,287.0,101.0,97.0,188.0,68.0,
4,Alberta,May,a,114.0,201.0,91.0,242.0,132.0,215.0,56.0,...,456.0,529.0,360.0,505.0,250.0,348.0,454.0,315.0,209.0,


Unnamed: 0,Jurisdiction,Month,Data Qualifier,1990,1991,1992,1993,1994,1995,1996,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Alberta,January,a,0.0,86.0,12.0,3.0,15.0,1.0,0.0,...,1.0,0.0,11.0,1.0,0.0,1.0,0.0,2.0,0.0,
1,Alberta,February,a,3.0,1.0,54.0,39.0,,10.0,,...,0.0,7.0,,0.0,6.0,90.0,,0.0,0.0,
2,Alberta,March,a,7.0,118.0,248.0,577.0,32.0,289.0,1.0,...,165.0,2.0,1.0,13.0,642.0,15.0,2.0,14.0,0.0,
3,Alberta,April,a,25.0,1158.0,1254.0,301.0,868.0,579.0,93.0,...,125.0,144.0,435.0,2235.0,2709.0,1380.0,117.0,628.0,77.0,
4,Alberta,May,a,636.0,3678.0,420.0,6476.0,3460.0,330669.0,522.0,...,122256.0,7215.0,505.0,263450.0,498402.0,27072.0,42831.0,896887.0,515.0,


In [51]:
# Get unique jurisdictions excluding "Parks Canada"
jurisdictions = numfiresbymonth[numfiresbymonth['Jurisdiction'] != 'Parks Canada']['Jurisdiction'].unique()

# Create subplots with 12 rows and 1 column (excluding "Parks Canada")
fig = make_subplots(rows=12, cols=1, subplot_titles=jurisdictions)

# Iterate over jurisdictions and create a bar plot for each
for i, jurisdiction in enumerate(jurisdictions, start=1):
    # Filter data for the current jurisdiction
    jurisdiction_data = numfiresbymonth[numfiresbymonth['Jurisdiction'] == jurisdiction]
    
    # Create a bar trace for each year excluding "Unspecified" (error value)
    for year in range(1990, 2022):
        # Check if "Unspecified" month exists in the data
        if "Unspecified" in jurisdiction_data['Month'].values:
            jurisdiction_data_filtered = jurisdiction_data[jurisdiction_data['Month'] != "Unspecified"]
        else:
            jurisdiction_data_filtered = jurisdiction_data
        
        fig.add_trace(
            go.Bar(x=jurisdiction_data_filtered['Month'], y=jurisdiction_data_filtered[year], name=str(year)),
            row=i, col=1
        )

# Update layout
fig.update_layout(height=1800, width=1780, showlegend=False, title_text="Trends of Number of Wildfires each Month from 1990-2021 by Province").show()