![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Callysto's Weekly Data Visualization 

## Costliest Natural Disasters

### Recommended Grade levels: 5-12

### Instructions

#### "Run" the cells to see the graphs

Click "Cell" and select "Run All".

This will import the data and run all the code, so you can see this week's data visualization. Scroll to the top after you’ve run the cells.

![instructions](https://github.com/callysto/data-viz-of-the-week/blob/main/images/instructions.png?raw=true)

**You don't need to do any coding to view the visualizations**.

The plots generated in this notebook are interactive. You can hover over and click on elements to see more information. 

Email contact@callysto.ca if you experience issues.

### About this Notebook

Callysto's Weekly Data Visualization is a learning resource that aims to develop data literacy skills. We provide Grades 5-12 teachers and students with a data visualization, like a graph, to interpret. This companion resource walks learners through how the data visualization is created and interpreted by a data scientist. 

The steps of the data analysis process are listed below and applied to each weekly topic.

1. Question - What are we trying to answer?
2. Gather - Find the data source(s) you will need. 
3. Organize - Arrange the data, so that you can easily explore it. 
4. Explore - Examine the data to look for evidence to answer the question. This includes creating visualizations. 
5. Interpret - Describe what's happening in the data visualization. 
6. Communicate - Explain how the evidence answers the question. 

## Question

Which natural disasters caused the greatest costs? 

### Goal

Our goal in this visualization is to show which natural disasters lead to the greatest financial costs and use the visualization to discover any patterns to their impact.

The dataset is taken from Public Safety Canada, and contains information on Canadian natural disaster events from the years 1900 to 2019.

### Background

Weather events and natural diasters have the potential to cause huge amounts of damage to property. Have you ever wondered what the most expensive natural disasters and weather events are in Canada? We are going to explore the costliest natural disasters in the 2010 decade in this notebook. The floods in Calgary, Alberta in 2013 were an example of a very costly natural disaster. The estimated cost of this flooding event was [five billion dollars](https://www.calgary.ca/water/flooding/history-calgary.html).

Run the code cell below to watch a video about the floods in Calgary. After watching the video run the code cell below for some questions to discuss with classmates close to you or your whole class.

In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo('jgw06p4jeh8')

After watching the video, reflect on the following questions:

* What images from the video stuck out with you?
* Why did those images stick in your mind?

# Gather

### Code: 

Now that we have stated what this project will be about we need to setup the rest of our notebook. To setup this notebook run the code cells below to import the libraries we need for this project. In short, libraries are pre-made code that make it easier to analyze our data.

In [None]:
import pandas as pd
import plotly.express as px

Pandas is a library that helps us with data analysis, and Plotly Express is a library that helps us to make visualizations. Without importing these libraries we would have to use much more code to analyze our data and generate visualizations. We import the libraries with abbreviations, or aliases, so that we have less typing to do in each line of our code below. 

### Data
We are using data from [Public Safety Canada](https://www.publicsafety.gc.ca/cnt/rsrcs/cndn-dsstr-dtbs/index-en.aspx) on natural disasters. Run the code below to populate the data into a dataframe.

#### Import the Data

In [None]:
data = pd.read_csv('data/CDD.txt', sep='\t')
data

### Comment on the data

As we can see from the numbers below the dataframe itself, this dataset has 867 rows and 23 columns. Each row represents a disaster, and each column describes an aspect of that disaster. In the next step, we'll dive into more detail and exactly what the data contains. 

Look at the columns to see what values are available in this data set. Which ones are interesting to you? 

The `COMMENTS` field may be of particular interest; when we read that column we can get more specific information on some of the events. Do any of the events mentioned in `COMMENTS` seem familiar to you? Under `EVENT TYPE`, we can see some categories that the disasters might belong to. We can focus on these aspects of the data, and more, in the next steps.

# Organize

An important part of the data science process is cleaning up and organizing your data so it can be useful for finding observations. Part of cleaning involves identifying and possibly removing missing data, ensuring the data is all in the same format, as well as identifying and dealing with outliers. This particular dataset has many values where data is missing. In Python, missing data is identified as `NaN` ('Not a Number'), so we want to see how much of our dataframe contains missing data. We do this by asking for the fields where the data is 'non-null'. Non-null essentially means those fields have actual data in them, or that they are *not* missing values.

Let's look at what the column names are and how much non-null data each contain. This function returns all of the column names, along with the number of non-null values inside each column:

In [None]:
data.info()

Many of our fields have data in them, or are non-null. Take note of which fields have a higher number of non-null cells, as that number varies quite a bit by column. A few of the fields have very few non-null cells, meaning most of the data is missing.

As the financial cost is the main question we're trying to answer, we are using the `NORMALIZED TOTAL COST` for our analysis and visualization. The `NORMALIZED TOTAL COST` differs from the `ESTIMATED TOTAL COST` by taking into account inflation. As the data spans from the years 1900 to 2019, the real value of money has steadily decreased, so we need to account for that. The year 2016 is the last year we're able to normalize for from the data, and because this is the column we are most curious about, we want to omit any rows that don't include an amount for that column:

In [None]:
data = data[data['NORMALIZED TOTAL COST'].notna()]
data.info()

We're also interested in the types of events that are included in the data. We can look at the `EVENT SUBGROUP` to see what types of events exist, and the code below extracts the unique values in that column: 

In [None]:
list(data['EVENT SUBGROUP'].unique())

It makes sense that events would be 'Meteorological - Hydrological', but what does '25' mean? Let's check out rows that have that value for `EVENT SUBGROUP`:

In [None]:
data[data['EVENT SUBGROUP']!='Meteorological - Hydrological']

# \# add text about what you found above and what to do with it

Now we can get rid of any data values where the `EVENT SUBGROUP` is not 'Meteorological - Hydrological'. This process removes the one outlier of our data that does not fit this category and makes our data cleaner to work with. 

In [None]:
data = data[data['EVENT SUBGROUP']=='Meteorological - Hydrological']
data

You can now see our data frame only includes events where the `EVENT SUBGROUP` is equal to 'Meteorological - Hydrological'. Look under `EVENT TYPE` to see more information about what each of these events were. 
 
# \# keep in mind the readers probably don't know how to look into the dataframe to find more information. Instead, add a section where you do that.

# Explore

The next part of the data science process is generating a visualization to help us answer our question. This part of the data science process is really exciting! A visualization is often a graph, but it can be any way that we can visually show our data. In our case, we are going to use a scatter plot. A visualization helps to understand what kind of story our data is telling us. 

Run the code below to generate a scatter plot from the data that will help us to answer our question. Each point represents a specific event.

The size of the points represents the estimated cost, and the color represents the total insurance payments paid out for that particular event.

In [None]:
fig = px.scatter(data, x="EVENT TYPE", y="NORMALIZED TOTAL COST", 
                 title='Disasters Compared to their Total Cost', 
                 hover_data={"PLACE","COMMENTS"},
                 size='ESTIMATED TOTAL COST',
                 color='INSURANCE PAYMENTS',
                 height=600)

fig.show()

# Interpret

# \# write about the various aspects of the plot

# Reflect on what you see

After making your visualization the next step is to use the data and your visualization to answer the question. Look at and interact with the visualization above. When you hover your mouse over the plots, you’ll notice more information appears. You can also use the legend to make plots appear and disappear.

#### Think about the following questions.

* What do you notice about these graphs?
* What do you wonder about the data?
* What kind of inferences can you make based on this data?

#### Use the fill-in-the-blank prompts to summarize your thoughts.
* "I used to think _______"
* "Now I think _______"
* "I wish I knew more about _______"
* "These data visualizations remind me of _______"
* "I really like _______"

# Communicate

If you have not yet done this use the plot to answer our question on which natural disaster was the most expensive. 
Once we understand the costs of natural disasters how can we use that information?

How can you communicate that information? What kind of product could you create to share that information with your school community and wider community?

Consider tagging Callysto on [Twitter](https://twitter.com/callysto_canada), [YouTube](https://www.youtube.com/Callysto), [TikTok](https://www.tiktok.com/@callysto_canada), [Facebook](https://www.facebook.com/callystocanada/), or [Linkedin](https://www.linkedin.com/company/callysto-canada/) if you decide to share your reflections or projects on social media.

# Further Resources

For more information on the costliest weather events between 2012 and 2016 check out this article from the [Weather Network](https://www.theweathernetwork.com/ca/news/article/the-top-five-costliest-canadian-natural-disasters-of-the-2010s) 

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)