# Explore and Analyze Tornadoes from 2013 using MATLAB in Jupyter

## MATLAB in Jupyter

Welcome to your first Jupyter Notebook running MATLAB as the kernel.  

The only difference between this Jupyter notebook and ones you may have used before is that ALL of the code shown here is MATLAB code.  Everything else is **exactly the same** including

* How you evaluate cells - Shift+Enter or CTRL+Enter
* Double clicking on text cells (such as this one) to edit Markdown

## Discussion of our problem

Add context

## Importing the data

### Here we'll use a Parquet file

The data is stored in the file `StormEvents.parquet` which is an example of a Parquet file. Parquet is in open-source column-oriented data storage format developed and maintained as a part of the Apache Software Foundation and is a great choice for 'Big Data' analysis. 


In [None]:
events = parquetread("/workspaces/computational_notebooks_workshop/data/StormEvents.parquet");

The `parquetread` file creates a table from the file called `events` which is of size 59,985 rows and 16 columns which we can see for ourselves by running the following commands

In [None]:
class(events)

In [None]:
size(events)

Displaying such large tables in the notebook can be problematic, particularly when each row takes up multiple lines, so let's just look at the first 7 columns.  Note that only the first few rows are displayed

In [None]:
events(:,[1:7])

The list of column headings can be found as follows

In [None]:
events.Properties.VariableNames

We can also request to view columns by their row name

In [None]:
events(:,["Month","Begin_Date_Time","Property_Cost","Crop_Cost"])

Some ofthe data contains missing data, encoded as `NaN`.  Here are some that contain missing `Property_Cost` data.

In [None]:
events(ismissing(events.Property_Cost),["Month","Begin_Date_Time","Property_Cost","Crop_Cost"])

## Data Processing steps

ADD CONTEXT

In [None]:
% Put months in correct order
monthOrder = ["January", "February", "March", "April", "May", "June", "July",...
    "August", "September", "October", "November", "December"];
events.Month = reordercats(events.Month, monthOrder);

% Set missing Property and Crop Cost to $0
events.Property_Cost(ismissing(events.Property_Cost)) = 0;
events.Crop_Cost(ismissing(events.Crop_Cost)) = 0;

% Add total damage to the table
events.Total_Damage = events.Property_Cost + events.Crop_Cost;

Description of code above

* A
* B
* C

# Visualize the Locations of Tornadoes
## Plot all tornados above a damage threshold

ADD CONTEXT 

In [None]:
minDamage = 0;    % modify this to change which tornados are included in the plot

% Select tornadoes above a damage threshold set by the minDamage variable
tornadoes = events(events.Event_Type == "Tornado" & events.Total_Damage >= minDamage, :);
% Plot the results on a map
geobubble(tornadoes.Begin_Lat,tornadoes.Begin_Lon,tornadoes.Total_Damage, tornadoes.Month);
title("Tornadoes with cost >= $" + minDamage)

In [None]:
eventMonth = "June";   % Change this and re-run the cell

tornadoes = events(events.Event_Type == "Tornado", :);
tornadoes = tornadoes(tornadoes.Month == eventMonth, :);
geobubble(tornadoes.Begin_Lat,tornadoes.Begin_Lon);
title("Tornadoes in the month of " + eventMonth)

In [None]:
% Total damage by month

stat = "Max"   % Could also be "Sum", "Mean" or "Max"
tornadoes = events(events.Event_Type == "Tornado",:);
% Group tornadoes by month and calculate the selected stat
cost = groupsummary(tornadoes, "Month", stat, "Total_Damage")

% View the results using a bar chart
bar(cost.Month, cost{:, end})
title(stat + " of tornado cost by month")
ylabel("Cost in dollars")