## NOAA Storm Events Database ##

This database contains data from January 1950 to May 2025, as entered by NOAA's National Weather Service (NWS) [1][1]

csvfiles are accessable through FTP:  
`ftp://ftp.ncei.noaa.gov/pub/data/swdi/stormevents/csvfiles/`

Detailed information about the fields/columns:  
`ftp://ftp.ncei.noaa.gov/pub/data/swdi/stormevents/csvfiles/Storm-Data-Bulk-csv-Format.pdf`

Documentation on the file naming convention:  
`ftp://ftp.ncei.noaa.gov/pub/data/swdi/stormevents/csvfiles/README`


### Notebook sections
[Data loading and exploration](#data-loading-and-exploration)
[Cleaning Strategy](#cleaning-strategy)


[1]: https://www.ncei.noaa.gov/stormevents/ftp.jsp

In [1]:
# import libraries
# NOTE: global_vars should be edited to include local paths and credentials before use.
# If global_vars.py is created in the root dir remove the ignore/ prefix in the import statement below.
import ignore.global_vars as gv
import db_tools as dbt
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import urllib.request
import re

## Data loading and exploration
- browse files available in FPT
- print README
- download sample CSV to see structure


In [2]:
#  Explore the FTP site to see what files are available.
dbt.browse_ftp("ftp://ftp.ncei.noaa.gov", "/pub/data/swdi/stormevents/csvfiles/");


Contents of /pub/data/swdi/stormevents/csvfiles/:
drwxrwxr-x   2 ftp      ftp          4096 May 14  2014 legacy
-rw----r-x   1 ftp      ftp          2020 May 14  2014 README
-rw-r-xr-x   1 ftp      ftp        147087 Jul 30  2024 Storm-Data-Bulk-csv-Format.pdf
-rw----r-x   1 ftp      ftp        150527 Jul 30  2024 Storm-Data-Export-Format.pdf
-rw-rw-r--   1 ftp      ftp         10597 Jul  2 14:22 StormEvents_details-ftp_v1.0_d1950_c20250520.csv.gz
-rw-rw-r--   1 ftp      ftp         12020 May 20 12:35 StormEvents_details-ftp_v1.0_d1951_c20250520.csv.gz
-rw-rw-r--   1 ftp      ftp         12634 May 20 12:35 StormEvents_details-ftp_v1.0_d1952_c20250520.csv.gz
-rw-rw-r--   1 ftp      ftp         21804 May 20 12:35 StormEvents_details-ftp_v1.0_d1953_c20250520.csv.gz
-rw-rw-r--   1 ftp      ftp         26220 May 20 12:35 StormEvents_details-ftp_v1.0_d1954_c20250520.csv.gz
-rw-rw-r--   1 ftp      ftp         53699 May 20 12:35 StormEvents_details-ftp_v1.0_d1955_c20250520.csv.gz
-rw-rw-r--   1

In [3]:
# Download and print the README file from the NOAA Storm Events database
readme_url = "ftp://ftp.ncei.noaa.gov/pub/data/swdi/stormevents/csvfiles/README"
with urllib.request.urlopen(readme_url) as response:
    readme_content = response.read().decode("utf-8")

print(readme_content)


---------------------------------------------------------------
-- README:                                                   --
-- Storm Events Database, Bulk Download                      --
---------------------------------------------------------------

This directory contains CSV (Comma-Separated Values) text files
which represent a dump or export of the Storm Events Database.


Update: 5/14/2014
Data from 1950 to 1996 has been added to the database and 
exported to CSV files in this directory.  Data from 1996 to 
present is available in the legacy CSV format but will be
reprocessed to the new data format by the end of May 2014.
The file naming convention has changed and the data are now 
compressed.  However, the contents of the files are similar.

Example file name:
StormEvents_details-ftp_v1.0_d1972_c20140508.csv.gz

The file is compressed with GZIP compression.  This compression
type is widely supported but custom software, such as 
'7-zip' (http://www.7-zip.org/), may be neede

In [4]:
# Download a sample CSV file from the NOAA Storm Events database and load it into a DataFrame
df_sample = dbt.ftp_to_df(
    "ftp://ftp.ncei.noaa.gov/pub/data/swdi/stormevents/csvfiles/StormEvents_details-ftp_v1.0_d2024_c20250818.csv.gz",
    compression="gzip",)

Streamed StormEvents_details-ftp_v1.0_d2024_c20250818.csv.gz: 69493 rows, 51 columns


In [5]:
df_sample.head()

Unnamed: 0,BEGIN_YEARMONTH,BEGIN_DAY,BEGIN_TIME,END_YEARMONTH,END_DAY,END_TIME,EPISODE_ID,EVENT_ID,STATE,STATE_FIPS,...,END_RANGE,END_AZIMUTH,END_LOCATION,BEGIN_LAT,BEGIN_LON,END_LAT,END_LON,EPISODE_NARRATIVE,EVENT_NARRATIVE,DATA_SOURCE
0,202404,30,2033,202404,30,2033,189851,1174463,OKLAHOMA,40,...,0.0,SSW,FREDERICK ARPT,34.3444,-98.983,34.3444,-98.983,A rather nebulous upper air pattern existed ac...,Frederick Municipal Airport (KFDR) observation.,CSV
1,202407,1,0,202407,5,900,193486,1195301,LOUISIANA,22,...,,,,,,,,An upper ridge of high pressure built in acros...,,CSV
2,202411,16,230,202411,18,1421,197838,1223377,OREGON,41,...,,,,,,,,A series of cold fronts the weekend of Nov. 16...,The Hog Pass SNOTEL reported an estimated 12 i...,CSV
3,202405,22,1230,202405,22,1615,191723,1184135,TEXAS,48,...,,,,,,,,A strong upper-level subtropical ridge/heat do...,Harlingen Valley International Airport (KHRL) ...,CSV
4,202405,21,1200,202405,21,1530,191723,1184133,TEXAS,48,...,,,,,,,,A strong upper-level subtropical ridge/heat do...,"By proxy, between locations in northern Kenedy...",CSV



## Cleaning Strategy

NOAA storm data cleaning

- Make list of files from FTP where type=StormEvents_details and year=1999:2025

- Download files from list, concatenate and make dataframe `all_storm_data`

- Select the following columns from the df and make new df `df_all_storms_drop`

['BEGIN_YEARMONTH', 'BEGIN_DAY', 'EPISODE_ID', 'EVENT_ID', 'EVENT_TYPE', 'CZ_FIPS', 'STATE_FIPS', 'INJURIES_DIRECT', 'INJURIES_INDIRECT', 'DEATHS_DIRECT', 'DEATHS_INDIRECT', 'DAMAGE_PROPERTY']

- Combine discrete columns, __BEGIN_YEARMONTH__ and __BEGIN_DAY__ and convert to datetieme format column __DATE__. Make new year column. Drop original columns and make new df `df_all_storms_comb`
- Combine __STATE_FIPS__ and __CZ_FIPS__ into __CO_FIPS__
- Clean __CO_FIPS__ to remove historical and non populated areas (marine sanctuaries etc.). Load into new df `df_clean`
    - Remove data where : >=’01001’ & <=’56045’ & startswith ‘99’
` Load df df_clean into db: disaster_db as table NOAA_STORM_EVENTS ` 
- Filter data to only include direct deaths and direct injuries, create new df `severe_events`
- Group by CO_FIPS, year and EPISODE_ID, sum injuries and deaths create df `county_episodes`
- Count uniuqe episodes by county-year for Poisson parameter later, create df `annual_episodes` with cols:
	- county_fips
	- year
	- event_count

- Load df `annual_episodes` into postgresql database as “NOAA_STORM_EPISODES”


#### StormEvents_details ###

- BEGIN_YEARMONTH 
- BEGIN_DAY
- Combine these 2 and get datetime format, don't bring in time
- EPISODE_ID
- EVENT_ID
- EVENT_TYPE
- STATE_FIPS
- CZ_FIPS
- combine state and cz fips
- INJURIES_DIRECT
- INJURIES_INDIRECT
- DEATHS_DIRECT
- DEATHS_INDIRECT
- "DAMAGE_PROPERTY


<svg xmlns="http://www.w3.org/2000/svg" style="cursor:pointer;max-width:100%;max-height:511px;" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="733px" viewBox="-0.5 -0.5 733 511" content="&lt;mxfile&gt;&#10;  &lt;diagram id=&quot;VjGDywkHgqa1gjS1SHQR&quot; name=&quot;Page-1&quot;&gt;&#10;    &lt;mxGraphModel dx=&quot;1681&quot; dy=&quot;941&quot; grid=&quot;1&quot; gridSize=&quot;10&quot; guides=&quot;1&quot; tooltips=&quot;1&quot; connect=&quot;1&quot; arrows=&quot;1&quot; fold=&quot;1&quot; page=&quot;1&quot; pageScale=&quot;1&quot; pageWidth=&quot;850&quot; pageHeight=&quot;1100&quot; math=&quot;0&quot; shadow=&quot;0&quot;&gt;&#10;      &lt;root&gt;&#10;        &lt;mxCell id=&quot;0&quot; /&gt;&#10;        &lt;mxCell id=&quot;1&quot; parent=&quot;0&quot; /&gt;&#10;        &lt;mxCell id=&quot;5&quot; style=&quot;edgeStyle=none;html=1;exitX=1;exitY=0.5;exitDx=0;exitDy=0;exitPerimeter=0;entryX=0;entryY=0.5;entryDx=0;entryDy=0;&quot; edge=&quot;1&quot; parent=&quot;1&quot; source=&quot;2&quot; target=&quot;4&quot;&gt;&#10;          &lt;mxGeometry relative=&quot;1&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;2&quot; value=&quot;NOAA FTP&quot; style=&quot;shape=cylinder3;whiteSpace=wrap;html=1;boundedLbl=1;backgroundOutline=1;size=15;fillColor=#ffcccc;strokeColor=#36393d;&quot; vertex=&quot;1&quot; parent=&quot;1&quot;&gt;&#10;          &lt;mxGeometry x=&quot;70&quot; y=&quot;250&quot; width=&quot;80&quot; height=&quot;80&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;7&quot; style=&quot;edgeStyle=none;html=1;exitX=1;exitY=0.5;exitDx=0;exitDy=0;entryX=0;entryY=0.5;entryDx=0;entryDy=0;&quot; edge=&quot;1&quot; parent=&quot;1&quot; source=&quot;4&quot; target=&quot;6&quot;&gt;&#10;          &lt;mxGeometry relative=&quot;1&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;4&quot; value=&quot;type=stormevents&amp;lt;div&amp;gt;&amp;amp;amp;&amp;lt;/div&amp;gt;&amp;lt;div&amp;gt;1999:2025&amp;lt;/div&amp;gt;&quot; style=&quot;shape=hexagon;perimeter=hexagonPerimeter2;whiteSpace=wrap;html=1;fixedSize=1;fillColor=#ffcc99;strokeColor=#36393d;&quot; vertex=&quot;1&quot; parent=&quot;1&quot;&gt;&#10;          &lt;mxGeometry x=&quot;190&quot; y=&quot;250&quot; width=&quot;120&quot; height=&quot;80&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;10&quot; style=&quot;edgeStyle=none;html=1;exitX=1;exitY=0.5;exitDx=0;exitDy=0;&quot; edge=&quot;1&quot; parent=&quot;1&quot; source=&quot;6&quot; target=&quot;9&quot;&gt;&#10;          &lt;mxGeometry relative=&quot;1&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;6&quot; value=&quot;df: all_storm_data&quot; style=&quot;rounded=1;whiteSpace=wrap;html=1;fillColor=#cce5ff;strokeColor=#36393d;&quot; vertex=&quot;1&quot; parent=&quot;1&quot;&gt;&#10;          &lt;mxGeometry x=&quot;340&quot; y=&quot;260&quot; width=&quot;120&quot; height=&quot;60&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;41&quot; style=&quot;edgeStyle=orthogonalEdgeStyle;html=1;exitX=1;exitY=0.5;exitDx=0;exitDy=0;entryX=0;entryY=0.5;entryDx=0;entryDy=0;&quot; edge=&quot;1&quot; parent=&quot;1&quot; source=&quot;8&quot; target=&quot;12&quot;&gt;&#10;          &lt;mxGeometry relative=&quot;1&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;8&quot; value=&quot;df: df_all_storms_drop&quot; style=&quot;rounded=1;whiteSpace=wrap;html=1;fillColor=#cce5ff;strokeColor=#36393d;&quot; vertex=&quot;1&quot; parent=&quot;1&quot;&gt;&#10;          &lt;mxGeometry x=&quot;620&quot; y=&quot;260&quot; width=&quot;120&quot; height=&quot;60&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;11&quot; style=&quot;edgeStyle=none;html=1;exitX=1;exitY=0.5;exitDx=0;exitDy=0;entryX=0;entryY=0.5;entryDx=0;entryDy=0;&quot; edge=&quot;1&quot; parent=&quot;1&quot; source=&quot;9&quot; target=&quot;8&quot;&gt;&#10;          &lt;mxGeometry relative=&quot;1&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;9&quot; value=&quot;drop unused cols&quot; style=&quot;shape=hexagon;perimeter=hexagonPerimeter2;whiteSpace=wrap;html=1;fixedSize=1;fillColor=#ffcc99;strokeColor=#36393d;&quot; vertex=&quot;1&quot; parent=&quot;1&quot;&gt;&#10;          &lt;mxGeometry x=&quot;480&quot; y=&quot;250&quot; width=&quot;120&quot; height=&quot;80&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;16&quot; style=&quot;edgeStyle=none;html=1;exitX=1;exitY=0.5;exitDx=0;exitDy=0;entryX=0;entryY=0.5;entryDx=0;entryDy=0;&quot; edge=&quot;1&quot; parent=&quot;1&quot; source=&quot;12&quot; target=&quot;13&quot;&gt;&#10;          &lt;mxGeometry relative=&quot;1&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;12&quot; value=&quot;combine dates, convert and split year&quot; style=&quot;shape=hexagon;perimeter=hexagonPerimeter2;whiteSpace=wrap;html=1;fixedSize=1;fillColor=#ffcc99;strokeColor=#36393d;&quot; vertex=&quot;1&quot; parent=&quot;1&quot;&gt;&#10;          &lt;mxGeometry x=&quot;50&quot; y=&quot;350&quot; width=&quot;140&quot; height=&quot;80&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;17&quot; style=&quot;edgeStyle=none;html=1;exitX=1;exitY=0.5;exitDx=0;exitDy=0;entryX=0;entryY=0.5;entryDx=0;entryDy=0;&quot; edge=&quot;1&quot; parent=&quot;1&quot; source=&quot;13&quot; target=&quot;14&quot;&gt;&#10;          &lt;mxGeometry relative=&quot;1&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;13&quot; value=&quot;df: df_all_storms_comb&quot; style=&quot;rounded=1;whiteSpace=wrap;html=1;fillColor=#cce5ff;strokeColor=#36393d;&quot; vertex=&quot;1&quot; parent=&quot;1&quot;&gt;&#10;          &lt;mxGeometry x=&quot;210&quot; y=&quot;360&quot; width=&quot;120&quot; height=&quot;60&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;18&quot; style=&quot;edgeStyle=none;html=1;exitX=1;exitY=0.5;exitDx=0;exitDy=0;entryX=0;entryY=0.5;entryDx=0;entryDy=0;&quot; edge=&quot;1&quot; parent=&quot;1&quot; source=&quot;14&quot; target=&quot;15&quot;&gt;&#10;          &lt;mxGeometry relative=&quot;1&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;14&quot; value=&quot;combine FIPS, clean irrelavent FIPS&quot; style=&quot;shape=hexagon;perimeter=hexagonPerimeter2;whiteSpace=wrap;html=1;fixedSize=1;fillColor=#ffcc99;strokeColor=#36393d;&quot; vertex=&quot;1&quot; parent=&quot;1&quot;&gt;&#10;          &lt;mxGeometry x=&quot;360&quot; y=&quot;350&quot; width=&quot;120&quot; height=&quot;80&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;20&quot; style=&quot;edgeStyle=orthogonalEdgeStyle;html=1;exitX=0.5;exitY=1;exitDx=0;exitDy=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;&quot; edge=&quot;1&quot; parent=&quot;1&quot; source=&quot;15&quot; target=&quot;19&quot;&gt;&#10;          &lt;mxGeometry relative=&quot;1&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;39&quot; style=&quot;edgeStyle=orthogonalEdgeStyle;html=1;exitX=1;exitY=0.5;exitDx=0;exitDy=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;&quot; edge=&quot;1&quot; parent=&quot;1&quot; source=&quot;15&quot; target=&quot;38&quot;&gt;&#10;          &lt;mxGeometry relative=&quot;1&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;15&quot; value=&quot;df: df_clean&quot; style=&quot;rounded=1;whiteSpace=wrap;html=1;fillColor=#cce5ff;strokeColor=#36393d;&quot; vertex=&quot;1&quot; parent=&quot;1&quot;&gt;&#10;          &lt;mxGeometry x=&quot;500&quot; y=&quot;360&quot; width=&quot;120&quot; height=&quot;60&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;23&quot; style=&quot;edgeStyle=none;html=1;exitX=1;exitY=0.5;exitDx=0;exitDy=0;entryX=0;entryY=0.5;entryDx=0;entryDy=0;&quot; edge=&quot;1&quot; parent=&quot;1&quot; source=&quot;19&quot; target=&quot;22&quot;&gt;&#10;          &lt;mxGeometry relative=&quot;1&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;19&quot; value=&quot;remove all data except where injury/death &amp;amp;gt;0&quot; style=&quot;shape=hexagon;perimeter=hexagonPerimeter2;whiteSpace=wrap;html=1;fixedSize=1;fillColor=#ffcc99;strokeColor=#36393d;&quot; vertex=&quot;1&quot; parent=&quot;1&quot;&gt;&#10;          &lt;mxGeometry x=&quot;50&quot; y=&quot;470&quot; width=&quot;120&quot; height=&quot;80&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;32&quot; style=&quot;edgeStyle=orthogonalEdgeStyle;html=1;exitX=1;exitY=0.5;exitDx=0;exitDy=0;entryX=0;entryY=0.5;entryDx=0;entryDy=0;&quot; edge=&quot;1&quot; parent=&quot;1&quot; source=&quot;21&quot; target=&quot;24&quot;&gt;&#10;          &lt;mxGeometry relative=&quot;1&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;21&quot; value=&quot;df: county_episodes&quot; style=&quot;rounded=1;whiteSpace=wrap;html=1;fillColor=#cce5ff;strokeColor=#36393d;&quot; vertex=&quot;1&quot; parent=&quot;1&quot;&gt;&#10;          &lt;mxGeometry x=&quot;480&quot; y=&quot;480&quot; width=&quot;120&quot; height=&quot;60&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;27&quot; style=&quot;edgeStyle=none;html=1;exitX=1;exitY=0.5;exitDx=0;exitDy=0;entryX=0;entryY=0.5;entryDx=0;entryDy=0;&quot; edge=&quot;1&quot; parent=&quot;1&quot; source=&quot;22&quot; target=&quot;26&quot;&gt;&#10;          &lt;mxGeometry relative=&quot;1&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;22&quot; value=&quot;df: severe_events&quot; style=&quot;rounded=1;whiteSpace=wrap;html=1;fillColor=#cce5ff;strokeColor=#36393d;&quot; vertex=&quot;1&quot; parent=&quot;1&quot;&gt;&#10;          &lt;mxGeometry x=&quot;190&quot; y=&quot;480&quot; width=&quot;120&quot; height=&quot;60&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;33&quot; style=&quot;edgeStyle=none;html=1;exitX=1;exitY=0.5;exitDx=0;exitDy=0;entryX=0;entryY=0.5;entryDx=0;entryDy=0;&quot; edge=&quot;1&quot; parent=&quot;1&quot; source=&quot;24&quot; target=&quot;29&quot;&gt;&#10;          &lt;mxGeometry relative=&quot;1&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;24&quot; value=&quot;count episodes and transform&quot; style=&quot;shape=parallelogram;perimeter=parallelogramPerimeter;whiteSpace=wrap;html=1;fixedSize=1;fillColor=#ffff88;strokeColor=#36393d;&quot; vertex=&quot;1&quot; parent=&quot;1&quot;&gt;&#10;          &lt;mxGeometry x=&quot;45&quot; y=&quot;570&quot; width=&quot;130&quot; height=&quot;60&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;28&quot; style=&quot;edgeStyle=none;html=1;exitX=1;exitY=0.5;exitDx=0;exitDy=0;entryX=0;entryY=0.5;entryDx=0;entryDy=0;&quot; edge=&quot;1&quot; parent=&quot;1&quot; source=&quot;26&quot; target=&quot;21&quot;&gt;&#10;          &lt;mxGeometry relative=&quot;1&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;26&quot; value=&quot;group by and sum injury/death&quot; style=&quot;shape=hexagon;perimeter=hexagonPerimeter2;whiteSpace=wrap;html=1;fixedSize=1;fillColor=#ffcc99;strokeColor=#36393d;&quot; vertex=&quot;1&quot; parent=&quot;1&quot;&gt;&#10;          &lt;mxGeometry x=&quot;330&quot; y=&quot;470&quot; width=&quot;120&quot; height=&quot;80&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;36&quot; style=&quot;edgeStyle=none;html=1;exitX=1;exitY=0.5;exitDx=0;exitDy=0;entryX=0;entryY=0.5;entryDx=0;entryDy=0;&quot; edge=&quot;1&quot; parent=&quot;1&quot; source=&quot;29&quot; target=&quot;35&quot;&gt;&#10;          &lt;mxGeometry relative=&quot;1&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;29&quot; value=&quot;df: annual_episodes&quot; style=&quot;rounded=1;whiteSpace=wrap;html=1;fillColor=#cce5ff;strokeColor=#36393d;&quot; vertex=&quot;1&quot; parent=&quot;1&quot;&gt;&#10;          &lt;mxGeometry x=&quot;190&quot; y=&quot;570&quot; width=&quot;120&quot; height=&quot;60&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;31&quot; value=&quot;&amp;lt;div&amp;gt;PostgresQL: disaster_db&amp;lt;/div&amp;gt;&quot; style=&quot;shape=cylinder3;whiteSpace=wrap;html=1;boundedLbl=1;backgroundOutline=1;size=15;fillColor=#ffcccc;strokeColor=#36393d;&quot; vertex=&quot;1&quot; parent=&quot;1&quot;&gt;&#10;          &lt;mxGeometry x=&quot;345&quot; y=&quot;680&quot; width=&quot;140&quot; height=&quot;80&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;37&quot; style=&quot;edgeStyle=none;html=1;exitX=0.5;exitY=1;exitDx=0;exitDy=0;&quot; edge=&quot;1&quot; parent=&quot;1&quot; source=&quot;35&quot; target=&quot;31&quot;&gt;&#10;          &lt;mxGeometry relative=&quot;1&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;35&quot; value=&quot;Load into database as table:&amp;lt;br&amp;gt;NOAA_STORM_EPISODES&quot; style=&quot;shape=hexagon;perimeter=hexagonPerimeter2;whiteSpace=wrap;html=1;fixedSize=1;fillColor=#ffcc99;strokeColor=#36393d;&quot; vertex=&quot;1&quot; parent=&quot;1&quot;&gt;&#10;          &lt;mxGeometry x=&quot;330&quot; y=&quot;560&quot; width=&quot;170&quot; height=&quot;80&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;38&quot; value=&quot;Load into database as table:&amp;lt;br&amp;gt;NOAA_STORM_EVENTS&quot; style=&quot;shape=hexagon;perimeter=hexagonPerimeter2;whiteSpace=wrap;html=1;fixedSize=1;fillColor=#ffcc99;strokeColor=#36393d;&quot; vertex=&quot;1&quot; parent=&quot;1&quot;&gt;&#10;          &lt;mxGeometry x=&quot;595&quot; y=&quot;560&quot; width=&quot;170&quot; height=&quot;80&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;        &lt;mxCell id=&quot;40&quot; style=&quot;edgeStyle=orthogonalEdgeStyle;html=1;exitX=0;exitY=0.5;exitDx=0;exitDy=0;entryX=1;entryY=0.5;entryDx=0;entryDy=0;entryPerimeter=0;&quot; edge=&quot;1&quot; parent=&quot;1&quot; source=&quot;38&quot; target=&quot;31&quot;&gt;&#10;          &lt;mxGeometry relative=&quot;1&quot; as=&quot;geometry&quot; /&gt;&#10;        &lt;/mxCell&gt;&#10;      &lt;/root&gt;&#10;    &lt;/mxGraphModel&gt;&#10;  &lt;/diagram&gt;&#10;&lt;/mxfile&gt;&#10;" onclick="(function(svg){var src=window.event.target||window.event.srcElement;while (src!=null&amp;&amp;src.nodeName.toLowerCase()!='a'){src=src.parentNode;}if(src==null){if(svg.wnd!=null&amp;&amp;!svg.wnd.closed){svg.wnd.focus();}else{var r=function(evt){if(evt.data=='ready'&amp;&amp;evt.source==svg.wnd){svg.wnd.postMessage(decodeURIComponent(svg.getAttribute('content')),'*');window.removeEventListener('message',r);}};window.addEventListener('message',r);svg.wnd=window.open('https://viewer.diagrams.net/?client=1&amp;page=0&amp;edit=_blank');}}})(this);"><defs/><g><g><path d="M 117 40 L 150.63 40" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="stroke" style="stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/><path d="M 155.88 40 L 148.88 43.5 L 150.63 40 L 148.88 36.5 Z" fill="#000000" stroke="#000000" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(0, 0, 0), rgb(255, 255, 255)); stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/></g><g><path d="M 37 15 C 37 6.72 54.91 0 77 0 C 87.61 0 97.78 1.58 105.28 4.39 C 112.79 7.21 117 11.02 117 15 L 117 65 C 117 73.28 99.09 80 77 80 C 54.91 80 37 73.28 37 65 Z" fill="#ffcccc" stroke="#36393d" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(255, 204, 204), rgb(87, 43, 43)); stroke: light-dark(rgb(54, 57, 61), rgb(186, 189, 192));"/><path d="M 117 15 C 117 23.28 99.09 30 77 30 C 54.91 30 37 23.28 37 15" fill="none" stroke="#36393d" stroke-miterlimit="10" pointer-events="all" style="stroke: light-dark(rgb(54, 57, 61), rgb(186, 189, 192));"/></g><g><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 78px; height: 1px; padding-top: 53px; margin-left: 38px;"><div style="box-sizing: border-box; font-size: 0; text-align: center; color: #000000; "><div style="display: inline-block; font-size: 12px; font-family: &quot;Helvetica&quot;; color: light-dark(#000000, #ffffff); line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">NOAA FTP</div></div></div></foreignObject><text x="77" y="56" fill="light-dark(#000000, #ffffff)" font-family="&quot;Helvetica&quot;" font-size="12px" text-anchor="middle">NOAA FTP</text></switch></g></g><g><path d="M 277 40 L 300.63 40" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="stroke" style="stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/><path d="M 305.88 40 L 298.88 43.5 L 300.63 40 L 298.88 36.5 Z" fill="#000000" stroke="#000000" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(0, 0, 0), rgb(255, 255, 255)); stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/></g><g><path d="M 177 0 L 257 0 L 277 40 L 257 80 L 177 80 L 157 40 Z" fill="#ffcc99" stroke="#36393d" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(255, 204, 153), rgb(94, 50, 6)); stroke: light-dark(rgb(54, 57, 61), rgb(186, 189, 192));"/></g><g><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 118px; height: 1px; padding-top: 40px; margin-left: 158px;"><div style="box-sizing: border-box; font-size: 0; text-align: center; color: #000000; "><div style="display: inline-block; font-size: 12px; font-family: &quot;Helvetica&quot;; color: light-dark(#000000, #ffffff); line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">type=stormevents<div>&amp;</div><div>1999:2025</div></div></div></div></foreignObject><text x="217" y="44" fill="light-dark(#000000, #ffffff)" font-family="&quot;Helvetica&quot;" font-size="12px" text-anchor="middle">type=stormevents...</text></switch></g></g><g><path d="M 427 40 L 440.63 40" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="stroke" style="stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/><path d="M 445.88 40 L 438.88 43.5 L 440.63 40 L 438.88 36.5 Z" fill="#000000" stroke="#000000" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(0, 0, 0), rgb(255, 255, 255)); stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/></g><g><rect x="307" y="10" width="120" height="60" rx="9" ry="9" fill="#cce5ff" stroke="#36393d" pointer-events="all" style="fill: light-dark(rgb(204, 229, 255), rgb(24, 46, 68)); stroke: light-dark(rgb(54, 57, 61), rgb(186, 189, 192));"/></g><g><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 118px; height: 1px; padding-top: 40px; margin-left: 308px;"><div style="box-sizing: border-box; font-size: 0; text-align: center; color: #000000; "><div style="display: inline-block; font-size: 12px; font-family: &quot;Helvetica&quot;; color: light-dark(#000000, #ffffff); line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">df: all_storm_data</div></div></div></foreignObject><text x="367" y="44" fill="light-dark(#000000, #ffffff)" font-family="&quot;Helvetica&quot;" font-size="12px" text-anchor="middle">df: all_storm_data</text></switch></g></g><g><path d="M 707 40 L 712 40 Q 717 40 717 50 L 717 75 Q 717 85 707 85 L 17 85 Q 7 85 7 95 L 7 130 Q 7 140 8.82 140 L 10.63 140" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="stroke" style="stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/><path d="M 15.88 140 L 8.88 143.5 L 10.63 140 L 8.88 136.5 Z" fill="#000000" stroke="#000000" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(0, 0, 0), rgb(255, 255, 255)); stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/></g><g><rect x="587" y="10" width="120" height="60" rx="9" ry="9" fill="#cce5ff" stroke="#36393d" pointer-events="all" style="fill: light-dark(rgb(204, 229, 255), rgb(24, 46, 68)); stroke: light-dark(rgb(54, 57, 61), rgb(186, 189, 192));"/></g><g><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 118px; height: 1px; padding-top: 40px; margin-left: 588px;"><div style="box-sizing: border-box; font-size: 0; text-align: center; color: #000000; "><div style="display: inline-block; font-size: 12px; font-family: &quot;Helvetica&quot;; color: light-dark(#000000, #ffffff); line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">df: df_all_storms_drop</div></div></div></foreignObject><text x="647" y="44" fill="light-dark(#000000, #ffffff)" font-family="&quot;Helvetica&quot;" font-size="12px" text-anchor="middle">df: df_all_storms_dr...</text></switch></g></g><g><path d="M 567 40 L 580.63 40" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="stroke" style="stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/><path d="M 585.88 40 L 578.88 43.5 L 580.63 40 L 578.88 36.5 Z" fill="#000000" stroke="#000000" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(0, 0, 0), rgb(255, 255, 255)); stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/></g><g><path d="M 467 0 L 547 0 L 567 40 L 547 80 L 467 80 L 447 40 Z" fill="#ffcc99" stroke="#36393d" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(255, 204, 153), rgb(94, 50, 6)); stroke: light-dark(rgb(54, 57, 61), rgb(186, 189, 192));"/></g><g><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 118px; height: 1px; padding-top: 40px; margin-left: 448px;"><div style="box-sizing: border-box; font-size: 0; text-align: center; color: #000000; "><div style="display: inline-block; font-size: 12px; font-family: &quot;Helvetica&quot;; color: light-dark(#000000, #ffffff); line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">drop unused cols</div></div></div></foreignObject><text x="507" y="44" fill="light-dark(#000000, #ffffff)" font-family="&quot;Helvetica&quot;" font-size="12px" text-anchor="middle">drop unused cols</text></switch></g></g><g><path d="M 157 140 L 170.63 140" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="stroke" style="stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/><path d="M 175.88 140 L 168.88 143.5 L 170.63 140 L 168.88 136.5 Z" fill="#000000" stroke="#000000" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(0, 0, 0), rgb(255, 255, 255)); stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/></g><g><path d="M 37 100 L 137 100 L 157 140 L 137 180 L 37 180 L 17 140 Z" fill="#ffcc99" stroke="#36393d" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(255, 204, 153), rgb(94, 50, 6)); stroke: light-dark(rgb(54, 57, 61), rgb(186, 189, 192));"/></g><g><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 138px; height: 1px; padding-top: 140px; margin-left: 18px;"><div style="box-sizing: border-box; font-size: 0; text-align: center; color: #000000; "><div style="display: inline-block; font-size: 12px; font-family: &quot;Helvetica&quot;; color: light-dark(#000000, #ffffff); line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">combine dates, convert and split year</div></div></div></foreignObject><text x="87" y="144" fill="light-dark(#000000, #ffffff)" font-family="&quot;Helvetica&quot;" font-size="12px" text-anchor="middle">combine dates, convert...</text></switch></g></g><g><path d="M 297 140 L 320.63 140" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="stroke" style="stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/><path d="M 325.88 140 L 318.88 143.5 L 320.63 140 L 318.88 136.5 Z" fill="#000000" stroke="#000000" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(0, 0, 0), rgb(255, 255, 255)); stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/></g><g><rect x="177" y="110" width="120" height="60" rx="9" ry="9" fill="#cce5ff" stroke="#36393d" pointer-events="all" style="fill: light-dark(rgb(204, 229, 255), rgb(24, 46, 68)); stroke: light-dark(rgb(54, 57, 61), rgb(186, 189, 192));"/></g><g><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 118px; height: 1px; padding-top: 140px; margin-left: 178px;"><div style="box-sizing: border-box; font-size: 0; text-align: center; color: #000000; "><div style="display: inline-block; font-size: 12px; font-family: &quot;Helvetica&quot;; color: light-dark(#000000, #ffffff); line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">df: df_all_storms_comb</div></div></div></foreignObject><text x="237" y="144" fill="light-dark(#000000, #ffffff)" font-family="&quot;Helvetica&quot;" font-size="12px" text-anchor="middle">df: df_all_storms_co...</text></switch></g></g><g><path d="M 447 140 L 460.63 140" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="stroke" style="stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/><path d="M 465.88 140 L 458.88 143.5 L 460.63 140 L 458.88 136.5 Z" fill="#000000" stroke="#000000" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(0, 0, 0), rgb(255, 255, 255)); stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/></g><g><path d="M 347 100 L 427 100 L 447 140 L 427 180 L 347 180 L 327 140 Z" fill="#ffcc99" stroke="#36393d" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(255, 204, 153), rgb(94, 50, 6)); stroke: light-dark(rgb(54, 57, 61), rgb(186, 189, 192));"/></g><g><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 118px; height: 1px; padding-top: 140px; margin-left: 328px;"><div style="box-sizing: border-box; font-size: 0; text-align: center; color: #000000; "><div style="display: inline-block; font-size: 12px; font-family: &quot;Helvetica&quot;; color: light-dark(#000000, #ffffff); line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">combine FIPS, clean irrelavent FIPS</div></div></div></foreignObject><text x="387" y="144" fill="light-dark(#000000, #ffffff)" font-family="&quot;Helvetica&quot;" font-size="12px" text-anchor="middle">combine FIPS, clean...</text></switch></g></g><g><path d="M 527 170 L 527 185 Q 527 195 517 195 L 87 195 Q 77 195 77 204.32 L 77 213.63" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="stroke" style="stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/><path d="M 77 218.88 L 73.5 211.88 L 77 213.63 L 80.5 211.88 Z" fill="#000000" stroke="#000000" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(0, 0, 0), rgb(255, 255, 255)); stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/></g><g><path d="M 587 140 L 637 140 Q 647 140 647 150 L 647 303.63" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="stroke" style="stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/><path d="M 647 308.88 L 643.5 301.88 L 647 303.63 L 650.5 301.88 Z" fill="#000000" stroke="#000000" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(0, 0, 0), rgb(255, 255, 255)); stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/></g><g><rect x="467" y="110" width="120" height="60" rx="9" ry="9" fill="#cce5ff" stroke="#36393d" pointer-events="all" style="fill: light-dark(rgb(204, 229, 255), rgb(24, 46, 68)); stroke: light-dark(rgb(54, 57, 61), rgb(186, 189, 192));"/></g><g><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 118px; height: 1px; padding-top: 140px; margin-left: 468px;"><div style="box-sizing: border-box; font-size: 0; text-align: center; color: #000000; "><div style="display: inline-block; font-size: 12px; font-family: &quot;Helvetica&quot;; color: light-dark(#000000, #ffffff); line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">df: df_clean</div></div></div></foreignObject><text x="527" y="144" fill="light-dark(#000000, #ffffff)" font-family="&quot;Helvetica&quot;" font-size="12px" text-anchor="middle">df: df_clean</text></switch></g></g><g><path d="M 137 260 L 150.63 260" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="stroke" style="stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/><path d="M 155.88 260 L 148.88 263.5 L 150.63 260 L 148.88 256.5 Z" fill="#000000" stroke="#000000" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(0, 0, 0), rgb(255, 255, 255)); stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/></g><g><path d="M 37 220 L 117 220 L 137 260 L 117 300 L 37 300 L 17 260 Z" fill="#ffcc99" stroke="#36393d" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(255, 204, 153), rgb(94, 50, 6)); stroke: light-dark(rgb(54, 57, 61), rgb(186, 189, 192));"/></g><g><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 118px; height: 1px; padding-top: 260px; margin-left: 18px;"><div style="box-sizing: border-box; font-size: 0; text-align: center; color: #000000; "><div style="display: inline-block; font-size: 12px; font-family: &quot;Helvetica&quot;; color: light-dark(#000000, #ffffff); line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">remove all data except where injury/death &gt;0</div></div></div></foreignObject><text x="77" y="264" fill="light-dark(#000000, #ffffff)" font-family="&quot;Helvetica&quot;" font-size="12px" text-anchor="middle">remove all data exce...</text></switch></g></g><g><path d="M 567 260 L 572 260 Q 577 260 577 270 L 577 295 Q 577 305 567 305 L 32 305 Q 22 305 22 315 L 22 343.63" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="stroke" style="stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/><path d="M 22 348.88 L 18.5 341.88 L 22 343.63 L 25.5 341.88 Z" fill="#000000" stroke="#000000" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(0, 0, 0), rgb(255, 255, 255)); stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/></g><g><rect x="447" y="230" width="120" height="60" rx="9" ry="9" fill="#cce5ff" stroke="#36393d" pointer-events="all" style="fill: light-dark(rgb(204, 229, 255), rgb(24, 46, 68)); stroke: light-dark(rgb(54, 57, 61), rgb(186, 189, 192));"/></g><g><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 118px; height: 1px; padding-top: 260px; margin-left: 448px;"><div style="box-sizing: border-box; font-size: 0; text-align: center; color: #000000; "><div style="display: inline-block; font-size: 12px; font-family: &quot;Helvetica&quot;; color: light-dark(#000000, #ffffff); line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">df: county_episodes</div></div></div></foreignObject><text x="507" y="264" fill="light-dark(#000000, #ffffff)" font-family="&quot;Helvetica&quot;" font-size="12px" text-anchor="middle">df: county_episodes</text></switch></g></g><g><path d="M 277 260 L 290.63 260" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="stroke" style="stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/><path d="M 295.88 260 L 288.88 263.5 L 290.63 260 L 288.88 256.5 Z" fill="#000000" stroke="#000000" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(0, 0, 0), rgb(255, 255, 255)); stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/></g><g><rect x="157" y="230" width="120" height="60" rx="9" ry="9" fill="#cce5ff" stroke="#36393d" pointer-events="all" style="fill: light-dark(rgb(204, 229, 255), rgb(24, 46, 68)); stroke: light-dark(rgb(54, 57, 61), rgb(186, 189, 192));"/></g><g><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 118px; height: 1px; padding-top: 260px; margin-left: 158px;"><div style="box-sizing: border-box; font-size: 0; text-align: center; color: #000000; "><div style="display: inline-block; font-size: 12px; font-family: &quot;Helvetica&quot;; color: light-dark(#000000, #ffffff); line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">df: severe_events</div></div></div></foreignObject><text x="217" y="264" fill="light-dark(#000000, #ffffff)" font-family="&quot;Helvetica&quot;" font-size="12px" text-anchor="middle">df: severe_events</text></switch></g></g><g><path d="M 132 350 L 150.63 350" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="stroke" style="stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/><path d="M 155.88 350 L 148.88 353.5 L 150.63 350 L 148.88 346.5 Z" fill="#000000" stroke="#000000" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(0, 0, 0), rgb(255, 255, 255)); stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/></g><g><path d="M 12 380 L 32 320 L 142 320 L 122 380 Z" fill="#ffff88" stroke="#36393d" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(255, 255, 136), rgb(33, 33, 0)); stroke: light-dark(rgb(54, 57, 61), rgb(186, 189, 192));"/></g><g><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 128px; height: 1px; padding-top: 350px; margin-left: 13px;"><div style="box-sizing: border-box; font-size: 0; text-align: center; color: #000000; "><div style="display: inline-block; font-size: 12px; font-family: &quot;Helvetica&quot;; color: light-dark(#000000, #ffffff); line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">count episodes and transform</div></div></div></foreignObject><text x="77" y="354" fill="light-dark(#000000, #ffffff)" font-family="&quot;Helvetica&quot;" font-size="12px" text-anchor="middle">count episodes and tr...</text></switch></g></g><g><path d="M 417 260 L 440.63 260" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="stroke" style="stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/><path d="M 445.88 260 L 438.88 263.5 L 440.63 260 L 438.88 256.5 Z" fill="#000000" stroke="#000000" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(0, 0, 0), rgb(255, 255, 255)); stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/></g><g><path d="M 317 220 L 397 220 L 417 260 L 397 300 L 317 300 L 297 260 Z" fill="#ffcc99" stroke="#36393d" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(255, 204, 153), rgb(94, 50, 6)); stroke: light-dark(rgb(54, 57, 61), rgb(186, 189, 192));"/></g><g><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 118px; height: 1px; padding-top: 260px; margin-left: 298px;"><div style="box-sizing: border-box; font-size: 0; text-align: center; color: #000000; "><div style="display: inline-block; font-size: 12px; font-family: &quot;Helvetica&quot;; color: light-dark(#000000, #ffffff); line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">group by and sum injury/death</div></div></div></foreignObject><text x="357" y="264" fill="light-dark(#000000, #ffffff)" font-family="&quot;Helvetica&quot;" font-size="12px" text-anchor="middle">group by and sum inj...</text></switch></g></g><g><path d="M 277 350 L 290.63 350" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="stroke" style="stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/><path d="M 295.88 350 L 288.88 353.5 L 290.63 350 L 288.88 346.5 Z" fill="#000000" stroke="#000000" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(0, 0, 0), rgb(255, 255, 255)); stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/></g><g><rect x="157" y="320" width="120" height="60" rx="9" ry="9" fill="#cce5ff" stroke="#36393d" pointer-events="all" style="fill: light-dark(rgb(204, 229, 255), rgb(24, 46, 68)); stroke: light-dark(rgb(54, 57, 61), rgb(186, 189, 192));"/></g><g><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 118px; height: 1px; padding-top: 350px; margin-left: 158px;"><div style="box-sizing: border-box; font-size: 0; text-align: center; color: #000000; "><div style="display: inline-block; font-size: 12px; font-family: &quot;Helvetica&quot;; color: light-dark(#000000, #ffffff); line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">df: annual_episodes</div></div></div></foreignObject><text x="217" y="354" fill="light-dark(#000000, #ffffff)" font-family="&quot;Helvetica&quot;" font-size="12px" text-anchor="middle">df: annual_episodes</text></switch></g></g><g><path d="M 312 445 C 312 436.72 343.34 430 382 430 C 400.57 430 418.37 431.58 431.5 434.39 C 444.63 437.21 452 441.02 452 445 L 452 495 C 452 503.28 420.66 510 382 510 C 343.34 510 312 503.28 312 495 Z" fill="#ffcccc" stroke="#36393d" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(255, 204, 204), rgb(87, 43, 43)); stroke: light-dark(rgb(54, 57, 61), rgb(186, 189, 192));"/><path d="M 452 445 C 452 453.28 420.66 460 382 460 C 343.34 460 312 453.28 312 445" fill="none" stroke="#36393d" stroke-miterlimit="10" pointer-events="all" style="stroke: light-dark(rgb(54, 57, 61), rgb(186, 189, 192));"/></g><g><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 138px; height: 1px; padding-top: 483px; margin-left: 313px;"><div style="box-sizing: border-box; font-size: 0; text-align: center; color: #000000; "><div style="display: inline-block; font-size: 12px; font-family: &quot;Helvetica&quot;; color: light-dark(#000000, #ffffff); line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; "><div>PostgresQL: disaster_db</div></div></div></div></foreignObject><text x="382" y="486" fill="light-dark(#000000, #ffffff)" font-family="&quot;Helvetica&quot;" font-size="12px" text-anchor="middle">PostgresQL: disaster_db</text></switch></g></g><g><path d="M 382 390 L 382 423.63" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="stroke" style="stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/><path d="M 382 428.88 L 378.5 421.88 L 382 423.63 L 385.5 421.88 Z" fill="#000000" stroke="#000000" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(0, 0, 0), rgb(255, 255, 255)); stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/></g><g><path d="M 317 310 L 447 310 L 467 350 L 447 390 L 317 390 L 297 350 Z" fill="#ffcc99" stroke="#36393d" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(255, 204, 153), rgb(94, 50, 6)); stroke: light-dark(rgb(54, 57, 61), rgb(186, 189, 192));"/></g><g><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 168px; height: 1px; padding-top: 350px; margin-left: 298px;"><div style="box-sizing: border-box; font-size: 0; text-align: center; color: #000000; "><div style="display: inline-block; font-size: 12px; font-family: &quot;Helvetica&quot;; color: light-dark(#000000, #ffffff); line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">Load into database as table:<br />NOAA_STORM_EPISODES</div></div></div></foreignObject><text x="382" y="354" fill="light-dark(#000000, #ffffff)" font-family="&quot;Helvetica&quot;" font-size="12px" text-anchor="middle">Load into database as table:...</text></switch></g></g><g><path d="M 582 310 L 712 310 L 732 350 L 712 390 L 582 390 L 562 350 Z" fill="#ffcc99" stroke="#36393d" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(255, 204, 153), rgb(94, 50, 6)); stroke: light-dark(rgb(54, 57, 61), rgb(186, 189, 192));"/></g><g><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 168px; height: 1px; padding-top: 350px; margin-left: 563px;"><div style="box-sizing: border-box; font-size: 0; text-align: center; color: #000000; "><div style="display: inline-block; font-size: 12px; font-family: &quot;Helvetica&quot;; color: light-dark(#000000, #ffffff); line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">Load into database as table:<br />NOAA_STORM_EVENTS</div></div></div></foreignObject><text x="647" y="354" fill="light-dark(#000000, #ffffff)" font-family="&quot;Helvetica&quot;" font-size="12px" text-anchor="middle">Load into database as table:...</text></switch></g></g><g><path d="M 562 350 L 517 350 Q 507 350 507 360 L 507 460 Q 507 470 497 470 L 458.37 470" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="stroke" style="stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/><path d="M 453.12 470 L 460.12 466.5 L 458.37 470 L 460.12 473.5 Z" fill="#000000" stroke="#000000" stroke-miterlimit="10" pointer-events="all" style="fill: light-dark(rgb(0, 0, 0), rgb(255, 255, 255)); stroke: light-dark(rgb(0, 0, 0), rgb(255, 255, 255));"/></g></g><switch><g requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"/><a transform="translate(0,-5)" xlink:href="https://www.drawio.com/doc/faq/svg-export-text-problems" target="_blank"><text text-anchor="middle" font-size="10px" x="50%" y="100%">Text is not SVG - cannot display</text></a></switch></svg>

In [6]:
# Get file list where type is StormEvents_details and year is 1999-2025
files = dbt.get_ftp_filenames("ftp://ftp.ncei.noaa.gov", "/pub/data/swdi/stormevents/csvfiles/")

# Filter for StormEvents_details files from 1999-2025
pattern = r'StormEvents_details-ftp_v1\.0_d(\d{4})_c.*\.csv\.gz'
selected_files = []
    
for file in files:
    match = re.match(pattern, file)
    if match:
        year = int(match.group(1))
        if 1999 <= year <= 2025:
            selected_files.append(file)


# Use the selected_files list
print(f"Selected {len(selected_files)} StormEvents_details files:")
for i, filename in enumerate(selected_files, 1):
    year = re.search(r"d(\d{4})", filename).group(1)
    print(f"{filename}")

Selected 27 StormEvents_details files:
StormEvents_details-ftp_v1.0_d2006_c20250520.csv.gz
StormEvents_details-ftp_v1.0_d2013_c20250520.csv.gz
StormEvents_details-ftp_v1.0_d2020_c20250702.csv.gz
StormEvents_details-ftp_v1.0_d2016_c20250818.csv.gz
StormEvents_details-ftp_v1.0_d2018_c20250520.csv.gz
StormEvents_details-ftp_v1.0_d2024_c20250818.csv.gz
StormEvents_details-ftp_v1.0_d2015_c20250818.csv.gz
StormEvents_details-ftp_v1.0_d2017_c20250520.csv.gz
StormEvents_details-ftp_v1.0_d2021_c20250520.csv.gz
StormEvents_details-ftp_v1.0_d2025_c20250818.csv.gz
StormEvents_details-ftp_v1.0_d2019_c20250520.csv.gz
StormEvents_details-ftp_v1.0_d1999_c20250520.csv.gz
StormEvents_details-ftp_v1.0_d2014_c20250520.csv.gz
StormEvents_details-ftp_v1.0_d2000_c20250520.csv.gz
StormEvents_details-ftp_v1.0_d2012_c20250520.csv.gz
StormEvents_details-ftp_v1.0_d2001_c20250520.csv.gz
StormEvents_details-ftp_v1.0_d2011_c20250520.csv.gz
StormEvents_details-ftp_v1.0_d2002_c20250520.csv.gz
StormEvents_details-ftp_v

In [7]:
# get all files identified in 'filenames' and populate a df for cleaning

base_url = "ftp://ftp.ncei.noaa.gov/pub/data/swdi/stormevents/csvfiles/"
all_storm_data = []

print(f"Processing {len(selected_files)} Storm Events files...")

for i, filename in enumerate(selected_files, 1):
    try:
        # Construct full URL
        full_url = base_url + filename

        # Stream file to DataFrame
        df = dbt.ftp_to_df(full_url, compression="gzip")

        if not df.empty:
            # Add year column for reference
            year = re.search(r"d(\d{4})", filename).group(1)
            df["FILE_YEAR"] = int(year)

            all_storm_data.append(df)
            print(f"{i:2d}/{len(selected_files)}: {year} - {len(df)} rows")
        else:
            print(f"{i:2d}/{len(selected_files)}: {filename} - No data")

    except Exception as e:
        print(f"Error processing {filename}: {e}")

# Concatenate all DataFrames
if all_storm_data:
    df_all_storms = pd.concat(all_storm_data, ignore_index=True)
    print(
        f"\nCombined DataFrame: {len(df_all_storms)} total rows, {len(df_all_storms.columns)} columns"
    )
    print(
        f"Years covered: {df_all_storms['FILE_YEAR'].min()} - {df_all_storms['FILE_YEAR'].max()}"
    )

else:
    print("No data was successfully loaded")

Processing 27 Storm Events files...


  df = pd.read_csv(bio, **kwargs)


Streamed StormEvents_details-ftp_v1.0_d2006_c20250520.csv.gz: 56400 rows, 51 columns
 1/27: 2006 - 56400 rows
Streamed StormEvents_details-ftp_v1.0_d2013_c20250520.csv.gz: 59986 rows, 51 columns
 2/27: 2013 - 59986 rows
Streamed StormEvents_details-ftp_v1.0_d2020_c20250702.csv.gz: 61278 rows, 51 columns
 3/27: 2020 - 61278 rows
Streamed StormEvents_details-ftp_v1.0_d2016_c20250818.csv.gz: 56005 rows, 51 columns
 4/27: 2016 - 56005 rows
Streamed StormEvents_details-ftp_v1.0_d2018_c20250520.csv.gz: 62697 rows, 51 columns
 5/27: 2018 - 62697 rows
Streamed StormEvents_details-ftp_v1.0_d2024_c20250818.csv.gz: 69493 rows, 51 columns
 6/27: 2024 - 69493 rows
Streamed StormEvents_details-ftp_v1.0_d2015_c20250818.csv.gz: 57907 rows, 51 columns
 7/27: 2015 - 57907 rows
Streamed StormEvents_details-ftp_v1.0_d2017_c20250520.csv.gz: 57029 rows, 51 columns
 8/27: 2017 - 57029 rows
Streamed StormEvents_details-ftp_v1.0_d2021_c20250520.csv.gz: 61389 rows, 51 columns
 9/27: 2021 - 61389 rows
Streamed S

In [8]:
# Show basic info
df_all_storms.head()

Unnamed: 0,BEGIN_YEARMONTH,BEGIN_DAY,BEGIN_TIME,END_YEARMONTH,END_DAY,END_TIME,EPISODE_ID,EVENT_ID,STATE,STATE_FIPS,...,END_AZIMUTH,END_LOCATION,BEGIN_LAT,BEGIN_LON,END_LAT,END_LON,EPISODE_NARRATIVE,EVENT_NARRATIVE,DATA_SOURCE,FILE_YEAR
0,200604,7,1515,200604,7,1515,1207534,5501658,INDIANA,18.0,...,E,PATOKA,38.41667,-87.51667,38.33333,-87.35,,"At Wheeling, the windows were blown out of a c...",PDS,2006
1,200601,1,0,200601,31,2359,1202408,5482463,COLORADO,8.0,...,,,,,,,The storm track favored northwest Colorado wit...,,PDS,2006
2,200601,1,0,200601,31,2359,1202408,5482464,COLORADO,8.0,...,,,,,,,The storm track favored northwest Colorado wit...,,PDS,2006
3,200601,1,0,200601,31,2359,1202408,5482465,COLORADO,8.0,...,,,,,,,The storm track favored northwest Colorado wit...,,PDS,2006
4,200601,1,0,200601,31,2359,1202408,5482466,COLORADO,8.0,...,,,,,,,The storm track favored northwest Colorado wit...,,PDS,2006


In [9]:
# Drop unneeded columns to reduce memory usage
df_all_storms_drop = df_all_storms[['BEGIN_YEARMONTH', 'BEGIN_DAY', 'EPISODE_ID', 'EVENT_ID', 'EVENT_TYPE', 'CZ_FIPS', 'STATE_FIPS', 'INJURIES_DIRECT', 'INJURIES_INDIRECT', 'DEATHS_DIRECT', 'DEATHS_INDIRECT', 'DAMAGE_PROPERTY']]
df_all_storms_drop.head()

Unnamed: 0,BEGIN_YEARMONTH,BEGIN_DAY,EPISODE_ID,EVENT_ID,EVENT_TYPE,CZ_FIPS,STATE_FIPS,INJURIES_DIRECT,INJURIES_INDIRECT,DEATHS_DIRECT,DEATHS_INDIRECT,DAMAGE_PROPERTY
0,200604,7,1207534,5501658,Thunderstorm Wind,51,18.0,0,0,0,0,60K
1,200601,1,1202408,5482463,Drought,2,8.0,0,0,0,0,
2,200601,1,1202408,5482464,Drought,7,8.0,0,0,0,0,
3,200601,1,1202408,5482465,Drought,4,8.0,0,0,0,0,
4,200601,1,1202408,5482466,Drought,13,8.0,0,0,0,0,


In [10]:
# combine BEGIN_YEARMONTH and BEGIN_DAY into a single DATE column and convert to datetime, drop original columns
# create YEAR column for filtering later

df_all_storms_comb = df_all_storms_drop.copy()

df_all_storms_comb['BEGIN_YEARMONTH'] = df_all_storms_comb['BEGIN_YEARMONTH'].astype(str)
df_all_storms_comb['BEGIN_DAY'] = df_all_storms_comb['BEGIN_DAY'].astype(str).str.zfill(2)
df_all_storms_comb['DATE']= df_all_storms_comb['BEGIN_YEARMONTH'] + df_all_storms_comb['BEGIN_DAY']
df_all_storms_comb['DATE'] = pd.to_datetime(df_all_storms_comb['DATE'], format='%Y%m%d')
df_all_storms_comb.drop(columns=['BEGIN_YEARMONTH', 'BEGIN_DAY'], inplace=True)
df_all_storms_comb['YEAR'] = df_all_storms_comb['DATE'].dt.year
df_all_storms_comb.head()

Unnamed: 0,EPISODE_ID,EVENT_ID,EVENT_TYPE,CZ_FIPS,STATE_FIPS,INJURIES_DIRECT,INJURIES_INDIRECT,DEATHS_DIRECT,DEATHS_INDIRECT,DAMAGE_PROPERTY,DATE,YEAR
0,1207534,5501658,Thunderstorm Wind,51,18.0,0,0,0,0,60K,2006-04-07,2006
1,1202408,5482463,Drought,2,8.0,0,0,0,0,,2006-01-01,2006
2,1202408,5482464,Drought,7,8.0,0,0,0,0,,2006-01-01,2006
3,1202408,5482465,Drought,4,8.0,0,0,0,0,,2006-01-01,2006
4,1202408,5482466,Drought,13,8.0,0,0,0,0,,2006-01-01,2006


In [11]:
# combine state and county fips into a single high level FIPS. handle NA with convention of 99999 as unknown county
# keep original columns in case needed.
df_all_storms_comb["STATE_FIPS"] = (
    pd.to_numeric(df_all_storms_comb["STATE_FIPS"], errors="coerce")
    .fillna(99)
    .astype(int)
    .astype(str)
    .str.zfill(2)
)
df_all_storms_comb["CZ_FIPS"] = (
    pd.to_numeric(df_all_storms_comb["CZ_FIPS"], errors="coerce")
    .fillna(999)
    .astype(int)
    .astype(str)
    .str.zfill(3)
)
df_all_storms_comb["CO_FIPS"] = (
    df_all_storms_comb["STATE_FIPS"] + df_all_storms_comb["CZ_FIPS"]
)
df_all_storms_comb.head()

Unnamed: 0,EPISODE_ID,EVENT_ID,EVENT_TYPE,CZ_FIPS,STATE_FIPS,INJURIES_DIRECT,INJURIES_INDIRECT,DEATHS_DIRECT,DEATHS_INDIRECT,DAMAGE_PROPERTY,DATE,YEAR,CO_FIPS
0,1207534,5501658,Thunderstorm Wind,51,18,0,0,0,0,60K,2006-04-07,2006,18051
1,1202408,5482463,Drought,2,8,0,0,0,0,,2006-01-01,2006,8002
2,1202408,5482464,Drought,7,8,0,0,0,0,,2006-01-01,2006,8007
3,1202408,5482465,Drought,4,8,0,0,0,0,,2006-01-01,2006,8004
4,1202408,5482466,Drought,13,8,0,0,0,0,,2006-01-01,2006,8013


In [12]:
# clean FIPS due to historical changes and non populated areas (marine, unincorporated, etc)

df_clean = df_all_storms_comb.copy()
df_clean = df_clean[
    (df_clean['CO_FIPS'] >= '01001') & 
    (df_clean['CO_FIPS'] <= '56045') &
    (~df_clean['CO_FIPS'].str.startswith('99'))
].copy()

In [13]:
# Filter data to only include events with direct deaths or injuries
severe_events = df_clean[
    (df_clean["DEATHS_DIRECT"] > 0) | (df_clean["INJURIES_DIRECT"] > 0)
].copy()

# Group by episode and county fips to get unique events
county_episodes = (severe_events.groupby(["CO_FIPS", "EPISODE_ID", "YEAR"]).agg(
        {
            "DEATHS_DIRECT": "sum",  # Total deaths in this episode for this county
            "INJURIES_DIRECT": "sum",  # Total injuries in this episode for this county
            "EVENT_TYPE": lambda x: ", ".join(sorted(set(x))),  # Combined event types
            "DATE": "first",  # Representative date
        }
    )
    .reset_index()
)

# count unique episodes per county-year for Poisson λ parameter
annual_episodes = county_episodes.groupby(["CO_FIPS", "YEAR"]).size().reset_index(name='event_count')
annual_episodes.columns = ['county_fips', 'year', 'event_count']
annual_episodes.sample(10, random_state=36)

Unnamed: 0,county_fips,year,event_count
562,2204,2008,1
1536,6061,2012,1
9251,37107,2016,1
3114,12202,2018,3
5251,21035,2000,1
322,1077,2000,1
3491,13191,2008,1
3115,12202,2020,3
1517,6059,2002,4
3864,17021,2007,1


In [14]:
annual_episodes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13791 entries, 0 to 13790
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   county_fips  13791 non-null  object
 1   year         13791 non-null  int32 
 2   event_count  13791 non-null  int64 
dtypes: int32(1), int64(1), object(1)
memory usage: 269.5+ KB


In [15]:
annual_episodes.describe()

Unnamed: 0,year,event_count
count,13791.0,13791.0
mean,2010.548619,1.319049
std,7.648862,1.053403
min,1999.0,1.0
25%,2004.0,1.0
50%,2010.0,1.0
75%,2017.0,1.0
max,2025.0,21.0


In [16]:

# load dfs to db
dbt.load_data(annual_episodes, "NOAA_STORM_EPISODES", if_exists="replace")
dbt.load_data(df_clean, "NOAA_STORM_EVENTS", if_exists="replace")

Created SQLAlchemy engine for disaster_db
Data loaded successfully into NOAA_STORM_EPISODES
Created SQLAlchemy engine for disaster_db
Data loaded successfully into NOAA_STORM_EVENTS
