## Exploring Flash Flood Data

This notebook provides basic information on the flood severity dataset (relevant to task two). This data is taken from the 

In [53]:
import pandas as pd 
import chartify 
import os
import bokeh as bk
data_path = os.path.abspath("").split("flood_forecast")[0] +"data"

In [54]:
flood_severity_df = pd.read_csv(os.path.join(data_path,"usgs_event/shave_all.csv"))
flood_severity_df.tail()

Unnamed: 0,ID_TAG,START_UTC,START_UNIX,END_UTC,END_UNIX,START_LAT,START_LON,IMPACT1,IMPACT2,IMPACT3,...,FLOOD_EVAC,CONTACT_RE,FLOOD_FREQ,LANDUSE_CO,LAND_USE,POPULATION,SLOPE,FLOW_ACCU,CTI,HU
9364,1219183208KM,8/19/2008 9:58:00 PM,1219183000.0,8/19/2008 9:58:00 PM,1219183000.0,33.903,-98.4806,1,1,1,...,,Questionable data point - time,0,13,urban_and_built-up,341.534,0.19,342.0,11.54,11
9365,1219183230TM,8/19/2008 9:59:00 PM,1219183000.0,8/19/2008 9:59:00 PM,1219183000.0,33.9457,-98.4461,1,1,1,...,,Questionable data point - time,0,11,cropland,31.7786,0.56,2.0,5.72,11
9366,1219183352KM,8/19/2008 10:00:00 PM,1219183000.0,8/19/2008 10:00:00 PM,1219183000.0,33.9027,-98.4501,1,1,1,...,,Questionable data point - time,0,10,grassland,118.329,0.25,1.0,6.11,11
9367,1219183395KM,8/19/2008 10:02:00 PM,1219183000.0,8/19/2008 10:02:00 PM,1219183000.0,33.9212,-98.4308,1,1,1,...,,Questionable data point - time,0,11,cropland,18.5963,0.36,0.0,5.06,11
9368,1219183397TM,8/19/2008 10:02:00 PM,1219183000.0,8/19/2008 10:02:00 PM,1219183000.0,33.9815,-98.5284,1,1,1,...,,Questionable data point - time,0,11,cropland,1082.92,0.13,0.0,6.13,11


Take from https://www.nssl.noaa.gov/projects/flash/database/2016v1/shave_impacts_metadata.txt

Id_tag				
Denotes the call time and caller (for internal use) [In 2010, this field was changed to serve as a unique report identifier containing a number denoting when each report was taken in chronological order]

Start_UTC			Event start date and time in UTC

Start_UNIX			Event start time in Unix time

End_UTC				Event end date and time in UTC

End_UNIX			Event end time in Unix time

Start_lat			Latitude of report

Start_lon			Longitude of report

Impact1				Most severe recorded impact. 
				
				Impact categories are classified from the least to the most severe into the following categories :
				1 = no impact (SHAVE 'null report')
				2 = other (unclassified or unknown impact)
				3 = overflow (streams out of their banks)
				4 = greenlands flooding (inlcudes : cropland, pastures, yards, grassland)
				5 = street/road flooding
				6 = road closure (or impassible)
				7 = inundation (floodwaters in buildings/homes, including basements)
				8 = Evacuation 
				9 = Stranded cars (e.g.: moved by floodwaters, stalled in ditches, ...)
				10 = Rescue, Fatality or Injury

Impact2				Second most severe recorded impact

Impact3				Third most severe recorded impact

Metr_comments			Additional comments about the call (pertaining to meteorological events)

Report_type			2 = severe
				1 = non-severe
				0 = null

In [37]:
impacted_grouped_df = flood_severity_df.groupby('IMPACT1')
count_impacted = impacted_grouped_df["ID_TAG"].count().reset_index()
count_impacted

Unnamed: 0,IMPACT1,ID_TAG
0,1,6872
1,2,374
2,3,238
3,4,962
4,5,233
5,6,215
6,7,371
7,8,40
8,9,33
9,10,31


In [47]:
#TODO add bar chart

As we can see the dataset is heavily imbalanced with the majority of events having an impact level of one

In [20]:
flood_severity_df.groupby('LAND_USE')["ID_TAG"].count()

LAND_USE
closed_shrubland                 54
cropland                       4130
deciduous_broadleaf_forest      363
evergreen_needleleaf_forest      80
grassland                       715
mixed_forest                    224
open_shrubland                   90
urban_and_built-up             1139
water                            52
wooded_grassland               1793
woodland                        729
Name: ID_TAG, dtype: int64

In [24]:
flood_severity_df.groupby(['LAND_USE', "IMPACT1"])['START_UTC'].count()

LAND_USE                     IMPACT1
closed_shrubland             1            34
                             2             6
                             3             3
                             4             3
                             5             4
                             6             1
                             7             3
cropland                     1          2963
                             2           141
                             3           114
                             4           527
                             5           102
                             6           109
                             7           134
                             8            14
                             9            11
                             10           15
deciduous_broadleaf_forest   1           260
                             2            17
                             3            15
                             4            32
                  

In [52]:
usgs_event_df = pd.read_csv(os.path.join(data_path,"usgs_event/01.csv"))
usgs_event_df.tail()

Unnamed: 0,GaugeID,Lat,Lon,Start Time (UTC),End Time (UTC),Peak Q (cms),Peak Time (UTC),Delta Time (h)
4395,1100000,42.645833,-71.298889,2011-03-13 14:00:00,2011-03-14 21:00:00,1039.2283,2011-03-14 03:30:00,13.5
4396,1100000,42.645833,-71.298889,2011-03-19 14:00:00,2011-03-21 11:15:00,1084.5352,2011-03-20 12:45:00,22.75
4397,1100000,42.645833,-71.298889,2011-04-18 16:45:00,2011-04-20 01:30:00,1047.7233,2011-04-19 05:15:00,12.5
4398,1100000,42.645833,-71.298889,2014-04-01 04:30:00,2014-04-03 00:30:00,1070.3768,2014-04-01 18:00:00,13.5
4399,1100000,42.645833,-71.298889,2014-04-10 00:45:00,2014-04-11 02:00:00,1019.4065,2014-04-10 09:45:00,9.0


In [55]:
usgs_meta_df = pd.read_csv(os.path.join(data_path,"usgs_meta_csv/01.csv"))

In [56]:
usgs_meta_df.tail()

Unnamed: 0,GaugeID,Lat,Lon,HUC,Agency,Regulated,Gauge Name,Drainage Area (km2),Contributing Drainage Area (km2),Q_2 (cms),...,Q_10 (cms),Q_25 (cms),Q_50 (cms),Q_100 (cms),Q_200 (cms),Q_500 (cms),Action (cms),Minor (cms),Moderate (cms),Major (cms)
374,1209700,41.163611,-73.419722,1100006,USGS,Undefined,"NORWALK R AT SOUTH WILTON, CT.",77.6996,77.6996,33.9802,...,90.6139,130.2575,164.2377,198.2179,240.6932,339.8022,-1.0,-1.0,-1.0,-1.0
375,1209761,41.174444,-73.511944,1100006,USGS,Undefined,"FIVEMILE RIVER NEAR NEW CANAAN, CT.",2.59,0.0,-1.0,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0
376,1209901,41.065556,-73.549722,1100006,USGS,Undefined,"RIPPOWAM RIVER AT STAMFORD, CT.",88.0596,0.0,-1.0,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0
377,4296000,44.868889,-72.270556,1110000,USGS,Undefined,"BLACK RIVER AT COVENTRY, VT",315.9785,0.0,59.7485,...,86.9327,100.808,111.002,121.4793,131.9565,146.3981,75.0396,107.8872,-0.0283,-0.0283
378,4296500,44.940278,-72.189722,1110000,USGS,Undefined,"CLYDE RIVER AT NEWPORT, VT",367.7783,0.0,-1.0,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0
