# Flood Stats
All data

## Goals
10/31 Meeting
- Clean data
- Get daily max values (creek values)
- Separate into 3 levels:
    - Minor flood stage: 8.5 ft
    - Moderate flood stage: 9.5 ft
    - Major flood stage: 11 ft
    - Minimum operating limit: 0.5 ft

### Setup

In [35]:
import pandas as pd
from pandas import DataFrame
import numpy as np
import seaborn as sns
import matplotlib
from matplotlib import pyplot as plt
%matplotlib inline

In [36]:
# Display entire dataframe
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

In [37]:
cd '/Users/shifraisaacs/Documents/Externship/cgi_flood_prediction_mitigation'

/Users/shifraisaacs/Documents/Externship/cgi_flood_prediction_mitigation


### Load Data

In [38]:
combined = pd.read_csv('combined_rainfall_discharge_gauge.csv', index_col=0)
print(combined.shape)
combined.head()

(131199, 7)


Unnamed: 0,agency_cd_x,guage,discharge,Date,hour,StID,rainfall
0,USGS,2.39,23.2,2007-10-01,1,RA101,0.0
1,USGS,2.38,22.6,2007-10-01,2,RA101,0.0
2,USGS,2.37,21.9,2007-10-01,3,RA101,0.0
3,USGS,2.35,20.6,2007-10-01,4,RA101,0.0
4,USGS,2.32,18.9,2007-10-01,5,RA101,0.0


### Data Cleaning and Manipulation

In [39]:
# Rearrange columns and fix spelling mistake
combined.columns = combined.columns.str.lower()
combined = combined[['date', 'stid', 'guage', 'discharge', 'rainfall']]
combined = combined.rename(columns={'guage': 'gauge', 'rainfall': 'rainfall_inches'})
combined.head()

Unnamed: 0,date,stid,gauge,discharge,rainfall
0,2007-10-01,RA101,2.39,23.2,0.0
1,2007-10-01,RA101,2.38,22.6,0.0
2,2007-10-01,RA101,2.37,21.9,0.0
3,2007-10-01,RA101,2.35,20.6,0.0
4,2007-10-01,RA101,2.32,18.9,0.0


In [40]:
combined['rainfall_ft'] = round(combined['rainfall']/12, 2)

In [41]:
combined['date'] = pd.to_datetime(combined['date'], infer_datetime_format=True)

In [42]:
combined.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 131199 entries, 0 to 131198
Data columns (total 6 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   date         131199 non-null  datetime64[ns]
 1   stid         131199 non-null  object        
 2   gauge        131199 non-null  float64       
 3   discharge    131199 non-null  float64       
 4   rainfall     131199 non-null  float64       
 5   rainfall_ft  131199 non-null  float64       
dtypes: datetime64[ns](1), float64(4), object(1)
memory usage: 7.0+ MB


## Data Dictionary
- date: date of measurement
- hour: hour of measurement
- stid: regional location
- gauge: water level in feet
- discharge: discharge level in cubic feet per second
- rainfall: rainfall level in inches
- rainfall_ft: rainfall level in feet

### Data Sources
- [Gauge and Discharge](https://waterdata.usgs.gov/monitoring-location/01464000/#parameterCode=00060&startDT=2005-10-01&endDT=2022-10-18)
- [Rainfall](https://njdep.rutgers.edu/rainfall/)

In [43]:
# Confirm no null values
combined.isnull().sum()

date           0
stid           0
gauge          0
discharge      0
rainfall       0
rainfall_ft    0
dtype: int64

### Descriptive Stats

In [44]:
combined.describe()

Unnamed: 0,gauge,discharge,rainfall,rainfall_ft
count,131199.0,131199.0,131199.0,131199.0
mean,3.242591,160.276956,0.005587,0.000373
std,0.758934,213.167199,0.034964,0.002962
min,2.17,10.1,0.0,0.0
25%,2.77,58.4,0.0,0.0
50%,3.08,103.0,0.0,0.0
75%,3.47,179.0,0.0,0.0
max,15.12,5820.0,1.7,0.14


### Correlational Analysis
Strong correlation between gauge and discharge; minimal correlation with rainfall

In [45]:
combined.corr()

Unnamed: 0,gauge,discharge,rainfall,rainfall_ft
gauge,1.0,0.951102,0.174789,0.159878
discharge,0.951102,1.0,0.166233,0.152832
rainfall,0.174789,0.166233,1.0,0.967911
rainfall_ft,0.159878,0.152832,0.967911,1.0


### Group data to get max levels for each day

In [46]:
daily_max = combined.groupby(by='date').mean().reset_index()
daily_max.head()

Unnamed: 0,date,gauge,discharge,rainfall,rainfall_ft
0,2007-10-01,2.355652,21.147826,0.0,0.0
1,2007-10-02,2.342083,20.345833,0.0,0.0
2,2007-10-03,2.329583,19.633333,0.0,0.0
3,2007-10-04,2.34125,20.3125,0.0,0.0
4,2007-10-05,2.3625,21.629167,0.0,0.0


In [48]:
daily_max.to_csv('Data/Daily_Max_Vals.csv')

## Create data subsets

In [49]:
# Minimum operating limit
op_limit = combined[combined['gauge'] >= 0.5]
op_limit.head()

Unnamed: 0,date,stid,gauge,discharge,rainfall,rainfall_ft
0,2007-10-01,RA101,2.39,23.2,0.0,0.0
1,2007-10-01,RA101,2.38,22.6,0.0,0.0
2,2007-10-01,RA101,2.37,21.9,0.0,0.0
3,2007-10-01,RA101,2.35,20.6,0.0,0.0
4,2007-10-01,RA101,2.32,18.9,0.0,0.0


In [51]:
# Minor flood stage
minor_flood = (combined[combined['gauge'] >= 8.5]) && (combined[combined['gauge'] < 9.5])
minor_flood.head()

SyntaxError: invalid syntax (1789602349.py, line 2)

## Data Visualization