# Flood Prediction Model 1 - Hourly Discharge at Caboolture River

The Objective of this notebook is to create a flood prediction model for the Caboolture River at Upper Caboolture. We will use hourly rainfall data and an hourly discharge as the target time series and a single hourly rainfall time series as input variable. This model will be used for comparision with [RORB](https://www.monash.edu/engineering/departments/civil/research/themes/water/rorb)’s predictions.

## Benchmark Model

[RORB](https://www.monash.edu/engineering/departments/civil/research/themes/water/rorb) model is generally employed for calculating design flood discharges. It uses many assumptions and is manually calibrated to one flooding event. This will be used as benchmark model for comparison purpose.

## Data Set
The hydrological data available at [Queensland Water Monitoring Information Portal](https://water-monitoring.information.qld.gov.au/) will be used. Hourly as well as daily water flow data is available at various stations. Rainfall data is available in some of the stations. For the Caboolture River, only single discharge and rainfal station is available. 



### Work Workflow

* Preprocessing and exploring the data
* Creating training and test sets of time series
* Formatting data as JSON files and uploading to S3
* Instantiating and training a DeepAR estimator
* Deploying a model and creating a predictor
* Comparing the Predictor with RORB's performance

# Preprocessing and exploring the data

The raw data for Caboolture River is available at raw_data/Caboolture folder. 

In [5]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline


In [48]:
# Cabooltre directory
data_dir = './raw_data/Caboolture'
target_name = '142001A_20191103_flow' # folder name containing target time series

# Zip data files in Caboolture folder
zip_files = os.listdir(data_dir)

# List to store path to the csv files
other_csv_paths = [] 
target_csv_paths = []

for zip_file in zip_files:
    if '.zip' in zip_file:
        # Directory name to extract contents of zip file
        zip_dir = data_dir + '/' + zip_file.split('.')[0]
        
        # Delete directory if it exists
        if os.path.exists(zip_dir):
            !rm -r {zip_dir}
        
        # Create dir and unzip
        ! mkdir {zip_dir}
        ! unzip {data_dir}/{zip_file} -d {zip_dir}
        print('unzipped {}!'.format(zip_file))
        
        # Save path to csv
        for file_name in os.listdir(zip_dir):
            if '.csv' in file_name:
                if target_name in zip_dir:
                    target_csv_paths.append('{}/{}'.format(zip_dir,file_name))
                else:
                    other_csv_paths.append('{}/{}'.format(zip_dir,file_name))

print(target_csv_paths)
print(other_csv_paths)

Archive:  ./raw_data/Caboolture/142001A_20191103_rain.zip
  inflating: ./raw_data/Caboolture/142001A_20191103_rain/142001A.csv  
  inflating: ./raw_data/Caboolture/142001A_20191103_rain/Copyright.pdf  
  inflating: ./raw_data/Caboolture/142001A_20191103_rain/Disclaimer.pdf  
  inflating: ./raw_data/Caboolture/142001A_20191103_rain/webglossary.pdf  
unzipped 142001A_20191103_rain.zip!
Archive:  ./raw_data/Caboolture/142001A_20191103_flow.zip
  inflating: ./raw_data/Caboolture/142001A_20191103_flow/142001A.csv  
  inflating: ./raw_data/Caboolture/142001A_20191103_flow/Copyright.pdf  
  inflating: ./raw_data/Caboolture/142001A_20191103_flow/Disclaimer.pdf  
  inflating: ./raw_data/Caboolture/142001A_20191103_flow/webglossary.pdf  
unzipped 142001A_20191103_flow.zip!
['./raw_data/Caboolture/142001A_20191103_flow/142001A.csv']
['./raw_data/Caboolture/142001A_20191103_rain/142001A.csv']


In [49]:


target_raw_data = pd.read_csv(target_csv_paths[0])

  interactivity=interactivity, compiler=compiler, result=result)


True