Hydrological Reference Stations (HRS) Example
==
This notebook demonstrates loading time series data into a PhilDB instance.
It shows an example of adding timeseries, measurand, and source to the database before loading the time series data as time series instances.

All instructions in this notebook are relative to the examples/hrs/ directory of the phildb project.

Before running the code the HRS data set needs to be downloaded:

In [1]:
import os
import requests

os.mkdir('hrs_data')

r = requests.get('http://www.bom.gov.au/water/hrs/content/config/site_config.json')
hrs_metadata = r.json()

for station in hrs_metadata['stations']['features']:
    station_id = station['properties']['AWRC_ID']
    filename = '{0}_daily_ts.csv'.format(station_id)

    url = "http://www.bom.gov.au/water/hrs/content/data/{0}/{1}".format(station_id, filename)

    #print("Downloading: {0}".format(url))
    
    csv_response = requests.get(url)
    with open(os.path.join('hrs_data', filename), 'w') as f:
        f.write(csv_response.text)
        

Now that we have the data we can do some standard imports:

In [2]:
import os
import datetime
import pandas as pd

Create a new PhilDB database
--

The next snippet shows how to create a PhilDB database using the create method. Alternate to the below code the commandline phil-create method could be used (e.g `phil-create hrs_db`).

In [3]:
from phildb.create import create
create('hrs_db')

Accessing a PhilDB instance
--

Once a PhilDB database has been created it can be accessed using the PhilDB database class. Which after being imported can be used to create a database instance (which is stored in the 'db' variable here) as follows:

In [4]:
from phildb.database import PhilDB
db = PhilDB('hrs_db')

Initialise source and measurand attributes
--

Now that we have created and connected to the 'hrs_db' we can initialise source and measurand attributes for identifying HRS time series instances.

In [5]:
db.add_measurand('Q', 'STREAMFLOW', 'Streamflow')
db.add_source('BOM_HRS', 'Bureau of Meteorology; Hydrological Reference Stations dataset.')

Set a frequency variable `freq` to 'D' to indicate daily data and a hrs_header_len variable so that the 18 lines of header in the CSV can be handled later:

In [6]:
freq = 'D'
hrs_header_len = 26

Create a function for reading HRS data into a pandas DataFrame (this is to simplify the for loop doing the actual data loading):

In [7]:
def read_hrs_series(filename):
    with open(filename) as datafile:
        header=[next(datafile) for x in range(hrs_header_len)]
        header = ''.join(header)
        df = pd.read_csv(filename, parse_dates=True, index_col='Date', skiprows=hrs_header_len)

        return header, df['Flow (ML)']

Get a list of csv files from the hrs_data directory:

In [8]:
datafiles = [ f for f in os.listdir('hrs_data') if f.endswith('_daily_ts.csv')]

In [9]:
for filename in datafiles:
    #print("Processing file: ", filename, '...')
    station_id = filename.split('_')[0]
    #print("Using station ID: ", station_id, '...')

    header, streamflow = read_hrs_series(os.path.join('hrs_data', filename))
    db.add_timeseries(station_id)
    db.add_timeseries_instance(station_id, freq, header, measurand = 'Q', source= 'BOM_HRS')
    db.write(station_id, freq, streamflow, measurand = 'Q', source = 'BOM_HRS')

Open the newly created PhilDB for interactive exploration:

    phil hrs_db

An example script, autocorr.py, shows how access to a PhilDB instance
can be automated to perform analysis on the data. This script for example
iterates over all of the available streamflow timeseries instances for the
BOM_HRS dataset and invokes the pandas autocorr function. Stations with
auto-correlation results greater than or equal to 0.95 are then printed.

The script can be run with:

    python autocorr.py hrs_db

Such analysis could be performed from the interactive phil shell as well.