# Prediction metrics for Flow with Rain

This notebook contains base model with metrics for measuring performence of the model.
It is based on dataset with 2 series:
 * Water flow
 * Rainfall data
 
This dataset has 36 months of data. 

Our model will predict flow for the next 24h. We will use all data up to the predicted day for training model, and then validate our prediction on the next day not seen in the trainning.

In [6]:
import datetime
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import rcParams
rcParams['figure.figsize'] = 12, 4

## Load dataset

In [5]:
dataset = pd.read_csv('../datasets/flow-rain.csv.gz', compression='gzip', parse_dates=['time'])
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 315648 entries, 0 to 315647
Data columns (total 3 columns):
time        315648 non-null datetime64[ns]
rainfall    310775 non-null float64
flow        315609 non-null float64
dtypes: datetime64[ns](1), float64(2)
memory usage: 7.2 MB


## Helper functions

In [17]:
def split(df, day):
    """
    Split dataset into training set and test set on a given day.
    All data before given day will be added to the training set, 
    while data from the given day will be used to create test set
    """
    next_day = day + pd.Timedelta(1, 'D')
    train = df[df.time < day]
    test = df[(df.time >= day) & (df.time < next_day)]
    X_train = train[['time', 'rainfall']]
    Y_train = train['flow']
    X_test = test[['time', 'rainfall']]
    Y_test = test['flow']
    return X_train, Y_train, X_test, Y_test

X_train, Y_train, X_test, Y_test = split(dataset, pd.Timestamp('2015-04-25'))
X_test.tail()

Unnamed: 0,time,rainfall
50683,2015-04-25 23:35:00,0.0
50684,2015-04-25 23:40:00,0.0
50685,2015-04-25 23:45:00,0.0
50686,2015-04-25 23:50:00,0.0
50687,2015-04-25 23:55:00,0.0
