# Basic Traffic Forecasting Tutorial
In this tutorial we use the Flow Forecast library to preform some basic traffic flow forecasting. In other notebooks we will go over how to use saved models and more complex parameter configurations.

 Flow Forecast is a general purpose deep learning for times series forecasting package written in PyTorch. 

In [None]:
!git clone http://github.com/AIStream-Peelout/flow-forecast #-b remove_versions # You can use a custom branch
import os 
os.chdir('flow-forecast')
!pip install -r requirements.txt
!python setup.py install develop
from flood_forecast.trainer import train_function

## Step One: Install and authenticate
In this first step we need to install the library and authenticate with Weights and Biases. Additionally, our code features built in GCP integration.

In [None]:
#!pip install --upgrade --force-reinstall wandb
!wandb login
# If you want to have your weights and JSON files stashed automatically uncomment
# os.environ["MODEL_BUCKET"] = "my-gcp-bucket-name"
# os.environ["ENVIRONMENT_GCP"] = "Colab"
# os.environ["GCP_PROJECT"] = "project_id"


In [None]:
!wget -O train.csv https://raw.githubusercontent.com/xiaochus/TrafficFlowPrediction/master/data/train.csv 

--2020-10-16 17:36:22--  https://raw.githubusercontent.com/xiaochus/TrafficFlowPrediction/master/data/train.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 199681 (195K) [text/plain]
Saving to: ‘train.csv’


2020-10-16 17:36:23 (3.77 MB/s) - ‘train.csv’ saved [199681/199681]



In [None]:
# We will preform very basic data techniques to just get the weekday.
import pandas as pd
import datetime as datetime
df = pd.read_csv("train.csv")
df["day_of_week"] = df["5 Minutes"].map(lambda x: datetime.datetime.strptime(x, '%d/%m/%Y %H:%M').weekday())
df["datetime"] = df['5 Minutes']

In [None]:
df.to_csv('train.csv')
# Truly bizzare error? WTF?

## Step Two Define the Configuration File
Now that we have everything installed and our data properly working. We need to define a configuration file. The configuration files is composed of three major required sub-parts: model_params, dataset_params, inference_params. The other major part that is required is the name of the model and the model type. 

Flow Forecast uses configuration files because they enable reproducible results.With the JSON file you can easily see all the parameters that you specify to your model and the configuration is logged to W&B and/or saved locally. This is a purposeful design choice as many other libraries it becomes difficult to manage parameters which results in un-reproducible results.

In [None]:
def make_config_file(file_path, train_end, valid_end):
  run = wandb.init(project="library_demos")
  wandb_config = wandb.config
  config_default={                 
    "model_name": "MultiAttnHeadSimple",
    "model_type": "PyTorch",
    "model_params": {
      "number_time_series":2,
      "seq_len":wandb_config["forecast_history"], 
      "output_seq_len":wandb_config["out_seq_length"],
      "forecast_length":wandb_config["out_seq_length"]
     },
    "dataset_params":
    {  "class": "default",
       "training_path": file_path,
       "validation_path": file_path,
       "test_path": file_path,
       "batch_size":wandb_config["batch_size"],
       "forecast_history":wandb_config["forecast_history"],
       "forecast_length":wandb_config["out_seq_length"],
       "train_end": train_end,
       "valid_start":int(train_end+1),
       "valid_end": int(valid_end),
       "test_start":int(valid_end) + 1,
       "target_col": ["Lane 1 Flow (Veh/5 Minutes)"],
       "relevant_cols": ["Lane 1 Flow (Veh/5 Minutes)", "day_of_week"],
       "scaler": "StandardScaler", 
       "interpolate": False
    },
    "training_params":
    {
       "criterion":"MSE",
       "optimizer": "Adam",
       "optim_params":
       {

       },
       "lr": wandb_config["lr"],
       "epochs": 10,
       "batch_size":wandb_config["batch_size"]
    },
    "GCS": False,
    "sweep":True,
    "wandb":False,
    "forward_params":{},
   "metrics":["MSE"],
   "inference_params":
   {     t
         "datetime_start":"2016-02-24",
          "hours_to_forecast":150, 
          "test_csv_path":file_path,
          "decoder_params":{
              "decoder_function": "simple_decode", 
            "unsqueeze_dim": 1
          },
          "dataset_params":{
             "file_path": file_path,
             "forecast_history":wandb_config["forecast_history"],
             "forecast_length":wandb_config["out_seq_length"],
             "relevant_cols": ["Lane 1 Flow (Veh/5 Minutes)", "day_of_week"],
             "target_col": ["Lane 1 Flow (Veh/5 Minutes)"],
             "scaling": "StandardScaler",
             "interpolate_param": False
          }
      }
  }
  wandb.config.update(config_default)
  return config_default

So I'll briefly explain what is going on in this config file.  

## Step Three Define Wandb Sweep Config
Now that we have our global configugration we define a second configuration of values we want to sweep over. You can find out more about Weights and Biases sweeps from their website. In this file we include all the parameters we want to sweep over. 

In [None]:
sweep_config = {
  "name": "Default sweep",
  "method": "random",
  "parameters": {
        "batch_size": {
            "values": [2, 3, 4]
        },
        "lr":{
            "values":[0.001, 0.01]
        },
        "forecast_history":{
            "values":[1, 2, 3, 5]
        },
        "out_seq_length":{
            "values":[1, 2, 3, 4]
        }
    }
}

## Step Four: Run code and log results
Now that we have both config files it is time to train our model and log the results to Weights and Biases to analyze later.

In [None]:
from flood_forecast.trainer import train_function
import wandb
sweep_id = wandb.sweep(sweep_config)
os.environ["SWEEP_ID"] = sweep_id
#!wandb agent $SWEEP_ID
wandb.agent(sweep_id, lambda: train_function("PyTorch", make_config_file("train.csv", 4500, 6000)) )
#_secretagent(sweep_id, lambda: train_function("PyTorch", make_config_file("train.csv", 4500, 6000)))

Create sweep with ID: du7xfwi9
Sweep URL: https://wandb.ai/igodfried/uncategorized/sweeps/du7xfwi9


[34m[1mwandb[0m: Agent Starting Run: yroy12mh with config:
[34m[1mwandb[0m: 	batch_size: 3
[34m[1mwandb[0m: 	forecast_history: 3
[34m[1mwandb[0m: 	lr: 0.01
[34m[1mwandb[0m: 	out_seq_length: 1
[34m[1mwandb[0m: Currently logged in as: [33migodfried[0m (use `wandb login --relogin` to force relogin)


interpolate should be below
Now loading and scaling train.csv
interpolate should be below
Now loading and scaling train.csv
interpolate should be below
Now loading and scaling train.csv
Using Wandb config:
{'batch_size': 3, 'forecast_history': 3, 'lr': 0.01, 'out_seq_length': 1, 'model_name': 'MultiAttnHeadSimple', 'model_type': 'PyTorch', 'model_params': {'number_time_series': 2, 'seq_len': 3, 'output_seq_len': 1, 'forecast_length': 1}, 'dataset_params': {'class': 'default', 'training_path': 'train.csv', 'validation_path': 'train.csv', 'test_path': 'train.csv', 'batch_size': 3, 'forecast_history': 3, 'forecast_length': 1, 'train_end': 4500, 'valid_start': 4501, 'valid_end': 6000, 'test_start': 6001, 'target_col': ['Lane 1 Flow (Veh/5 Minutes)'], 'relevant_cols': ['Lane 1 Flow (Veh/5 Minutes)', 'day_of_week'], 'scaler': 'StandardScaler', 'interpolate': False}, 'training_params': {'criterion': 'MSE', 'optimizer': 'Adam', 'optim_params': {}, 'lr': 0.01, 'epochs': 10, 'batch_size': 3}, 'G


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://p

plotting with CI now


VBox(children=(Label(value=' 0.02MB of 0.09MB uploaded (0.00MB deduped)\r'), FloatProgress(value=0.19096893507…