## Transformer Bottleneck Forecasting COVID-19
In this notebook we will examine using the [transformer bottleneck paper](https://arxiv.org/abs/1907.00235) to COVID-19 cases. We will also try to use some custom evaluation metrics such as dilate loss and MASE loss.

In [None]:
from google.colab import auth
from datetime import datetime
import os
auth.authenticate_user()
!git clone https://github.com/AIStream-Peelout/flow-forecast.git

### Setup/Preliminary Preprocessing

In [None]:
import os
os.chdir('flow-forecast')
!pip install -r  requirements.txt
!python setup.py develop
!mkdir data

In [None]:
!gsutil cp gs://flow_datasets/miami_dade.csv . 
!gsutil cp gs://flow_datasets/palm_beach.csv .
!gsutil cp gs://flow_datasets/miami_weather.csv . 

In [None]:
import pandas as pd
df = pd.read_csv("/content/flow-forecast/miami_weather.csv")
df2 = pd.read_csv("/content/flow-forecast/palm_beach.csv")
df["datetime"] = df["date"]
df["new_cases"] = df["cases"].diff()
df["rolling_7"] = df["new_cases"].rolling(7, win_type='triang').mean()
df2["datetime"] = df2["date"]
df2["new_cases"] = df2["cases"].diff()
df2["prolling_7"] = df2["new_cases"].rolling(7, win_type='triang').mean()
df = df[8:]
df = df.merge(df2[["prolling_7", "date"]], right_on="date", left_on="date")
df.to_csv("miami_f.csv")

In [None]:
df = pd.read_csv("/content/miami_dade_d.csv")
df["datetime"] = df["date"]
df["new_cases"] = df["cases"].diff()
df["rolling_7_cases"] = df["new_cases"].rolling(7, win_type='triang').mean()
df["rolling_7_deaths"] = df["deaths"].rolling(7, win_type='triang').mean()

In [None]:
df.to_csv("miami_f.csv")

In [None]:
df.sort_values(by="date")[-20:]

Unnamed: 0.1,Unnamed: 0,date,level_0,index,UID,sub_region,region,country,lat,long,Combined_Key,cases,level,deaths,mobility_retail_recreation,mobility_grocery_pharmacy,mobility_parks,mobility_transit_stations,mobility_workplaces,mobility_residential,lon,avg_temperature,min_temperature,max_temperature,relative_humidity,specific_humidity,pressure,datetime,new_cases,rolling_7_cases,rolling_7_deaths
323,323,2021-01-03,767551,1159362,84012086,Miami-Dade County,Florida,United States,25.611236,-80.551706,"Miami-Dade, Florida, US",305734,sub_region,4251,-26.0,-21.0,-40.0,-41.0,-19.0,6.0,-80.551706,23.11,21.27,25.67,85.78,14.98,101.73,2021-01-03,1547.0,2492.125,4189.6875
324,324,2021-01-04,769590,1162702,84012086,Miami-Dade County,Florida,United States,25.611236,-80.551706,"Miami-Dade, Florida, US",308259,sub_region,4256,-22.0,-12.0,-46.0,-39.0,-31.0,11.0,-80.551706,19.68,17.25,21.66,77.53,10.95,101.65,2021-01-04,2525.0,2418.3125,4205.8125
325,325,2021-01-05,772199,1166042,84012086,Miami-Dade County,Florida,United States,25.611236,-80.551706,"Miami-Dade, Florida, US",311606,sub_region,4257,-21.0,-10.0,-44.0,-42.0,-31.0,11.0,-80.551706,16.88,13.88,20.02,66.32,7.84,101.7,2021-01-05,3347.0,2682.25,4224.1875
326,326,2021-01-06,774829,1169382,84012086,Miami-Dade County,Florida,United States,25.611236,-80.551706,"Miami-Dade, Florida, US",314742,sub_region,4260,-25.0,-15.0,-47.0,-43.0,-30.0,12.0,-80.551706,16.92,13.38,19.92,64.54,7.64,101.82,2021-01-06,3136.0,2654.5,4239.1875
327,327,2021-01-07,777460,1172722,84012086,Miami-Dade County,Florida,United States,25.611236,-80.551706,"Miami-Dade, Florida, US",318115,sub_region,4297,-25.0,-15.0,-45.0,-42.0,-30.0,12.0,-80.551706,20.28,17.64,22.95,72.23,10.59,101.64,2021-01-07,3373.0,2815.9375,4251.9375
328,328,2021-01-08,780079,1176062,84012086,Miami-Dade County,Florida,United States,25.611236,-80.551706,"Miami-Dade, Florida, US",321555,sub_region,4332,-25.0,-12.0,-48.0,-43.0,-28.0,12.0,-80.551706,20.56,17.59,22.52,82.24,12.31,101.44,2021-01-08,3440.0,3060.3125,4265.1875
329,329,2021-01-09,782621,1179402,84012086,Miami-Dade County,Florida,United States,25.611236,-80.551706,"Miami-Dade, Florida, US",324260,sub_region,4365,-24.0,-13.0,-49.0,-41.0,-17.0,7.0,-80.551706,15.64,12.69,17.68,74.19,8.1,101.87,2021-01-09,2705.0,3055.375,4280.875
330,330,2021-01-10,784704,1182742,84012086,Miami-Dade County,Florida,United States,25.611236,-80.551706,"Miami-Dade, Florida, US",326607,sub_region,4413,-24.0,-16.0,-47.0,-43.0,-18.0,7.0,-80.551706,13.65,9.21,17.72,73.79,7.05,102.21,2021-01-10,2347.0,3137.25,4304.8125
331,331,2021-01-11,786746,1186082,84012086,Miami-Dade County,Florida,United States,25.611236,-80.551706,"Miami-Dade, Florida, US",328701,sub_region,4441,-24.0,-15.0,-46.0,-41.0,-29.0,11.0,-80.551706,20.07,16.32,22.57,78.31,11.29,102.1,2021-01-11,2094.0,3025.0625,4334.875
332,332,2021-01-12,789358,1189422,84012086,Miami-Dade County,Florida,United States,25.611236,-80.551706,"Miami-Dade, Florida, US",331649,sub_region,4452,-24.0,-13.0,-47.0,-43.0,-31.0,12.0,-80.551706,20.57,18.16,22.69,85.85,12.78,102.07,2021-01-12,2948.0,2824.9375,4367.6875


## Model setup

In [None]:
!wandb login
os.environ['MODEL_BUCKET'] = "coronaviruspublicdata"
os.environ["ENVIRONMENT_GCP"] = "Colab"
os.environ["GCP_PROJECT"] = "gmap-997"

[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter: 
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


In [None]:
import json
import os
import subprocess as subp
from flood_forecast.trainer import train_function
import traceback
from flood_forecast.long_train import split_on_letter

def make_config_file(flow_file_path, gage_id, station_id, weight_path=None, pretrained=[]):
  run = wandb.init(project="experiment_results_covid")
  wandb_config = run.config
  the_wandb_c = run.config
  print(wandb_config)
  the_config4 = {                 
    "model_name": "DecoderTransformer",
    "model_type": "PyTorch",
    "model_params": {
        "n_time_series":18,
        "n_head": 8,
        "forecast_history":wandb_config["forecast_history"],
        "n_embd":wandb_config["n_embedding"],
        "num_layer": wandb_config["num_layer"],
        "q_len": 1,
        "dropout": wandb_config["dropout"],
        "forecast_length":wandb_config["forecast_length"],
        "additional_params":{}
     }, 
    "dataset_params":
    {  "class": "default",
       "num_workers":5,
       "pin_memory": True,
       "training_path": flow_file_path,
       "validation_path": flow_file_path,
       "test_path": flow_file_path,
       "batch_size":wandb_config["batch_size"],
       "forecast_history":wandb_config["forecast_history"],
       "forecast_length":wandb_config["forecast_length"],
       "scaler": "StandardScaler", 
       "train_start":0,
       "train_end": 170,
       "valid_start":170,
       "valid_end": 310,
       "sort_column": "date",
       "test_start": 170,
       "test_end":310,
       "target_col": ["rolling_7_cases", "rolling_7_deaths"],
       "relevant_cols": ['rolling_7_cases', "rolling_7_deaths",'mobility_retail_recreation', 'mobility_grocery_pharmacy',
                          'mobility_parks', 'mobility_transit_stations', 'mobility_workplaces',
                          'mobility_residential', "avg_temperature",	"min_temperature",	"max_temperature",	"relative_humidity",	"specific_humidity",	"pressure"], 
       "feature_param":{
           "datetime_params":{
               "day_of_week":"cyclical",
               "month":"cyclical"
           }
       },
       "interpolate":False
    },
    "training_params":
    {
       "criterion":"MASELoss",
       "optimizer": wandb_config["optimizer"],
       "criterion_params":{"baseline_method":"mean"},
    "optim_params":{
       "lr": the_wandb_c["lr"]
    },
       "epochs": 10,
       "batch_size":wandb_config["batch_size"]
    },
    "early_stopping":{
        "patience":3
    },
    "GCS": True,
    "sweep":True,
    "wandb":False,
    "forward_params":{},
   "metrics":["MSE", "DilateLoss"],
   "inference_params":
   {     
         "datetime_start":"2020-12-14",
          "hours_to_forecast":18, 
          "num_prediction_samples": 20,
          "test_csv_path":flow_file_path,
          "decoder_params":{
            "decoder_function": "simple_decode", 
            "unsqueeze_dim": 1},
          "dataset_params":{
             "file_path": flow_file_path,
             "sort_column": "date",
             "scaling": "StandardScaler",
             "forecast_history": wandb_config["forecast_history"],
             "forecast_length":wandb_config["forecast_length"],
             "relevant_cols":["rolling_7_cases", "rolling_7_deaths", 'mobility_retail_recreation', 'mobility_grocery_pharmacy',
                          'mobility_parks', 'mobility_transit_stations', 'mobility_workplaces',
                          'mobility_residential', "avg_temperature",	"min_temperature",	"max_temperature",	"relative_humidity",	"specific_humidity",	"pressure"],
             "target_col": ["rolling_7_cases", "rolling_7_deaths"],
             "interpolate_param":False,
             "feature_params":{
              "datetime_params":{
               "day_of_week":"cyclical",
               "month":"cyclical"
           }
       }
          }
          } 
    }

      
  if weight_path:
    the_config4["weight_path"] = weight_path
  wandb.config.update(the_config4)
  return the_config4
  
sweep_config = {
  "name": "Default sweep",
  "method": "grid",
  "parameters": {
        "forecast_history":
        {
            "values": [5, 8, 10, 20]
        },
        "batch_size": {
            "values": [2, 4, 10, 20]
        },
        "lr":{
            "values":[.00001, .001]
        },
        "forecast_length":{
            "values":[1, 2, 5]
        },
        "num_layer":{
            "values":[2, 5, 10]
        },
        "n_embedding":{
            "values":[32, 64, 128]
        },
        "optimizer":{
            "values":["SGD"]
        },
        "dropout":{
            "values": [.3, .5, .7]
        }
        #"scaling"{
            #values=["RobustScaler", "StandardScaler"]
       #}
    }
}

In [None]:
import wandb
sweep_full = wandb.sweep(sweep_config, project="covid_icml")
#sweep_id = "21i08e3p"
wandb.agent(sweep_full, lambda:train_function("PyTorch", make_config_file("/content/flow-forecast/miami_f.csv", "miami_dade", "rolling")))


Create sweep with ID: ttnyy89p
Sweep URL: https://wandb.ai/igodfried/covid_icml/sweeps/ttnyy89p


[34m[1mwandb[0m: Agent Starting Run: aydzdtla with config:
[34m[1mwandb[0m: 	batch_size: 2
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	forecast_history: 5
[34m[1mwandb[0m: 	forecast_length: 1
[34m[1mwandb[0m: 	lr: 1e-05
[34m[1mwandb[0m: 	n_embedding: 32
[34m[1mwandb[0m: 	num_layer: 2
[34m[1mwandb[0m: 	optimizer: SGD
[34m[1mwandb[0m: Currently logged in as: [33migodfried[0m (use `wandb login --relogin` to force relogin)


{'batch_size': 2, 'dropout': 0.3, 'forecast_history': 5, 'forecast_length': 1, 'lr': 1e-05, 'n_embedding': 32, 'num_layer': 2, 'optimizer': 'SGD'}
interpolate should be below
running feature fix code s
Relevant cols are
['cos_day_of_week', 'sin_day_of_week', 'cos_month', 'sin_month']
Now loading/content/flow-forecast/miami_f.csv
scaling now
2
Error nan values detected in data. Please run interpolate ffill or bfill on data
interpolate should be below
running feature fix code s
Relevant cols are
['cos_day_of_week', 'sin_day_of_week', 'cos_month', 'sin_month']
Now loading/content/flow-forecast/miami_f.csv
scaling now
2
interpolate should be below
running feature fix code s
Relevant cols are
['cos_day_of_week', 'sin_day_of_week', 'cos_month', 'sin_month']
Now loading/content/flow-forecast/miami_f.csv
scaling now
2
Using Wandb config:
{'batch_size': 2, 'dropout': 0.3, 'forecast_history': 5, 'forecast_length': 1, 'lr': 1e-05, 'n_embedding': 32, 'num_layer': 2, 'optimizer': 'SGD', 'model_name

To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
Using a target size (torch.Size([1, 2, 1])) that is different to the input size (torch.Size([2, 2, 18])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

Run aydzdtla errored: ValueError('Error infinite or NaN loss detected. Try normalizing data or performing interpolation',)
[34m[1mwandb[0m: [32m[41mERROR[0m Run aydzdtla errored: ValueError('Error infinite or NaN loss detected. Try normalizing data or performing interpolation',)
[34m[1mwandb[0m: Agent Starting Run: 1rzscg9t with config:
[34m[1mwandb[0m: 	batch_size: 2
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	forecast_history: 5
[34m[1mwandb[0m: 	forecast_length: 1
[34m[1mwandb[0m: 	lr: 1e-05
[34m[1mwandb[0m: 	n_embedding: 32
[34m[1mwandb[0m: 	num_layer: 5
[34m[1mwandb[0m: 	optimizer: SGD


{'batch_size': 2, 'dropout': 0.3, 'forecast_history': 5, 'forecast_length': 1, 'lr': 1e-05, 'n_embedding': 32, 'num_layer': 5, 'optimizer': 'SGD'}
interpolate should be below
running feature fix code s
Relevant cols are
['cos_day_of_week', 'sin_day_of_week', 'cos_month', 'sin_month']
Now loading/content/flow-forecast/miami_f.csv
scaling now
2
Error nan values detected in data. Please run interpolate ffill or bfill on data
interpolate should be below
running feature fix code s
Relevant cols are
['cos_day_of_week', 'sin_day_of_week', 'cos_month', 'sin_month']
Now loading/content/flow-forecast/miami_f.csv
scaling now
2
interpolate should be below
running feature fix code s
Relevant cols are
['cos_day_of_week', 'sin_day_of_week', 'cos_month', 'sin_month']
Now loading/content/flow-forecast/miami_f.csv
scaling now
2
Using Wandb config:
{'batch_size': 2, 'dropout': 0.3, 'forecast_history': 5, 'forecast_length': 1, 'lr': 1e-05, 'n_embedding': 32, 'num_layer': 5, 'optimizer': 'SGD', 'model_name

To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
Using a target size (torch.Size([1, 2, 1])) that is different to the input size (torch.Size([2, 2, 18])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.


VBox(children=(Label(value=' 0.00MB of 0.01MB uploaded (0.00MB deduped)\r'), FloatProgress(value=0.10621982391…

Run 1rzscg9t errored: ValueError('Error infinite or NaN loss detected. Try normalizing data or performing interpolation',)
[34m[1mwandb[0m: [32m[41mERROR[0m Run 1rzscg9t errored: ValueError('Error infinite or NaN loss detected. Try normalizing data or performing interpolation',)
[34m[1mwandb[0m: Agent Starting Run: 7sw202hw with config:
[34m[1mwandb[0m: 	batch_size: 2
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	forecast_history: 5
[34m[1mwandb[0m: 	forecast_length: 1
[34m[1mwandb[0m: 	lr: 1e-05
[34m[1mwandb[0m: 	n_embedding: 32
[34m[1mwandb[0m: 	num_layer: 10
[34m[1mwandb[0m: 	optimizer: SGD


{'batch_size': 2, 'dropout': 0.3, 'forecast_history': 5, 'forecast_length': 1, 'lr': 1e-05, 'n_embedding': 32, 'num_layer': 10, 'optimizer': 'SGD'}
interpolate should be below
running feature fix code s
Relevant cols are
['cos_day_of_week', 'sin_day_of_week', 'cos_month', 'sin_month']
Now loading/content/flow-forecast/miami_f.csv
scaling now
2
Error nan values detected in data. Please run interpolate ffill or bfill on data
interpolate should be below
running feature fix code s
Relevant cols are
['cos_day_of_week', 'sin_day_of_week', 'cos_month', 'sin_month']
Now loading/content/flow-forecast/miami_f.csv
scaling now
2
interpolate should be below
running feature fix code s
Relevant cols are
['cos_day_of_week', 'sin_day_of_week', 'cos_month', 'sin_month']
Now loading/content/flow-forecast/miami_f.csv
scaling now
2
Using Wandb config:
{'batch_size': 2, 'dropout': 0.3, 'forecast_history': 5, 'forecast_length': 1, 'lr': 1e-05, 'n_embedding': 32, 'num_layer': 10, 'optimizer': 'SGD', 'model_na

To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
Using a target size (torch.Size([1, 2, 1])) that is different to the input size (torch.Size([2, 2, 18])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.


VBox(children=(Label(value=' 0.01MB of 0.01MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

Run 7sw202hw errored: ValueError('Error infinite or NaN loss detected. Try normalizing data or performing interpolation',)
[34m[1mwandb[0m: [32m[41mERROR[0m Run 7sw202hw errored: ValueError('Error infinite or NaN loss detected. Try normalizing data or performing interpolation',)
Detected 3 failed runs in the first 60 seconds, killing sweep.
[34m[1mwandb[0m: [32m[41mERROR[0m Detected 3 failed runs in the first 60 seconds, killing sweep.
[34m[1mwandb[0m: To disable this check set WANDB_AGENT_DISABLE_FLAPPING=true
