# Forecast POC Guide

This notebook will walk you through the process of building a custom collection of models with Amazon Forecast based on the time series data you have for your problem.

## Overview

1. Introduction to Amazon Forecast
1. Obtaining Your Data
1. Fitting the Data to Forecast
1. Determining Your Forecast Horizon (1st pass)
1. Building Your First Few Predictors
1. Visualizing Predictors
1. Making Decisions
1. Adding Related Time Series Data
1. Evaluations Again
1. Next Steps


## Introduction to Amazon Forecast

If you are not familiar with Amazon Forecast you can learn more about this tool on these pages:

* Product Page: https://aws.amazon.com/forecast/
* GitHub Sample Notebooks: https://github.com/aws-samples/amazon-forecast-samples
* Product Docs: https://docs.aws.amazon.com/forecast/latest/dg/what-is-forecast.html


## Obtaining Your Data

A critical requirement to use Amazon Forecast is to have access to time-series data for your selected use case. To learn more about time series data:

1. Wikipedia: https://en.wikipedia.org/wiki/Time_series
1. Toward's Data Science Primer: https://towardsdatascience.com/the-complete-guide-to-time-series-analysis-and-forecasting-70d476bfe775
1. O'Reilly Book: https://www.amazon.com/gp/product/1492041653/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1

As an exmaple for this POC guide we are going to select a dataset from the UCI repository of machine learning datasets. This is a great tool for finding datasets for various problems. In this particular case it is traffic data for a given section of interstate highway. More information on the dataset can be found here: https://archive.ics.uci.edu/ml/datasets/Metro+Interstate+Traffic+Volume

To begin the cell below when executed will create a data folder and download our dataset into it, lastly it will extract the data into a csv file we can edit locally.







In [3]:
!mkdir data
!cd data && wget https://archive.ics.uci.edu/ml/machine-learning-databases/00492/Metro_Interstate_Traffic_Volume.csv.gz
!gunzip data/Metro_Interstate_Traffic_Volume.csv.gz

mkdir: cannot create directory ‘data’: File exists
--2019-12-24 19:58:58--  https://archive.ics.uci.edu/ml/machine-learning-databases/00492/Metro_Interstate_Traffic_Volume.csv.gz
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 405373 (396K) [application/x-httpd-php]
Saving to: ‘Metro_Interstate_Traffic_Volume.csv.gz’


2019-12-24 19:58:58 (1.41 MB/s) - ‘Metro_Interstate_Traffic_Volume.csv.gz’ saved [405373/405373]



With the data downloaded, now we will import the Pandas library as well as a few other data science tools in order to inspect the information.

In [4]:
import boto3
from time import sleep
import subprocess
import pandas as pd
import json
import time

In [5]:
original_data = pd.read_csv('data/Metro_Interstate_Traffic_Volume.csv')
original_data.head(5)

Unnamed: 0,holiday,temp,rain_1h,snow_1h,clouds_all,weather_main,weather_description,date_time,traffic_volume
0,,288.28,0.0,0.0,40,Clouds,scattered clouds,2012-10-02 09:00:00,5545
1,,289.36,0.0,0.0,75,Clouds,broken clouds,2012-10-02 10:00:00,4516
2,,289.58,0.0,0.0,90,Clouds,overcast clouds,2012-10-02 11:00:00,4767
3,,290.13,0.0,0.0,90,Clouds,overcast clouds,2012-10-02 12:00:00,5026
4,,291.14,0.0,0.0,75,Clouds,broken clouds,2012-10-02 13:00:00,4918


At this point we can see a few things about the data:

* Holidays seem to be specified
* There is a value for temp but the units are unclear
