<a href="https://colab.research.google.com/github/duongtrung/Pytorch-tutorials/blob/main/11_AWS_forecast_quick_start_up.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## AWS forecast quick start

AWS forecast involves the following 3 steps:

<img src="resources/11-aws-quick-start-forecasting-overview.png" width=900 alt="a aws forecast workflow"/>

Imagine we are trying to solve the forecasting problem for a ride-hailing service and we want to predict how many pick-ups are expected in specific areas of New York. For this exercise, we will use the yellow taxi trip records from [NYC Taxi and Limousine Commission (TLC)](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page).

We will start by importing the historical data from December 2017 to January 2019, [link](https://raw.githubusercontent.com/aws-samples/amazon-forecast-samples/main/notebooks/basic/Getting_Started/data/taxi-dec2017-jan2019.csv). Next, we will train a Predictor using this data. Finally, we will generate a forecast for February 2019 and compare it with the actual data from February 2019, [link](https://raw.githubusercontent.com/aws-samples/amazon-forecast-samples/main/notebooks/basic/Getting_Started/data/taxi-feb2019.csv).

You need to download the experimental datasets and put them in your data folder.

### Pre-requisites

Before we get started, lets set up the notebook environment, the AWS SDK client for Amazon Forecast and IAM Role used by Amazon Forecast to access your data.


In [1]:
# The following installation commmands are used for anaconda windows 11 64bit
# I recommend to create conda env for your experiments
# If you use other OS or docker, google how to prepare your machine.

# conda install -c conda-forge s3fs
# conda install -c anaconda ipywidgets
# conda install -c anaconda boto3

# https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
# https://docs.aws.amazon.com/cli/latest/userguide/getting-started-prereqs.html

# aws --version
# aws-cli/2.7.33 Python/3.9.11 Windows/10 exe/AMD64 prompt/off

In [2]:
import sys
import os

sys.path.insert( 0, os.path.abspath("common-aws-forecast") )

import json
import util
import boto3
import s3fs
import pandas as pd

### Create an instance of AWS SDK client

In [3]:
region = 'eu-central-1' # Europe (Frankfurt)
session = boto3.Session(region_name=region) 
forecast = session.client(service_name='forecast')
forecastquery = session.client(service_name='forecastquery')

# Checking to make sure we can communicate with Amazon Forecast
assert forecast.list_predictors()

### Setup IAM Role used by Amazon Forecast to access your data

In [4]:
role_name = "ForecastNotebookRole-Basic"
print(f"Creating Role {role_name}...")
role_arn = util.get_or_create_iam_role( role_name = role_name )

# echo user inputs without account
print(f"Success! Created role = {role_arn.split('/')[1]}")

Creating Role ForecastNotebookRole-Basic...
The role ForecastNotebookRole-Basic already exists, skipping creation
Done.
Success! Created role = ForecastNotebookRole-Basic


## Step 1: Import your data.

In this step, we will create a Dataset and Import the December 2017 to January 2019 dataset from S3 to Amazon Forecast. To train a Predictor we will need a DatasetGroup that groups the input Datasets. So, we will end this step by creating a DatasetGroup with the imported Dataset.

Peek at the data and upload it to S3.

The taxi dataset has the following 3 columns:

1. **timestamp**: Timetamp at which pick-ups are requested.
2. **item_id**: Pick-up location ID.
3. **target_value**: Number of pick-ups requested around the timestamp at the pick-up location.

In [5]:
key="D:/gdrive/aws-taxi/taxi-dec2017-jan2019.csv"   # replace with your path

taxi_df = pd.read_csv(key, dtype = object, names=['timestamp','item_id','target_value']) # note that the column names are not in the data file
display(taxi_df.head(5))

Unnamed: 0,timestamp,item_id,target_value
0,2017-12-01 00:00:00,4,27
1,2017-12-01 00:00:00,7,36
2,2017-12-01 00:00:00,10,2
3,2017-12-01 00:00:00,12,1
4,2017-12-01 00:00:00,13,61


In [6]:
bucket_name = input("\nEnter S3 bucket name for uploading the data and hit ENTER key:")
print(f"\nAttempting to upload the data to the S3 bucket '{bucket_name}' at key '{key}' ...")

s3 = boto3.Session().resource('s3')
bucket = s3.Bucket(bucket_name)
if not bucket.creation_date:
    if region != "us-east-1":
        s3.create_bucket(Bucket=bucket_name, CreateBucketConfiguration={'LocationConstraint': region})
    else:
        s3.create_bucket(Bucket=bucket_name)

s3.Bucket(bucket_name).Object(key).upload_file(key)
ts_s3_path = f"s3://{bucket_name}/{key}"

print(f"\nDone, the dataset is uploaded to S3 at {ts_s3_path}.")


Enter S3 bucket name for uploading the data and hit ENTER key: aws-nghia-taxi-forecast



Attempting to upload the data to the S3 bucket 'aws-nghia-taxi-forecast' at key 'D:/gdrive/aws-taxi/taxi-dec2017-jan2019.csv' ...

Done, the dataset is uploaded to S3 at s3://aws-nghia-taxi-forecast/D:/gdrive/aws-taxi/taxi-dec2017-jan2019.csv.
