#   Amazon Forecast Sample Bike Share Example

This notebook shows an example of Amazon Forecast being used to make predicions based on data about Bike Sharing .

Prerequisites: 
[AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/installing.html) . 

For more informations about APIs, please check the [documentation](https://docs.aws.amazon.com/forecast/latest/dg/what-is-forecast.html)

## Table Of Contents
* [Setting up](#setup)
* [Test Setup - Running first API](#hello)
* [Forecasting Example with Amazon Forecast](#forecastingExample)

**Read Every Cell FULLY before executing it**


## Set up Preview SDK<a class="anchor" id="setup"></a>

In [31]:
# Configures your AWS CLI to now understand our up and coming service Amazon Forecast
!aws configure add-model --service-model file://../sdk/forecastquery-2018-06-26.normal.json --service-name forecastquery
!aws configure add-model --service-model file://../sdk/forecast-2018-06-26.normal.json --service-name forecast

In [32]:
# Prerequisites : 1 time install only, remove the comments to execute the lines.
#!pip install boto3
#!pip install pandas

In [33]:
import boto3
from time import sleep
import subprocess

In [34]:
session = boto3.Session(region_name='us-west-2') #us-east-1 is also supported

forecast = session.client(service_name='forecast')
forecastquery = session.client(service_name='forecastquery')

## Test Setup <a class="anchor" id="hello"></a>
Let's say Hi to the Amazon Forecast to interact with our Simple API ListRecipes. The API returns a list of the global recipes Forecast offers that you could potentially use as a part of your forecasting solution. 

In [35]:
forecast.list_recipes()

{'RecipeNames': ['forecast_ARIMA',
  'forecast_DEEP_AR',
  'forecast_DEEP_AR_PLUS',
  'forecast_ETS',
  'forecast_MDN',
  'forecast_MQRNN',
  'forecast_NPTS',
  'forecast_PROPHET',
  'forecast_SQF'],
 'ResponseMetadata': {'RequestId': 'f4c776a6-4d18-45c3-bb47-8b65022c67db',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',
   'date': 'Fri, 11 Jan 2019 16:36:27 GMT',
   'x-amzn-requestid': 'f4c776a6-4d18-45c3-bb47-8b65022c67db',
   'content-length': '174',
   'connection': 'keep-alive'},
  'RetryAttempts': 0}}

*If this ran successfully, kudos! If there are any errors at this point runing the following list_recipes, please contact us at the [AWS support forum](https://forums.aws.amazon.com/forum.jspa?forumID=327)

## Forecasting with Amazon Forecast<a class="anchor" id="forecastingExample"></a>
### Preparing your Data

In Amazon Forecast , a dataset is a collection of file(s) which contain data that is relevant for a forecasting task. A dataset must conform to a schema provided by Amazon Forecast. 

The dataset we are using is from a Bike Share dataset. The data is of hourly rental data spanning many years. The training set is comprised of the first 19 days of each month, while the test set is the 20th to the end of the month. 

# Data Type

Amazon forecast can import data from Amazon S3. We first explore the data locally to see the fields

In [36]:
import pandas as pd
df = pd.read_csv("../data/bike.csv", dtype = object)
df.head(3)

Unnamed: 0,1/1/2014 1:00,38.34991708,bike_12
0,1/1/2014 2:00,33.58208955,bike_12
1,1/1/2014 3:00,34.41127695,bike_12
2,1/1/2014 4:00,39.80099502,bike_12


Now upload the data to S3. But before doing that, go into your AWS Console, select S3 for the service and create a new bucket inside the `Oregon` or `us-west-2` region. Use that bucket name convention of `amazon-forecast-unique-value-data`. The name must be unique, if you get an error, just adjust until your name works, then update the `bucketName` cell below.

In [37]:
s3 = session.client('s3')

In [38]:
accountId = boto3.client('sts').get_caller_identity().get('Account')

In [39]:
bucketName = 'amazon-forecast-data-{0}'.format(accountId) # Update the unique-value bit here.
key="bikeshare/bike.csv"

In [40]:
s3.upload_file(Filename="../data/bike.csv", Bucket=bucketName, Key=key)

In [41]:
# One time setup only, uncomment the following command to create the role to provide to Amazon Forecast. 
# Save the generated role for all future calls to use for importing or exporting data. 

cmd = 'python ../setup_forecast_permissions.py '+bucketName
p = subprocess.Popen(cmd.split(' '), stdout=subprocess.PIPE, stderr=subprocess.PIPE)

In [42]:
roleArn = 'arn:aws:iam::%s:role/amazonforecast'%accountId

### CreateDataset

More details about `Domain` and dataset type can be found on the [documentation](https://docs.aws.amazon.com/forecast/latest/dg/howitworks-domains-ds-types.html) . For this example, we are using [CUSTOM](https://docs.aws.amazon.com/forecast/latest/dg/custom-domain.html) domain with 3 required attributes `timestamp`, `target_value` and `item_id`. Also for your project name, update it to reflect your name in a lowercase format.

In [43]:
DATASET_FREQUENCY = "H" 
TIMESTAMP_FORMAT = "yyyy-MM-dd hh:mm:ss"

In [44]:
project = 'bike_forecastdemo' # Replace this with a unique name here, make sure the entire name is < 30 characters.
datasetName= project+'_bike_ds'
datasetGroupName= project +'_gp'
s3DataPath = "s3://"+bucketName+"/"+key

In [45]:
datasetName

'bike_forecastdemo_bike_ds'

In [46]:
# Specify the schema of your dataset here. Make sure the order of columns matches the raw data files.

schema ={
   "Attributes":[
      {  "AttributeName":"timestamp",     "AttributeType":"timestamp"      },
      {  "AttributeName":"target_value",  "AttributeType":"float"          },
      {  "AttributeName":"item_id",       "AttributeType":"string"         }
   ]   
}

response=forecast.create_dataset(
                    Domain="CUSTOM",
                    DatasetType='TARGET_TIME_SERIES',
                    DataFormat='CSV',
                    DatasetName=datasetName,
                    DataFrequency=DATASET_FREQUENCY, 
                    TimeStampFormat=TIMESTAMP_FORMAT,
                    Schema = schema
                   )

In [47]:
forecast.describe_dataset(DatasetName=datasetName)

{'DatasetName': 'bike_forecastdemo_bike_ds',
 'DatasetType': 'TARGET_TIME_SERIES',
 'DataFormat': 'CSV',
 'Domain': 'CUSTOM',
 'ScheduleExpression': 'none',
 'DatasetArn': 'arn:aws:forecast:us-west-2:983739021977:ds/bike_forecastdemo_bike_ds',
 'Status': 'ACTIVE',
 'ResponseMetadata': {'RequestId': '7b382a05-3c95-4ab1-a80b-114bfa5d1b6c',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',
   'date': 'Fri, 11 Jan 2019 16:36:50 GMT',
   'x-amzn-requestid': '7b382a05-3c95-4ab1-a80b-114bfa5d1b6c',
   'content-length': '245',
   'connection': 'keep-alive'},
  'RetryAttempts': 0}}

In [48]:
forecast.create_dataset_group(DatasetGroupName=datasetGroupName,RoleArn=roleArn,DatasetNames=[datasetName])

{'DatasetGroupName': 'bike_forecastdemo_gp',
 'DatasetGroupArn': 'arn:aws:forecast:us-west-2:983739021977:dsgroup/bike_forecastdemo_gp',
 'ResponseMetadata': {'RequestId': '0d10f49b-68d5-462f-af4b-6fd4ec77c22b',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',
   'date': 'Fri, 11 Jan 2019 16:36:52 GMT',
   'x-amzn-requestid': '0d10f49b-68d5-462f-af4b-6fd4ec77c22b',
   'content-length': '132',
   'connection': 'keep-alive'},
  'RetryAttempts': 0}}

If you have an existing datasetgroup, you can update it

In [49]:
forecast.describe_dataset_group(DatasetGroupName=datasetGroupName)

{'DatasetGroupName': 'bike_forecastdemo_gp',
 'DatasetGroupArn': 'arn:aws:forecast:us-west-2:983739021977:dsgroup/bike_forecastdemo_gp',
 'Datasets': ['bike_forecastdemo_bike_ds'],
 'RoleArn': 'arn:aws:iam::983739021977:role/amazonforecast',
 'ResponseMetadata': {'RequestId': 'ffbc2f94-6a9f-4377-9b32-e564c9828ec7',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',
   'date': 'Fri, 11 Jan 2019 16:36:53 GMT',
   'x-amzn-requestid': 'ffbc2f94-6a9f-4377-9b32-e564c9828ec7',
   'content-length': '231',
   'connection': 'keep-alive'},
  'RetryAttempts': 0}}

### Create Data Import Job
Brings the data into Amazon Forecast system ready to forecast from raw data. 

In [50]:
ds_import_job_response=forecast.create_dataset_import_job(DatasetName=datasetName,Delimiter=',', DatasetGroupName =datasetGroupName ,S3Uri= s3DataPath)


In [51]:
ds_versionId=ds_import_job_response['VersionId']
print(ds_versionId)

58438778


Check the status of dataset, when the status change from **CREATING** to **ACTIVE**, we can continue to next steps. Depending on the data size. It can take 10 mins to be **ACTIVE**. This process will take 5 to 10 minutes.

In [52]:
while True:
    dataImportStatus = forecast.describe_dataset_import_job(DatasetName=datasetName,VersionId=ds_versionId)['Status']
    print(dataImportStatus)
    if dataImportStatus != 'ACTIVE' and dataImportStatus != 'FAILED':
        sleep(30)
    else:
        break

QUEUED
CREATING
CREATING
CREATING
CREATING
CREATING
CREATING
CREATING
CREATING
CREATING
CREATING
CREATING
CREATING
CREATING
CREATING
CREATING
CREATING
CREATING
CREATING
CREATING
CREATING
FAILED


In [53]:
forecast.describe_dataset_import_job(DatasetName=datasetName,VersionId=ds_versionId)

{'DatasetArn': 'arn:aws:forecast:us-west-2:983739021977:ds/bike_forecastdemo_bike_ds',
 'DatasetName': 'bike_forecastdemo_bike_ds',
 'VersionId': '58438778',
 'Status': 'FAILED',
 'FieldStatistics': {'date': {'Count': 0, 'CountDistinct': 0, 'CountNull': 0},
  'item': {'Count': 0, 'CountDistinct': 0, 'CountNull': 0},
  'target': {'Count': 0, 'CountDistinct': 0, 'CountNull': 0}},
 'Message': 'No records were ingested for this data. Please make sure that the data exists, it conforms to the specified schema or headers, and has the correct delimiter.',
 'ResponseMetadata': {'RequestId': '87e717c7-9e7e-4fc1-810b-b68fad1d3961',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',
   'date': 'Fri, 11 Jan 2019 16:48:12 GMT',
   'x-amzn-requestid': '87e717c7-9e7e-4fc1-810b-b68fad1d3961',
   'content-length': '513',
   'connection': 'keep-alive'},
  'RetryAttempts': 0}}

### Recipe

In [None]:
forecast.describe_recipe(RecipeName='forecast_MQRNN')

### Create Solution with customer forecast horizon

Forecast horizon is how long in future the forecast should be predicting. For weekly data, a value of 12 means 1 weeks. Our example is hourly data, we try forecast the next day, so we can set to 24.

In [None]:
predictorName= project+'_mqrnn'

In [None]:
forecastHorizon = 24

In [None]:
createPredictorResponse=forecast.create_predictor(RecipeName='forecast_MQRNN',DatasetGroupName= datasetGroupName ,PredictorName=predictorName, 
  ForecastHorizon = forecastHorizon)

In [None]:
predictorVerionId=createPredictorResponse['VersionId']

In [None]:
forecast.list_predictor_versions(PredictorName=predictorName)

Check the status of solutions, when the status change from **CREATING** to **ACTIVE**, we can continue to next steps. Depending on data size, model selection and hyper parameters，it can take 10 mins to more than one hour to be **ACTIVE**.

In [None]:
while True:
    predictorStatus = forecast.describe_predictor(PredictorName=predictorName,VersionId=predictorVerionId)['Status']
    print(predictorStatus)
    if predictorStatus != 'ACTIVE' and predictorStatus != 'FAILED':
        sleep(30)
    else:
        break

### Get Error Metrics

In [None]:
forecastquery.get_accuracy_metrics(PredictorName=predictorName)

### Deploy Predictor

In [None]:
forecast.deploy_predictor(PredictorName=predictorName)

In [None]:
deployedPredictorsResponse=forecast.list_deployed_predictors()
print(deployedPredictorsResponse)

Please note that the following cell can also take 10 minutes or more to be fully operational. There's no output here, but that is fine as long as the * is there.

In [None]:
while True:
    deployedPredictorStatus = forecast.describe_deployed_predictor(PredictorName=predictorName)['Status']
    print(deployedPredictorStatus)
    if deployedPredictorStatus != 'ACTIVE' and deployedPredictorStatus != 'FAILED':
        sleep(30)
    else:
        break
print(deployedPredictorStatus)

### Get Forecast

When the solution is deployed and forecast results are ready, you can view them. 

In [None]:
forecastResponse = forecastquery.get_forecast(
    PredictorName=predictorName,
    Interval="hour",
    Filters={"item_id":"client_12"}
)
print(forecastResponse)

# Export Forecast

You can batch export forecast to s3 bucket. To do so an role with s3 put access is needed, but this has already been created.

In [None]:
forecastInfoList= forecast.list_forecasts(PredictorName=predictorName)['ForecastInfoList']
forecastId= forecastInfoList[0]['ForecastId']

In [None]:
outputPath="s3://"+bucketName+"/output"

In [None]:
forecastExportResponse = forecast.create_forecast_export_job(ForecastId=forecastId, OutputPath={"S3Uri": outputPath,"RoleArn":roleArn})

In [None]:
forecastExportJobId = forecastExportResponse['ForecastExportJobId']

In [None]:
while True:
    forecastExportStatus = forecast.describe_forecast_export_job(ForecastExportJobId=forecastExportJobId)['Status']
    print(forecastExportStatus)
    if forecastExportStatus != 'ACTIVE' and forecastExportStatus != 'FAILED':
        sleep(30)
    else:
        break

Check s3 bucket for results

In [None]:
s3.list_objects(Bucket=bucketName,Prefix="output")

# Cleanup

While Forecast is in preview there are no charges for using it, but to future proof this work below are the instructions to cleanup your work space.

In [None]:
# Delete Deployed Predictor 
forecast.delete_deployed_predictor(PredictorName=predictorName)

In [None]:
# Delete the Predictor: 
forecast.delete_predictor(PredictorName=predictorName)

In [None]:
# Delete Import
forecast.delete_dataset_import(DatasetName=datasetName)

In [None]:
# Delete Dataset Group
forecast.delete_dataset_group(DatasetGroupName=datasetGroupName)