# NYC Taxi Amazon Forecast Quick-Start

> *This notebook should work well in the `Python 3 (Data Science)` kernel in SageMaker Studio, or `conda_python3` in SageMaker Notebook Instances*

This notebook presents an initial exploration of Amazon Forecast, focussing on creating initial experiments through the AWS Console.

## Pre-requisites

The following assumes:

- You've already deployed the [Improving Forecast Accuracy with Machine Learning Solution](https://aws.amazon.com/solutions/implementations/improving-forecast-accuracy-with-machine-learning/) from AWS Solutions, including the default New York City taxi demo dataset.
- The SageMaker Studio user (or notebook instance) you're running this notebook on has read access to Amazon CloudFormation and your solution's Forecast data bucket.

## Libraries and imports

> ℹ️ The below `pip install` upgrades are required at the time of writing for running the notebook in SageMaker Studio, to avoid bugs in some specific features including reading pandas DataFrames from Amazon S3, and producing DataFrame summaries.

You need to run this cell first. If you modify any installs after `import`ing affected libraries, you may see unexpected errors and need to restart your notebook kernel to ensure a self-consistent state.

In [None]:
!pip install -U numpy pandas matplotlib "s3fs>=2022.01.0"

With the required upgrades complete, we're ready to import the libraries used by this notebook and create clients for the various AWS Services to be used:

In [None]:
%load_ext autoreload
%autoreload 2

# Python Built-Ins:
from io import BytesIO

# External Dependencies:
import boto3  # The AWS SDK for Python
import pandas as pd  # DataFrame (tabular data) utilities

# Local Dependencies:
import util  # Local helper utilities (code in the util/ folder)


# Initialize connectors for AWS services:
forecast = boto3.client("forecast")
forecastquery = boto3.client("forecastquery")
s3 = boto3.resource("s3")
s3client = boto3.client("s3")

## Loading prepared data to Amazon Forecast

Careful, well-engineered data curation and preparation is probably the **most important step** you can take to build highly accurate forecasts... So it's important to dive deep on this topic to prepare you for a successful PoC.

**However**, training models and producing results takes time.

So first, let's walk through the process of actually loading data into Amazon Forecast - and revisit data engineering later.

### About the task

In this example, we'll use the New York City Taxi trip record dataset (originally sourced from [nyc.gov](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page) and also available hosted by AWS [on the AWS Open Data Registry](https://registry.opendata.aws/nyc-tlc-trip-records-pds/)).

The forecasting goal will be to predict the **number of pickups in the next 7 days, per hour and per pickup zone** - using yellow taxi data from 2018-12 through 2020-02, to avoid COVID effects.

### Locating the pre-prepared data

If you've deployed the [Improving Forecast Accuracy with Machine Learning Solution](https://aws.amazon.com/solutions/implementations/improving-forecast-accuracy-with-machine-learning/) in your account with demo stack (NYC Taxi Data Downloader) enabled, your account will already contain a snapshot of this data prepared and feature engineered for Amazon Forecast.

This will include:

- The **Target Time Series (TTS)**: The historical data for the actual quantity you want to forecast (number of pickups, by hour and zone)
- The **Item Metadata**: Static metadata for each "item" in the dataset (in this case, pickup zones)
- The **Related Time Series (RTS)**: Time-varying input variables to improve the forecast (in this case, just an extra time featurization)

The code cell below will find these datasets in your environment, checking the files exist on [Amazon S3](https://s3.console.aws.amazon.com/s3/home):

In [None]:
bucket_name = util.config.find_solution_data_bucket()

tts_s3uri = f"s3://{bucket_name}/train/nyctaxi_weather_auto.csv"
rts_s3uri = f"s3://{bucket_name}/train/nyctaxi_weather_auto.related.csv"
metadata_s3uri = f"s3://{bucket_name}/train/nyctaxi_weather_auto.metadata.csv"


# Check these configured objects exist in S3:
for uri in [tts_s3uri, metadata_s3uri, rts_s3uri]:
    if uri:
        try:
            s3.Object(bucket_name, tts_s3uri[len("s3://"):].partition("/")[2]).load()
            print(f"Found: {uri}")
        except botocore.exceptions.ClientError as e:
            if e.response['Error']['Code'] == "404":
                raise ValueError(f"{uri} does not exist in S3!") from e
            else:
                raise

### Create a dataset group

The first step when using Amazon Forecast is to create a [dataset group (DSG)](https://docs.aws.amazon.com/forecast/latest/dg/howitworks-datasets-groups.html#howitworks-datasetgroup), which is like a container for your project.

Each DSG has **one 'slot' per dataset type**: TTS, RTS, and Metadata. So although you can create multiple predictors (models) and forecasts within a DSG, any parallel experiments that compare different data schemas or data extracts will need to run in separate DSGs.

▶️ **Open** the [Amazon Forecast Console](https://console.aws.amazon.com/forecast/home)
  - Check you're in the right **AWS Region** for your experiment (top right)
  - If necessary, go to "Dataset Groups" in collapsible the left sidebar
  - You should probably see one pre-existing dataset group created for you by the deployed solution - as shown below:

![](static/imgs/NYC/01-Dataset-Groups.png "Screenshot of Amazon Forecast console showing one pre-created Dataset Group")

▶️ **Create** a **new dataset group** by clicking on the orange button as shown above. Select:
  - **Name:** `nyctaxi_manual`
  - **Domain:** `Custom`

### Define and import the Target Time Series (TTS)

Because the TTS is the only *mandatory* dataset to use Amazon Forecast, you should be automatically directed to define it after creating your Dataset Group.

So what is our example dataset's schema? Let's take a look by running the code cell below:

In [None]:
tts_df = pd.read_csv(
    tts_s3uri,
    header=None,
    names=["timestamp", "item_id", "geolocation", "target_value"]
)
tts_df.head()

▶️ **Complete** the "dataset details" to define your Target Time Series in line with the data:

- **Name:** `nyctaxi_manual_tts`
- **Frequency:** `1 hour`
- **Schema:** ⚠️ Note that this must match the data *including column order!*
    1. Field `timestamp`, of type `timestamp` (with format including the time HH:mm:ss)
    2. Field `item_id`, of type `string`
    3. Field `geolocation`, of type `geolocation` (Lat/Long decimal degrees)
    4. Field `target_value`, of type `float`

![](static/imgs/NYC/02-TTS-Schema.png "Screenshot of Amazon Forecast Console TTS data setup showing entered schema")

Next, you should automatically be prompted to import the data

▶️ **Set up** your dataset import as follows, and then click "Start" to kick it off.

- **Name:** `nyctaxi_manual_tts_1` (doesn't matter too much, so long as it's unique)
- **Timezone:** Set time zone `America/New_York` (although for this dataset, sync with geolocation should be equivalent)
- **Data Location:** The `s3://...` URI of the TTS, as shown above
- **IAM Role:** For simplicity, you can `Create a new role` and grant access to `Any S3 bucket`

![](static/imgs/NYC/03-Create-TTS-Import.png)

### Define and import Item Metadata

While it's possible to start training models (predictors) with only a target time-series, adding additional *relevant* data of other types can help improve your forecast.

So next, we'll import the other optional dataset types starting with **item metadata**

This dataset enables you to define **static** metadata for each "item": Whatever that means for your use case (such as product SKUs for retail demand forecasting, or pickup zones in this example case).

This metadata helps Amazon Forecast understand **connections** between item IDs in your dataset, to improve performance when using algorithms like DeepAR+ or CNN-QR which can learn global (cross-item) patterns.

As shown below the example metadata includes the city borough where each pickup zone is located, and a category for each zone:

In [None]:
metadata_df = pd.read_csv(
    metadata_s3uri,
    header=None,
    names=["item_id", "pickup_borough", "binned_max_item"]
)
metadata_df.head()

▶️ **Open** your newly created `nyctaxi_manual` DSG and select **Datasets** from the left sidebar to view the details of available dataset slots:

![](static/imgs/NYC/04-Datasets-TTSOnly.png "Screenshot of DSG 'datasets' page showing unused metadata and RTS slots")

▶️ **Click** the "Upload dataset" button and select `ITEM_METADATA`

▶️ **Configure** your dataset details as below:

- **Name**: `nyctaxi_manual_metadata`
- **Schema**: (With order matching the dataset above, remember!)
  1. Field `item_id` of type `string`
  2. Field `pickup_borough` of type `string`
  3. Field `binned_max_item` of type `string`

![](static/imgs/NYC/05-Metadata-Schema.png "Screenshot of Item Metadata dataset schema definition")

▶️ **Set up** your dataset import as follows, and then click "Start" to kick it off.

- **Name**: `nyctaxi_manual_metadata_1`
- **Data Location**: The s3://... URI of the **Item Metadata**, as shown earlier
- **IAM Role**: Use the same role you created earlier (should show in drop-down)

![](static/imgs/NYC/06-Create-Metadata-Import.png "Screenshot of Item Metadata import job settings")

### Define and import Related Time Series (RTS)

The **Related Time Series** is where we can define **time-varying** input variables for the forecast: For example describing item prices and discounts in retail, or custom featurizations of factors such as economic environment, competitor pressure indicators, or public health concerns.

Although some algorithms in Amazon Forecast are able to get benefit from [historical-only Related Time Series](https://docs.aws.amazon.com/forecast/latest/dg/related-time-series-datasets.html#related-time-series-historical-futurelooking), RTS are generally **much more valuable when we also provide them throughout the future forecast window**.

This means the end date of your RTS dataset will usually be later than your TTS: since it will include known or predicted inputs over the period you're trying to forecast.

One other idea you might consider for a Related Time Series is *weather* data... But in this example, we'll use the already-provided [Amazon Forecast Weather Index](https://docs.aws.amazon.com/forecast/latest/dg/weather.html).

So for the example dataset, our RTS includes just one field - a custom time-of-day featurization, as shown below:

In [None]:
rts_df = pd.read_csv(
    rts_s3uri,
    header=None,
    names=["timestamp", "item_id", "geolocation", "day_hour_name"]
)
rts_df.head()

▶️ **Click** the "Upload dataset" button from the *Datasets* screen as we showed before, and this time select `RELATED_TIME_SERIES` (if prompted)

▶️ **Configure** your dataset details as below:

- **Name**: `nyctaxi_manual_rts`
- **Frequency**: `1 hour`
- **Fields:** (With order matching the dataset above, remember!)
    1. Field `timestamp` of type `timestamp` (including HH:mm:ss)
    2. Field `item_id` of type `string`
    3. Field `geolocation` of type `geolocation` (Lat/Long decimal)
    4. Field `day_hour_name` of type `string`

![](static/imgs/NYC/07-RTS-Schema.png "Screenshot of Related Time Series schema definition")

▶️ **Set up** your dataset import as follows, and then click "Start" to kick it off.

- **Name**: `nyctaxi_manual_rts_1`
- **Data Location**: The s3://... URI of the **RTS**, as shown earlier
- **IAM Role**: Use the same role you created earlier (should show in drop-down)

![](static/imgs/NYC/08-Create-RTS-Import.png "Screenshot of Related Time Series import job settings")

### Wait for all import jobs to complete

After defining schemas and starting import jobs for all 3 dataset types (TTS, Metadata, and RTS) - you'll be able to check on import status from the **Datasets** list within your Dataset Group:

![](static/imgs/NYC/09-Waiting-For-RTS-Import.png "Screenshot of datasets showing RTS not yet finished importing")

▶️ **Wait** for all 3 datasets to show as `Active` **AND** for their "Last import status" to be `Active` also, before moving on to start training predictors.

- ⏰ Dataset imports may take several minutes to complete, and do not scale linearly with dataset size (because of overheads in creating the jobs, and the scalability of underlying infrastructure for large jobs)

> ℹ️ **Note:** You can click on the name hyperlink of your datasets to view more details: Including summary statistics, the history of import jobs, and also an ability to **export** the data to S3 again:

![](static/imgs/NYC/10-RTS-Dataset-Details.png "Screenshot of RTS dataset details page showing statistics and export option")

## Train Forecast models ('Predictors')

Once your datasets are defined and imported, you're ready to start training forecasting models.

▶️ If the `Predictors` item in the left sidebar is available, you can select that and click the orange "Train new predictor" button to get started. Otherwise, go to your DSG's `Dashboard` and click **Start predictor training**

![](static/imgs/NYC/11-Dashboard-Start-Predictors.png "DSG Dashboard screenshot showing 'start' button for predictor training")

### Train an AutoPredictor

Amazon Forecast [AutoPredictors](https://docs.aws.amazon.com/forecast/latest/dg/howitworks-predictor.html) automatically train multiple [forecasting algorithms](https://docs.aws.amazon.com/forecast/latest/dg/aws-forecast-choosing-recipes.html) on your dataset and produce an **ensemble model** combining these algorithms to deliver the most accurate results for each item in your forecast.

▶️ **Configure** your Predictor as below and then click 'Create'

- **Name**: `nyctaxi_manual_autopredictor`
- **Forecast Frequency**: `1 hour`
- **Forecast Horizon**: `168` (7 days at 24 hours per day)
- **Forecast Dimensions**: Add `geolocation`
- **Forecast Quantiles**: Leave as default `0.1, 0.5, 0.9`
- **AutoPredictor** `Enabled`
- **Optimization Metric**: Blank or `MAPE`
- **Explainability**: `Enabled`
- **Weather Index**: `Enabled`
- **Holidays**: `Enabled` (United States)

![](static/imgs/NYC/12-AutoPredictor-Part-1.png "Screenshot of AutoPredictor configuration - first section")

![](static/imgs/NYC/13-AutoPredictor-Part-2.png "Screenshot of AutoPredictor configuration - second section")

### Train a Legacy Predictor (Prophet Algorithm)

Amazon Forecast [Legacy Predictors](https://docs.aws.amazon.com/forecast/latest/dg/howitworks-predictor.html#legacy-predictors) select a single algorithm for prediction, rather than producing an ensemble model: Either manually selected or found via an AutoML process somewhat similar to AutoPredictor.

While AutoPredictors generally deliver [higher accuracy and also support explainability](https://aws.amazon.com/blogs/machine-learning/new-amazon-forecast-api-that-creates-up-to-40-more-accurate-forecasts-and-provides-explainability/), Legacy Predictors are of interest to us here because being able to manually specify an algorithm will help us demonstrate a (less good) trained model **faster**.

> ℹ️ **Note:** AutoPredictors and Legacy Predictors also carry different [pricing](https://aws.amazon.com/forecast/pricing/), reflecting their different resource utilizations.

For this example, we'll train a Legacy Predictor with the statistical, non-deep-learning [Prophet algorithm](https://docs.aws.amazon.com/forecast/latest/dg/aws-forecast-recipe-prophet.html) to demonstrate some quick results.

▶️ **Click** 'Train Predictor' again from your DSG Dashboard, or go to 'Datasets' from the left sidebar and click 'Train new predictor'

▶️ **Configure** your Predictor as below and then click 'Create'

- **Name**: `nyctaxi_manual_prophet`
- **Forecast Frequency**: `1 hour`
- **Forecast Horizon**: `168` (7 days at 24 hours per day)
- **Forecast Dimensions**: Add `geolocation`
- **Forecast Quantiles**: Leave as default `0.1, 0.5, 0.9`
- **AutoPredictor** `DISABLED`
- **Optimization Metric**: Blank or `MAPE`
- **Number of Backtest Windows:** `3`
- **Backtest Window Offset:** `168` (Same as Forecast Horizon)
- **Weather Index**: `Enabled`
- **Holidays**: `Enabled` (United States)

> ⚠️ **NOTE: If you no longer see the option to disable AutoPredictor**
>
> If you don't see any UI option to disable AutoPredictor and create a legacy predictor, you'll need to perform this step through the APIs instead. You can run the code cell after these screenshots to create your predictor programmatically.

![](static/imgs/NYC/14-Prophet-Part-1.png "Screenshot of Prophet predictor configuration - first section")

![](static/imgs/NYC/15-Prophet-Part-2.png "Screenshot of Prophet predictor configuration - second section")

![](static/imgs/NYC/16-Prophet-Part-3.png "Screenshot of Prophet predictor configuration - third section")

> ⚠️ **Alternative, programmatic method for creating legacy predictors**
>
> If you don't see a UI option to create a legacy predictor (as shown above), you can create one programmatically as follows:
>
> 1. From the [Dataset groups list](https://console.aws.amazon.com/forecast/home?#datasetGroups), find **your dataset group's ARN** and copy/paste it over the placeholder below
> 2. Run the code cell below to create the predictor: After which you should see it appear in your DSG's "predictors" list alongside the AutoPredictor we created earlier.

In [None]:
# IF YOU CREATED YOUR LEGACY PREDICTOR THROUGH THE AWS CONSOLE, YOU DO NOT NEED TO RUN THIS CELL
# (See details above)

dataset_group_arn = "arn:aws:forecast:???:???:dataset-group/???"  # TODO: Replace with your DSG ARN

prophet_create_predictor_response = forecast.create_predictor(
    PredictorName="nyctaxi_manual_prophet",
    AlgorithmArn="arn:aws:forecast:::algorithm/Prophet",
    ForecastHorizon=168,
    PerformAutoML=False,
    PerformHPO=False,
    EvaluationParameters={
        "NumberOfBacktestWindows": 3,
        "BackTestWindowOffset": 168,
    },
    InputDataConfig={
        "DatasetGroupArn": dataset_group_arn,
        "SupplementaryFeatures": [
            { "Name": "holiday", "Value": "US" },
            { "Name": "weather", "Value": "true" },
        ],
    },
    FeaturizationConfig={
        "ForecastFrequency": "H",
        "ForecastDimensions": ["geolocation"],
        "Featurizations": [
            {
                "AttributeName": "target_value",
                "FeaturizationPipeline": [
                    {
                        "FeaturizationMethodName": "filling",
                        "FeaturizationMethodParameters": {
                            "frontfill": "none",
                            "middlefill": "zero",
                            "backfill": "zero",
                        },
                    },
                ],
            },
        ],
    },
)

### (Optional) Train other predictors

You could also explore training Legacy Predictors for other algorithms, such as the neural algorithms DeepAR+ and CNN-QR.

However, note that if the AWS Solution Demo is deployed successfully in your account, you'll probably *already have* a trained DeepAR+ predictor in the **pre-created `nyctaxi_weather_auto` dataset group**:

![](static/imgs/NYC/17-Pre-Created-DeepAR.png "Screenshot of automatically created DSG with DeepAR+ predictor")

## Evaluating predictor accuracy with backtests

You might consider holding out the final section of your historical data during preparation, and reconciling produced forecasts to this to calculate model accuracy: A process called [backtesting](https://en.wikipedia.org/wiki/Backtesting).

However, Amazon Forecast already performs backtesting internally (to produce [predictor metrics](https://docs.aws.amazon.com/forecast/latest/dg/metrics.html) and compare algorithms) - so you can instead **export the backtest data** to avoid duplicating this effort.

Your automatically-created predictor (as shown above) should **already have backtest results exported** - so let's explore those to start.

▶️ **Click** On the `nyctaxi_weather_auto` predictor's name hyperlink as shown above, to show the predictor detail page

▶️ **Scroll down** to the "Predictor backtest exports" section, where you should find an `Active` (completed) job.

▶️ **Replace** the dummy S3 uri below with the "Location" of this export, and run the cell.

In [None]:
backtest_s3uri = "s3://data-bucket-???/exports/???"  # TODO: Replace with your completed export

![](static/imgs/NYC/18-Export-Location.png "Screenshot showing backtest export location on Predictor details page")

Backtest exports include both detailed accuracy metric breakdowns, and individual predictions versus actual values. Both are partitioned across multiple CSV files in Amazon S3.

For example, to explore metric details:

In [None]:
def df_from_s3_prefix(prefix_s3uri) -> pd.DataFrame:
    """Function to read all .csv files under an S3 URI prefix to one Pandas DataFrame
    """
    bucket_name, _, prefix = prefix_s3uri[len("s3://"):].partition("/")
    prefix_objs = s3.Bucket(bucket_name).objects.filter(Prefix=prefix)
    prefix_df = []
    for obj in prefix_objs:
        key = obj.key
        if not key.lower().endswith(".csv"):
            print(f"Skipping file {key}")
            continue
        body = obj.get()['Body'].read()
        df = pd.read_csv(BytesIO(body), encoding='utf8')
        prefix_df.append(df)
    return pd.concat(prefix_df)


backtest_metrics_df = df_from_s3_prefix(f"{backtest_s3uri}/accuracy-metrics-values")

In [None]:
print("Items by RMSE aggregated across all backtest windows:\n")
backtest_summaries_by_rmse = backtest_metrics_df[
    backtest_metrics_df["backtest_window"] == "Summary"
].sort_values(["RMSE"], ascending=True)

print("Best:")
display(backtest_summaries_by_rmse.head())

print("Worst:")
display(backtest_summaries_by_rmse.tail())

Or alternatively, to produce plots of predicted values for particular items:

In [None]:
backtest_values_df = df_from_s3_prefix(f"{backtest_s3uri}/forecasted-values")
backtest_values_df["timestamp"] = pd.to_datetime(backtest_values_df["timestamp"])

print("\nBacktest window start timestamps:")
backtest_window_starts = backtest_values_df["backtestwindow_start_time"].unique().tolist()
print(backtest_window_starts)

In [None]:
target_item_id = 79  # Change this to explore different items
window_start_time = backtest_window_starts[0]  # Change this to explore different backtests

item_backtest_values = backtest_values_df[
    (backtest_values_df["item_id"] == target_item_id)
    & (backtest_values_df["backtestwindow_start_time"] == window_start_time)
].set_index("timestamp")
item_backtest_values.head()

In [None]:
util.plot_forecasts(
    item_backtest_values,
    actuals=item_backtest_values.rename(columns={ "target_value": "actual" }),
    ylabel="Number of Pickups"
)

### Create your own backtest exports

Above we explored already-created backtest results from the dataset group that was already set up for you. When your predictors finish training (in `Active` state in the AWS Console), how can you produce such exports yourself?

▶️ **Open** the "Predictors" list for your dataset group and **Check** that your target predictor is showing in `Active` status (finished training) - and not still `Create in progress` or similar.

▶️ **Click** on your predictor's name to open its detail page and then find the **Export backtest results** button a little way down the page:

![](static/imgs/NYC/19-Export-Backtest-Button.png "Screenshot of Prophet Predictor detail page showing 'Export backtest' button")

In [None]:
print("You'll export your backtests under:")
print(f"s3://{bucket_name}/exports/[...]")

▶️ **Configure** your export as follows and then click "Start"

- **Name**: Just like dataset import jobs, this needs to be unique. For example, you could use `nyctaxi_manual_prophet_backtest_1`
- **IAM Role**: The same IAM Role we created before should be available in the drop-down
- **KMS Key**: Can be left blank in this example
- **Export Location**: Specify a unique folder under `/exports/` in your data bucket, such as `/exports/nyctaxi_manual_prophet_backtest_1` (No trailing slash)

Once your export job shows as "Active" in the console (like the pre-created job we saw before), you can re-use the code above to visualize the results.

## Inference: Forecasts and forecast exports

So far we've evaluated our models based on past data, but what about when you're ready to actually forecast the future?

Let's continue with your manually-created Prophet predictor as an example:

### Create a Forecast

▶️ **Click** "Create a Forecast", either by selecting your predictor from the Dataset Group's "Predictors" list or from the Dataset Group dashboard.

▶️ **Configure** your Forecast as follows and click "Start":

- **Name**: `nyctaxi_manual_prophet_forecast_1`
- **Predictor**: Ensure your `Prophet` predictor is selected
- **Quantiles**: Enter `0.1, 0.5, 0.9, mean`

![](static/imgs/NYC/20-Create-Prophet-Forecast.png "Create Forecast configuration for Prophet predictor")

> ⏰ This operation creates a forward-looking forecast for each item in your dataset and can also take some time. In fact it's actually possible to update DSG datasets and re-forecast *without re-training* a predictor - although generally accuracy will be best if re-training each time.

When a Forecast is `Active`, you can query it in real-time using the Amazon Forecast [QueryForecast API](https://docs.aws.amazon.com/forecast/latest/dg/API_forecastquery_QueryForecast.html)

(You can use the pre-created forecast in the 'auto' dataset to try out the code below while waiting for your Prophet forecast to complete)

> ℹ️ **Note:** The "Forecast ARN" required below is listed near the top of the detail page for your particular Forecast

In [None]:
forecast_arn = "arn:aws:forecast:???:???:forecast/???"  # TODO: Replace with your Forecast ARN
item_id = 79  # Change this to explore different items

In [None]:
fcstresult = forecastquery.query_forecast(
    ForecastArn=forecast_arn,
    Filters={ "item_id": str(item_id) },
)

quantiles = [k for k in fcstresult["Forecast"]["Predictions"]]
print(f"Got forecast quantiles {quantiles} for item {item_id}")

In [None]:
fcst_df = pd.DataFrame({
    "timestamp": [
        o["Timestamp"]
        for o in fcstresult["Forecast"]["Predictions"][quantiles[0]]
    ],
    "item_id": item_id,
    **{
        k: [o["Value"] for o in fcstresult["Forecast"]["Predictions"][k]]
        for k in quantiles
    },
})
fcst_df["timestamp"] = pd.to_datetime(fcst_df["timestamp"])
fcst_df.set_index("timestamp", inplace=True)

display(fcst_df.head())

util.plot_forecasts(
    fcst_df,
    ylabel="Number of Pickups"
)

### Exporting Forecasts

While filtering and querying forecasts in real time through the API may be useful in some cases, other applications may want to export the forecast results in bulk.

Once a Forecast is ready, Amazon Forecast supports creating a [Forecast Export Job](https://docs.aws.amazon.com/forecast/latest/dg/API_CreateForecastExportJob.html) to save the results to partitioned CSV files in Amazon S3, much like we saw earlier with backtest exports.

> ⚠️ **Note:** Because active forecasts are available through real-time APIs, a [quota limit](https://docs.aws.amazon.com/forecast/latest/dg/limits.html) applies to how many forecasts you can have "active" at one time. If you intend to consume your results only via batch processes, you can delete your active Forecasts once their export jobs are complete.

When your Forecast is ready, it will show as "Active" in the Forecasts list as below:

![](static/imgs/NYC/21-Prophet-Forecast-Active.png "Screenshot of DSG 'Forecasts' list showing active Prophet forecast")

In [None]:
print("You'll export your forecasts under:")
print(f"s3://{bucket_name}/exports/[...]")
print("(Like we used for backtests earlier)")

▶️ **Select** your forecast from the list and click `Create forecast export` to create an export job.

▶️ **Configure** your export job as follows, and click 'Start':

- **Name**: `nyctaxi_manual_prophet_forecast_1` - This should be unique, and (since we're using the same S3 folder here) not confusable with backtest results.
- **IAM Role**: Same as earlier steps
- **Export Location**: Use the job name under the `/exports/` folder as we did earlier: E.g. `/exports/nyctaxi_manual_prophet_forecast_1`.

![](static/imgs/NYC/22-Create-Forecast-Export.png "Screenshot of forecast export job settings")

Once created, the status of Forecast Export jobs can be checked from each Forecast's detail page:

![](static/imgs/NYC/23-Forecast-Export-Ongoing.png "Screenshot of Prophet forecast detail page showing in-progress export job")

When the job shows as "Active", the export is complete.

As with backtest exports earlier, the result is typically multiple sharded CSV files under your target prefix in S3. While waiting for your export to complete, you could explore the previously-created forecast export in the `nyctaxi_weather_auto` dataset group instead:

In [None]:
forecast_export_s3uri = "s3://data-bucket-???/exports/???/"  # TODO: Replace with your completed export location

In [None]:
forecast_export_df = df_from_s3_prefix(forecast_export_s3uri)
forecast_export_df["date"] = pd.to_datetime(forecast_export_df["date"])
forecast_export_df.rename(columns={ "date": "timestamp" }, inplace=True)
forecast_export_df.head()

In [None]:
item_id = 79

util.plot_forecasts(
    forecast_export_df[forecast_export_df["item_id"] == item_id].set_index("timestamp"),
    ylabel="Number of Pickups"
)

## Wrapping up and next steps

In this walkthrough we skimmed over actual data preparation and formatting to help you get familiar with actually using the Amazon Forecast service through the AWS Console.

As you've probably realised from our work with the auto-created dataset group, it's possible to automate all of these tasks through the [Amazon Forecast APIs](https://docs.aws.amazon.com/forecast/latest/dg/api-reference.html) instead - and the [AWS Solution Implementation 'Improving Forecast Accuracy with Machine Learning'](https://aws.amazon.com/solutions/implementations/improving-forecast-accuracy-with-machine-learning/) provides an example implementation chaining these operations together in a workflow orchestrated by [AWS Step Functions](https://aws.amazon.com/step-functions/). If you open up the [Step Functions Console](https://console.aws.amazon.com/states/home?#/statemachines) you should be able to view the pipeline executed for this automated setup.

Now that you have a high-level overview of working with Amazon Forecast, and a reference implementation for automating the workflow, it's likely the most important topic to explore further is how to prepare your data to work correctly with the service and deliver the best model accuracy possible.

Good resources for this include:

- The [Importing Datasets section of the Amazon Forecast Developer Guide](https://docs.aws.amazon.com/forecast/latest/dg/howitworks-datasets-groups.html)
- The [pre-PoC workshop notebooks](https://github.com/aws-samples/amazon-forecast-samples/tree/master/workshops/pre_POC_workshop) on the official [Amazon Forecast Samples GitHub repository](https://github.com/aws-samples/amazon-forecast-samples/tree/master)
- The 'Data Diagnostic' notebook in this repository - which can help analyze your TTS for common problems like sparsity of data points.