# SageMaker JumpStart - Deploy Chronos endpoints to AWS for production use

In this demo notebook, we will walk through the process of using the **SageMaker Python SDK** to deploy a **Chronos** model to a cloud endpoint on AWS. To simplify deployment, we will leverage **SageMaker JumpStart**.

### Why Deploy to an Endpoint?
So far, we’ve seen how to run models locally, which is useful for experimentation. However, in a production setting, a forecasting model is typically just one component of a larger system. Running models locally doesn’t scale well and lacks the reliability needed for real-world applications.

To address this, we deploy models as **endpoints** on AWS. An endpoint acts as a **hosted service**—we can send it requests (containing time series data), and it returns forecasts in response. This allows seamless integration into production workflows, ensuring scalability and real-time inference.

## Deploy the model

First, update the SageMaker SDK to access the latest models:

In [1]:
!pip install -U -q sagemaker

We create a `JumpStartModel` with the necessary configuration based on the model ID. The key parameters are:
- `model_id`: Specifies the model to use. Here, we choose the [Chronos-Bolt (Base)](https://huggingface.co/amazon/chronos-bolt-base) model. Currently, the following model IDs are supported:
  - `autogluon-forecasting-chronos-bolt-base` - [Chronos-Bolt (Base)](https://huggingface.co/amazon/chronos-bolt-base).
  - `autogluon-forecasting-chronos-bolt-small` - [Chronos-Bolt (Small)](https://huggingface.co/amazon/chronos-bolt-small).
  - [Original Chronos models](https://huggingface.co/amazon/chronos-t5-small) in sizes `small`, `base` and `large` can be accessed, e.g., as `autogluon-forecasting-chronos-t5-small`. Note that these models require a GPU to run, are much slower and don't support covariates. Therefore, for most practical purposes we recommend using Chronos-Bolt models instead.
- `instance_type`: Defines the AWS instance for serving the endpoint. We use `ml.c5.2xlarge` to run the model on CPU. To use a GPU, select an instance like `ml.g5.2xlarge`, or choose other CPU options such as `ml.m5.xlarge` or `ml.m5.4xlarge`. You can check the pricing for different SageMaker instance types for real-time inference [here](https://aws.amazon.com/sagemaker-ai/pricing/).

The `JumpStartModel` will automatically set the necessary attributes such as `image_uri` based on the chosen `model_id` and `instance_type`.

In [7]:
from sagemaker.jumpstart.model import JumpStartModel

model_id = "autogluon-forecasting-chronos-bolt-base"

model = JumpStartModel(
    model_id=model_id,
    instance_type="ml.c5.2xlarge",
)

Next, we deploy the model and create an endpoint. Deployment typically takes a few minutes, as SageMaker provisions the instance, loads the model, and sets up the endpoint for inference.


In [None]:
predictor = model.deploy()

> **Note:** Once the endpoint is deployed, it remains active and incurs charges on your AWS account until it is deleted. The cost depends on factors such as the instance type, the region where the endpoint is hosted, and the duration it remains running. To avoid unnecessary charges, make sure to delete the endpoint when it is no longer needed. For detailed pricing information, refer to the [SageMaker AI pricing page](https://aws.amazon.com/sagemaker-ai/pricing/).


If the previous step results in an error, you may need to update the model configuration. For example, specifying a `role` when creating the `JumpStartModel` ensures the necessary AWS resources are accessible.

In [None]:
# model = JumpStartModel(role="your-sagemaker-execution-role", model_id=model_id, instance_type="ml.c5.2xlarge")

Alternatively, you can create a predictor for an existing endpoint.

In [None]:
# from sagemaker.predictor import retrieve_default

# endpoint_name = "NAME-OF-EXISTING-ENDPOINT"
# predictor = retrieve_default(endpoint_name)

## Querying the endpoint

We can now invoke the endpoint to make a forecast. We send a **payload** to the endpoint, which includes historical time series values and configuration parameters, such as the prediction length. The endpoint processes this input and returns a **response** containing the forecasted values based on the provided data.

In [2]:
# Define a utility function to print the response in a pretty format
from pprint import pformat


def nested_round(data, decimals=2):
    """Round numbers, including nested dicts and list."""
    if isinstance(data, float):
        return round(data, decimals)
    elif isinstance(data, list):
        return [nested_round(item, decimals) for item in data]
    elif isinstance(data, dict):
        return {key: nested_round(value, decimals) for key, value in data.items()}
    else:
        return data


def pretty_format(data):
    return pformat(nested_round(data), width=150, sort_dicts=False)

In [3]:
payload = {
    "inputs": [
        {"target": [0.0, 4.0, 5.0, 1.5, -3.0, -5.0, -3.0, 1.5, 5.0, 4.0, 0.0, -4.0, -5.0, -1.5, 3.0, 5.0, 3.0, -1.5, -5.0, -4.0]},
    ],
    "parameters": {
        "prediction_length": 10
    }
}
response = predictor.predict(payload)
print(pretty_format(response))

{'predictions': [{'mean': [-1.58, 0.52, 1.88, 1.39, -1.03, -3.34, -2.67, -0.64, 0.96, 1.59],
                  '0.1': [-4.17, -2.71, -1.7, -2.35, -4.79, -6.98, -6.59, -4.87, -3.45, -2.89],
                  '0.5': [-1.58, 0.52, 1.88, 1.39, -1.03, -3.34, -2.67, -0.64, 0.96, 1.59],
                  '0.9': [1.47, 4.47, 6.27, 5.98, 3.5, 1.11, 2.06, 4.47, 6.41, 7.17]}]}


A payload may also contain **multiple time series**, potentially including `start` and `item_id` fields.

In [4]:
payload = {
    "inputs": [
        {
            "target": [1.0, 2.0, 3.0, 2.0, 0.5, 2.0, 3.0, 2.0, 1.0],
            "item_id": "product_A",
            "start": "2024-01-01T01:00:00",
        },
        {
            "target": [5.4, 3.0, 3.0, 2.0, 1.5, 2.0, -1.0],
            "item_id": "product_B",
            "start": "2024-02-02T03:00:00",
        },
    ],
    "parameters": {
        "prediction_length": 5,
        "freq": "1h",
        "quantile_levels": [0.1, 0.5, 0.9],
        "batch_size": 2,
    },
}
response = predictor.predict(payload)
print(pretty_format(response))

{'predictions': [{'mean': [1.41, 1.5, 1.49, 1.45, 1.51],
                  '0.1': [0.12, -0.08, -0.25, -0.41, -0.45],
                  '0.5': [1.41, 1.5, 1.49, 1.45, 1.51],
                  '0.9': [3.29, 3.82, 4.09, 4.3, 4.56],
                  'item_id': 'product_A',
                  'start': '2024-01-01T10:00:00'},
                 {'mean': [-1.22, -1.3, -1.3, -1.14, -1.13],
                  '0.1': [-4.51, -5.48, -6.12, -6.5, -7.1],
                  '0.5': [-1.22, -1.3, -1.3, -1.14, -1.13],
                  '0.9': [2.84, 4.02, 4.92, 5.99, 6.79],
                  'item_id': 'product_B',
                  'start': '2024-02-02T10:00:00'}]}


Chronos-Bolt models also support forecasting with covariates (a.k.a. exogenous features or related time series). These can be provided using the `past_covariates` and `future_covariates` keys.

In [5]:
payload = {
    "inputs": [
        {
            "target": [1.0, 2.0, 3.0, 2.0, 0.5, 2.0, 3.0, 2.0, 1.0],
            # past_covariates must have the same length as "target"
            "past_covariates": {
                "feat_1": [3.0, 6.0, 9.0, 6.0, 1.5, 6.0, 9.0, 6.0, 3.0],
                "feat_2": ["A", "B", "B", "B", "A", "A", "A", "A", "B"],
            },
            # future_covariates must have length equal to "prediction_length"
            "future_covariates": {
                "feat_1": [2.5, 2.2, 3.3],
                "feat_2": ["B", "A", "A"],
            },
        },
        {
            "target": [5.4, 3.0, 3.0, 2.0, 1.5, 2.0, -1.0],
            "past_covariates": {
                "feat_1": [0.6, 1.2, 1.8, 1.2, 0.3, 1.2, 1.8],
                "feat_2": ["A", "B", "B", "B", "A", "A", "A"],
            },
            "future_covariates": {
                "feat_1": [1.2, 0.3, 4.4],
                "feat_2": ["A", "B", "A"],
            },
        },
    ],
    "parameters": {
        "prediction_length": 3,
        "quantile_levels": [0.1, 0.5, 0.9],
    },
}
response = predictor.predict(payload)
print(pretty_format(response))

{'predictions': [{'mean': [1.41, 1.5, 1.49], '0.1': [0.12, -0.08, -0.25], '0.5': [1.41, 1.5, 1.49], '0.9': [3.29, 3.82, 4.09]},
                 {'mean': [-1.22, -1.3, -1.3], '0.1': [-4.51, -5.48, -6.12], '0.5': [-1.22, -1.3, -1.3], '0.9': [2.84, 4.02, 4.92]}]}


## Endpoint API
So far, we have explored several examples of querying the endpoint with different payload structures. Below is a comprehensive API specification detailing all supported parameters, their meanings, and how they affect the model’s predictions.

* **inputs** (required): List with at most 1000 time series that need to be forecasted. Each time series is represented by a dictionary with the following keys:
    * **target** (required): List of observed numeric time series values. 
        - It is recommended that each time series contains at least 30 observations.
        - If any time series contains fewer than 5 observations, an error will be raised.
    * **item_id**: String that uniquely identifies each time series. 
        - If provided, the ID must be unique for each time series.
        - If provided, then the endpoint response will also include the **item_id** field for each forecast.
    * **start**: Timestamp of the first time series observation in ISO format (`YYYY-MM-DD` or `YYYY-MM-DDThh:mm:ss`). 
        - If **start** field is provided, then **freq** must also be provided as part of **parameters**.
        - If provided, then the endpoint response will also include the **start** field indicating the first timestamp of each forecast.
    * **past_covariates**: Dictionary containing the past values of the covariates for this time series.
        - If **past_covariates** field is provided, then **future_covariates** must be provided as well with the same keys.
        - Each key in **past_covariates** correspond to the name of the covariate. Each value must be an array consisting of all-numeric or all-string values, with the length equal to the length of the **target**.
    * **future_covariates**: Dictionary containing the future values of the covariates for this time series (values during the forecast horizon).
        - If **future_covariates** field is provided, then **past_covariates** must be provided as well with the same keys.
        - Each key in **future_covariates** correspond to the name of the covariate. Each value must be an array consisting of all-numeric or all-string values, with the length equal to **prediction_length**.
        - If both **past_covariates** and **future_covariates** are provided, a regression model specified by **covariate_model** will be used to incorporate the covariate information into the forecast.
* **parameters**: Optional parameters to configure the model.
    * **prediction_length**: Integer corresponding to the number of future time series values that need to be predicted. Defaults to `1`.
        - Recommended to keep prediction_length <= 64 since larger values will result in inaccurate quantile forecasts. Values above 1000 will raise an error.
    * **quantile_levels**: List of floats in range (0, 1) specifying which quantiles should should be included in the probabilistic forecast. Defaults to `[0.1, 0.5, 0.9]`. 
        - Note that Chronos-Bolt cannot produce quantiles outside the [0.1, 0.9] range (predictions outside the range will be clipped).
    * **freq**: Frequency of the time series observations in [pandas-compatible format](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases). For example, `1h` for hourly data or `2W` for bi-weekly data. 
        - If **freq** is provided, then **start** must also be provided for each time series in **inputs**.
    * **batch_size**: Number of time series processed in parallel by the model. Larger values speed up inference but may lead to out of memory errors. Defaults to `256`.
    * **covariate_model**: Name of the tabular regression model applied to the covariates. Possible options: `GBM` (LightGBM), `LR` (linear regression), `RF` (random forest), `CAT` (CatBoost), `XGB` (XGBoost). Defaults to `GBM`.

All keys not marked with (required) are optional.

The endpoint response contains the probabilistic (quantile) forecast for each time series included in the request.

## Working with long-format data frames

The endpoint communicates using JSON format for both input and output. However, in practice, time series data is often stored in a **long-format data frame** (where each row represents a timestamp for a specific item).

In the following example, we demonstrate how to:

1. Convert a long-format data frame into the JSON payload format required by the endpoint.
2. Send the request and retrieve predictions.
3. Convert the response back into a long-format data frame for further analysis.

First, we load an example dataset in long data frame format.

In [6]:
import pandas as pd

df = pd.read_csv(
    "https://autogluon.s3.amazonaws.com/datasets/timeseries/grocery_sales/test.csv",
    parse_dates=["timestamp"],
)
df.head()

Unnamed: 0,item_id,timestamp,scaled_price,promotion_email,promotion_homepage,unit_sales
0,1062_101,2018-01-01,0.87913,0.0,0.0,636.0
1,1062_101,2018-01-08,0.994517,0.0,0.0,123.0
2,1062_101,2018-01-15,1.005513,0.0,0.0,391.0
3,1062_101,2018-01-22,1.0,0.0,0.0,339.0
4,1062_101,2018-01-29,0.883309,0.0,0.0,661.0


We split the data into two parts:
- Past data, including historic values of the target column and the covariates.
- Future data that contains the future values of the covariates during the forecast horizon.

In [7]:
prediction_length = 8
target_col = "unit_sales"
freq = pd.infer_freq(df[df.item_id == df.item_id[0]]["timestamp"])

past_df = df.groupby("item_id").head(-prediction_length)
future_df = df.groupby("item_id").tail(prediction_length).drop(columns=[target_col])

In [8]:
past_df.head()

Unnamed: 0,item_id,timestamp,scaled_price,promotion_email,promotion_homepage,unit_sales
0,1062_101,2018-01-01,0.87913,0.0,0.0,636.0
1,1062_101,2018-01-08,0.994517,0.0,0.0,123.0
2,1062_101,2018-01-15,1.005513,0.0,0.0,391.0
3,1062_101,2018-01-22,1.0,0.0,0.0,339.0
4,1062_101,2018-01-29,0.883309,0.0,0.0,661.0


In [9]:
future_df.head()

Unnamed: 0,item_id,timestamp,scaled_price,promotion_email,promotion_homepage
23,1062_101,2018-06-11,1.005425,0.0,0.0
24,1062_101,2018-06-18,1.005454,0.0,0.0
25,1062_101,2018-06-25,1.0,0.0,0.0
26,1062_101,2018-07-02,1.005513,0.0,0.0
27,1062_101,2018-07-09,1.0,0.0,0.0


We can now convert this data into a JSON payload.

In [10]:
def convert_df_to_payload(
    past_df,
    future_df=None,
    prediction_length=1,
    freq="D",
    target_col="target",
    id_col="item_id",
    timestamp_col="timestamp",
):
    """
    Converts past and future DataFrames into JSON payload format for the Chronos endpoint.

    Args:
        past_df (pd.DataFrame): Historical data with `target_col`, `timestamp_col`, and `id_col`.
        future_df (pd.DataFrame, optional): Future covariates with `timestamp_col` and `id_col`.
        prediction_length (int): Number of future time steps to predict.
        freq (str): Pandas-compatible frequency of the time series.
        target_col (str): Column name for target values.
        id_col (str): Column name for item IDs.
        timestamp_col (str): Column name for timestamps.

    Returns:
        dict: JSON payload formatted for the Chronos endpoint.
    """
    past_df = past_df.sort_values([id_col, timestamp_col])
    if future_df is not None:
        future_df = future_df.sort_values([id_col, timestamp_col])

    covariate_cols = list(past_df.columns.drop([target_col, id_col, timestamp_col]))
    if covariate_cols and (future_df is None or not set(covariate_cols).issubset(future_df.columns)):
        raise ValueError(f"If past_df contains covariates {covariate_cols}, they should also be present in future_df")

    inputs = []
    for item_id, past_group in past_df.groupby(id_col):
        target_values = past_group[target_col].tolist()

        if len(target_values) < 5:
            raise ValueError(f"Time series '{item_id}' has fewer than 5 observations.")

        series_dict = {
            "target": target_values,
            "item_id": str(item_id),
            "start": past_group[timestamp_col].iloc[0].isoformat(),
        }

        if covariate_cols:
            series_dict["past_covariates"] = past_group[covariate_cols].to_dict(orient="list")
            future_group = future_df[future_df[id_col] == item_id]
            if len(future_group) != prediction_length:
                raise ValueError(
                    f"future_df must contain exactly {prediction_length=} values for each item_id from past_df "
                    f"(got {len(future_group)=}) for {item_id=}"
                )
            series_dict["future_covariates"] = future_group[covariate_cols].to_dict(orient="list")

        inputs.append(series_dict)


    return {
        "inputs": inputs,
        "parameters": {"prediction_length": prediction_length, "freq": freq},
    }

In [11]:
payload = convert_df_to_payload(
    past_df,
    future_df,
    prediction_length=prediction_length,
    freq=freq,
    target_col="unit_sales",
)

We can now send the payload to the endpoint.

In [12]:
response = predictor.predict(payload)

Note how Chronos-Bolt generated predictions for >300 time series in the dataset (with covariates!) in less than 2 seconds, even when running on a small CPU instance.

Finally, we can convert the response back to a long-format data frame.

In [13]:
def convert_response_to_df(response, freq="D"):
    """
    Converts a JSON response from the Chronos endpoint into a long-format DataFrame.

    Args:
        response (dict): JSON response containing forecasts.
        freq (str): Pandas-compatible frequency of the time series.

    Returns:
        pd.DataFrame: Long-format DataFrame with timestamps, item_id, and forecasted values.
    """
    dfs = []
    for forecast in response["predictions"]:
        forecast_df = pd.DataFrame(forecast).drop(columns=["start"])
        forecast_df["timestamp"] = pd.date_range(forecast["start"], freq=freq, periods=len(forecast_df))
        dfs.append(forecast_df)
    return pd.concat(dfs)

In [14]:
forecast_df = convert_response_to_df(response, freq=freq)
forecast_df.head()

Unnamed: 0,mean,0.1,0.5,0.9,item_id,timestamp
0,315.504037,210.074945,315.504037,487.484408,1062_101,2018-06-11
1,315.364478,200.272695,315.364478,508.14585,1062_101,2018-06-18
2,310.507265,193.90263,310.507265,511.55974,1062_101,2018-06-25
3,317.322873,200.051215,317.322873,525.01383,1062_101,2018-07-02
4,319.089405,199.634549,319.089405,534.102518,1062_101,2018-07-09


## Clean up the endpoint
Don't forget to clean up resources when finished to avoid unnecessary charges.

In [None]:
predictor.delete_predictor()