# Synthefy Forecasting API: A Comprehensive Guide

Welcome to the Synthefy Forecasting API! This notebook demonstrates how to use our powerful forecasting capabilities to predict time series data with advanced features like context-aware forecasting and external data integration.

## What is Synthefy?

Synthefy is an advanced forecasting API that leverages foundation models to provide accurate predictions for your time series data. Our API offers:

- **Simple Integration**: Easy-to-use Python interface
- **Context-Aware Forecasting**: Incorporate future known data into your predictions
- **External Data Integration**: Leverage Haver Analytics data for enhanced predictions
- **Flexible Input**: Support for multiple targets and covariates

## Getting Started

First, let's set up our environment and import the necessary libraries.


In [10]:
import json
from typing import Any, Dict, List, Optional, Tuple

import httpx
import pandas as pd
from pydantic import BaseModel, Field, field_validator

# --------------------------------------------------- Setup API ---------------------------------------------------
X_API_KEY = "254316e6638ae1f39897cc84f5de86b4cf347ff17611b96d42c8161f57d1a01e"  # Replace with your API key
BASE_URL = "https://dev.synthefy.com"
client = httpx.Client(base_url=BASE_URL, timeout=100.0)
ENDPOINT = "api/foundation_models/forecast/stream"

# --------------------------------------------------- End API Setup ---------------------------------------------------

# ## Understanding the Data Models
#
# Our API uses Pydantic models to ensure data validation and type safety. Let's explore the key components:


# --------------------------------------------------- Data Model(s) ---------------------------------------------------
class HaverMetadataAccessInfo(BaseModel):
    """Model for accessing Haver Analytics data.

    This model allows you to specify which external data sources you want to use
    for enhancing your forecasts.
    """

    db_path_info: Optional[str] = None
    access_info: Dict[str, Any]

    @field_validator("access_info")
    @classmethod
    def validate_access_info(cls, v: Dict[str, Any]):
        if not v:
            raise ValueError("access_info must not be empty")
        if not (("file_name" in v) or ("databaseName" in v and "name" in v)):
            raise ValueError(
                "Each access_info item must contain either 'file_name' or both 'databaseName' and 'name'"
            )
        return v


class FoundationModelForecastStreamRequest(BaseModel):
    # --------------- User uploaded data ---------------
    historical_timestamps: List[str]

    # from df.to_dict(orient='list')
    historical_timeseries_data: Dict[str, List[Any]]
    targets: List[str]  # must be present as keys in historical_timeseries_data

    # must be present in historical_timeseries_data if provided
    covariates: List[str] = Field(default_factory=list)
    # --------------- End user uploaded data ---------------

    # --------------- Synthefy Database context ---------------
    synthefy_metadata_info_combined: List[HaverMetadataAccessInfo] | None
    # Must be a subset of synthefy_metadata_list; these will be leaked into the future_df
    synthefy_metadata_leak_idxs: Optional[List[int]] = None
    # --------------- End Synthefy Database context ---------------

    # --------------- Data for Forecasting ---------------
    # the timestamps for which we want to predict the targets' values
    forecast_timestamps: List[str]
    # from df.to_dict(orient='list'); future metadata that will be used
    future_timeseries_data: Dict[str, List[Any]] | None
    # --------------- End Data for Forecasting ---------------

    # Dict used to add constant context (will be same for each timestamp/repeated for the dfs)
    static_context: Dict[str, float | int | str] | None
    prompt: str | None  # Prompt/description of the task/data/etc

    quantiles: List[float] | None  # which quantiles to return
    model_type: str = Field(
        default="tabpfn",
        description="Model type to use for forecasting. Options: 'tabpfn', 'synthefy'",
    )



def make_api_call(request: FoundationModelForecastStreamRequest):
# Make the API call
    response = client.post(
        ENDPOINT,
        json=request.model_dump(),
        headers={"X-API-Key": X_API_KEY},
    )
    return response


# ## Helper Functions
#
# We provide a comprehensive helper function to convert your data into the format our API expects:


def convert_df_to_synthefy_request_with_leaking_context(
    df: pd.DataFrame,
    future_df: pd.DataFrame | None,
    target_cols: List[str],
    forecast_timestamps: List[str],
    timestamp_col: Optional[str] = None,  # auto-detect if not provided
    covariate_cols: List[str] = [],
    synthefy_metadata_info_combined: List[HaverMetadataAccessInfo]
    | None = None,
    synthefy_metadata_leak_idxs: Optional[List[int]] = None,
    model_type: str = "synthefy-forecasting",
) -> FoundationModelForecastStreamRequest:
    """Convert pandas DataFrames into a Synthefy API request.

    This function handles all the data preparation needed to make a forecast request.
    It supports:
    - Automatic timestamp detection
    - Multiple target variables
    - Optional covariates
    - Future known data
    - External data integration

    Args:
        df: Historical data DataFrame
        future_df: Future known data DataFrame (can be empty)
        target_cols: Columns to forecast
        forecast_timestamps: Timestamps to forecast for
        timestamp_col: Column containing timestamps (auto-detected if None)
        covariate_cols: Additional columns to use as features
        synthefy_metadata_info_combined: External data sources to use
        synthefy_metadata_leak_idxs: Which external data sources to use

    Returns:
        A properly formatted request object for the Synthefy API
    """
    df_copy = df.copy()
    # auto-detect timestamp column if not provided
    if timestamp_col is None:
        for col in df.columns:
            if pd.api.types.is_datetime64_any_dtype(df[col]):
                timestamp_col = col
                break
    else:
        df.loc[:, timestamp_col] = pd.to_datetime(df[timestamp_col])
        if future_df is not None:
            future_df.loc[:, timestamp_col] = pd.to_datetime(
                future_df[timestamp_col]
            )
    if not timestamp_col:
        raise ValueError("No timestamp column found")

    historical_timestamps = [
        ts.isoformat() for ts in df[timestamp_col].tolist()
    ]

    # drop timestamp column from df
    df_copy = df_copy.drop(columns=[timestamp_col])

    df_copy = df_copy[target_cols + covariate_cols]
    historical_timeseries_data = df_copy.to_dict(orient="list")
    historical_timeseries_data = {
        str(k): v for k, v in historical_timeseries_data.items()
    }

    # Get the future_timeseries_data (don't give the target columns)
    future_timeseries_data = None
    if future_df is not None:
        future_timeseries_data = future_df[covariate_cols].to_dict(
            orient="list"
        )
        future_timeseries_data = {
            str(k): v for k, v in future_timeseries_data.items()
        }

    # create request object
    request = FoundationModelForecastStreamRequest(
        historical_timestamps=historical_timestamps,
        historical_timeseries_data=historical_timeseries_data,
        targets=target_cols,
        covariates=covariate_cols,
        synthefy_metadata_info_combined=synthefy_metadata_info_combined,
        synthefy_metadata_leak_idxs=synthefy_metadata_leak_idxs,
        forecast_timestamps=forecast_timestamps,
        future_timeseries_data=future_timeseries_data,
        static_context=None,  # Not yet supported
        prompt=None,  # Not yet supported
        quantiles=None,
        model_type=model_type,
    )

    return request

def convert_response_to_df(
    response: Dict[str, Any],
) -> Tuple[pd.DataFrame, pd.DataFrame]:
    """Convert API response to pandas DataFrames for easy analysis.

    Args:
        response: The API response dictionary

    Returns:
        Tuple of (forecast_df, quantiles_df) where:
        - forecast_df contains the point forecasts
        - quantiles_df contains the forecast quantiles
    """
    forecast_dict = {k: v for k, v in response["forecast"].items()}
    quantiles = {k: v for k, v in response["forecast_quantiles"].items()}
    forecast_df = pd.DataFrame(forecast_dict)
    forecast_df["timestamp"] = response["forecast_timestamps"]
    quantiles_df = pd.DataFrame(quantiles)
    quantiles_df["timestamp"] = response["forecast_timestamps"]
    return forecast_df, quantiles_df






In [11]:
df = pd.read_csv("/home/minkyu/repos/synthefy-package/_dev_/fixed_walmart.csv")
df


Unnamed: 0,Store,Date,Weekly_Sales,Holiday_Flag,Temperature,Fuel_Price,CPI,Unemployment
0,store_1,2010-02-05,1643690.90,0,42.31,2.572,211.096358,8.106
1,store_1,2010-02-12,1641957.44,1,38.51,2.548,211.242170,8.106
2,store_1,2010-02-19,1611968.17,0,39.93,2.514,211.289143,8.106
3,store_1,2010-02-26,1409727.59,0,46.63,2.561,211.319643,8.106
4,store_1,2010-03-05,1554806.68,0,46.50,2.625,211.350143,8.106
...,...,...,...,...,...,...,...,...
6430,store_45,2012-09-28,713173.95,0,64.88,3.997,192.013558,8.684
6431,store_45,2012-10-05,733455.07,0,64.89,3.985,192.170412,8.667
6432,store_45,2012-10-12,734464.36,0,54.47,4.000,192.327265,8.667
6433,store_45,2012-10-19,718125.53,0,56.47,3.969,192.330854,8.667


In [12]:
df

Unnamed: 0,Store,Date,Weekly_Sales,Holiday_Flag,Temperature,Fuel_Price,CPI,Unemployment
0,store_1,2010-02-05,1643690.90,0,42.31,2.572,211.096358,8.106
1,store_1,2010-02-12,1641957.44,1,38.51,2.548,211.242170,8.106
2,store_1,2010-02-19,1611968.17,0,39.93,2.514,211.289143,8.106
3,store_1,2010-02-26,1409727.59,0,46.63,2.561,211.319643,8.106
4,store_1,2010-03-05,1554806.68,0,46.50,2.625,211.350143,8.106
...,...,...,...,...,...,...,...,...
6430,store_45,2012-09-28,713173.95,0,64.88,3.997,192.013558,8.684
6431,store_45,2012-10-05,733455.07,0,64.89,3.985,192.170412,8.667
6432,store_45,2012-10-12,734464.36,0,54.47,4.000,192.327265,8.667
6433,store_45,2012-10-19,718125.53,0,56.47,3.969,192.330854,8.667


In [13]:
# The last date is 2012-10-26. -> Let's forecast for the next 4 weeks.
df = df[df["Store"] == "store_1"]

# Create a basic forecast request
request = convert_df_to_synthefy_request_with_leaking_context(
    df=df,
    future_df=None,  # No future data in this example
    target_cols=["Weekly_Sales"],
    forecast_timestamps=[
        "2012-11-02",
        "2012-11-09",
        "2012-11-16",
        "2012-11-23",
    ],
    timestamp_col="Date",
    covariate_cols=[],
    synthefy_metadata_info_combined=None,
    synthefy_metadata_leak_idxs=None,
)

In [14]:
# Let's examine our request
print(json.dumps(request.model_dump(), indent=4))


{
    "historical_timestamps": [
        "2010-02-05T00:00:00",
        "2010-02-12T00:00:00",
        "2010-02-19T00:00:00",
        "2010-02-26T00:00:00",
        "2010-03-05T00:00:00",
        "2010-03-12T00:00:00",
        "2010-03-19T00:00:00",
        "2010-03-26T00:00:00",
        "2010-04-02T00:00:00",
        "2010-04-09T00:00:00",
        "2010-04-16T00:00:00",
        "2010-04-23T00:00:00",
        "2010-04-30T00:00:00",
        "2010-05-07T00:00:00",
        "2010-05-14T00:00:00",
        "2010-05-21T00:00:00",
        "2010-05-28T00:00:00",
        "2010-06-04T00:00:00",
        "2010-06-11T00:00:00",
        "2010-06-18T00:00:00",
        "2010-06-25T00:00:00",
        "2010-07-02T00:00:00",
        "2010-07-09T00:00:00",
        "2010-07-16T00:00:00",
        "2010-07-23T00:00:00",
        "2010-07-30T00:00:00",
        "2010-08-06T00:00:00",
        "2010-08-13T00:00:00",
        "2010-08-20T00:00:00",
        "2010-08-27T00:00:00",
        "2010-09-03T00:00:00",
      

In [15]:
# Veryfing we use the synthefy foundation model
request.model_type

'synthefy-forecasting'

In [16]:
# Make the API call
response = make_api_call(request)

print(response.json())


# Convert response to DataFrames for analysis
forecast_df, quantiles_df = convert_response_to_df(response.json())
print("\nPoint Forecasts:")
print(forecast_df.head())
print("\nForecast Quantiles:")
print(quantiles_df.head())


{'forecast_timestamps': ['2012-11-02T00:00:00.000000000', '2012-11-09T00:00:00.000000000', '2012-11-16T00:00:00.000000000', '2012-11-23T00:00:00.000000000'], 'forecast': {'Weekly_Sales': [1630241.0, 1599859.625, 1570059.875, 1569942.625]}, 'forecast_quantiles': {'Weekly_Sales_p10': [0.0, 0.0, 0.0, 0.0], 'Weekly_Sales_p90': [0.0, 0.0, 0.0, 0.0]}}

Point Forecasts:
   Weekly_Sales                      timestamp
0   1630241.000  2012-11-02T00:00:00.000000000
1   1599859.625  2012-11-09T00:00:00.000000000
2   1570059.875  2012-11-16T00:00:00.000000000
3   1569942.625  2012-11-23T00:00:00.000000000

Forecast Quantiles:
   Weekly_Sales_p10  Weekly_Sales_p90                      timestamp
0               0.0               0.0  2012-11-02T00:00:00.000000000
1               0.0               0.0  2012-11-09T00:00:00.000000000
2               0.0               0.0  2012-11-16T00:00:00.000000000
3               0.0               0.0  2012-11-23T00:00:00.000000000


## Example 2: Using Future Known Data

Now, let's enhance our forecast by incorporating known future data. This is useful when you have
information about future events that might affect your target variable.


In [17]:
# Split our data into historical and future portions
historical_df = df[df["Date"] < pd.Timestamp("2012-09-01")]
future_df = df[df["Date"] >= pd.Timestamp("2012-09-01")]
print(historical_df.tail())
print(future_df.head())

# Create a request with future known data
request = convert_df_to_synthefy_request_with_leaking_context(
    df=historical_df,
    future_df=future_df,
    target_cols=["Weekly_Sales", "Unemployment"],
    forecast_timestamps=[ts.isoformat() for ts in future_df["Date"].tolist()],
    timestamp_col="Date",
    covariate_cols=["Holiday_Flag", "CPI"],
    synthefy_metadata_info_combined=None,
    synthefy_metadata_leak_idxs=None,
)


       Store                 Date  Weekly_Sales  Holiday_Flag  Temperature  \
130  store_1  2012-08-03 00:00:00    1631135.79             0        86.11   
131  store_1  2012-08-10 00:00:00    1592409.97             0        85.05   
132  store_1  2012-08-17 00:00:00    1597868.05             0        84.85   
133  store_1  2012-08-24 00:00:00    1494122.38             0        77.66   
134  store_1  2012-08-31 00:00:00    1582083.40             0        80.49   

     Fuel_Price         CPI  Unemployment  
130       3.417  221.949864         6.908  
131       3.494  221.958433         6.908  
132       3.571  222.038411         6.908  
133       3.620  222.171946         6.908  
134       3.638  222.305480         6.908  
       Store                 Date  Weekly_Sales  Holiday_Flag  Temperature  \
135  store_1  2012-09-07 00:00:00    1661767.33             1        83.96   
136  store_1  2012-09-14 00:00:00    1517428.87             0        74.97   
137  store_1  2012-09-21 00:00:00

In [18]:
# Make the API call
response = make_api_call(request)

# Convert response to DataFrames for analysis
forecast_df, quantiles_df = convert_response_to_df(response.json())
print("\nPoint Forecasts:")
print(forecast_df.head())
print("\nForecast Quantiles:")
print(quantiles_df.head())




Point Forecasts:
   Weekly_Sales  Unemployment                      timestamp
0   1631153.750      7.771542  2012-09-07T00:00:00.000000000
1   1573854.250      7.781507  2012-09-14T00:00:00.000000000
2   1567910.250      7.741852  2012-09-21T00:00:00.000000000
3   1565830.625      7.734254  2012-09-28T00:00:00.000000000
4   1563406.000      7.766744  2012-10-05T00:00:00.000000000

Forecast Quantiles:
   Weekly_Sales_p10  Weekly_Sales_p90  Unemployment_p10  Unemployment_p90  \
0               0.0               0.0               0.0               0.0   
1               0.0               0.0               0.0               0.0   
2               0.0               0.0               0.0               0.0   
3               0.0               0.0               0.0               0.0   
4               0.0               0.0               0.0               0.0   

                       timestamp  
0  2012-09-07T00:00:00.000000000  
1  2012-09-14T00:00:00.000000000  
2  2012-09-21T00:00:00.0000

## Next Steps
Now that you've seen the basics of using the Synthefy Forecasting API, you can:
1. Try different combinations of targets and covariates
2. Experiment with different external data sources
3. Adjust the forecast horizon
