<center><img src="https://storage.googleapis.com/arize-assets/arize-logo-white.jpg" width="200"/></center>

# Batch Ingestion for Time Series

This example walks through the Arize `pandas` batch SDK for ingesting time series data. Guides for all model types are available [here](https://docs.arize.com/arize/sending-data-to-arize/model-types).

## Install and Import Dependencies

In [None]:
!pip install -q arize

import datetime

from arize.pandas.logger import Client
from arize.utils.types import ModelTypes, Environments, Schema
import numpy as np
import pandas as pd

## Download and Display Data

In [None]:
df = pd.read_csv(
    "https://storage.googleapis.com/arize-assets/documentation-sample-data/data-ingestion/time-series-assets/time_series_data.csv?raw=true",
    index_col=False,
)
df.head()

## Add Forecast Timestamp, Run Date and Lag

In Arize, time series models are a subset of the [regression model type](https://docs.arize.com/arize/model-types/regression) and are characterized by three fields:
- The _forecast timestamp_ describes the date and time of the predicted event or observation and is passed into the timestamp field.
- The _run date_ describes the date on which the model was run and the prediction was generated and is optionally passed in as a tag.
- The _lag_ describes the number of days between the forecast timestamp and run date and is optionally passed in as a tag.

For example, if you run a model on Monday to predict the temperature on Friday, the run date would be Monday's date, the forecast timestamp would be a timestamp for a time on Friday and the lag would be four days.

The following cell distributes the forecast timestamps evenly across the past month and assumes the model generating the predictions was run one month ago.

In [None]:
num_samples = df.shape[0]
current_datetime = datetime.datetime.now()
run_datetime = current_datetime - datetime.timedelta(days=30)
run_datetime = datetime.datetime(run_datetime.year, run_datetime.month, run_datetime.day)  # Format run datetime
forecast_timestamps = np.linspace(run_datetime.timestamp(), current_datetime.timestamp(), num_samples).astype(int)
forecast_datetimes = [datetime.datetime.fromtimestamp(ts) for ts in forecast_timestamps]  # Format forecast datetime
num_lag_days = [(dt - run_datetime).days for dt in forecast_datetimes] # Calculate lag

df.insert(1, "forecast_ts", forecast_timestamps)
df.insert(2, "run_date", run_datetime.strftime("%Y-%m-%d"))
df.insert(3, "lag", num_lag_days)
df[["forecast_ts", "run_date", "lag"]]

## Create Arize Client

Sign up/log in to your Arize account [here](https://app.arize.com/auth/login). Find your [space and API keys](https://docs.arize.com/arize/sending-data/sdk-reference/python-sdk/arize.init#retrieving-space-and-api-keys). Copy/paste into the cell below.

In [None]:
SPACE_KEY = "SPACE_KEY"  # Change this line.
API_KEY = "API_KEY"  # Change this line.
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)
if SPACE_KEY == "SPACE_KEY" or API_KEY == "API_KEY":
    raise ValueError("❌ CHANGE SPACE AND API KEYS")
else:
    print("✅ Arize client setup done! Now you can start using Arize!")

## Define Schema

Create your [model schema](https://docs.arize.com/arize/sending-data-to-arize/model-schema-reference).

In [None]:
feature_column_names = list(df.columns[4:-2])
schema = Schema(
    prediction_id_column_name="prediction_id",
    feature_column_names=feature_column_names,
    timestamp_column_name="forecast_ts",
    prediction_label_column_name="predicted_thermal",
    actual_label_column_name="reported_thermal",
    tag_column_names=[
        "run_date",
        "lag",
    ],
)

## Log Data to Arize

Log the DataFrame using the [pandas API](https://docs.arize.com/arize/sending-data-to-arize/data-ingestion-methods/sdk-reference/python-sdk/arize.pandas).

In [None]:
response = arize_client.log(
    dataframe=df,
    schema=schema,
    model_id="time-series-batch-ingestion-tutorial",
    model_version="1.0.0",
    model_type=ModelTypes.NUMERIC,
    environment=Environments.PRODUCTION,
)

if response.status_code == 200:
    print(f"✅ Successfully logged data to Arize!")
else:
    print(f'❌ Logging failed with status code {response.status_code} and message "{response.text}"')