# Batch forecasting on Ray Tune

Batch training and tuning are common tasks in simple machine learning use-cases such as time series forecasting. They require fitting of simple models on multiple data batches corresponding to locations, products, etc.

**'Batch training'** is a workload that trains model(s) on subsets of a dataset. This notebook showcases how to conduct batch training using [Ray Tune](https://docs.ray.io/en/latest/tune/index.html).

![Batch training diagram](../../data/examples/images/batch-training.svg)

For the data, we will use the [NYC Taxi dataset](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page). This popular tabular dataset contains historical taxi pickups by timestamp and location in NYC.

In this notebook, we will split the data by pickup location and train a separate forecasting model to predict #pickups at each location in NYC at monthly level for the next 2 months. Specifically, we will use the `pickup_location_id` column in the dataset to group the dataset into data batches. Then we will fit a separate model for each batch and evaluate it.

# Contents

In this this tutorial, you will learn about:
 1. [Define how to load and prepare Parquet data](#prepare_data)
 2. [Define your Ray Tune Search Space and Search Algorithm](#define_search_space2)
 3. [Define a Trainable (callable) function](#define_trainable2)
 4. [Run batch training on Ray Tune](#run_tune_search2)
 5. [Load a model from checkpoint and perform inference](#load_checkpoint2)


# Walkthrough

```{tip}
Prerequisite for this notebook: Read the [Key Concepts](https://docs.ray.io/en/latest/tune/key-concepts.html) page for Ray Tune.
```

Let us start by importing a few required libraries, including open-source [Ray](https://github.com/ray-project/ray) itself!

In [1]:
import os
print(f'Number of CPUs in this system: {os.cpu_count()}')
from typing import Tuple, List, Union, Optional, Callable
import time
import pandas as pd
import numpy as np
import pyarrow
import pyarrow.parquet as pq
import pyarrow.dataset as pds
print(f"pyarrow: {pyarrow.__version__}")

Number of CPUs in this system: 8
pyarrow: 10.0.0


In [2]:
import ray

if ray.is_initialized():
    ray.shutdown()
ray.init()

2022-11-15 10:53:13,671	INFO worker.py:1230 -- Using address localhost:9031 set in the environment variable RAY_ADDRESS
2022-11-15 10:53:13,672	INFO worker.py:1342 -- Connecting to existing Ray cluster at address: 172.31.79.31:9031...
2022-11-15 10:53:13,707	INFO worker.py:1519 -- Connected to Ray cluster. View the dashboard at [1m[32mhttps://console.anyscale-staging.com/api/v2/sessions/ses_b5q8xHd42BTdukSgFqTxejLT/services?redirect_to=dashboard [39m[22m


0,1
Python version:,3.8.13
Ray version:,2.1.0
Dashboard:,http://console.anyscale-staging.com/api/v2/sessions/ses_b5q8xHd42BTdukSgFqTxejLT/services?redirect_to=dashboard


In [3]:
print(ray.cluster_resources())

{'memory': 66320753459.0, 'CPU': 24.0, 'object_store_memory': 27553038336.0, 'node:172.31.79.31': 1.0, 'node:172.31.84.43': 1.0}


In [4]:
# import forecasting libraries
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import prophet
from prophet import Prophet
print(f"numpy: {np.__version__}")
print(f"prophet: {prophet.__version__}")

# import ray libraries
from ray import air, tune
from ray.air import session
from ray.air.checkpoint import Checkpoint

# set global random seed for sklearn models
np.random.seed(415)

numpy: 1.23.4
prophet: 1.1.1


In [5]:
# For benchmarking purposes, we can print the times of various operations.
# In order to reduce clutter in the output, this is set to False by default.
PRINT_TIMES = False

def print_time(msg: str):
    if PRINT_TIMES:
        print(msg)
        
# To speed things up, we’ll only use a small subset of the full dataset consisting of two last months of 2019.
# You can choose to use the full dataset for 2018-2019 by setting the SMOKE_TEST variable to False.
SMOKE_TEST = True


## Define how to load and prepare Parquet data <a class="anchor" id="load_data"></a>

First, we need to load some data.  Since the NYC Taxi dataset is fairly large, we will filter files first into a PyArrow dataset. And then in the next cell after, we will filter the data on read into a PyArrow table and convert that to a pandas dataframe.

```{tip}
Use PyArrow dataset and table for reading or writing large parquet files, since its native multithreaded C++ adpater is faster than pandas read_parquet, even using engine=pyarrow.
```

In [6]:
# Define some global variables.
TARGET = "trip_duration"
FORECAST_LENGTH = 6
s3_partitions = pds.dataset(
    "s3://anonymous@air-example-data/ursa-labs-taxi-data/by_year/",
    partitioning=["year", "month"],
)
s3_files = [f"s3://anonymous@{file}" for file in s3_partitions.files]

# Obtain all location IDs
all_location_ids = (
    pq.read_table(s3_files[0], columns=["dropoff_location_id"])["dropoff_location_id"]
    .unique()
    .to_pylist()
)
# drop [264, 265]
all_location_ids.remove(264)
all_location_ids.remove(265)

# Use smoke testing or not.
starting_idx = -5 if SMOKE_TEST else 0
#TODO - drop error-handling test location 199
sample_locations = [1, 10, 199] if SMOKE_TEST else all_location_ids

# Display what data will be used.
s3_files = s3_files[starting_idx:]
print(f"NYC Taxi using {len(s3_files)} file(s)!")
print(f"s3_files: {s3_files}")
print(f"Locations: {sample_locations}")


NYC Taxi using 5 file(s)!
s3_files: ['s3://anonymous@air-example-data/ursa-labs-taxi-data/by_year/2019/02/data.parquet/5bc40cf9bc1145cbb0867d39064daa01_000000.parquet', 's3://anonymous@air-example-data/ursa-labs-taxi-data/by_year/2019/03/data.parquet/8b894872a484458cbd5a6cd0425b77df_000000.parquet', 's3://anonymous@air-example-data/ursa-labs-taxi-data/by_year/2019/04/data.parquet/7e490662e39c4bfe8c64c6a2c45c9e8b_000000.parquet', 's3://anonymous@air-example-data/ursa-labs-taxi-data/by_year/2019/05/data.parquet/359c21b3e28f40328e68cf66f7ba40e2_000000.parquet', 's3://anonymous@air-example-data/ursa-labs-taxi-data/by_year/2019/06/data.parquet/ab5b9d2b8cc94be19346e260b543ec35_000000.parquet']
Locations: [1, 10, 199]


In [7]:
# Function to read a pyarrow.Table object using pyarrow parquet 
def read_data(file: str, sample_id: np.int32) -> pd.DataFrame:
    
    df = pq.read_table(
        file,
        filters=[
            ("passenger_count", ">", 0),
            ("trip_distance", ">", 0),
            ("fare_amount", ">", 0),
            ("pickup_location_id", "not in", [264, 265]),
            ("dropoff_location_id", "not in", [264, 265]), 
            ("pickup_location_id", "=", sample_id)
        ],
        columns=[
            "pickup_at",
            "dropoff_at",
            "pickup_location_id",
            "dropoff_location_id",
            "passenger_count",
            "trip_distance",
            "fare_amount",
        ],
    ).to_pandas()

    return df

# Function to transform a pandas dataframe
def transform_df(the_df: pd.DataFrame) -> pd.DataFrame:
    df = the_df.copy()
    
    # calculate trip_duration
    df["trip_duration"] = (df["dropoff_at"] - df["pickup_at"]).dt.seconds
    # filter trip_durations > 1 minute and less than 24 hours
    df = df[df["trip_duration"] > 60]
    df = df[df["trip_duration"] < 24 * 60 * 60]
    
    # Prophet requires time is 'ds' and target_value name is 'y'
    
    # add year_month and concat into a unique column to use as groupby key
    df['ds'] = df['pickup_at'].dt.to_period('M').dt.to_timestamp()
    df['loc_year_month'] = df['pickup_location_id'].astype(str) + "_"  + \
                            df["pickup_at"].dt.year.astype(str) + "_"  + \
                            df["pickup_at"].dt.month.astype(str)
    # add target_value quantity for groupby count later
    df['y'] = 1
    # drop unnecessary columns
    df.drop(["dropoff_at", "pickup_at", "dropoff_location_id", "fare_amount",
            "passenger_count", "trip_distance", 
             "trip_duration"]
            , axis=1, inplace=True)
    # return df
    
    # groupby aggregregate
    g = df.groupby("loc_year_month")\
                .agg({'pickup_location_id': min,
                      'ds': min,
                      'y': sum})
    # having num rows in group > 10
    g = g[g['y'] > 10].copy()
    
    # Drop groupby variable since we do not need it anymore
    g.reset_index(inplace=True)
    g.drop(["loc_year_month"], axis=1, inplace=True)
    
    return g

In [8]:
%%time

# Test reading data.
the_location = 10
df_list = [read_data(f, the_location) for f in s3_files] 
df_raw = pd.concat(df_list, ignore_index=True)
df = transform_df(df_raw)
print(df.dtypes)
df.head()

# # # without groupby
# # CPU times: user 6.23 s, sys: 2.76 s, total: 9 s
# # Wall time: 7.66 s

# # # with groupby
# # CPU times: user 6.36 s, sys: 2.86 s, total: 9.22 s
# # Wall time: 7.01 s

pickup_location_id             int32
ds                    datetime64[ns]
y                              int64
dtype: object
CPU times: user 9.98 s, sys: 4.31 s, total: 14.3 s
Wall time: 15.6 s


Unnamed: 0,pickup_location_id,ds,y
0,10,2019-02-01,2298
1,10,2019-03-01,2608
2,10,2019-04-01,2084
3,10,2019-05-01,2382
4,10,2019-06-01,2296


In [9]:
# %%time

# # test the groupby
# g = df_raw.groupby("loc_year_month")\
#             .agg({'pickup_location_id': min,
#                   'ds': min,
#                   'trip_quantity': sum})
# # # having num rows in group > 10
# g = g[g['trip_quantity'] > 10].copy()
# g.reset_index(inplace=True)
# df = g.copy()
# g.head()

# check we don't have any missing or 0 y-values
# df.describe()

In [10]:
# # plot a timeseries
# plt.figure(figsize=(8, 5))
# ax = plt.gca()
# df.plot(x="ds", y="y", ax=ax, label=f"pickup_location_id={the_location}");

## Define your Ray Tune Search Space and Search Algorithm <a class="anchor" id="define_search_space"></a>

In this notebook, we will use Ray Tune to run parallel training jobs per dropoff location.  The training jobs will be defined using a search space and simple grid search.  Depending on your need, fancier search spaces and search algorithms are possible with Tune. 

**First, define a search space of experiment trials to run.**  
> The typical use case for Tune search spaces are for hypterparameter tuning.  In our case, we are defining a Tune search space in a way to allow for training jobs to be conducted automatically.  Each training job will run on a different data partition (taxi dropoff location) and use a different model.

Common search algorithms include grid search, random search, and Bayesian optimization.  For more details, see [Working with Tune Search Spaces](https://docs.ray.io/en/master/tune/tutorials/tune-search-spaces.html#tune-search-space-tutorial).  Deciding the best combination of search space and search algorithm is part of the art of being a Data Scientist and depends on the data, algorithm, and problem being solved!  

**Next, define a search algorithm.**
>Ray Tune will use the search space and the specified search algorithm to generate multiple configurations, each of which will be evaluated in a separate Trial on a Ray Cluster. Ray Tune will take care of orchestrating those Trials automatically.  Specifically, Ray Tune will pass a config dictionary to each partition and make a Trainable function call.

**Below, we define our search space consists of:**
- 2 different Scikit-learn algorithms 
- Some or all NYC taxi drop-off locations. 

**Also below, we define our search algorithm is:**
- Grid search.

What this means is every algorithm will be applied to every NYC Taxi drop-off location.

In [11]:
# test prophet model
model = Prophet(seasonality_mode="additive")
model = Prophet(seasonality_mode="multiplicative")
print(model)

# Keep only columns Prophet needs
model.fit(df[['ds', 'y']])

INFO:prophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:prophet:n_changepoints greater than number of observations. Using 3.
INFO:cmdstanpy:start chain 1
INFO:cmdstanpy:finish chain 1


<prophet.forecaster.Prophet object at 0x7f808c164a90>


<prophet.forecaster.Prophet at 0x7f808c164a90>

In [12]:
# test prophet prediction
FORECAST_LENGTH = 1
future_dates = model.make_future_dataframe(periods=FORECAST_LENGTH, freq='MS')
# print(future_dates)
# make a prediction
future = model.predict(future_dates)
print(type(future))

# assemble actual values
test_y = df.loc[(df.ds.isin(future_dates.ds)), :]

# assemble actual vs predicted values
pred_y = future[['ds', 'trend']]
# print(pred_y)
# Concat together predictions and actuals to visualize
temp = pd.concat([pred_y, test_y[['y']]], axis=1, ignore_index=True)
temp.columns = ['ds', 'pred_y', 'test_y']
temp = temp.iloc[0:-FORECAST_LENGTH]

# calculate mean absolute forecast error
temp['forecast_error'] = np.abs(temp['test_y'] - temp['pred_y'])
print(temp)
mean_absolute_error = np.mean(temp['forecast_error'])
print(f"mean_absolute_error: {mean_absolute_error}")

<class 'pandas.core.frame.DataFrame'>
          ds       pred_y  test_y  forecast_error
0 2019-02-01  2380.707584  2298.0       82.707584
1 2019-03-01  2358.389807  2608.0      249.610193
2 2019-04-01  2333.680851  2084.0      249.680851
3 2019-05-01  2309.768960  2382.0       72.231040
4 2019-06-01  2285.059997  2296.0       10.940003
mean_absolute_error: 133.03393416093377


In [13]:
# 1. Define a search space.

# TODO: 1. add longer forecast window; 2. keep additive, 3. try kats additive instead
search_space = {
    "model": tune.grid_search([Prophet(seasonality_mode="multiplicative"), 
                               Prophet(seasonality_mode="additive")]),
    "location": tune.grid_search(sample_locations),
}
search_space

{'model': {'grid_search': [<prophet.forecaster.Prophet at 0x7f807c71aa60>,
   <prophet.forecaster.Prophet at 0x7f808c03be80>]},
 'location': {'grid_search': [1, 10, 199]}}

## Define a Trainable (callable) function <a class="anchor" id="define_trainable"></a>

📈 Typically when you are running Data Science experiments, you want to be able to keep track of summary metrics for each trial, so you can decide at the end which trials were best.  That way, you can decide which model to deploy.

🇫 Next, we define a trainable function in order to train and evaluate a scikit-learn model on a data partition.  This function will be called in parallel by every Tune trial.  Inside this trainable function, we will:
- Add detailed metrics we want to report (each model's loss or error). 
- Checkpoint each model for easy deployment later.

📖 **The metrics defined inside the trainable function will appear in the Ray Tune experiment summary table.**
```{tip}
Ray Tune has two ways of defining a trainable, namely the [Function API](https://docs.ray.io/en/latest/tune/api_docs/trainable.html#trainable-docs) and the Class API. Both are valid ways of defining a trainable, but *the Function API is generally recommended*.
```

**In the cell below, we define a "Trainable" function called `train_model()`**.  
- The input is a config dictionary argument. 
- The output can be a simple dictionary of metrics which will be reported back to Tune.  
- We will [checkpoint](https://docs.ray.io/en/master/ray-air/key-concepts.html#checkpoints) save each model in addition to reporting each trial's metrics.
  > For checkpointing, we use `ray.air.checkpoint.Checkpoint`.  *Ray AIR includes integrations to popular ML libraries, including Scikit-learn*.  This makes it possible to use the convenient AIR API abstractions, without having to specify code details of the Scikit-learn library itself.
- Since we are using **grid search**, this means `train_model()` will be run *in parallel for every permutation* in the Tune search space!

In [31]:
# 2. Define a custom train function
def train_model(config: dict):

    model = config['model']
    the_location = config['location']
    
    # Load data.
    df_list = [read_data(f, the_location) for f in s3_files]   
    df_raw = pd.concat(df_list, ignore_index=True)
    df = transform_df(df_raw)

    # Train model.
    model = model.fit(df[['ds', 'y']])
    
    ### START move to evaluate_model()
    # TODO - move this to error, test_df, pred_df = evaluate_model(model)
    # Inference model.
    future_dates = model.make_future_dataframe(periods=FORECAST_LENGTH, freq='MS')
    future = model.predict(future_dates)

    # assemble actual vs predicted values
    test_y = df.loc[(df.ds.isin(future_dates.ds)), :]
    pred_y = future[['ds', 'trend']].iloc[0:-FORECAST_LENGTH]
    temp = pd.concat([pred_y, test_y[['y']]], 
                     axis=1, ignore_index=True)
    temp.columns = ['ds', 'pred_y', 'test_y']

    # Evaluate mean absolute forecast error.
    temp['forecast_error'] = np.abs(temp['test_y'] - temp['pred_y'])
    error = np.mean(temp['forecast_error'])

    # calculate mean absolute forecast error
    temp['forecast_error'] = np.abs(temp['test_y'] - temp['pred_y'])
    mean_absolute_error = np.mean(temp['forecast_error'])
    ### END move to evaluate_model()
    
    # Define a model checkpoint using AIR API.  
    # https://docs.ray.io/en/latest/tune/tutorials/tune-checkpoints.html
    checkpoint = ray.air.checkpoint.Checkpoint.from_dict({
        "model": model, 
        "forecast_df": future,
        "location_id": the_location})

    # Save checkpoint and report back metrics, using ray.air.session.report()
    # The metrics you specify here will appear in Tune summary table.
    # They will also be recorded in Tune results under `metrics`.
    metrics = dict(error = error)
    session.report(
            metrics, 
            checkpoint=checkpoint)

## Run batch training on Ray Tune <a class="anchor" id="run_tune_search"></a>

**In the cell below, we configure the resources allocated per trial.** 

Tune uses this resources allocation to control the parallelism. For example, if each trial was configured to use 4 CPUs, and the cluster had only 32 CPUs, then Tune will limit the number of concurrent trials to 8 to avoid overloading the cluster. For more information, see [A Guide To Parallelism and Resources](https://docs.ray.io/en/master/tune/tutorials/tune-resources.html#tune-parallelism).

In [15]:
# 3. Customize resources per trial, here we set 1 CPU each.
train_model = tune.with_resources(train_model, {"cpu": 1})

<br>

**Now we are ready to kick off a Ray Tune experiment!**  

Recall what we are doing, high level, is training several different models per dropoff location.  We are using Ray Tune so we can run all these trials in parallel.  At the end, we will inspect the results of the experiment and deploy only the best model per dropoff location.

**In the cell below, we use AIR configs and run the experiment using `tuner.fit()`.** 

Tune will report on experiment status, and after the experiment finishes, you can inspect the results. 

In the AIR config below, we have specified a local directory `my_Tune_logs` for logging instead of the default `~/ray_results` directory. Giving your logs a project name makes them easier to find.  Also giving a relative path, means you can see your logs inside the Jupyter browser.  Learn more about logging Tune results at [How to configure logging in Tune](https://docs.ray.io/en/master/tune/tutorials/tune-output.html#tune-logging).

Tune can [retry failed experiments automatically](https://docs.ray.io/en/master/tune/tutorials/tune-stopping.html#tune-stopping-guide), as well as entire experiments.  This is necessary in case a node on your remote cluster fails (when running on a cloud such as AWS or GCP).

💡 Right-click on the cell below and choose "Enable Scrolling for Outputs"! This will make it easier to view, since model training output can be very long!

**In the output below and in the Ray Dashboard, you can see that 518 models, using 18 NYC Taxi S3 files dating from 2018/01 to 2019/06 (split into partitions approx 7GiB each), were simultaneously trained on a 23-node AWS cluster of [m5.4xlarge](https://aws.amazon.com/ec2/instance-types/m5/)s, within 37 minutes.**

In [32]:
# Define a tuner object using Ray AIR Tuner API
tuner = tune.Tuner(
    train_model, 
    param_space=search_space,
    run_config=air.RunConfig(
        
        #redirect logs to relative path instead of default ~/ray_results/
        local_dir = "my_Tune_logs",
        name = "batch_tuning",

        # Set Ray Tune verbosity.  Print summary table only with levels 2 or 3.
        verbose=2,
        ),
)

# 4. Run the experiment with Ray Tune
start = time.time()
results = tuner.fit()
total_time_taken = time.time() - start

# Print some training stats
print(f"Total number of models: {len(results)}")
print(f"TOTAL TIME TAKEN: {total_time_taken:.2f} seconds")
best_result = results.get_best_result(metric="error", mode="min").config
print(f"Best result: {best_result}")

0,1
Current time:,2022-11-15 10:59:28
Running for:,00:00:24.55
Memory:,2.6/30.9 GiB

Trial name,# failures,error file
train_model_92abc_00002,1,"/home/ray/christy-air/my_Tune_logs/batch_tuning/train_model_92abc_00002_2_location=199,model=prophet_forecaster_Prophet_object_at_0x7f802de31d90_2022-11-15_10-59-06/error.txt"
train_model_92abc_00005,1,"/home/ray/christy-air/my_Tune_logs/batch_tuning/train_model_92abc_00005_5_location=199,model=prophet_forecaster_Prophet_object_at_0x7f802c23ebe0_2022-11-15_10-59-06/error.txt"

Trial name,status,loc,location,model,iter,total time (s),error
train_model_92abc_00000,TERMINATED,172.31.84.43:1865,1,<prophet.foreca_11f0,1.0,10.1811,2.2321
train_model_92abc_00001,TERMINATED,172.31.84.43:1896,10,<prophet.foreca_1b80,1.0,9.49834,133.034
train_model_92abc_00003,TERMINATED,172.31.84.43:1898,1,<prophet.foreca_48e0,1.0,9.53897,2.2321
train_model_92abc_00004,TERMINATED,172.31.84.43:1899,10,<prophet.foreca_c190,1.0,9.57942,133.034
train_model_92abc_00002,ERROR,172.31.84.43:1897,199,<prophet.foreca_1d90,,,
train_model_92abc_00005,ERROR,172.31.84.43:1900,199,<prophet.foreca_ebe0,,,


[2m[36m(train_model pid=1865, ip=172.31.84.43)[0m INFO:cmdstanpy:start chain 1
[2m[36m(train_model pid=1865, ip=172.31.84.43)[0m INFO:cmdstanpy:finish chain 1


Trial name,error,should_checkpoint
train_model_92abc_00000,2.2321022029248,True
train_model_92abc_00001,133.03393416093377,True
train_model_92abc_00002,,
train_model_92abc_00003,2.2321022029248,True
train_model_92abc_00004,133.03393416093377,True
train_model_92abc_00005,,


[2m[36m(train_model pid=1899, ip=172.31.84.43)[0m INFO:cmdstanpy:start chain 1
[2m[36m(train_model pid=1896, ip=172.31.84.43)[0m INFO:cmdstanpy:start chain 1
[2m[36m(train_model pid=1896, ip=172.31.84.43)[0m INFO:cmdstanpy:finish chain 1
[2m[36m(train_model pid=1898, ip=172.31.84.43)[0m INFO:cmdstanpy:start chain 1
[2m[36m(train_model pid=1899, ip=172.31.84.43)[0m INFO:cmdstanpy:finish chain 1
[2m[36m(train_model pid=1898, ip=172.31.84.43)[0m INFO:cmdstanpy:finish chain 1
2022-11-15 10:59:19,412	ERROR trial_runner.py:993 -- Trial train_model_92abc_00005: Error processing event.
ray.exceptions.RayTaskError(ValueError): [36mray::ImplicitFunc.train()[39m (pid=1900, ip=172.31.84.43, repr=train_model)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/tune/trainable/trainable.py", line 355, in train
    raise skipped from exception_cause(skipped)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/tune/trainable/function_trainable.py", line 325, in entry

Total number of models: 6
TOTAL TIME TAKEN: 24.69 seconds
Best result: {'model': <prophet.forecaster.Prophet object at 0x7f8322b59670>, 'location': 1}


<br>

**After the Tune experiment has run, select the best model per dropoff location.**

We can assemble the Tune results ([ResultGrid object](https://docs.ray.io/en/master/tune/examples/tune_analyze_results.html)) into a pandas dataframe, then sort by minimum error, to select the best model per dropoff location.

In [33]:
# get a list of training loss errors
errors = []
[errors.append(i.metrics.get('error',10000.0)) for i in results]

# get a list of checkpoints
checkpoints = []
[checkpoints.append(i.checkpoint) for i in results] 

# get a list of locations
locations = []
[locations.append(i.config['location']) for i in results]

# get a list of models
models = []
[models.append(i.config['model']) for i in results]

# Assemble a pandas dataframe from Tune results
results_df = pd.DataFrame(zip(locations, models, errors,checkpoints),
                          columns = ['location_id', 'model', 'error', 'checkpoint']
                         )
print(results_df.dtypes)
results_df.head()

location_id      int64
model           object
error          float64
checkpoint      object
dtype: object


Unnamed: 0,location_id,model,error,checkpoint
0,1,<prophet.forecaster.Prophet object at 0x7f8322...,2.232102,Checkpoint(local_path=/home/ray/christy-air/my...
1,10,<prophet.forecaster.Prophet object at 0x7f802c...,133.033934,Checkpoint(local_path=/home/ray/christy-air/my...
2,199,<prophet.forecaster.Prophet object at 0x7f8322...,10000.0,
3,1,<prophet.forecaster.Prophet object at 0x7f802c...,2.232102,Checkpoint(local_path=/home/ray/christy-air/my...
4,10,<prophet.forecaster.Prophet object at 0x7f802c...,133.033934,Checkpoint(local_path=/home/ray/christy-air/my...


In [34]:
type(results_df.model[0])

prophet.forecaster.Prophet

In [35]:
# Keep only 1 model per location_id with minimum error
final_df = results_df.dropna()
final_df = final_df.loc[final_df.groupby('location_id')['error'].idxmin()].copy()
final_df.sort_values(by=["error"], inplace=True)
final_df.set_index('location_id', inplace=True, drop=True)
print(final_df.dtypes)
final_df

model          object
error         float64
checkpoint     object
dtype: object


Unnamed: 0_level_0,model,error,checkpoint
location_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,<prophet.forecaster.Prophet object at 0x7f8322...,2.232102,Checkpoint(local_path=/home/ray/christy-air/my...
10,<prophet.forecaster.Prophet object at 0x7f802c...,133.033934,Checkpoint(local_path=/home/ray/christy-air/my...


## Load a model from checkpoint and perform inference  <a class="anchor" id="load_checkpoint"></a>

```{tip}
[Ray AIR Predictors](https://docs.ray.io/en/latest/ray-air/predictors.html) make batch inference easy since they have internal logic to parallelize the inference.
```
  
In this notebook, we will restore a single scikit-learn model directly from checkpoint, and demonstrate it can be used for inference.

Below, we can easily obtain AIR Checkpoint objects from the Tune results. 

In [36]:
# Get a dropoff location
the_location = final_df.index[0]
the_location

1

In [37]:
# Get a checkpoint directly from the pandas dataframe of Tune results
checkpoint = final_df.checkpoint[the_location]
print(type(checkpoint))

# Restore a model from checkpoint
model = checkpoint.to_dict()['model']
print(type(model))

<class 'ray.air.checkpoint.Checkpoint'>
<class 'prophet.forecaster.Prophet'>


In [38]:
# Create some test data
df_list = [read_data(f, the_location) for f in s3_files[:1]]   
df_raw = pd.concat(df_list, ignore_index=True)
df = transform_df(df_raw)
df.head()

Unnamed: 0,pickup_location_id,ds,y
0,1,2019-02-01,34


In [None]:
# Perform inference using restored model from checkpoint
future_dates = model.make_future_dataframe(periods=FORECAST_LENGTH, freq='MS')
future = model.predict(future_dates)

# Assemble actual vs predicted pandas dfs to visualize
error, test_df, pred_df = evaluate_model(model)

# TODO create plot training + backtesting w/actuals, CIs, and future preds

**Compare validation and test error.**

During model training we reported error on "validation" data (random sample).  Below, we will report error on a pretend "test" data set (a different random sample).

Do a quick validation that both errors are reasonably close together.

In [None]:
# Evaluate restored model on test data.
error = sklearn.metrics.mean_absolute_error(test_y, pred_y)
print(f"Test error: {error}")

In [None]:
# Compare test error with training validation error
print(f"Validation error: {final_df.error[the_location]}")

# Validation and test errors should be reasonably close together.