### Overview
Making a prediction using a linear regression model is a common use case in ML. In this guide tutorial, we build the model that predicts if a driver will complete a trip based on a number of features ingested into Feast.

The basic local mode gives you ability to quickly try Feast, while the advanced mode shows how you can use Feast in a production setting, in particular for the Google Cloud Platform (GCP) cloud.

This tutorial uses Feast with scikit learn to:

* Train a model locally using data from BigQuery
* Test the model for online inference using SQLite (for fast iteration)
* Test the model for online inference using Firestore (to represent production)


## Step 1: Install feast, scikit-learn

Install feast, gcp dependencies and scikit-learn


In [None]:
!pip install feast scikit-learn 'feast[gcp]'

Collecting feast
  Downloading feast-0.49.0-py2.py3-none-any.whl.metadata (36 kB)
Collecting colorama<1,>=0.3.9 (from feast)
  Downloading colorama-0.4.6-py2.py3-none-any.whl.metadata (17 kB)
Collecting mmh3 (from feast)
  Downloading mmh3-5.1.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (16 kB)
Collecting numpy<2,>=1.22 (from feast)
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.0/61.0 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
Collecting pyarrow<=17.0.0 (from feast)
  Downloading pyarrow-17.0.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting pydantic==2.10.6 (from feast)
  Downloading pydantic-2.10.6-py3-none-any.whl.metadata (30 kB)
Collecting tenacity<9,>=7 (from feast)
  Downloading tenacity-8.5.0-py3-none-any.whl.metadata (1.2 kB)
Collecting uvicorn==0.34.0 (from uvicorn[sta

#### Check feast version

In [None]:
!feast version

Feast SDK Version: "0.49.0"


## Step 2: Clone the Git repo

Clone the Driver Ranking Git repo into your Colab Folder

In [None]:
!git clone https://github.com/feast-dev/feast-driver-ranking-tutorial.git

Cloning into 'feast-driver-ranking-tutorial'...
remote: Enumerating objects: 65, done.[K
remote: Counting objects: 100% (65/65), done.[K
remote: Compressing objects: 100% (49/49), done.[K
remote: Total 65 (delta 26), reused 43 (delta 14), pack-reused 0 (from 0)[K
Receiving objects: 100% (65/65), 21.31 KiB | 5.33 MiB/s, done.
Resolving deltas: 100% (26/26), done.


## Step 3: Set up your Goggle Cloud Platform (GCP) Configurations

## Authenticate into GCP
This will allow you to do the advanced section of the tutorial, where you materialize remotely on a GCP
Feast spins up infrastructure on GCP using the credentials in our environment. Run the following cell to log into GCP:

In [None]:
from google.colab import auth
auth.authenticate_user()

MessageError: Error: credential propagation was unsuccessful

Set configurations
Set the following configuration, which we'll be using throughout the tutorial:

PROJECT_ID: Your project.
BUCKET_NAME: The name of a bucket which will be used to store the feature store registry and model artifacts.
BIGQUERY_DATASET_NAME: The name of a dataset which will be used to create tables containing features.
AI_PLATFORM_MODEL_NAME: The name of a model name which will be created in AI Platform.

In [None]:
PROJECT_ID= "dulcet-bastion-452612-v4" #@param {type:"string"}
BUCKET_NAME= "dulcet-bastion-452612-v4-driver_ranking_tutorial" #@param {type:"string"} custom
BIGQUERY_DATASET_NAME="feast_driver_ranking_tutorial" #@param {type:"string"} custom
AI_PLATFORM_MODEL_NAME="feast_driver_rankin_jsd_model" #@param {type:"string"

! gcloud config set project $PROJECT_ID
%env GOOGLE_CLOUD_PROJECT=$PROJECT_ID
!echo project_id = $PROJECT_ID > ~/.bigqueryrc

Updated property [core/project].
env: GOOGLE_CLOUD_PROJECT=dulcet-bastion-452612-v4


In [None]:
# Only run if your bucket doesn't already exist!
! gsutil mb gs://$BUCKET_NAME

Creating gs://dulcet-bastion-452612-v4-driver_ranking_tutorial/...
ServiceException: 409 A Cloud Storage bucket named 'dulcet-bastion-452612-v4-driver_ranking_tutorial' already exists. Try another name. Bucket names must be globally unique across all Google Cloud projects, including those outside of your organization.


## Step 4: Apply and deploy feature definitions

`feast apply` scans python files in the current directory for feature definitions and deploys infrastructure according to `feature_store.yaml`

In [None]:
%%shell
cd /content/feast-driver-ranking-tutorial/driver_ranking/
feast apply

see the appropriate new directories, set the environment variable
`JUPYTER_PLATFORM_DIRS=1` and then run `jupyter --paths`.
The use of platformdirs will be the default in `jupyter_core` v6
  from jupyter_core.paths import jupyter_data_dir, jupyter_runtime_dir, secure_write
  driver = Entity(name="driver_id", join_keys=["driver_id"])
No project found in the repository. Using project name driver_ranking defined in feature_store.yaml
Applying changes for project driver_ranking
Deploying infrastructure for [1m[32mdriver_hourly_stats[0m




### Inspect the files created under your local folder

In [None]:
%%shell
cd /content/feast-driver-ranking-tutorial/driver_ranking/data/
ls -l

total 20
-rw-r--r-- 1 root root 16384 Mar 20 02:49 online.db
-rw-r--r-- 1 root root   787 Mar 20 02:57 registry.db




## Step 5: Train your model

In [None]:
!pip install "numpy<2" "pandas==2.2.2"



In [None]:
import feast
from joblib import dump
import pandas as pd
from sklearn.linear_model import LinearRegression

# Load driver order data
orders = pd.read_csv("/content/feast-driver-ranking-tutorial/driver_orders.csv", sep="\t")
orders["event_timestamp"] = pd.to_datetime(orders["event_timestamp"])

# Connect to your feature store provider
fs = feast.FeatureStore(repo_path="/content/feast-driver-ranking-tutorial/driver_ranking")

# Retrieve training data from BigQuery
training_df = fs.get_historical_features(
    entity_df=orders,
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
).to_df()

print("----- Feature schema -----\n")
print(training_df.info())

print()
print("----- Example features -----\n")
print(training_df.head())

# Train model
target = "trip_completed"

reg = LinearRegression()
train_X = training_df[training_df.columns.drop(target).drop("event_timestamp")]
train_Y = training_df.loc[:, target]
reg.fit(train_X[sorted(train_X)], train_Y)

# Save model
dump(reg, "driver_model.bin")



----- Feature schema -----

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype              
---  ------           --------------  -----              
 0   event_timestamp  10 non-null     datetime64[us, UTC]
 1   driver_id        10 non-null     int64              
 2   trip_completed   10 non-null     int64              
 3   conv_rate        10 non-null     float64            
 4   acc_rate         10 non-null     float64            
 5   avg_daily_trips  10 non-null     int64              
dtypes: datetime64[us, UTC](1), float64(2), int64(3)
memory usage: 612.0 bytes
None

----- Example features -----

            event_timestamp  driver_id  trip_completed  conv_rate  acc_rate  \
0 2021-04-17 04:29:28+00:00       1002               0   0.586277  0.374841   
1 2021-04-19 04:29:28+00:00       1002               0   0.586277  0.374841   
2 2021-04-18 04:29:28+00:00       1002               0 

['driver_model.bin']

## Step 6: Materialize your online store
Apply and materialize data to Firestore

In [None]:
!cd /content/feast-driver-ranking-tutorial/driver_ranking/ && feast materialize 2021-01-01T00:00:00 2022-01-01T00:00:00

see the appropriate new directories, set the environment variable
`JUPYTER_PLATFORM_DIRS=1` and then run `jupyter --paths`.
The use of platformdirs will be the default in `jupyter_core` v6
  from jupyter_core.paths import jupyter_data_dir, jupyter_runtime_dir, secure_write
Materializing [1m[32m1[0m feature views from [1m[32m2021-01-01 00:00:00+00:00[0m to [1m[32m2022-01-01 00:00:00+00:00[0m into the [1m[32msqlite[0m online store.

[1m[32mdriver_hourly_stats[0m:
100%|██████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 376.42it/s]


### Step 7:  Make Prediction

In [None]:
import pandas as pd
import feast
from joblib import load


class DriverRankingModel:
    def __init__(self):
        # Load model
        self.model = load("/content/driver_model.bin")

        # Set up feature store
        self.fs = feast.FeatureStore(repo_path="/content/feast-driver-ranking-tutorial/driver_ranking/")

    def predict(self, driver_ids):
        # Read features from Feast
        driver_features = self.fs.get_online_features(
            entity_rows=[{"driver_id": driver_id} for driver_id in driver_ids],
            features=[
                "driver_hourly_stats:conv_rate",
                "driver_hourly_stats:acc_rate",
                "driver_hourly_stats:avg_daily_trips",
            ],
        )
        df = pd.DataFrame.from_dict(driver_features.to_dict())

        # Make prediction
        df["prediction"] = self.model.predict(df[sorted(df)])

        # Choose best driver
        best_driver_id = df["driver_id"].iloc[df["prediction"].argmax()]

        # return best driver
        return best_driver_id

In [None]:
def make_drivers_prediction():
    drivers = [1001, 1002, 1003, 1004]
    model = DriverRankingModel()
    best_driver = model.predict(drivers)
    print(f"Prediction for best driver id: {best_driver}")

In [None]:
make_drivers_prediction()

Prediction for best driver id: 1001


