<a href="https://colab.research.google.com/github/chandan8349/feast/blob/main/notebooks/Driver_Ranking_Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Overview
Making a prediction using a linear regression model is a common use case in ML. In this guide tutorial, we build the model that predicts if a driver will complete a trip based on a number of features ingested into Feast.

The basic local mode gives you ability to quickly try Feast, while the advanced mode shows how you can use Feast in a production setting, in particular for the Google Cloud Platform (GCP) cloud.

This tutorial uses Feast with scikit learn to:

* Train a model locally using data from BigQuery
* Test the model for online inference using SQLite (for fast iteration)
* Test the model for online inference using Firestore (to represent production)


## Step 1: Install feast, scikit-learn

Install feast, gcp dependencies and scikit-learn


In [1]:
!pip install feast scikit-learn 'feast[gcp]'



#### Check feast version

In [2]:
!feast version

2025-07-26 07:26:00.398177: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1753514760.417277    2837 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1753514760.423911    2837 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-07-26 07:26:00.447809: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[1m[34mFeast SDK Version: [1m[32m"0.51.0"


## Step 2: Clone the Git repo

Clone the Driver Ranking Git repo into your Colab Folder

In [3]:
!git clone https://github.com/feast-dev/feast-driver-ranking-tutorial.git

Cloning into 'feast-driver-ranking-tutorial'...
remote: Enumerating objects: 65, done.[K
remote: Counting objects: 100% (65/65), done.[K
remote: Compressing objects: 100% (49/49), done.[K
remote: Total 65 (delta 26), reused 43 (delta 14), pack-reused 0 (from 0)[K
Receiving objects: 100% (65/65), 21.31 KiB | 10.66 MiB/s, done.
Resolving deltas: 100% (26/26), done.


## Step 3: Set up your Goggle Cloud Platform (GCP) Configurations

## Authenticate into GCP
This will allow you to do the advanced section of the tutorial, where you materialize remotely on a GCP
Feast spins up infrastructure on GCP using the credentials in our environment. Run the following cell to log into GCP:

In [16]:
from google.colab import auth
auth.authenticate_user()

Set configurations
Set the following configuration, which we'll be using throughout the tutorial:

PROJECT_ID: Your project.
BUCKET_NAME: The name of a bucket which will be used to store the feature store registry and model artifacts.
BIGQUERY_DATASET_NAME: The name of a dataset which will be used to create tables containing features.
AI_PLATFORM_MODEL_NAME: The name of a model name which will be created in AI Platform.

In [20]:
!gcloud projects list

PROJECT_ID                  NAME              PROJECT_NUMBER
applied-buckeye-382114      My First Project  820347523003
echotube-ad7if              EchoTube          990426747743
essential-text-426918-q9    My First Project  82469949418
fridge-chef-577n2           Fridge Chef       1064257934476
gen-lang-client-0621683441  Gemini API        946470080760
neat-fin-382005             My First Project  73886607326
pro-adapter-463813-u8       kf-feast          996193945591


In [21]:
!gcloud config set project pro-adapter-463813-u8

Updated property [core/project].


In [22]:
!gcloud auth list
!gcloud projects get-iam-policy pro-adapter-463813-u8 --filter="bindings.members:$(gcloud config get-value account)" --flatten="bindings[].members" --format="table(bindings.role)"


       Credentialed Accounts
ACTIVE  ACCOUNT
*       chandansahu834902@gmail.com

To set the active account, run:
    $ gcloud config set account `ACCOUNT`

ROLE
roles/owner


In [31]:
PROJECT_ID= "pro-adapter-463813-u8" #@param {type:"string"}
BUCKET_NAME= "driver_ranking_tutorial-chandan123"  #@param {type:"string"} custom
BIGQUERY_DATASET_NAME="feast_driver_ranking_tutorial" #@param {type:"string"} custom
AI_PLATFORM_MODEL_NAME="feast_driver_rankin_jsd_model" #@param {type:"string"

! gcloud config set project $PROJECT_ID
%env GOOGLE_CLOUD_PROJECT=$PROJECT_ID
!echo project_id = $PROJECT_ID > ~/.bigqueryrc

Updated property [core/project].
env: GOOGLE_CLOUD_PROJECT=pro-adapter-463813-u8


In [29]:
# Only run if your bucket doesn't already exist!
! gsutil mb gs://$BUCKET_NAME

Creating gs://driver_ranking_tutorial-chandan123/...


## Step 4: Apply and deploy feature definitions

`feast apply` scans python files in the current directory for feature definitions and deploys infrastructure according to `feature_store.yaml`

In [32]:
%%shell
cd /content/feast-driver-ranking-tutorial/driver_ranking/
feast apply

2025-07-26 08:03:22.666246: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1753517002.689080   12768 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1753517002.696069   12768 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-07-26 08:03:22.718053: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
  File "/usr/local/bin/feast", line 8, in <module>
    sys.exit(cli())
             ^^^^^


CalledProcessError: Command 'cd /content/feast-driver-ranking-tutorial/driver_ranking/
feast apply
' returned non-zero exit status 1.

### Inspect the files created under your local folder

In [None]:
%%shell
cd /content/feast-driver-ranking-tutorial/driver_ranking/data/
ls -l

total 20
-rw-r--r-- 1 root root 16384 Jul 26 20:43 online.db
-rw-r--r-- 1 root root   310 Jul 26 20:43 registry.db




## Step 5: Train your model

In [None]:
import feast
from joblib import dump
import pandas as pd
from sklearn.linear_model import LinearRegression

# Load driver order data
orders = pd.read_csv("/content/feast-driver-ranking-tutorial/driver_orders.csv", sep="\t")
orders["event_timestamp"] = pd.to_datetime(orders["event_timestamp"])

# Connect to your feature store provider
fs = feast.FeatureStore(repo_path="/content/feast-driver-ranking-tutorial/driver_ranking")

# Retrieve training data from BigQuery
training_df = fs.get_historical_features(
    entity_df=orders,
    feature_refs=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
).to_df()

print("----- Feature schema -----\n")
print(training_df.info())

print()
print("----- Example features -----\n")
print(training_df.head())

# Train model
target = "trip_completed"

reg = LinearRegression()
train_X = training_df[training_df.columns.drop(target).drop("event_timestamp")]
train_Y = training_df.loc[:, target]
reg.fit(train_X[sorted(train_X)], train_Y)

# Save model
dump(reg, "driver_model.bin")

----- Feature schema -----

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 6 columns):
 #   Column                                Non-Null Count  Dtype              
---  ------                                --------------  -----              
 0   event_timestamp                       10 non-null     datetime64[ns, UTC]
 1   driver_id                             10 non-null     int64              
 2   trip_completed                        10 non-null     int64              
 3   driver_hourly_stats__conv_rate        10 non-null     float64            
 4   driver_hourly_stats__acc_rate         10 non-null     float64            
 5   driver_hourly_stats__avg_daily_trips  10 non-null     int64              
dtypes: datetime64[ns, UTC](1), float64(2), int64(3)
memory usage: 608.0 bytes
None

----- Example features -----

            event_timestamp  ...  driver_hourly_stats__avg_daily_trips
0 2021-04-17 04:29:28+00:00  ...                     

['driver_model.bin']

## Step 6: Materialize your online store
Apply and materialize data to Firestore

In [None]:
!cd /content/feast-driver-ranking-tutorial/driver_ranking/ && feast materialize-incremental 2022-01-01T00:00:00

Materializing [1m[32m1[0m feature views to [1m[32m2022-01-01 00:00:00+00:00[0m into the [1m[32mdatastore[0m online store.

[1m[32mdriver_hourly_stats[0m from [1m[32m2020-07-27 20:45:14+00:00[0m to [1m[32m2022-01-01 00:00:00+00:00[0m:
100%|███████████████████████████████████████████████████████████████| 10/10 [00:01<00:00,  6.16it/s]


### Step 7:  Make Prediction

In [None]:
import pandas as pd
import feast
from joblib import load


class DriverRankingModel:
    def __init__(self):
        # Load model
        self.model = load("/content/driver_model.bin")

        # Set up feature store
        self.fs = feast.FeatureStore(repo_path="/content/feast-driver-ranking-tutorial/driver_ranking/")

    def predict(self, driver_ids):
        # Read features from Feast
        driver_features = self.fs.get_online_features(
            entity_rows=[{"driver_id": driver_id} for driver_id in driver_ids],
            features=[
                "driver_hourly_stats:conv_rate",
                "driver_hourly_stats:acc_rate",
                "driver_hourly_stats:avg_daily_trips",
            ],
        )
        df = pd.DataFrame.from_dict(driver_features.to_dict())

        # Make prediction
        df["prediction"] = self.model.predict(df[sorted(df)])

        # Choose best driver
        best_driver_id = df["driver_id"].iloc[df["prediction"].argmax()]

        # return best driver
        return best_driver_id

In [None]:
def make_drivers_prediction():
    drivers = [1001, 1002, 1003, 1004]
    model = DriverRankingModel()
    best_driver = model.predict(drivers)
    print(f"Prediction for best driver id: {best_driver}")

In [None]:
make_drivers_prediction()

Prediction for best driver id: 1001
