### Overview
Making a prediction using a linear regression model is a common use case in ML. In this guide tutorial, we build the model that predicts if a driver will complete a trip based on a number of features ingested into Feast.

The basic local mode gives you ability to quickly try Feast, while the advanced mode shows how you can use Feast in a production setting, in particular for the Google Cloud Platform (GCP) cloud.

This tutorial uses Feast with scikit learn to:

* Train a model locally using data from BigQuery
* Test the model for online inference using SQLite (for fast iteration)
* Test the model for online inference using Firestore (to represent production)


## Step 1: Install feast, scikit-learn

Install feast, gcp dependencies and scikit-learn


In [1]:
!pip install feast scikit-learn 'feast[gcp]'



#### Check feast version

In [2]:
!feast version

[1m[34mFeast SDK Version: [1m[32m"0.54.0"


## Step 2: Clone the Git repo

Clone the Driver Ranking Git repo into your Colab Folder

In [3]:
!git clone https://github.com/feast-dev/feast-driver-ranking-tutorial.git

fatal: destination path 'feast-driver-ranking-tutorial' already exists and is not an empty directory.


## Step 3: Set up your Goggle Cloud Platform (GCP) Configurations

## Authenticate into GCP
This will allow you to do the advanced section of the tutorial, where you materialize remotely on a GCP
Feast spins up infrastructure on GCP using the credentials in our environment. Run the following cell to log into GCP:

Set configurations
Set the following configuration, which we'll be using throughout the tutorial:

PROJECT_ID: Your project.
BUCKET_NAME: The name of a bucket which will be used to store the feature store registry and model artifacts.
BIGQUERY_DATASET_NAME: The name of a dataset which will be used to create tables containing features.
AI_PLATFORM_MODEL_NAME: The name of a model name which will be created in AI Platform.

In [4]:
PROJECT_ID= "dulcet-bastion-452612-v4" #@param {type:"string"}
BUCKET_NAME= "dulcet-bastion-452612-v4-driver_ranking_tutorial" #@param {type:"string"} custom
BIGQUERY_DATASET_NAME="feast_driver_ranking_tutorial" #@param {type:"string"} custom
AI_PLATFORM_MODEL_NAME="feast_driver_rankin_jsd_model" #@param {type:"string"}

# Add the --quiet flag to automatically confirm
! gcloud config set project $PROJECT_ID --quiet
%env GOOGLE_CLOUD_PROJECT=$PROJECT_ID
!echo project_id = $PROJECT_ID > ~/.bigqueryrc

Updated property [core/project].
env: GOOGLE_CLOUD_PROJECT=dulcet-bastion-452612-v4


In [5]:
# Only run if your bucket doesn't already exist!
! gsutil mb gs://$BUCKET_NAME

Creating gs://dulcet-bastion-452612-v4-driver_ranking_tutorial/...
AccessDeniedException: 403 592784929565-compute@developer.gserviceaccount.com does not have storage.buckets.create access to the Google Cloud project. Permission 'storage.buckets.create' denied on resource (or it may not exist).


## Step 4: Apply and deploy feature definitions

`feast apply` scans python files in the current directory for feature definitions and deploys infrastructure according to `feature_store.yaml`

In [6]:
!cd feast-driver-ranking-tutorial/driver_ranking && feast apply

No project found in the repository. Using project name driver_ranking defined in feature_store.yaml
Applying changes for project driver_ranking
Traceback (most recent call last):
  File "/home/ssahana1608/feast-venv/bin/feast", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/home/ssahana1608/feast-venv/lib/python3.11/site-packages/click/core.py", line 1462, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ssahana1608/feast-venv/lib/python3.11/site-packages/click/core.py", line 1383, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/ssahana1608/feast-venv/lib/python3.11/site-packages/click/core.py", line 1850, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ssahana1608/feast-venv/lib/python3.11/site-packages/click/core.py", line 1246, in invoke
    return ctx.invoke(self.callback, **ctx.params)
   

### Inspect the files created under your local folder

In [7]:
!cd /content/feast-driver-ranking-tutorial/driver_ranking/data/
!ls -l

/bin/bash: line 1: cd: /content/feast-driver-ranking-tutorial/driver_ranking/data/: No such file or directory
total 52
-rw-r--r-- 1 ssahana1608 ssahana1608 34690 Oct 18 15:55 Modified_Driver_Ranking_Tutorial.ipynb
-rw-r--r-- 1 ssahana1608 ssahana1608  4298 Oct 12 15:08 README.md
drwxr-xr-x 5 ssahana1608 ssahana1608  4096 Oct 12 15:48 feast-driver-ranking-tutorial
-rw-r--r-- 1 ssahana1608 ssahana1608  4009 Oct 12 15:08 iris_data_adapted_for_feast.csv


## Step 5: Train your model

In [8]:
!pip install "numpy<2" "pandas==2.2.2"

Collecting numpy<2
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.3/18.3 MB[0m [31m33.9 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting pandas==2.2.2
  Downloading pandas-2.2.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.0/13.0 MB[0m [31m38.0 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: numpy, pandas
  Attempting uninstall: numpy
    Found existing installation: numpy 2.3.3
    Uninstalling numpy-2.3.3:
      Successfully uninstalled numpy-2.3.3
  Attempting uninstall: pandas
    Found existing installation: pandas 2.3.3
    Uninstalling pandas-2.3.3:
      Successfully uninstalled pandas-2.3.3
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the s

In [9]:
import feast
from joblib import dump
import pandas as pd
from sklearn.linear_model import LinearRegression

# Load driver order data
orders = pd.read_csv("/content/feast-driver-ranking-tutorial/driver_orders.csv", sep="\t")
orders["event_timestamp"] = pd.to_datetime(orders["event_timestamp"])

# Connect to your feature store provider
fs = feast.FeatureStore(repo_path="/content/feast-driver-ranking-tutorial/driver_ranking")

# Retrieve training data from BigQuery
training_df = fs.get_historical_features(
    entity_df=orders,
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
).to_df()

print("----- Feature schema -----\n")
print(training_df.info())

print()
print("----- Example features -----\n")
print(training_df.head())

# Train model
target = "trip_completed"

reg = LinearRegression()
train_X = training_df[training_df.columns.drop(target).drop("event_timestamp")]
train_Y = training_df.loc[:, target]
reg.fit(train_X[sorted(train_X)], train_Y)

# Save model
dump(reg, "driver_model.bin")

FileNotFoundError: [Errno 2] No such file or directory: '/content/feast-driver-ranking-tutorial/driver_orders.csv'

## Step 6: Materialize your online store
Apply and materialize data to Firestore

In [10]:
!cd /content/feast-driver-ranking-tutorial/driver_ranking/ && feast materialize 2021-01-01T00:00:00 2022-01-01T00:00:00

/bin/bash: line 1: cd: /content/feast-driver-ranking-tutorial/driver_ranking/: No such file or directory


### Step 7:  Make Prediction

In [11]:
import pandas as pd
import feast
from joblib import load


class DriverRankingModel:
    def __init__(self):
        # Load model
        self.model = load("/content/driver_model.bin")

        # Set up feature store
        self.fs = feast.FeatureStore(repo_path="/content/feast-driver-ranking-tutorial/driver_ranking/")

    def predict(self, driver_ids):
        # Read features from Feast
        driver_features = self.fs.get_online_features(
            entity_rows=[{"driver_id": driver_id} for driver_id in driver_ids],
            features=[
                "driver_hourly_stats:conv_rate",
                "driver_hourly_stats:acc_rate",
                "driver_hourly_stats:avg_daily_trips",
            ],
        )
        df = pd.DataFrame.from_dict(driver_features.to_dict())

        # Make prediction
        df["prediction"] = self.model.predict(df[sorted(df)])

        # Choose best driver
        best_driver_id = df["driver_id"].iloc[df["prediction"].argmax()]

        # return best driver
        return best_driver_id

In [12]:
def make_drivers_prediction():
    drivers = [1001, 1002, 1003, 1004]
    model = DriverRankingModel()
    best_driver = model.predict(drivers)
    print(f"Prediction for best driver id: {best_driver}")

In [13]:
make_drivers_prediction()

FileNotFoundError: [Errno 2] No such file or directory: '/content/driver_model.bin'