# 👩🏻‍🔬 Offline inference pipeline: Computing item embeddings

In this notebook you will compute the candidate embeddings and populate a Hopsworks feature group with a vector index.

In [1]:
%load_ext autoreload
%autoreload 2

import warnings

warnings.filterwarnings("ignore")

from loguru import logger
from recsys.config import settings
from recsys.gcp.vertex_ai import model_registry
from recsys.gcp.bigquery import client as bq_client
from recsys.gcp.feature_store import client as fs_client
from recsys.core.embeddings.computation import compute_embeddings
from recsys.gcp.feature_store.datasets import create_training_dataset
from recsys.core.embeddings.preprocessing import preprocess_candidates
from recsys.data.preprocessing.splitting import train_validation_test_split

## ☁️ Connect to Vertex AI Feature Online Store

In [None]:
fos = fs_client.get_client()

In [3]:
trans_fv, articles_fv, customers_fv, _ = fs_client.get_feature_views(fos)

# Computing candidate embeddings

You start by computing candidate embeddings for all items in the training data.

First, you load your candidate model. Recall that you uploaded it to the Vertex AI Model Registry in previous steps:

In [None]:
candidate_model, candidate_features = model_registry.get_model(
    model_name="candidate_tower_v1",
    download_model=True
)

### Get candidates data

Now, we get the training retrieval data containing all the features required for the candidate embedding model.

In [None]:
training_data = create_training_dataset(trans_fv, articles_fv, customers_fv)

In [None]:
train_df, val_df, test_def, _, _, _ = train_validation_test_split(
    df=training_data,
    validation_size=settings.TWO_TOWER_DATASET_VALIDATION_SPLIT_SIZE,
    test_size=settings.TWO_TOWER_DATASET_TEST_SPLIT_SIZE,
)

In [None]:
train_df.head(3)

### Compute embeddings

Next you compute the embeddings of all candidate items that were used to train the retrieval model.

In [None]:
item_df = preprocess_candidates(train_df, candidate_features)
item_df.head(3)

In [None]:
embeddings_df = compute_embeddings(item_df, candidate_model)
embeddings_df.head()

# <span style="color:#ff5f27">Create Vertex AI Embedding Index </span>

Now you are ready to create a feature group for your candidate embeddings.

To begin with, you need to create your Embedding Index where you will specify the name of the embeddings feature and the embeddings length.
Then you attach this index to the FV.

In [None]:
logger.info("Uploading 'candidates' Feature to BigQuery.")
bq_client.load_features(candidates_df=embeddings_df)
logger.info("✅ Uploaded 'candidates' Feature to BigQuery!")