![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2FMLOps%2FFeature+Store&file=Feature+Store+-+Embeddings.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/MLOps/Feature%20Store/Feature%20Store%20-%20Embeddings.ipynb">
      <img width="32px" src="https://www.svgrepo.com/download/217753/github.svg" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/MLOps/Feature%20Store/Feature%20Store%20-%20Embeddings.ipynb">
      <img width="32px" src="https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2FMLOps%2FFeature%2520Store%2FFeature%2520Store%2520-%2520Embeddings.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/bigquery/import?url=https://github.com/statmike/vertex-ai-mlops/blob/main/MLOps/Feature%20Store/Feature%20Store%20-%20Embeddings.ipynb">
      <img width="32px" src="https://www.gstatic.com/images/branding/gcpiconscolors/bigquery/v1/32px.svg" alt="BigQuery logo">
      <br>Open in<br>BigQuery Studio
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/MLOps/Feature%20Store/Feature%20Store%20-%20Embeddings.ipynb">
      <img width="32px" src="https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

---

**File Move Notices**

This file moved locations:
- On 09/08/2024 (mm/dd/yyyy)
	- From: `Feature Store/Feature Store - Embeddings.ipynb`
	- To: `MLOps/Feature Store/Feature Store - Embeddings.ipynb`
---
<!---end of move notices--->

# Feature Store - Embeddings

Managing features for ML with a focus on both training and serving is the focus of [Vertex AI Feature Store](https://cloud.google.com/vertex-ai/docs/featurestore/latest/overview). For a comple workflow based overview of feature store please first review [Feature Store](./Feature%20Store.ipynb).

Embeddings are vectors of condenced information from a larger space.  It could be as simple as converting categorial data to one-hot encodings or addresses to lat/long (note, consider ECEF projection for this).  With the rise of large language models and search applications there are many new text embeddings models that convert text into embeddings that represent the semantic meaninng of the text in a structured form.

Embeddings are features - an array of floats.  You can use Feature Store as a way of retrieving matching entities for a query embedding as well as request matching entities for a query entity id. [Reference](https://cloud.google.com/vertex-ai/docs/featurestore/latest/embeddings-search) 

This workflow will expand upon the previous [feature store overview](./Feature%20Store.ipynb) by adding a feature view with an embedding feature.  This embedding was created by training an autoencoder on the tabular data and using the latent encoding layers to form the embedding.  Review and run the pre-requisite notebook [BQML Autoencoder As Table Embedding](../../../Working%20With/Embeddings/BQML%20Autoencoder%20As%20Table%20Embedding.ipynb).

**Prerequisites:**
-  [01 - BigQuery - Table Data Source](../../01%20-%20Data%20Sources/01%20-%20BigQuery%20-%20Table%20Data%20Source.ipynb)
    - the data source for the examples
- [Feature Store](./Feature%20Store.ipynb)
    - while not required, this notebook covers Feature Store in detail and sets up an instance that will be reused by this notebook
- [BQML Autoencoder As Table Embedding](../../../Working%20With/Embeddings/BQML%20Autoencoder%20As%20Table%20Embedding.ipynb)
    - create embeddings for rows of data in the data source
    
**References:**
- [Feature Store Documentation](https://cloud.google.com/vertex-ai/docs/featurestore/latest/overview)
- [Manage Embeddings in Feature Store](https://cloud.google.com/vertex-ai/docs/featurestore/latest/embeddings-search)


---
## Colab Setup

When running this notebook in [Colab](https://colab.google/) or [Colab Enterprise](https://cloud.google.com/colab/docs/introduction), this section will authenticate to GCP (follow prompts in the popup) and set the current project for the session.

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    from google.colab import auth
    auth.authenticate_user(project_id = PROJECT_ID)
    print('Colab authorized to GCP')
except Exception:
    print('Not a Colab Environment')
    pass

Not a Colab Environment


---
## Installs

The list `packages` contains tuples of package import names and install names.  If the import name is not found then the install name is used to install quitely for the current user.

In [3]:
# tuples of (import name, install name)
packages = [
    ('google.cloud.aiplatform', 'google-cloud-aiplatform', '1.62.0'),
    ('google.cloud.bigquery', 'google-cloud-bigquery'),
    ('bigframes', 'bigframes'),
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [4]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

---
## Setup

inputs:

In [5]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [6]:
REGION = 'us-central1'
EXPERIMENT = 'embeddings'
SERIES = 'feature-store'

# source data
BQ_PROJECT = PROJECT_ID
BQ_DATASET = 'fraud'
BQ_TABLE = 'fraud_prepped_embedding'

packages:

In [7]:
from google.cloud import aiplatform
import time
import numpy as np
import asyncio
from google.cloud import bigquery
import bigframes.pandas as bpd

clients:

In [8]:
aiplatform.init(project = PROJECT_ID, location = REGION)
bq = bigquery.Client(project = PROJECT_ID)
#bpd.options.bigquery.project = PROJECT_ID

---
## Review Source Data

The data source here was prepared in [01 - BigQuery - Table Data Source](../../01%20-%20Data%20Sources/01%20-%20BigQuery%20-%20Table%20Data%20Source.ipynb).

This is a table of 284,807 credit card transactions classified as fradulant or normal in the column `Class`.  In order protect confidentiality, the original features have been transformed using [principle component analysis (PCA)](https://en.wikipedia.org/wiki/Principal_component_analysis) into 28 features named `V1, V2, ... V28` (float).  Two descriptive features are provided without transformation by PCA:
- `Time` (integer) is the seconds elapsed between the transaction and the earliest transaction in the table
- `Amount` (float) is the value of the transaction

The data preparation included added splits for machine learning with a column named `splits` with 80% for training (`TRAIN`), 10% for validation (`VALIDATE`) and 10% for testing (`TEST`).  Additionally, a unique identifier was added to each transaction, `transaction_id`.  

The additional workflow ["BQML Autoencoder As Table Embedding"](../../Applied%20GenAI/Embeddings/BQML%20Autoencoder%20As%20Table%20Embedding.ipynb) augmented the data with an encoding built by training an autoencoder and using the latent space form the encoder to form an embedding.


### Connect to the BigQuery table:

In [9]:
source_data = bpd.read_gbq(f'{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}')

### Review a sample of the data:

In [10]:
source_data.head()

Unnamed: 0,transaction_id,Time,V1,V2,V3,V4,V5,V6,V7,V8,...,V23,V24,V25,V26,V27,V28,Amount,Class,splits,embedding
0,51698445-2df7-4b30-8c05-661d0f934411,87453,-0.913532,1.488683,-0.055828,-1.073554,0.641097,0.668031,-0.346768,-3.139927,...,-0.239166,0.189976,1.906899,0.629313,0.115069,0.146527,35.0,0,TRAIN,"[0.5977821652479549, 0.0, 0.36559913211568446,..."
1,b47162eb-0720-4b5d-9dfb-18d239a2b35a,163504,-0.642694,0.657615,-1.219919,-0.536793,0.982125,0.7905,0.913977,0.67696,...,0.334306,-1.562848,-1.040489,-0.165496,0.16578,0.210512,145.9,0,TRAIN,"[0.4127544656891617, 0.0, 0.25944730398086335,..."
2,2fba541e-a9ff-4946-8b57-0fd235fb40bd,37367,1.016735,0.644522,1.346616,3.574517,-0.116084,0.695501,-0.313215,0.27986,...,-0.009286,0.237722,0.304148,0.088105,0.028592,0.023834,6.93,0,TRAIN,"[0.48196315087096875, 0.0, 0.39905086264143763..."
3,f470444d-a2af-4de6-b08f-df37ef7d2315,140817,1.973879,-0.509845,-1.227686,0.081128,-0.208654,-0.508661,-0.163037,-0.005953,...,0.139703,-0.525682,-0.102873,-0.07585,-0.046174,-0.069476,34.66,0,TEST,"[0.15399287512902307, 0.0, 0.5188780427569708,..."
4,b76e73da-dee3-44bd-a7b5-d5be17214ef0,80660,1.31612,0.428729,-0.014443,0.485672,0.150964,-0.569457,0.160334,-0.227616,...,0.009039,-0.471929,0.372731,0.142581,-0.01537,0.025051,1.79,0,TEST,"[0.15605789043098536, 0.0, 0.6220855584587272,..."


In [11]:
source_data.dtypes

transaction_id                         string
Time                                    Int64
V1                                    Float64
V2                                    Float64
V3                                    Float64
V4                                    Float64
V5                                    Float64
V6                                    Float64
V7                                    Float64
V8                                    Float64
V9                                    Float64
V10                                   Float64
V11                                   Float64
V12                                   Float64
V13                                   Float64
V14                                   Float64
V15                                   Float64
V16                                   Float64
V17                                   Float64
V18                                   Float64
V19                                   Float64
V20                               

### Get Row Level IDs

Grab a short list of `transaction_id` values from the source data to use in examples and testing throughout this workflow.

In [12]:
transaction_ids = list(source_data['transaction_id'].head(10))

In [13]:
transaction_ids

['51698445-2df7-4b30-8c05-661d0f934411',
 'b47162eb-0720-4b5d-9dfb-18d239a2b35a',
 '2fba541e-a9ff-4946-8b57-0fd235fb40bd',
 'f470444d-a2af-4de6-b08f-df37ef7d2315',
 'b76e73da-dee3-44bd-a7b5-d5be17214ef0',
 'e8a3d7f9-be9d-4117-832e-ab889ee0f52e',
 '9dc134af-0eb0-48b4-b468-41235c4383dd',
 '446f0f5f-a2c5-46f7-ba6b-5536a5b431b1',
 '563d577a-5021-465f-8469-1b7375295ddf',
 'c96ae226-c169-4bca-8a03-d9117771e05b']

---
## Feature Store For Vector Search

The Vertex AI Feature Store can be used for vector matching and retrieval.  To extend the feature store for this capability these are the steps:

- Use an [optimized online serving](https://cloud.google.com/vertex-ai/docs/featurestore/latest/online-serving-types) instance
- Create feature view with vector search configuration

### Feature Online Store Admin Client

Used to create online stores and feature views

In [14]:
online_admin_client = aiplatform.gapic.FeatureOnlineStoreAdminServiceClient(client_options = dict(api_endpoint = f'{REGION}-aiplatform.googleapis.com'))

### Create/Retrieve Online Store

**NOTE:** This can take around 10 minutes if creating a new feature store instance

**Reference:**
- [Online Serving Types](https://cloud.google.com/vertex-ai/docs/featurestore/latest/online-serving-types)
- [Create an Online Store Instance](https://cloud.google.com/vertex-ai/docs/featurestore/latest/create-onlinestore)

In [15]:
FEATURE_ONLINE_STORE_NAME = 'featurestore_optimized'

In [16]:
try:
    online_store = online_admin_client.get_feature_online_store(name = f'projects/{PROJECT_ID}/locations/{REGION}/featureOnlineStores/{FEATURE_ONLINE_STORE_NAME}')
except Exception:
    create_online_store = online_admin_client.create_feature_online_store(
        request = aiplatform.gapic.CreateFeatureOnlineStoreRequest(
            parent = f'projects/{PROJECT_ID}/locations/{REGION}',
            feature_online_store_id = FEATURE_ONLINE_STORE_NAME,
            feature_online_store = aiplatform.gapic.FeatureOnlineStore(
                optimized = aiplatform.gapic.FeatureOnlineStore.Optimized()
            )
        )
    )
    online_store = create_online_store.result()
    
online_store.name

'projects/1026793852137/locations/us-central1/featureOnlineStores/featurestore_optimized'

In [17]:
print(f'Review in the console:\n\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/online-stores/featurestore?project={PROJECT_ID}')

Review in the console:

https://console.cloud.google.com/vertex-ai/locations/us-central1/online-stores/featurestore?project=statmike-mlops-349915


### Create Feature View: From BigQuery Source

Create a feature view directly from a BigQuery table/view, the 'latest' version created above.

**Reference:**
- [Create a feature view from a BigQuery source](https://cloud.google.com/vertex-ai/docs/featurestore/latest/create-featureview#create_from_bq)
- API Link for [`aiplatform.gapic.FeatureView.IndexConfig()`](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.FeatureView.IndexConfig)

In [18]:
BQ_FEATURE_VIEW_NAME = 'bq_embedding_view'

In [19]:
try:
    bq_view = online_admin_client.get_feature_view(name = f'{online_store.name}/featureViews/{BQ_FEATURE_VIEW_NAME}')
except Exception:
    create_bq_view = online_admin_client.create_feature_view(
        request = aiplatform.gapic.CreateFeatureViewRequest(
            parent = online_store.name,
            feature_view_id = BQ_FEATURE_VIEW_NAME,
            feature_view = aiplatform.gapic.FeatureView(
                big_query_source = aiplatform.gapic.FeatureView.BigQuerySource(
                    uri = f'bq://{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}',
                    entity_id_columns = ['transaction_id']
                ),
                sync_config = aiplatform.gapic.FeatureView.SyncConfig(cron = 'TZ=America/New_York 10 * * * *'),
                index_config = aiplatform.gapic.FeatureView.IndexConfig(
                    embedding_column = 'embedding',
                    embedding_dimension = 8,
                    filter_columns = ['splits'],
                    crowding_column = 'Class',
                    tree_ah_config = aiplatform.gapic.FeatureView.IndexConfig.TreeAHConfig(
                        leaf_node_embedding_count = 500
                    ),
                    distance_measure_type = aiplatform.gapic.FeatureView.IndexConfig.DistanceMeasureType(3)
                )
            ),
            run_sync_immediately = True
        )
    )
    bq_view = create_bq_view.result()
    
bq_view.name

'projects/1026793852137/locations/us-central1/featureOnlineStores/featurestore_optimized/featureViews/bq_embedding_view'

In [20]:
print(f'Review in the console:\n\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/online-stores/featurestore/feature-views/{BQ_FEATURE_VIEW_NAME}?project={PROJECT_ID}')

Review in the console:

https://console.cloud.google.com/vertex-ai/locations/us-central1/online-stores/featurestore/feature-views/bq_embedding_view?project=statmike-mlops-349915


### Start/Get Sync Manually: BQ View

Manually start a sync for the feature view create from BigQuery source.

**References:**
- [Sync feature data to online store](https://cloud.google.com/vertex-ai/docs/featurestore/latest/sync-data)
- [List sync operations](https://cloud.google.com/vertex-ai/docs/featurestore/latest/list-data-syncs)

In [21]:
bq_sync = list(online_admin_client.list_feature_view_syncs(parent = bq_view.name))[-1]

In [22]:
bq_sync.name

'projects/1026793852137/locations/us-central1/featureOnlineStores/featurestore_optimized/featureViews/bq_embedding_view/featureViewSyncs/7924924894091935744'

In [23]:
while True:
    feature_view_sync = online_admin_client.get_feature_view_sync(name = bq_sync.name)
    if feature_view_sync.run_time.end_time.seconds > 0:
        status = feature_view_sync.final_status.code
        break
    else:
        print('waiting for 20 seconds...')
    time.sleep(20)
    
if status == 0: print('Succeeded!')
else: print('Failed!')

Succeeded!


In [24]:
online_admin_client.list_feature_view_syncs(
    request = dict(
        parent = bq_view.name,
        page_size = 1,
        #filter = f'create_time > "{(datetime.now() - timedelta(hours = 9)).strftime("%Y-%m-%dT%X")}"'
    )
)

ListFeatureViewSyncsPager<feature_view_syncs {
  name: "projects/1026793852137/locations/us-central1/featureOnlineStores/featurestore_optimized/featureViews/bq_embedding_view/featureViewSyncs/7924924894091935744"
  create_time {
    seconds: 1714825378
    nanos: 64941000
  }
  final_status {
  }
  run_time {
    start_time {
      seconds: 1714825378
      nanos: 64941000
    }
    end_time {
      seconds: 1714826447
      nanos: 124028000
    }
  }
}
>

In [25]:
print(f'Review in the console:\n\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/online-stores/featurestore/feature-views/{BQ_FEATURE_VIEW_NAME}?project={PROJECT_ID}')

Review in the console:

https://console.cloud.google.com/vertex-ai/locations/us-central1/online-stores/featurestore/feature-views/bq_embedding_view?project=statmike-mlops-349915


---
## Online Serving From Feature Views

The feature view has embeddings and features.  This section will show all the ways to retrieve features, matching entities with and without the other features, and vector based matches.  The searches will also be shown with filters and crowding.
    
**References:**
- [Serve feature values](https://cloud.google.com/vertex-ai/docs/featurestore/latest/serve-feature-values)
- **NEW SDK**:
    - https://github.com/googleapis/python-aiplatform/blob/main/vertexai/resources/preview/feature_store/feature_view.py

In [26]:
transaction_ids

['51698445-2df7-4b30-8c05-661d0f934411',
 'b47162eb-0720-4b5d-9dfb-18d239a2b35a',
 '2fba541e-a9ff-4946-8b57-0fd235fb40bd',
 'f470444d-a2af-4de6-b08f-df37ef7d2315',
 'b76e73da-dee3-44bd-a7b5-d5be17214ef0',
 'e8a3d7f9-be9d-4117-832e-ab889ee0f52e',
 '9dc134af-0eb0-48b4-b468-41235c4383dd',
 '446f0f5f-a2c5-46f7-ba6b-5536a5b431b1',
 '563d577a-5021-465f-8469-1b7375295ddf',
 'c96ae226-c169-4bca-8a03-d9117771e05b']

### Setup New SDK

In [27]:
from vertexai.resources.preview.feature_store.feature_view import FeatureView

In [28]:
fsv = FeatureView(name = bq_view.name)

### Retrieve Features For An Entity

**NOTE:** The embedding is also retrieved.

In [29]:
fsv.read(key = transaction_ids[0:1]).to_dict()

Public endpoint for the optimized online store featurestore_optimized is 1762965740069060608.us-central1-1026793852137.featurestore.vertexai.goog


{'features': [{'name': 'Time', 'value': {'int64_value': '87453'}},
  {'name': 'V1', 'value': {'double_value': -0.9135316526409949}},
  {'name': 'V2', 'value': {'double_value': 1.48868345547945}},
  {'name': 'V3', 'value': {'double_value': -0.0558275127733335}},
  {'name': 'V4', 'value': {'double_value': -1.0735543425487502}},
  {'name': 'V5', 'value': {'double_value': 0.641097083307115}},
  {'name': 'V6', 'value': {'double_value': 0.668030764339843}},
  {'name': 'V7', 'value': {'double_value': -0.346768028815742}},
  {'name': 'V8', 'value': {'double_value': -3.13992719016926}},
  {'name': 'V9', 'value': {'double_value': -0.251660311329974}},
  {'name': 'V10', 'value': {'double_value': -1.1054730243276598}},
  {'name': 'V11', 'value': {'double_value': 0.860519358053083}},
  {'name': 'V12', 'value': {'double_value': 0.965772150844481}},
  {'name': 'V13', 'value': {'double_value': 0.502979347618886}},
  {'name': 'V14', 'value': {'double_value': -1.0579731390941}},
  {'name': 'V15', 'value

### Search For Matches Based On Entity

In [31]:
fsv.search(
    entity_id = transaction_ids[0],
    neighbor_count = 5,
    return_full_entity = False
)

SearchNearestEntitiesResponse(_response=nearest_neighbors {
  neighbors {
    entity_id: "6410b2ea-d053-4b31-b3a9-9ffdf15934c1"
    distance: -0.9982097148895264
  }
  neighbors {
    entity_id: "628e0548-55c8-4491-94db-e3a4fb168ab7"
    distance: -0.9976215362548828
  }
  neighbors {
    entity_id: "8ab474f0-ca7f-42f9-bfdd-e7140446ba4a"
    distance: -0.9963568449020386
  }
  neighbors {
    entity_id: "2ca52b14-1de9-4242-bf3a-959834bcc605"
    distance: -0.9961671829223633
  }
  neighbors {
    entity_id: "f56fd7cd-714b-4b1b-af28-13cec7c60745"
    distance: -0.9953159093856812
  }
}
)

### Search For Matches: Return All Features

In [32]:
fsv.search(
    entity_id = transaction_ids[0],
    neighbor_count = 1,
    return_full_entity = True
).to_dict()

{'neighbors': [{'entity_id': '6410b2ea-d053-4b31-b3a9-9ffdf15934c1',
   'distance': -0.9982097148895264,
   'entity_key_values': {'key_values': {'features': [{'name': 'Time',
       'value': {'int64_value': '41012'}},
      {'name': 'V1', 'value': {'double_value': -5.5127138509665095}},
      {'name': 'V2', 'value': {'double_value': -5.6601811649875495}},
      {'name': 'V3', 'value': {'double_value': 1.2347485263843898}},
      {'name': 'V4', 'value': {'double_value': 0.11835837075961302}},
      {'name': 'V5', 'value': {'double_value': 3.69773525010983}},
      {'name': 'V6', 'value': {'double_value': -2.5178590657829703}},
      {'name': 'V7', 'value': {'double_value': -2.78345734820181}},
      {'name': 'V8', 'value': {'double_value': 1.19022483423165}},
      {'name': 'V9', 'value': {'double_value': 0.548229655299798}},
      {'name': 'V10', 'value': {'double_value': -1.55683141932904}},
      {'name': 'V11', 'value': {'double_value': 1.18848189256695}},
      {'name': 'V12', 'val

### Search For Matches Based On Embedding

In [33]:
fsv.search(
    embedding_value = [1] * 8,
    neighbor_count = 5,
    return_full_entity = False
).to_dict()

{'neighbors': [{'entity_id': 'e24b08b8-544e-4063-9d47-ae6b9fcb5b73',
   'distance': -2.640953779220581},
  {'entity_id': 'c1581ddc-0e0e-4aa7-ad1e-6e703fce006e',
   'distance': -2.640927314758301},
  {'entity_id': 'bc1016f2-0086-4659-8069-45aba2f515d2',
   'distance': -2.6408705711364746},
  {'entity_id': '35c0d534-ba6a-479c-96bf-4288333f62cb',
   'distance': -2.6404473781585693},
  {'entity_id': '36e4743b-8680-49f2-b4c7-674944cb7b7c',
   'distance': -2.640377998352051}]}

### Search For Matches: With Filter

Use allow and deny tokens to subset the search.  In this case the data had a `splits` column with values 'TRAIN', 'TEST', 'VALIDATE'.  This search will look for matches in the 'TEST' and 'VALIDATE' entities and not the 'TRAIN' entities.

In [34]:
fsv.search(
    entity_id = transaction_ids[0],
    neighbor_count = 5,
    return_full_entity = False,
    string_filters = [
        aiplatform.gapic.NearestNeighborQuery.StringFilter(
            name = 'splits',
            allow_tokens = ['TEST', 'VALIDATE'],
            deny_tokens = ['TRAIN']
        )
    ]
).to_dict()

{'neighbors': [{'entity_id': '4ffd1d9a-209e-42d8-9811-324fca5287a1',
   'distance': -0.9949505925178528},
  {'entity_id': '7a3590c5-3a07-466b-b0ef-43ef77dee7e8',
   'distance': -0.9934127926826477},
  {'entity_id': 'fbd44e94-86b7-4c8a-a0ae-72a6e4c403e0',
   'distance': -0.9924111366271973},
  {'entity_id': '5922c640-850e-4bd4-87d4-dd1e1cc00fcb',
   'distance': -0.9923803806304932},
  {'entity_id': '61410583-131a-4930-8b82-d005a821bc79',
   'distance': -0.9911936521530151},
  {'entity_id': 'a8b656ab-b96f-42a4-aace-594b72c1795c',
   'distance': -0.9911452531814575}]}

### Search For Matches: With Filter And Crowding

Extending the filtering to also limit the number of matches based on the labeles stored in the column `Class` which was used as the crowding column. The `per_crowding_attribute_neighbor_count` parameter limmits the number of matches returned by the values of `Class`, giving a variety of matches based on this crowding column.

In [35]:
fsv.search(
    entity_id = transaction_ids[0],
    neighbor_count = 5,
    return_full_entity = False,
    string_filters = [
        aiplatform.gapic.NearestNeighborQuery.StringFilter(
            name = 'splits',
            allow_tokens = ['TEST', 'VALIDATE'],
            deny_tokens = ['TRAIN']
        )
    ],
    per_crowding_attribute_neighbor_count = 3
).to_dict()

{'neighbors': [{'entity_id': '4ffd1d9a-209e-42d8-9811-324fca5287a1',
   'distance': -0.9949505925178528},
  {'entity_id': '7a3590c5-3a07-466b-b0ef-43ef77dee7e8',
   'distance': -0.9934127926826477},
  {'entity_id': 'fbd44e94-86b7-4c8a-a0ae-72a6e4c403e0',
   'distance': -0.9924111366271973}]}

### Search For Matches: Increase The Search Space

In [39]:
fsv.search(
    entity_id = transaction_ids[0],
    neighbor_count = 5,
    return_full_entity = False,
    leaf_nodes_search_fraction = 1.0
).to_dict()

{'neighbors': [{'entity_id': '6410b2ea-d053-4b31-b3a9-9ffdf15934c1',
   'distance': -0.9982097148895264},
  {'entity_id': '628e0548-55c8-4491-94db-e3a4fb168ab7',
   'distance': -0.9976215362548828},
  {'entity_id': '8ab474f0-ca7f-42f9-bfdd-e7140446ba4a',
   'distance': -0.9963568449020386},
  {'entity_id': '2ca52b14-1de9-4242-bf3a-959834bcc605',
   'distance': -0.9961671829223633},
  {'entity_id': 'f56fd7cd-714b-4b1b-af28-13cec7c60745',
   'distance': -0.9953159093856812}]}

## Search For Matches: Brute Force

### Create Feature View: From BigQuery Source

**Similar to Above But With Brute Force Configuration**

Create a feature view directly from a BigQuery table/view, the 'latest' version created above.

**Reference:**
- [Create a feature view from a BigQuery source](https://cloud.google.com/vertex-ai/docs/featurestore/latest/create-featureview#create_from_bq)
- API Link for [`aiplatform.gapic.FeatureView.IndexConfig()`](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.FeatureView.IndexConfig)

In [40]:
BQ_FEATURE_VIEW_NAME_2 = 'bq_embedding_bruteforce_view'

In [41]:
try:
    bq_view = online_admin_client.get_feature_view(name = f'{online_store.name}/featureViews/{BQ_FEATURE_VIEW_NAME_2}')
except Exception:
    create_bq_view = online_admin_client.create_feature_view(
        request = aiplatform.gapic.CreateFeatureViewRequest(
            parent = online_store.name,
            feature_view_id = BQ_FEATURE_VIEW_NAME_2,
            feature_view = aiplatform.gapic.FeatureView(
                big_query_source = aiplatform.gapic.FeatureView.BigQuerySource(
                    uri = f'bq://{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}',
                    entity_id_columns = ['transaction_id']
                ),
                sync_config = aiplatform.gapic.FeatureView.SyncConfig(cron = 'TZ=America/New_York 10 * * * *'),
                index_config = aiplatform.gapic.FeatureView.IndexConfig(
                    embedding_column = 'embedding',
                    embedding_dimension = 8,
                    filter_columns = ['splits'],
                    crowding_column = 'Class',
                    #tree_ah_config = aiplatform.gapic.FeatureView.IndexConfig.TreeAHConfig(
                    #    leaf_node_embedding_count = 500
                    #),
                    brute_force_config = aiplatform.gapic.FeatureView.IndexConfig.BruteForceConfig(),
                    distance_measure_type = aiplatform.gapic.FeatureView.IndexConfig.DistanceMeasureType(3)
                )
            ),
            run_sync_immediately = True
        )
    )
    bq_view = create_bq_view.result()
    
bq_view.name

'projects/1026793852137/locations/us-central1/featureOnlineStores/featurestore_optimized/featureViews/bq_embedding_bruteforce_view'

In [42]:
print(f'Review in the console:\n\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/online-stores/featurestore/feature-views/{BQ_FEATURE_VIEW_NAME_2}?project={PROJECT_ID}')

Review in the console:

https://console.cloud.google.com/vertex-ai/locations/us-central1/online-stores/featurestore/feature-views/bq_embedding_bruteforce_view?project=statmike-mlops-349915


In [43]:
bq_sync = list(online_admin_client.list_feature_view_syncs(parent = bq_view.name))[-1]

In [44]:
bq_sync.name

'projects/1026793852137/locations/us-central1/featureOnlineStores/featurestore_optimized/featureViews/bq_embedding_bruteforce_view/featureViewSyncs/6509956783237234688'

In [45]:
while True:
    feature_view_sync = online_admin_client.get_feature_view_sync(name = bq_sync.name)
    if feature_view_sync.run_time.end_time.seconds > 0:
        status = feature_view_sync.final_status.code
        break
    else:
        print('waiting for 20 seconds...')
    time.sleep(20)
    
if status == 0: print('Succeeded!')
else: print('Failed!')

waiting for 20 seconds...
waiting for 20 seconds...
waiting for 20 seconds...
waiting for 20 seconds...
waiting for 20 seconds...
waiting for 20 seconds...
waiting for 20 seconds...
waiting for 20 seconds...
Succeeded!


### Brute Force Search

In [50]:
fsv = vertexai.resources.preview.FeatureView(name = bq_view.name)

In [51]:
transaction_ids[0]

'51698445-2df7-4b30-8c05-661d0f934411'

In [54]:
embedding = source_data[(source_data['transaction_id'] == transaction_ids[0])]['embedding'][0]
embedding

[0.5977821652479549,
 0.0,
 0.36559913211568446,
 0.1804625550232114,
 0.11810490530542082,
 0.3456272434200685,
 0.3010381280111156,
 0.5023903951852832]

In [55]:
fsv.search(
    #entity_id = transaction_ids[0],
    embedding_value = embedding,
    neighbor_count = 5,
    return_full_entity = False
).to_dict()

{'neighbors': [{'entity_id': '51698445-2df7-4b30-8c05-661d0f934411',
   'distance': -1.0},
  {'entity_id': '6410b2ea-d053-4b31-b3a9-9ffdf15934c1',
   'distance': -0.9982097148895264},
  {'entity_id': '628e0548-55c8-4491-94db-e3a4fb168ab7',
   'distance': -0.9976215362548828},
  {'entity_id': '8ab474f0-ca7f-42f9-bfdd-e7140446ba4a',
   'distance': -0.9963568449020386},
  {'entity_id': '2ca52b14-1de9-4242-bf3a-959834bcc605',
   'distance': -0.9961671829223633}]}

## Cleanup

This deletes the Feature Store objects: Registry Views, Online Store, Feature, then Feature Group, 

In [131]:
# delete feature store objects:

del_registry_views = False
del_online_store = False
#del_features = False
#del_feature_group = False

if del_registry_views:
    #online_admin_client.delete_feature_view(name = registry_view.name)
    online_admin_client.delete_feature_view(name = bq_view.name)
if del_online_store:
    online_admin_client.delete_feature_online_store(name = online_store.name, force = True)
#if del_features:
#    registry_client.delete_feature(name = feature.name)
#if del_feature_group:
#    registry_client.delete_feature_group(name = feature_group.name)