#Requirements

Required Compute Type: Classic Compute

Required Runtime: 16.4 ML

Compute Type: A10 or T4 GPU

#Introduction

This notebook is apart of the DSA Databricks Blog Post here: <insert link>

We will be deploying a multi-modal embedding model called ColNomic. It is important to use a multi-modal embedding model so that we can embed different types of modalities. Please review the blog post as to how embedding spaces work to understand the importance of selecting a multi-modal embedding model. 

That said, after this notebook, you will be able to:
1. Load and Register a Huggingface Multi-Modal Embedding Model to Unity Catalog
2. Serve the model to Databricks Model Serving to process PDFs

#Install your Dependencies

In [0]:
%pip install --upgrade git+https://github.com/illuin-tech/colpali qwen-vl-utils accelerate numpy pillow scikit-learn torch==2.6.0 requests databricks-sdk mlflow 

Collecting git+https://github.com/illuin-tech/colpali
  Cloning https://github.com/illuin-tech/colpali to /tmp/pip-req-build-lw444trk
  Running command git clone --filter=blob:none --quiet https://github.com/illuin-tech/colpali /tmp/pip-req-build-lw444trk
  Resolved https://github.com/illuin-tech/colpali to commit fbf9dcc70ef591dcadd1aa73ab019e97a60f272a
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting qwen-vl-utils
  Downloading qwen_vl_utils-0.0.11-py3-none-any.whl.metadata (6.3 kB)
Collecting accelerate
  Downloading accelerate-1.7.0-py3-none-any.whl.metadata (19 kB)
Collecting matplotlib
  Downloading matplotlib-3.10.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
C

In [0]:
dbutils.library.restartPython()

#Update the config file

You need to update the config file to your specific catalog, schema and table/volume/index names. Otherwise, the notebook will use the default names to spin up the necessary resources

In [0]:
from config import volume_label, volume_name, catalog, schema, model_name, model_endpoint_name, embedding_table_name, embedding_table_name_index, registered_model_name, vector_search_endpoint_name

In [0]:
spark.sql(f"CREATE CATALOG IF NOT EXISTS {catalog}")
spark.sql(f"CREATE SCHEMA IF NOT EXISTS {catalog}.{schema}")

DataFrame[]

In [0]:
import mlflow
from mlflow.models.signature import ModelSignature
from mlflow.pyfunc import PythonModel
from mlflow.types.schema import Schema, ColSpec, TensorSpec

from colpali_engine.models import ColQwen2_5, ColQwen2_5_Processor
from PIL import Image
import base64
from io import BytesIO
import os
import numpy as np
import torch
from transformers.utils.import_utils import is_flash_attn_2_available
import time
import pandas as pd 

mlflow.autolog()

2025-06-03 06:02:46.079998: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-06-03 06:02:46.285255: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-06-03 06:02:46.352064: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-06-03 06:02:46.751859: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


[2025-06-03 06:03:01,618] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)


df: /root/.triton/autotune: No such file or directory
/usr/bin/ld: cannot find -laio: No such file or directory
collect2: error: ld returned 1 exit status
/usr/bin/ld: cannot find -lcufile: No such file or directory
collect2: error: ld returned 1 exit status
2025/06/03 06:03:09 INFO mlflow.tracking.fluent: Autologging successfully enabled for keras.
2025/06/03 06:03:09 INFO mlflow.tracking.fluent: Autologging successfully enabled for sklearn.
2025/06/03 06:03:09 INFO mlflow.tracking.fluent: Autologging successfully enabled for tensorflow.
2025/06/03 06:03:09 INFO mlflow.tracking.fluent: Autologging successfully enabled for transformers.
2025/06/03 06:03:09 INFO mlflow.tracking.fluent: Autologging successfully enabled for xgboost.
2025/06/03 06:03:09 INFO mlflow.tracking.fluent: Autologging successfully enabled for pyspark.
2025/06/03 06:03:09 INFO mlflow.tracking.fluent: Autologging successfully enabled for pyspark.ml.


#Load the model

We will download the model into a Databricks Volume as the model is too large to be loaded into workspace files. 

We can use the cache_dir parameter to refer back to the model and avoid redownloading the model each time

In [0]:
spark.sql(f"CREATE VOLUME IF NOT EXISTS {catalog}.{schema}.{volume_label}")

DataFrame[]

In [0]:
current_device = "cuda:0" if torch.cuda.is_available() else "cpu"
model_name = "nomic-ai/colnomic-embed-multimodal-7b"

model = ColQwen2_5.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="cuda:0",
    attn_implementation="flash_attention_2" if is_flash_attn_2_available() else None,
).eval()
processor = ColQwen2_5_Processor.from_pretrained(model_name, cache_dir=volume_name) #change this to YOUR volume path. It can be the same volume path

#Wrap logic into a PythonModel for mlflow Pyfunc

We will wrap some python logic and the model itself into a mlflow.PythonModel so that we can register the model with some logic to Unity Catalog. 

Pyfunc allows us to package up our model with code using PythonModel, enabling model as code. This allows us to pass in images of different formats like url or base64 and still be able to process the image

In [0]:

class ColQwenInferenceModel(PythonModel):
    def load_context(self, context):
        self.current_device = "cuda:0" if torch.cuda.is_available() else "cpu" 
        self.model_name = "nomic-ai/colnomic-embed-multimodal-7b"

        #download the model from your CACHE which should be saved to a volume in the cell before
        self.model = ColQwen2_5.from_pretrained(
            self.model_name,
            torch_dtype=torch.bfloat16,
            device_map=self.current_device,
            attn_implementation="flash_attention_2" if is_flash_attn_2_available() else None,
            cache_dir=context.artifacts['cache'] #this is how you refer to mlflow logged artifacts               
        ).eval()
        self.processor = ColQwen2_5_Processor.from_pretrained(self.model_name, cache_dir=context.artifacts['cache'])#this is how you refer to mlflow logged artifacts  

    def generate_image_embedding_from_base64_string(self, base64_string):
        """
        Generate embeddings for an image from a base64 encoded string using the ColQwen2.5 model.
        
        Then, flattens the embedding, so it can be used for semantic search.
        
        Args:
            base64_string: Base64 encoded string of the image
            
        Returns:
            Dictionary containing flattened image embedding and time it took to generate the embedding.
        """
        
        try:

            image_data = base64.b64decode(base64_string)
            image = Image.open(BytesIO(image_data)).convert("RGB")  # Ensure RGB mode
            
            processed_image = self.processor.process_images([image]).to(self.model.device)

            with torch.no_grad():
                image_embedding = self.model(**processed_image)

            image_embedding_flat = image_embedding.mean(dim=1).tolist()[0]
            return {"embedding": image_embedding_flat}
        except Exception as e:
            print(f"Error processing image: {e}")
            return None
        
    def generate_text_embedding(self, text):
        """
        Generate embeddings for a text using the ColQwen2.5 model.
        
        Then, flattens the embedding, so it can be used for semantic search.
        """ 
        try:
            inputs = self.processor.process_queries([text]).to(self.model.device)
            with torch.no_grad():
                text_embedding_mutivec = self.model(**inputs)
            text_embedding_flat = text_embedding_mutivec.mean(dim=1).tolist()[0]
            return {"embedding": text_embedding_flat}
           
        except Exception as e:
            print(f"Error processing text: {e}")
            return None
        

    def predict(self, context, model_input):
        """
        model_input: Could be a pandas DataFrame with either a 'text' column 
            or an 'image_base64' column (base64 string).
        Note: MLflow’s pyfunc model flavor enforces a DataFrame-based contract under the hood. 
        So, if you pass a dict, it will be converted to a DataFrame

        Example input:
            {
                "text": "Hello, world!"
            }
        or
            {   
                "image_base64": "just_base64_string_of_image" 
            }
        * Note: data:image/png;base64, not necessary for image_base64 value.
        """
        # Determine if input is text or image
        if isinstance(model_input, pd.DataFrame):
            if 'text' in model_input.columns and 'image_base64' in model_input.columns:

                text_embedding = self.generate_text_embedding(model_input['text'].to_list()[0])
                image_embedding = self.generate_image_embedding_from_base64_string(model_input['image_base64'].to_list()[0])
                return {"predictions": [text_embedding, image_embedding]}

            elif 'text' in model_input.columns:
                embedding = self.generate_text_embedding(model_input['text'].to_list()[0])
                return {"predictions": embedding}

            elif 'image_base64' in model_input.columns:

                embedding = self.generate_image_embedding_from_base64_string(model_input['image_base64'].to_list()[0])
                return {"predictions": embedding}
            
        elif isinstance(model_input, dict):

            if 'text' in model_input and model_input['text'] and 'image_base64' in model_input and model_input['image_base64']:

                text_embedding = self.generate_text_embedding(model_input['text'])
                image_embedding = self.generate_image_embedding_from_base64_string(model_input['image_base64'])
                return {"predictions": [text_embedding, image_embedding]}

            if 'text' in model_input and model_input['text']:

                embedding = self.generate_text_embedding(model_input['text'])
                return {"predictions": embedding}
            elif 'image_base64' in model_input and model_input['image_base64']:

                embedding = self.generate_image_embedding_from_base64_string(model_input['image_base64'])
                return {"predictions": embedding}

        raise ValueError(f"Invalid input format. Your input type was: {type(model_input)}. Expected a dictionary or pandas DataFrame with 'text' or 'image_base64' keys.")




#Register to Unity Catalog using Mlflow 

We now log and register the created model above to Unity Catalog. You can adjust the model signature to adjust what inputs the model should accept. 

For the input_schema, we want a way to provide text OR images. That way, when we embed our text queries later to do similarity search on our vector search index, we only need to pass in text

In [0]:
input_schema = Schema([
    ColSpec("string", "text", required=False),         # Optional text input.
    ColSpec("string", "image_base64", required=False)  # Optional base64 image input
])

output_schema = Schema([
    TensorSpec(np.dtype("float32"), (128,), "embedding"), 
])


# Create the model signature
print("creating model signature. Creating examples of inputs")
signature = ModelSignature(inputs=input_schema, outputs=output_schema)

with mlflow.start_run():
    mlflow.pyfunc.log_model(
        artifact_path="colNomic_model",
        artifacts={'cache': volume_name},
        python_model=ColQwenInferenceModel(),
        signature=signature,
        registered_model_name=registered_model_name,
        extra_pip_requirements=["git+https://github.com/illuin-tech/colpali", "torchvision"]
    )

creating model signature. Creating examples of inputs


Downloading artifacts: 0it [00:00, ?it/s]

 - mlflow (current: 2.22.0, required: mlflow==2.15.1)
To fix the mismatches, call `mlflow.pyfunc.get_model_dependencies(model_uri)` to fetch the model's environment and install dependencies using the resulting environment file.


Uploading artifacts:   0%|          | 0/36 [00:00<?, ?it/s]

Uploading /local_disk0/repl_tmp_data/ReplId-19700-8ca05-6/tmpa3_2fiq9/model/artifacts/colnomic_model/models--n…

Uploading /local_disk0/repl_tmp_data/ReplId-19700-8ca05-6/tmpa3_2fiq9/model/artifacts/colnomic_model/models--n…

Uploading /local_disk0/repl_tmp_data/ReplId-19700-8ca05-6/tmpa3_2fiq9/model/artifacts/colnomic_model/models--n…

Uploading /local_disk0/repl_tmp_data/ReplId-19700-8ca05-6/tmpa3_2fiq9/model/artifacts/colnomic_model/models--n…

Uploading /local_disk0/repl_tmp_data/ReplId-19700-8ca05-6/tmpa3_2fiq9/model/artifacts/colnomic_model/models--n…

Uploading /local_disk0/repl_tmp_data/ReplId-19700-8ca05-6/tmpa3_2fiq9/model/artifacts/colnomic_model/models--n…

Uploading /local_disk0/repl_tmp_data/ReplId-19700-8ca05-6/tmpa3_2fiq9/model/artifacts/colnomic_model/models--n…

Registered model 'austin_choi_demo_catalog.agents.colNomic_DSA' already exists. Creating a new version of this model...


Uploading artifacts:   0%|          | 0/36 [00:00<?, ?it/s]

Uploading /local_disk0/repl_tmp_data/ReplId-19700-8ca05-6/tmpa3_2fiq9/model/artifacts/colnomic_model/models--n…

Uploading /local_disk0/repl_tmp_data/ReplId-19700-8ca05-6/tmpa3_2fiq9/model/artifacts/colnomic_model/models--n…

Uploading /local_disk0/repl_tmp_data/ReplId-19700-8ca05-6/tmpa3_2fiq9/model/artifacts/colnomic_model/models--n…

Uploading /local_disk0/repl_tmp_data/ReplId-19700-8ca05-6/tmpa3_2fiq9/model/artifacts/colnomic_model/models--n…

Uploading /local_disk0/repl_tmp_data/ReplId-19700-8ca05-6/tmpa3_2fiq9/model/artifacts/colnomic_model/models--n…

Uploading /local_disk0/repl_tmp_data/ReplId-19700-8ca05-6/tmpa3_2fiq9/model/artifacts/colnomic_model/models--n…

Uploading /local_disk0/repl_tmp_data/ReplId-19700-8ca05-6/tmpa3_2fiq9/model/artifacts/colnomic_model/models--n…

Created version '6' of model 'austin_choi_demo_catalog.agents.colnomic_dsa'.
2025/05/24 07:27:51 INFO mlflow.tracking._tracking_service.client: 🏃 View run bemused-doe-419 at: e2-demo-field-eng.cloud.databricks.com/ml/experiments/2588019255055285/runs/77f57cd0c01b43e6a06362abc095f816.
2025/05/24 07:27:51 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: e2-demo-field-eng.cloud.databricks.com/ml/experiments/2588019255055285.


#Test the model registered correctly

Sometimes, mlflow.autolog does not always capture all dependencies or the inputs aren't formatted correctly for the model to properly run inference. We should load the model we just registered and test it locally to ensure we registered it correctly. 

In [0]:
import requests
from PIL import Image
from io import BytesIO
import base64
image_url = "https://miro.medium.com/v2/resize:fit:447/1*G0CAXQqb250tgBMeeVvN6g.png"
response = requests.get(image_url)
img = Image.open(BytesIO(response.content))
buffer = BytesIO()
img.save(buffer, format=img.format)
img_bytes = buffer.getvalue()

img_base64 = base64.b64encode(img_bytes).decode('utf-8')

In [0]:
import mlflow.pyfunc
from PIL import Image

model_version_uri = f"models:/{registered_model_name}/1" #load the model based on the version you want to use
first_version = mlflow.pyfunc.load_model(model_version_uri)
result = first_version.predict({'text':"Is attention really all you need?", 'image_base64': img_base64})
print(result)

Downloading /local_disk0/repl_tmp_data/ReplId-19700-8ca05-6/tmpokfenh1t/artifacts/colnomic_model/models--nomic…

Downloading /local_disk0/repl_tmp_data/ReplId-19700-8ca05-6/tmpokfenh1t/artifacts/colnomic_model/models--nomic…

Downloading /local_disk0/repl_tmp_data/ReplId-19700-8ca05-6/tmpokfenh1t/artifacts/colnomic_model/models--nomic…

Downloading /local_disk0/repl_tmp_data/ReplId-19700-8ca05-6/tmpokfenh1t/artifacts/colnomic_model/models--nomic…

Downloading /local_disk0/repl_tmp_data/ReplId-19700-8ca05-6/tmpokfenh1t/artifacts/colnomic_model/models--nomic…

Downloading /local_disk0/repl_tmp_data/ReplId-19700-8ca05-6/tmpokfenh1t/artifacts/colnomic_model/models--nomic…

Downloading /local_disk0/repl_tmp_data/ReplId-19700-8ca05-6/tmpokfenh1t/artifacts/colnomic_model/models--nomic…

Downloading artifacts:   0%|          | 0/36 [00:00<?, ?it/s]

Fetching 7 files:   0%|          | 0/7 [00:00<?, ?it/s]

model-00002-of-00007.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00005-of-00007.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00007-of-00007.safetensors:   0%|          | 0.00/3.39G [00:00<?, ?B/s]

model-00001-of-00007.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00004-of-00007.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00007.safetensors:   0%|          | 0.00/4.93G [00:00<?, ?B/s]

model-00006-of-00007.safetensors:   0%|          | 0.00/4.93G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/7 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/214 [00:00<?, ?B/s]

adapter_config.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/323M [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/574 [00:00<?, ?B/s]

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


tokenizer_config.json:   0%|          | 0.00/7.33k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/605 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/613 [00:00<?, ?B/s]

processor_config.json:   0%|          | 0.00/81.0 [00:00<?, ?B/s]

chat_template.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

{'predictions': [{'embedding': [0.053466796875, 0.032470703125, 0.05126953125, -0.010009765625, -0.03564453125, -0.21875, -0.0189208984375, -0.044921875, -0.03759765625, -0.006988525390625, 0.08447265625, 0.01318359375, 0.03173828125, 0.005035400390625, -0.055908203125, -0.07470703125, -0.0213623046875, 0.10400390625, 0.0054931640625, -0.1005859375, 0.07275390625, 0.0029754638671875, 0.015869140625, -0.0869140625, -0.044677734375, -0.0257568359375, -0.026611328125, 0.02587890625, 0.11083984375, 0.09765625, 0.05029296875, -0.05712890625, 0.0284423828125, 0.024658203125, -0.029052734375, -0.03466796875, -0.00555419921875, -0.047607421875, 0.02099609375, 0.0120849609375, 0.026611328125, 0.018798828125, 0.05615234375, -0.0098876953125, 0.003265380859375, -0.06201171875, -0.07421875, 0.003204345703125, 0.10498046875, 0.0155029296875, -0.029052734375, 0.0947265625, 0.053955078125, 0.0947265625, 0.0693359375, -0.038330078125, 0.0103759765625, -0.0751953125, 0.00262451171875, -0.06396484375, -

Looks like we are generating embeddings! Let's now deploy it to a model serving endpoint

#Use the MLflow SDK to interact with your endpoint

Because of the size of this model, we need some extra memory to deploy the model. You can always try to use the 3B model if you don't have the resources to deploy the 7B model. 

Just replace the reference to the model in the PythonModel with this string: nomic-ai/colnomic-embed-multimodal-3b

Model Card: https://huggingface.co/nomic-ai/colnomic-embed-multimodal-3b

Note: Depending on GPU availability, your deployment may fail

Review the GPU workload types here: https://docs.databricks.com/aws/en/machine-learning/model-serving/create-manage-serving-endpoints#gpu-workload-types

In [0]:
import mlflow.deployments

client = mlflow.deployments.get_deploy_client("databricks")

In [0]:
endpoint = client.create_endpoint(
    name=model_endpoint_name,
    config={
        "served_entities": [
            {
                "name": model_name,
                "entity_name": registered_model_name,
                "entity_version": "6",
                "workload_size": "Small",
                "workload_type": "MULTIGPU_MEDIUM",
                "scale_to_zero_enabled": True
            }
        ],
        "traffic_config": {
            "routes": [
                {
                    "served_model_name": model_name,
                    "traffic_percentage": 100
                }
            ]
        }
    }
)

import time
while True:
    deployment = client.get_endpoint(model_endpoint_name)
    
    if deployment['state']['config_update'] == "NOT_UPDATING":
        print("Endpoint is ready!")
        break
    elif deployment['state']['config_update'] in ["UPDATE_FAILED", "DEPLOYMENT_FAILED"]:
        print(f"Deployment failed: {deployment['state']}")
        break
    else:
        print(f"Deployment in progress... Status: {deployment['state']['config_update']}")
        time.sleep(30)

Deployment in progress... Status: IN_PROGRESS
Deployment in progress... Status: IN_PROGRESS
Deployment in progress... Status: IN_PROGRESS
Deployment in progress... Status: IN_PROGRESS
Deployment in progress... Status: IN_PROGRESS
Deployment in progress... Status: IN_PROGRESS
Deployment in progress... Status: IN_PROGRESS
Deployment in progress... Status: IN_PROGRESS
Endpoint is ready!


#Test the endpoint

We should test to see if the model serving endpoint deployed correctly. 

Ensure the endpoint is spun up before testing. It will take some time.

Once it is complete, run the cells below and you should see some embeddings! 

In [0]:
endpoint_name = model_endpoint_name
databricks_instance = dbutils.entry_point.getDbutils().notebook().getContext().browserHostName().get()
endpoint_url = f"https://{databricks_instance}/ml/endpoints/{endpoint_name}"
print(f"Endpoint URL: {endpoint_url}")

Endpoint URL: https://e2-demo-field-eng.cloud.databricks.com/ml/endpoints/colNomic-embedding-generation


In [0]:
import requests
from PIL import Image
from io import BytesIO
import base64
image_url = "https://miro.medium.com/v2/resize:fit:447/1*G0CAXQqb250tgBMeeVvN6g.png"
response = requests.get(image_url)
img = Image.open(BytesIO(response.content))
buffer = BytesIO()
img.save(buffer, format=img.format)
img_bytes = buffer.getvalue()

img_base64 = base64.b64encode(img_bytes).decode('utf-8')

In [0]:
import pandas as pd
import time

start_time = time.time()
response = client.predict(
            endpoint=model_endpoint_name,
            inputs={"dataframe_split": {
                    "columns": ["text", "image_base64"],
                    "data": [["this is just a test", img_base64]]
                    }
            }
          )
end_time = time.time()
total_time = end_time-start_time
print(response)
print(f"Final Time: {total_time}")


{'predictions': {'predictions': [{'embedding': [0.0106201171875, -0.0498046875, 0.0203857421875, -0.0947265625, 0.0106201171875, -0.0751953125, 0.0184326171875, -0.002227783203125, -0.0034637451171875, -0.01953125, 0.09130859375, -0.058837890625, 0.0771484375, 0.0244140625, -0.06982421875, 0.058837890625, 0.0257568359375, 0.039306640625, 0.0167236328125, 0.01300048828125, 0.09765625, -0.035888671875, -0.054443359375, 0.07373046875, -0.0015411376953125, 0.0172119140625, 0.0035552978515625, 0.00194549560546875, 0.02099609375, -0.052734375, 0.00106048583984375, -0.044189453125, -0.04052734375, -0.0751953125, -0.003997802734375, 0.0301513671875, 0.06884765625, -0.09765625, -0.0111083984375, -0.0302734375, 0.0166015625, 0.00750732421875, -0.055419921875, 0.041748046875, 0.033203125, -0.000507354736328125, -0.0888671875, 0.10498046875, -0.0498046875, -0.040283203125, -0.07470703125, -0.02392578125, -0.0908203125, 0.078125, 0.10205078125, -0.1455078125, 0.09521484375, 0.01263427734375, -0.055

In [0]:
response['predictions']['predictions'][0]['embedding']

[0.0106201171875,
 -0.0498046875,
 0.0203857421875,
 -0.0947265625,
 0.0106201171875,
 -0.0751953125,
 0.0184326171875,
 -0.002227783203125,
 -0.0034637451171875,
 -0.01953125,
 0.09130859375,
 -0.058837890625,
 0.0771484375,
 0.0244140625,
 -0.06982421875,
 0.058837890625,
 0.0257568359375,
 0.039306640625,
 0.0167236328125,
 0.01300048828125,
 0.09765625,
 -0.035888671875,
 -0.054443359375,
 0.07373046875,
 -0.0015411376953125,
 0.0172119140625,
 0.0035552978515625,
 0.00194549560546875,
 0.02099609375,
 -0.052734375,
 0.00106048583984375,
 -0.044189453125,
 -0.04052734375,
 -0.0751953125,
 -0.003997802734375,
 0.0301513671875,
 0.06884765625,
 -0.09765625,
 -0.0111083984375,
 -0.0302734375,
 0.0166015625,
 0.00750732421875,
 -0.055419921875,
 0.041748046875,
 0.033203125,
 -0.000507354736328125,
 -0.0888671875,
 0.10498046875,
 -0.0498046875,
 -0.040283203125,
 -0.07470703125,
 -0.02392578125,
 -0.0908203125,
 0.078125,
 0.10205078125,
 -0.1455078125,
 0.09521484375,
 0.012634277343

You're ready to set up our sample data! Continue to 02_PDF ETL to set up our PDF sources, create embeddings of them and load them into a vector search index