<picture>
  <source media="(prefers-color-scheme: dark)" srcset="https://vespa.ai/assets/vespa-ai-logo-heather.svg">
  <source media="(prefers-color-scheme: light)" srcset="https://vespa.ai/assets/vespa-ai-logo-rock.svg">
  <img alt="#Vespa" width="200" src="https://vespa.ai/assets/vespa-ai-logo-rock.svg" style="margin-bottom: 25px;">
</picture>

# Multi-vector indexing with HNSW

This is the pyvespa steps of the multi-vector-indexing sample application.
Go to the [source](https://github.com/vespa-engine/sample-apps/tree/master/multi-vector-indexing)
for a full description and prerequisites,
and read the [blog post](https://blog.vespa.ai/semantic-search-with-multi-vector-indexing/).
Highlighted features:

- Approximate Nearest Neighbor Search - using HNSW or exact
- Use a Component to configure the Huggingface embedder.
- Using synthetic fields with auto-generated
  [embeddings](https://docs.vespa.ai/en/embedding.html) in data and query flow.
- Application package file export, model files in the application package, deployment from files.
- [Multiphased ranking](https://docs.vespa.ai/en/phased-ranking.html).
- How to control text search result highlighting.

For simpler examples, see [text search](https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa.html)
and [pyvespa examples](https://pyvespa.readthedocs.io/en/latest/examples/pyvespa-examples.html).

Pyvespa is an add-on to Vespa, and this guide will export the application package containing `services.xml` and `wiki.sd`. The latter is the schema file for this application - knowing services.xml and schema files is useful when reading Vespa documentation.


<div class="alert alert-info">
    Refer to <a href="https://pyvespa.readthedocs.io/en/latest/troubleshooting.html">troubleshooting</a>
    for any problem when running this guide.
</div>


This notebook requires [pyvespa >= 0.37.1](https://pyvespa.readthedocs.io/en/latest/index.html#requirements),
ZSTD, and the [Vespa CLI](https://docs.vespa.ai/en/vespa-cli.html).


In [1]:
!pip3 install pyvespa




[notice] A new release of pip is available: 23.3.1 -> 24.0
[notice] To update, run: C:\Users\vanes\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip





## Create the application

Configure the Vespa instance with a component loading the E5-small model.
Components are used to plug in code and models to a Vespa application -
[read more](https://docs.vespa.ai/en/jdisc/container-components.html):


In [28]:
from vespa.package import (
    ApplicationPackage,
    Component,
    Parameter,
    Field,
    HNSW,
    RankProfile,
    Function,
    FirstPhaseRanking,
    SecondPhaseRanking,
    FieldSet,
    DocumentSummary,
    Summary,
)
from pathlib import Path
import json

app_package = ApplicationPackage(
    name="findmypasta",
    components=[
        Component(
            id="e5-small-q",
            type="hugging-face-embedder",
            parameters=[
                Parameter("transformer-model", {"path": "model/e5-small-v2-int8.onnx"}),
                Parameter("tokenizer-model", {"path": "model/tokenizer.json"}),
            ],
        )
    ],
)

## Configure fields

Vespa has a variety of basic and complex
[field types](https://docs.vespa.ai/en/reference/schema-reference.html#field).
This application uses a combination of integer, text and tensor fields,
making it easy to implement hybrid ranking use cases:


In [29]:
# app_package.schema.add_fields(
#     Field(name="id", type="int", indexing=["attribute", "summary"]),
#     Field(
#         name="title", type="string", indexing=["index", "summary"], index="enable-bm25"
#     ),
#     Field(
#         name="url", type="string", indexing=["index", "summary"], index="enable-bm25"
#     ),
#     Field(
#         name="paragraphs",
#         type="array<string>",
#         indexing=["index", "summary"],
#         index="enable-bm25",
#         bolding=True,
#     ),
#     Field(
#         name="paragraph_embeddings",
#         type="tensor<float>(p{},x[384])",
#         indexing=["input paragraphs", "embed", "index", "attribute"],
#         ann=HNSW(distance_metric="angular"),
#         is_document_field=False,
#     ),
#     #
#     # Alteratively, for exact distance calculation not using HNSW:
#     #
#     # Field(name="paragraph_embeddings", type="tensor<float>(p{},x[384])",
#     #       indexing=["input paragraphs", "embed", "attribute"],
#     #       attribute=["distance-metric: angular"],
#     #       is_document_field=False)
# )

In [30]:
# e": x["name"], "id": x["id"], "tags", "steps", "description", "ingredients"
app_package.schema.add_fields(
    Field(name="id", type="int", indexing=["attribute", "summary"]),
    Field(
        name="title", type="string", indexing=["index", "summary"], index="enable-bm25"
    ),
    Field(
        name="description", type="string", indexing=["index", "summary"], index="enable-bm25"
    ),
    Field(
        name="minutes",
        type="string",
        indexing=["attribute", "summary"],
    ),
    Field(
        name="n_steps",
        type="string",
        indexing=["attribute", "summary"],
    ),
    Field(
        name="n_ingredients",
        type="string",
        indexing=["attribute", "summary"],
    ),
    Field(
        name="submitted",
        type="string",
        indexing=["attribute", "summary"],
    ),
    Field(
        name="contributor_id",
        type="string",
        indexing=["attribute", "summary"],
    ),
    Field(
        name="tags",
        type="array<string>",
        indexing=["index", "summary"],
        index="enable-bm25",
        bolding=True,
    ),
    Field(
        name="steps",
        type="array<string>",
        indexing=["index", "summary"],
        index="enable-bm25",
        bolding=True,
    ),
    Field(
        name="ingredients",
        type="array<string>",
        indexing=["index", "summary"],
        index="enable-bm25",
        bolding=True,
    ),
    Field(
        name="nutrition",
        type="array<string>",
        indexing=["index", "summary"],
        index="enable-bm25",
        bolding=True,
    ),
    Field(
        name="description_embeddings",
        type="tensor<float>(p{},x[384])",
        indexing=["input description", "embed", "index", "attribute"],
        ann=HNSW(distance_metric="angular"),
        is_document_field=False,
    ),
    Field(
        name="tag_embeddings",
        type="tensor<float>(p{},x[384])",
        indexing=["input tags", "embed", "index", "attribute"],
        ann=HNSW(distance_metric="angular"),
        is_document_field=False,
    ),
    Field(
        name="step_embeddings",
        type="tensor<float>(p{},x[384])",
        indexing=["input steps", "embed", "index", "attribute"],
        ann=HNSW(distance_metric="angular"),
        is_document_field=False,
    ),
    Field(
        name="ingredient_embeddings",
        type="tensor<float>(p{},x[384])",
        indexing=["input ingredients", "embed", "index", "attribute"],
        ann=HNSW(distance_metric="angular"),
        is_document_field=False,
    ),
    Field(
        name="nutrition_embeddings",
        type="tensor<float>(p{},x[384])",
        indexing=["input nutrition", "embed", "index", "attribute"],
        ann=HNSW(distance_metric="angular"),
        is_document_field=False,
    ),
    #
    # Alteratively, for exact distance calculation not using HNSW:
    #
    # Field(name="paragraph_embeddings", type="tensor<float>(p{},x[384])",
    #       indexing=["input paragraphs", "embed", "attribute"],
    #       attribute=["distance-metric: angular"],
    #       is_document_field=False)
)

One field of particular interest is `paragraph_embeddings`.
Note that we are _not_ feeding embeddings to this instance.
Instead, the embeddings are generated by using the [embed](https://docs.vespa.ai/en/embedding.html)
feature, using the model configured at start.
Read more in [Text embedding made simple](https://blog.vespa.ai/text-embedding-made-simple/).

Looking closely at the code, `paragraph_embeddings` uses `is_document_field=False`, meaning it will read another field as input (here `paragraph`), and run `embed` on it.

As only one model is configured, `embed` will use that one -
it is possible to configure mode models and use `embed model-id` as well.

As the code comment illustrates, there can be different distrance metrics used,
as well as using an _exact_ or _approximate_ nearest neighbor search.

## Configure rank profiles

A rank profile defines the computation for the ranking,
with a wide range of possible features as input.
Below you will find `first_phase` ranking using text ranking (`bm`),
semantic ranking using vector distance (consider a tensor a vector here),
and combinations of the two:


In [31]:
app_package.schema.add_rank_profile(
    RankProfile(
        name="semantic",
        inputs=[("query(q)", "tensor<float>(x[384])")],
        inherits="default",
        first_phase="cos(distance(field,step_embeddings))",
        match_features=["closest(step_embeddings)"],
    )
)

# app_package.schema.add_rank_profile(
#     RankProfile(name="bm25", first_phase="2*bm25(title) + bm25(paragraphs)")
# )

app_package.schema.add_rank_profile(
    RankProfile(name="bm25", first_phase="2*bm25(title) + bm25(description) + bm25(steps)")
)

app_package.schema.add_rank_profile(
    RankProfile(name="bm25", first_phase="2*bm25(title) + bm25(description) + bm25(steps)")
)

app_package.schema.add_rank_profile(
    RankProfile(
        name="hybrid",
        inherits="semantic",
        functions=[
            Function(
                name="avg_paragraph_similarity",
                expression="""reduce(
                              sum(l2_normalize(query(q),x) * l2_normalize(attribute(step_embeddings),x),x),
                              avg,
                              p
                          )""",
            ),
            Function(
                name="max_paragraph_similarity",
                expression="""reduce(
                              sum(l2_normalize(query(q),x) * l2_normalize(attribute(step_embeddings),x),x),
                              max,
                              p
                          )""",
            ),
            Function(
                name="all_paragraph_similarities",
                expression="sum(l2_normalize(query(q),x) * l2_normalize(attribute(step_embeddings),x),x)",
            ),
            
        ],
        first_phase=FirstPhaseRanking(
            expression="cos(distance(field,step_embeddings))"
        ),
        second_phase=SecondPhaseRanking(
            expression="firstPhase + avg_paragraph_similarity() + log( bm25(title) + bm25(description) + bm25(tags) + bm25(steps) + bm25(ingredients))"
        ),
        match_features=[
            "closest(step_embeddings)",
            "firstPhase",
            "bm25(title)",
            "bm25(description)",
            "bm25(steps)", 
            "avg_paragraph_similarity",
            "max_paragraph_similarity",
            "all_paragraph_similarities",
        ],
    )
)

## Configure fieldset

A [fieldset](https://docs.vespa.ai/en/reference/schema-reference.html#fieldset)
is a way to configure search in multiple fields:


In [32]:
app_package.schema.add_field_set(
    FieldSet(name="default", fields=["title", "tags", "steps", "description", "ingredients"])
)

## Configure document summary

A [document summary](https://docs.vespa.ai/en/document-summaries.html)
is the collection of fields to return in query results -
the default summary is used unless other specified in the query.
Here we configure a `minimal` fieldset without the larger paragraph text/embedding fields:


In [33]:
app_package.schema.add_document_summary(
    DocumentSummary(
        name="minimal",
        summary_fields=[Summary("id", "int"), Summary("title", "string")],
    )
)

## Export the configuration

At this point, the application is well defined.
Remember that the Component configuration at start configures model files to be found in a `model` directory.
We must therefore export the configuration and add the models, before we can deploy to the Vespa instance.
Export the [application package](https://docs.vespa.ai/en/application-packages.html):


In [34]:
Path("pkg").mkdir(parents=True, exist_ok=True)
app_package.to_files("pkg")

It is a good idea to inspect the files exported into `pkg` - these are files referred to in the
[Vespa Documentation](https://docs.vespa.ai/).


## Download model files

At this point, we can save the model files into the application package:


In [35]:
# !mkdir -p pkg/models
# !curl -L -o pkg/models/tokenizer.json https://raw.githubusercontent.com/vespa-engine/sample-apps/master/simple-semantic-search/model/tokenizer.json
# !curl -L -o pkg/models/e5-small-v2-int8.onnx https://github.com/vespa-engine/sample-apps/raw/master/simple-semantic-search/model/e5-small-v2-int8.onnx


In [36]:
! mkdir -p pkg/model
! curl -L -o pkg/model/tokenizer.json \
  https://raw.githubusercontent.com/vespa-engine/sample-apps/master/simple-semantic-search/model/tokenizer.json

! curl -L -o pkg/model/e5-small-v2-int8.onnx \
  https://github.com/vespa-engine/sample-apps/raw/master/simple-semantic-search/model/e5-small-v2-int8.onnx

A sintaxe do comando est� incorreta.


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  694k  100  694k    0     0  1804k      0 --:--:-- --:--:-- --:--:-- 1823k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0

 37 32.3M   37 12.0M    0     0  10.7M      0  0:00:03  0:00:01  0:00:02 10.7M
100 32.3M  100 32.3M    0     0  16.6M      0  0:00:01  0:00:01 --:--:-- 24.5M


## Deploy the application

As all the files in the app package are ready, we can start a Vespa instance - here using Docker.
Deploy the app package:


In [37]:
pip install requests_toolbelt




In [38]:
from vespa.deployment import VespaDocker

vespa_docker = VespaDocker()
app = vespa_docker.deploy_from_disk(application_name="findmypasta", application_root="pkg")

Waiting for configuration server, 0/60 seconds...
Waiting for configuration server, 5/60 seconds...
Waiting for configuration server, 10/60 seconds...
Waiting for configuration server, 15/60 seconds...
Waiting for configuration server, 20/60 seconds...
Waiting for configuration server, 25/60 seconds...
Using plain http against endpoint http://localhost:8080/ApplicationStatus
Waiting for application status, 0/300 seconds...
Using plain http against endpoint http://localhost:8080/ApplicationStatus
Waiting for application status, 5/300 seconds...
Using plain http against endpoint http://localhost:8080/ApplicationStatus
Waiting for application status, 10/300 seconds...
Using plain http against endpoint http://localhost:8080/ApplicationStatus
Waiting for application status, 15/300 seconds...
Using plain http against endpoint http://localhost:8080/ApplicationStatus
Waiting for application status, 20/300 seconds...
Using plain http against endpoint http://localhost:8080/ApplicationStatus
Wait

## Feed documents

Download the Wikipedia articles:


In [None]:
# ! curl -s -H "Accept:application/vnd.github.v3.raw" \
#   https://api.github.com/repos/vespa-engine/sample-apps/contents/multi-vector-indexing/ext/articles.jsonl.zst | \
#   zstdcat - > articles.jsonl

I you do not have ZSTD install, get `articles.jsonl.zip` and unzip it instead.

Feed and index the Wikipedia articles using the [Vespa CLI](https://docs.vespa.ai/en/vespa-cli.html).
As part of feeding, `embed` is called on each article,
and the output of this is stored in the `paragraph_embeddings` field:


In [None]:
# # importing the csvs from the .archive folder, in a way that they are a dataset of the same format as the one used in the training of the model
# import pandas as pd
# import ast

# types = {
#     "contributor_id": "string",
#     "name": "string",
#     "id": "string",
#     "minutes": "int",
#     "tags": "string",
#     "nutrition": "string",
#     "n_steps": "int",
#     "n_ingredients": "int",
#     "steps": "string",
#     "description": "string",
#     "ingredients": "string",
#     "submitted": "string"
# }
# df = pd.read_csv('archive/RAW_recipes.csv', dtype=types)
# df = df.dropna()
# df = df.reset_index(drop=True)

# # # creating a body field that is the concatenation of the fields
# # df['body'] = df['name']
# # df['body'] = df['body'] + ',\nminutes to cook: ' + df['minutes'].astype(str) 
# # df['body'] = df['body'] + ', submitted in ' + df["submitted"]
# # df['body'] = df['body'] + " by " + df["contributor_id"]
# # df['body'] = df['body'] + ", \n" + df["tags"]
# # df['body'] = df['body'] + " \n " + df['ingredients']
# # df['body'] = df['body'] + '\n' + df['steps']
# # df['body'] = df['body'] + '\n' + df['description']
# # df['body'] = df['body'] + '\n' + df['n_steps'].astype(str) + ' steps'
# # df['body'] = df['body'] + '\n' + df['n_ingredients'].astype(str) + ' ingredients'
# # df['body'] = df['body'] + '\n - nutrition: ' + df['nutrition']

# # df['title'] = df['name']

# # # creating a dataframe with the same format as the one used in the training of the model
# # df = df[['id', 'title', 'body']]
# # df = df.rename(columns={"id": "id", "title": "title", "body": "body"})
# # df = df.dropna()
# # df = df.reset_index(drop=True)

# # now converting to IterableDataset format that vespa expects
# def to_vespa_format(x):
#     return {"put": x["id"], "fields": { "title": x["name"], "id": x["id"], 'minutes': x['minutes'], "tags": ast.literal_eval(x["tags"]), "nutrition": ast.literal_eval(x["nutrition"]), "n_steps": x["n_steps"], "n_ingredients": x["n_ingredients"], "steps": ast.literal_eval(x["steps"]), "description": x["description"], "ingredients": ast.literal_eval(x["ingredients"]), "submitted": x["submitted"]}}
    
# # creating the vespa_feed
# vespa_feed = df.apply(to_vespa_format, axis=1)
# vespa_feed_slice = vespa_feed[0:6000]
# # saving the vespa_feed to a jsonl file
# vespa_feed_slice.to_json("vespa_feed.jsonl", orient="records", lines=True)

In [25]:
import pandas as pd
import ast

# Definindo os tipos das colunas
types = {
    "contributor_id": "string",
    "name": "string",
    "id": "string",
    "minutes": "int",
    "tags": "string",
    "nutrition": "string",
    "n_steps": "int",
    "n_ingredients": "int",
    "steps": "string",
    "description": "string",
    "ingredients": "string",
    "submitted": "string"
}

# Carregando o CSV e removendo valores nulos
df = pd.read_csv('archive/RAW_recipes.csv', dtype=types)
df = df.dropna()
df = df.reset_index(drop=True)

# # # creating a body field that is the concatenation of the fields
# # df['body'] = df['name']
df['minutes'] = 'minutes to cook: ' + df['minutes'].astype(str) 
df['submitted'] = 'submitted in: ' + df["submitted"]
df['contributor_id'] = "contributor id: " + df["contributor_id"]
df['tags'] = df["tags"]
df['nutrition'] = df["nutrition"]
df['n_steps'] = 'number of steps: ' + df['n_steps'].astype(str)
df['n_ingredients'] = 'number of ingredients: ' + df['n_ingredients'].astype(str)
df['steps'] = df["steps"]
df['description'] = df["description"]
df['ingredients'] = df["ingredients"]
df['title'] = df['name']


# # df['body'] = df['body'] + " by " + df["contributor_id"]
# # df['body'] = df['body'] + ", \n" + df["tags"]
# # df['body'] = df['body'] + " \n " + df['ingredients']
# # df['body'] = df['body'] + '\n' + df['steps']
# # df['body'] = df['body'] + '\n' + df['description']
# # df['body'] = df['body'] + '\n' + df['n_steps'].astype(str) + ' steps'
# # df['body'] = df['body'] + '\n' + df['n_ingredients'].astype(str) + ' ingredients'
# # df['body'] = df['body'] + '\n - nutrition: ' + df['nutrition']


namespace = "recipes"
document_type = "findmypasta"

# Função para converter o formato dos dados para o formato esperado pelo Vespa
def to_vespa_format(x):
    document_id = f"id:{namespace}:{document_type}::{x['id']}"
    return {
        "put": document_id,
        "fields": {
            "title": x["name"],
            "tags": ast.literal_eval(x["tags"]),
            "steps": ast.literal_eval(x["steps"]),
            "description": x["description"],
            "ingredients": ast.literal_eval(x["ingredients"]),
            "minutes": x["minutes"],
            "nutrition": ast.literal_eval(x["nutrition"]),
            "n_steps": x["n_steps"],
            "n_ingredients": x["n_ingredients"],
            "submitted": x["submitted"],
            "contributor_id": x["contributor_id"]
        }
    }

# Criando o feed do Vespa
vespa_feed = df.apply(to_vespa_format, axis=1).tolist()
vespa_feed_slice = vespa_feed[0:1000]
# Salvando o feed em um arquivo JSONL
with open("vespa_feed.jsonl", "w") as f:
    for item in vespa_feed_slice:
        f.write(json.dumps(item) + "\n")


In [None]:
# ! vespa config set target local
# ! vespa feed articles.jsonl

In [39]:
! vespa config set target local
! vespa feed vespa_feed.jsonl

{
  "feeder.operation.count": 1000,
  "feeder.seconds": 184.901,
  "feeder.ok.count": 1000,
  "feeder.ok.rate": 5.408,
  "feeder.error.count": 0,
  "feeder.inflight.count": 0,
  "http.request.count": 1000,
  "http.request.bytes": 776170,
  "http.request.MBps": 0.004,
  "http.exception.count": 0,
  "http.response.count": 1000,
  "http.response.bytes": 95492,
  "http.response.MBps": 0.001,
  "http.response.error.count": 0,
  "http.response.latency.millis.min": 1528,
  "http.response.latency.millis.avg": 3226,
  "http.response.latency.millis.max": 8380,
  "http.response.code.counts": {
    "200": 1000
  }
}


Note that creating embeddings is computationally expensive, but this is a small dataset with only 8 articles, so will be done in a few seconds.

The Vespa instance is now populated with the Wikipedia articles, with generated embeddings, and ready for queries.
The next sections have examples of various kinds of queries to run on the dataset.


## Simple retrieve all articles with undefined ranking

Run a query selecting _all_ documents, returning two of them.
The rank profile is the built-in `unranked` which means no ranking calculations are done,
the results are returned in random order:


In [40]:
from vespa.io import VespaQueryResponse

result: VespaQueryResponse = app.query(
    body={
        "yql": "select * from findmypasta where true",
        "ranking.profile": "unranked",
        "hits": 2,
    }
)
if not result.is_successful():
    raise ValueError(result.get_json())
if len(result.hits) != 2:
    raise ValueError("Expected 2 hits, got {}".format(len(result.hits)))
print(json.dumps(result.hits, indent=4))

[
    {
        "id": "id:recipes:findmypasta::63986",
        "relevance": 0.0,
        "source": "findmypasta_content",
        "fields": {
            "sddocname": "findmypasta",
            "steps": [
                "dredge pork chops in mixture of flour , salt , dry mustard and garlic powder",
                "brown in oil in a large skillet",
                "place browned pork chops in a crock pot",
                "add the can of soup , undiluted",
                "cover and cook on low for 6-8 hours"
            ],
            "tags": [
                "weeknight",
                "time-to-make",
                "course",
                "main-ingredient",
                "preparation",
                "main-dish",
                "pork",
                "crock-pot-slow-cooker",
                "dietary",
                "meat",
                "pork-chops",
                "equipment"
            ],
            "nutrition": [
                "105.7",
                "8.0",
 

## Traditional keyword search with BM25 ranking on the article level

Run a text-search query and use the [bm25](https://docs.vespa.ai/en/reference/bm25.html)
ranking profile configured at the start of this guide: `2*bm25(title) + bm25(paragraphs)`.
Here, we use BM25 on the `title` and `paragraph` text fields, giving more weight to matches in title:


In [42]:
result = app.query(
    body={
        "yql": "select * from findmypasta where userQuery()",
        "query": 24,
        "ranking.profile": "bm25",
        "hits": 2,
    }
)
if len(result.hits) != 2:
    raise ValueError("Expected 2 hits, got {}".format(len(result.hits)))
print(json.dumps(result.hits, indent=4))

[
    {
        "id": "id:recipes:findmypasta::157487",
        "relevance": 10.239079743249208,
        "source": "findmypasta_content",
        "fields": {
            "sddocname": "findmypasta",
            "steps": [
                "mix all ingredients in a glass bowl",
                "cover and chill for <hi>24</hi> hours before serving",
                "delicious !"
            ],
            "tags": [
                "time-to-make",
                "course",
                "main-ingredient",
                "preparation",
                "occasion",
                "low-protein",
                "sauces",
                "condiments-etc",
                "eggs-dairy",
                "1-day-or-more",
                "easy",
                "refrigerator",
                "beginner-cook",
                "dinner-party",
                "holiday-event",
                "vegetarian",
                "dietary",
                "copycat",
                "low-in-something",
     

## Semantic vector search on the paragraph level

This query creates an embedding of the query "what does 24 mean in the context of railways"
and specifies the `semantic` ranking profile: `cos(distance(field,paragraph_embeddings))`.
This will hence compute the distance between the vector in the query
and the vectors computed when indexing: `"input paragraphs", "embed", "index", "attribute"`:


In [43]:
result = app.query(
    body={
        "yql": "select * from findmypasta where {targetHits:2}nearestNeighbor(step_embeddings,q)",
        "input.query(q)": "embed(chocolate?)",
        "ranking.profile": "semantic",
        "presentation.format.tensors": "short-value",
        "hits": 2,
    }
)
result.hits
if len(result.hits) != 2:
    raise ValueError("Expected 2 hits, got {}".format(len(result.hits)))
print(json.dumps(result.hits, indent=4))

[
    {
        "id": "id:recipes:findmypasta::85434",
        "relevance": 0.854120871134202,
        "source": "findmypasta_content",
        "fields": {
            "matchfeatures": {
                "closest(step_embeddings)": {
                    "0": 1.0
                }
            },
            "sddocname": "findmypasta",
            "steps": [
                "melt chocolate",
                "add chopped jelly snakes",
                "mix quickly into chocolate",
                "add marshmallows",
                "spread into greased swiss roll tray and put into fridge to set",
                "when set , chop into squares and put into patty cases",
                "serve as sweet with coffee",
                "ps",
                "you can also add chopped nuts or chopped licorice to this recipe as well"
            ],
            "tags": [
                "15-minutes-or-less",
                "time-to-make",
                "course",
                "main-ingredient",


An interesting question then is, of the paragraphs in the document, which one was the closest?
When analysing ranking,
using [match-features](https://docs.vespa.ai/en/reference/schema-reference.html#match-features)
lets you export the scores used in the ranking calculations, see
[closest](<https://docs.vespa.ai/en/reference/rank-features.html#closest(name)>) - from the result above:

```
 "matchfeatures": {
                "closest(paragraph_embeddings)": {
                    "4": 1.0
                }
}
```

This means, the tensor of index 4 has the closest match. With this, it is straight forward to feed articles with an array of paragraphs and highlight the best matching paragraph in the document!


In [46]:
def find_best_paragraph(hit: dict) -> str:
    paragraphs = hit["fields"]["steps"]
    match_features = hit["fields"]["matchfeatures"]
    index = int(list(match_features["closest(step_embeddings)"].keys())[0])
    return paragraphs[index]

In [47]:
find_best_paragraph(result.hits[0])

'melt chocolate'

## Hybrid search and ranking

Hybrid combining keyword search on the article level with vector search in the paragraph index:


In [52]:
result = app.query(
    body={
        "yql": "select * from findmypasta where userQuery() or ({targetHits:6}nearestNeighbor(step_embeddings,q))",
        "input.query(q)": "embed(chocolate)",
        "query": "chocolate cake",
        "ranking.profile": "hybrid",
        "presentation.format.tensors": "short-value",
        "hits": 6,
    }
)
if len(result.hits) != 6:
    raise ValueError("Expected 1 hits, got {}".format(len(result.hits)))
print(json.dumps(result.hits, indent=4))

[
    {
        "id": "id:recipes:findmypasta::392181",
        "relevance": 4.94590496132451,
        "source": "findmypasta_content",
        "fields": {
            "matchfeatures": {
                "bm25(description)": 6.668711207976459,
                "bm25(steps)": 7.984511600035208,
                "bm25(title)": 2.0003665414910587,
                "closest(step_embeddings)": {
                    "14": 1.0
                },
                "firstPhase": 0.8350588786320291,
                "all_paragraph_similarities": {
                    "0": 0.7568165063858032,
                    "1": 0.775661826133728,
                    "2": 0.7706385850906372,
                    "3": 0.7548873424530029,
                    "4": 0.7492822408676147,
                    "5": 0.7702987194061279,
                    "6": 0.7517319917678833,
                    "7": 0.7745205163955688,
                    "8": 0.8311383724212646,
                    "9": 0.7718607783317566,
              

This case combines exact search with nearestNeighbor search. The `hybrid` rank-profile above
also calculates several additional features using
[tensor expressions](https://docs.vespa.ai/en/tensor-user-guide.html):

- `firstPhase` is the score of the first ranking phase, configured in the hybrid
  profile as `cos(distance(field, paragraph_embeddings))`.
- `all_paragraph_similarities` returns all the similarity scores for all paragraphs.
- `avg_paragraph_similarity` is the average similarity score across all the paragraphs.
- `max_paragraph_similarity` is the same as `firstPhase`, but computed using a tensor expression.

These additional features are calculated during [second-phase ranking](https://docs.vespa.ai/en/phased-ranking.html)
to limit the number of vector computations.

The [Tensor Playground](https://docs.vespa.ai/playground/) is useful to play with tensor expressions.

The [Hybrid Search](https://blog.vespa.ai/improving-zero-shot-ranking-with-vespa/) blog post series
is a good read to learn more about hybrid ranking!


In [None]:
def find_paragraph_scores(hit: dict) -> str:
    paragraphs = hit["fields"]["paragraphs"]
    match_features = hit["fields"]["matchfeatures"]
    indexes = [int(v) for v in match_features["all_paragraph_similarities"]]
    scores = list(match_features["all_paragraph_similarities"].values())
    return list(zip([paragraphs[i] for i in indexes], scores))

In [None]:
find_paragraph_scores(result.hits[0])

[('<hi>The</hi> <hi>24</hi>-hour clock is a way <hi>of</hi> telling <hi>the</hi> time <hi>in</hi> which <hi>the</hi> day runs from midnight to midnight and is divided into <hi>24</hi> hours, numbered from 0 to 23. It <hi>does</hi> not use a.m. or p.m. This system is also referred to (only <hi>in</hi> <hi>the</hi> US and <hi>the</hi> English speaking parts <hi>of</hi> Canada) as military time or (only <hi>in</hi> <hi>the</hi> United Kingdom and now very rarely) as continental time. <hi>In</hi> some parts <hi>of</hi> <hi>the</hi> world, it is called <hi>railway</hi> time. Also, <hi>the</hi> international standard notation <hi>of</hi> time (ISO 8601) is based on this format.',
  0.8497915267944336),
 ('A time <hi>in</hi> <hi>the</hi> <hi>24</hi>-hour clock is written <hi>in</hi> <hi>the</hi> form hours:minutes (for example, 01:23), or hours:minutes:seconds (01:23:45). Numbers under 10 have a zero <hi>in</hi> front (called a leading zero); e.g. 09:07. Under <hi>the</hi> <hi>24</hi>-hour cl

## Hybrid search and filter

YQL is a structured query langauge.
In the query examples, the user input is fed as-is using the `userQuery()` operator.

Filters are normally separate from the user input,
below is an example of adding a filter `url contains "9985"` to the YQL string.

Finally, the use the [Query API](https://docs.vespa.ai/en/query-api.html) for other options, like highlighting -
here disable [bolding](https://docs.vespa.ai/en/reference/schema-reference.html#bolding):


In [51]:
result = app.query(
    body={
        "yql": 'select * from findmypasta where ({targetHits:1}nearestNeighbor(step_embeddings,q))',
        "input.query(q)": "embed(what does 24 mean in the context of railways)",
        "query": "what does 24 mean in the context of railways",
        "ranking.profile": "hybrid",
        "bolding": False,
        "presentation.format.tensors": "short-value",
        "hits": 1,
    }
)
if len(result.hits) != 1:
    raise ValueError("Expected one hit, got {}".format(len(result.hits)))
print(json.dumps(result.hits, indent=4))

[
    {
        "id": "id:recipes:findmypasta::336323",
        "relevance": "-Infinity",
        "source": "findmypasta_content",
        "fields": {
            "matchfeatures": {
                "bm25(description)": 0.0,
                "bm25(steps)": 0.0,
                "bm25(title)": 0.0,
                "closest(step_embeddings)": {
                    "1": 1.0
                },
                "firstPhase": 0.7806437331403645,
                "all_paragraph_similarities": {
                    "0": 0.7388210296630859,
                    "1": 0.7806437015533447,
                    "2": 0.7215437889099121,
                    "3": 0.7284496426582336,
                    "4": 0.7706419229507446,
                    "5": 0.7345936298370361,
                    "6": 0.7291796207427979,
                    "7": 0.7272926568984985,
                    "8": 0.7424706220626831,
                    "9": 0.7324963808059692,
                    "10": 0.7383525371551514
                }

In short, the above query demonstrates how easy it is to combine various ranking strategies,
and also combine with filters.

To learn more about pre-filtering vs post-filtering,
read [Filtering strategies and serving performance](https://blog.vespa.ai/constrained-approximate-nearest-neighbor-search/).
[Semantic search with multi-vector indexing](https://blog.vespa.ai/semantic-search-with-multi-vector-indexing/)
is a great read overall for this domain.


## Cleanup


In [50]:
vespa_docker.container.stop()
vespa_docker.container.remove()