<picture>
  <source media="(prefers-color-scheme: dark)" srcset="https://vespa.ai/assets/vespa-ai-logo-heather.svg">
  <source media="(prefers-color-scheme: light)" srcset="https://vespa.ai/assets/vespa-ai-logo-rock.svg">
  <img alt="#Vespa" width="200" src="https://vespa.ai/assets/vespa-ai-logo-rock.svg" style="margin-bottom: 25px;">
</picture>

# Multi-vector indexing with HNSW

This is the pyvespa steps of the multi-vector-indexing sample application.
Go to the [source](https://github.com/vespa-engine/sample-apps/tree/master/multi-vector-indexing)
for a full description and prerequisites,
and read the [blog post](https://blog.vespa.ai/semantic-search-with-multi-vector-indexing/).
Highlighted features:

- Approximate Nearest Neighbor Search - using HNSW or exact
- Use a Component to configure the Huggingface embedder.
- Using synthetic fields with auto-generated
  [embeddings](https://docs.vespa.ai/en/embedding.html) in data and query flow.
- Application package file export, model files in the application package, deployment from files.
- [Multiphased ranking](https://docs.vespa.ai/en/phased-ranking.html).
- How to control text search result highlighting.


<div class="alert alert-info">
    Refer to <a href="https://pyvespa.readthedocs.io/en/latest/troubleshooting.html">troubleshooting</a>
    for any problem when running this guide.
</div>


This notebook requires [pyvespa >= 0.37.1](https://pyvespa.readthedocs.io/en/latest/index.html#requirements),
ZSTD, and the [Vespa CLI](https://docs.vespa.ai/en/vespa-cli.html).


In [20]:
!pip3 install pyvespa




[notice] A new release of pip is available: 23.3.1 -> 24.0
[notice] To update, run: C:\Users\vanes\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


## Create the application

Configure the Vespa instance with a component loading the E5-small model.
Components are used to plug in code and models to a Vespa application -
[read more](https://docs.vespa.ai/en/jdisc/container-components.html):


In [88]:
from vespa.package import (
    ApplicationPackage,
    Component,
    Parameter,
    Field,
    HNSW,
    RankProfile,
    Function,
    FirstPhaseRanking,
    SecondPhaseRanking,
    FieldSet,
    DocumentSummary,
    Summary,
)
from pathlib import Path
import json
import pandas as pd
import ast
import numpy as np

app_package = ApplicationPackage(
    name="findmypasta",
    components=[
        Component(
            id="e5-small-q",
            type="hugging-face-embedder",
            parameters=[
                Parameter("transformer-model", {"path": "model/e5-small-v2-int8.onnx"}),
                Parameter("tokenizer-model", {"path": "model/tokenizer.json"}),
            ],
        )
    ],
)

## Configure fields

Vespa has a variety of basic and complex
[field types](https://docs.vespa.ai/en/reference/schema-reference.html#field).
This application uses a combination of integer, text and tensor fields,
making it easy to implement hybrid ranking use cases:


In [90]:
app_package.schema.add_fields(
    Field(name="id", type="int", indexing=["attribute", "summary"]),
    Field(
        name="title", type="string", indexing=["index", "summary"], index="enable-bm25"
    ),
    Field(
        name="description", type="string", indexing=["index", "summary"], index="enable-bm25"
    ),
    Field(
        name="minutes",
        type="string",
        indexing=["summary"],
    ),
    Field(
        name="n_steps",
        type="string",
        indexing=["attribute", "summary"],
    ),
    Field(
        name="n_ingredients",
        type="string",
        indexing=["attribute", "summary"],
    ),
    Field(
        name="submitted",
        type="string",
        indexing=["attribute", "summary"],
    ),
    Field(
        name="body",
        type="string", 
        indexing=["index", "summary"],
        index="enable-bm25",
        bolding=True
    ),
    Field(
        name = "body_split",
        type = "array<string>",
        indexing = ["index", "summary"],
        index = "enable-bm25",
        bolding = True,
    ),
    Field(
        name="tags",
        type="array<string>",
        indexing=["index", "summary"],
        index="enable-bm25",
        bolding=True,
    ),
    Field(
        name="steps",
        type="array<string>",
        indexing=["index", "summary"],
        index="enable-bm25",
        bolding=True,
    ),
    Field(
        name="ingredients",
        type="array<string>",
        indexing=["index", "summary"],
        index="enable-bm25",
        bolding=True,
    ),
    Field(name="embedding", type="tensor<float>(x[384])",
        indexing=["input body", "embed", "index", "attribute"],
        ann=HNSW(distance_metric="angular"),
        is_document_field=False
    ),
    Field(
        name="description_embeddings",
        type="tensor<float>(x[384])",
        indexing=["input description", "embed", "index", "attribute"],
        ann=HNSW(distance_metric="angular"),
        is_document_field=False,
    ),
    Field(
        name="tag_embeddings",
        type="tensor<float>(p{},x[384])",
        indexing=["input tags", "embed", "index", "attribute"],
        ann=HNSW(distance_metric="angular"),
        is_document_field=False,
    ),
    Field(
        name="step_embeddings",
        type="tensor<float>(p{},x[384])",
        indexing=["input steps", "embed", "index", "attribute"],
        ann=HNSW(distance_metric="angular"),
        is_document_field=False,
    ),
    Field(
        name="ingredient_embeddings",
        type="tensor<float>(p{},x[384])",
        indexing=["input ingredients", "embed", "index", "attribute"],
        ann=HNSW(distance_metric="angular"),
        is_document_field=False,
    ),
    Field(
        name="body_split_embedding",
        type="tensor<float>(p{},x[384])",
        indexing=["input body_split", "embed", "index", "attribute"],
        ann=HNSW(distance_metric="angular"),
        is_document_field=False,
    ),
    #
    # Alteratively, for exact distance calculation not using HNSW:
    #
    # Field(name="paragraph_embeddings", type="tensor<float>(p{},x[384])",
    #       indexing=["input paragraphs", "embed", "attribute"],
    #       attribute=["distance-metric: angular"],
    #       is_document_field=False)
)

One field of particular interest is `embeddings`.
Note that we are _not_ feeding embeddings to this instance.
Instead, the embeddings are generated by using the [embed](https://docs.vespa.ai/en/embedding.html)
feature, using the model configured at start.
Read more in [Text embedding made simple](https://blog.vespa.ai/text-embedding-made-simple/).

Looking closely at the code, `paragraph_embeddings` uses `is_document_field=False`, meaning it will read another field as input (here `paragraph`), and run `embed` on it.

As only one model is configured, `embed` will use that one -
it is possible to configure mode models and use `embed model-id` as well.

As the code comment illustrates, there can be different distrance metrics used,
as well as using an _exact_ or _approximate_ nearest neighbor search.

## Configure rank profiles

A rank profile defines the computation for the ranking,
with a wide range of possible features as input.
Below you will find `first_phase` ranking using text ranking (`bm`),
semantic ranking using vector distance (consider a tensor a vector here),
and combinations of the two:


In [108]:
app_package.schema.add_rank_profile(
    RankProfile(
        name="bm25_2", 
        inputs=[("query(q)", "tensor<float>(x[384])")],
        functions=[Function(
            name="bm25sum", expression="bm25(title) + bm25(body)"
        )],
        first_phase="bm25sum"
    )
)

app_package.schema.add_rank_profile(
    RankProfile(
        name="semantic",
        inputs=[("query(q)", "tensor<float>(x[384])")],
        inherits="default",
        first_phase="cos(distance(field,body_split_embedding))",
        match_features=["closest(body_split_embedding)"],
    )
)

app_package.schema.add_rank_profile(
    RankProfile(
        name="semantic_2", 
        inputs=[("query(q)", "tensor<float>(x[384])")],
        first_phase="closeness(field, embedding)"
    )
)

app_package.schema.add_rank_profile(
    RankProfile(
        name="bm25_3", 
        inputs=[("query(q)", "tensor<float>(x[384])")],
        first_phase="3*bm25(title) + 2*bm25(description) + 2*bm25(tags) + bm25(steps)"
    )
)


app_package.schema.add_rank_profile(
    RankProfile(
        name="hybrid_2",
        inherits="semantic_2",
        functions=[
            Function(
                name="avg_fields_similarity",
                expression="""reduce(
                              sum(l2_normalize(query(q),x) * l2_normalize(attribute(body_split_embedding),x),x),
                              avg,
                              p
                          )""",
            ),
            Function(
                name="max_fields_similarity",
                expression="""reduce(
                              sum(l2_normalize(query(q),x) * l2_normalize(attribute(body_split_embedding),x),x),
                              max,
                              p
                          )""",
            ),
            Function(
                name="all_fields_similarities",
                expression="sum(l2_normalize(query(q),x) * l2_normalize(attribute(body_split_embedding),x),x)",
            ),
            
        ],
        first_phase=FirstPhaseRanking(
            expression="cos(distance(field,embedding))"
        ),
        second_phase=SecondPhaseRanking(
            expression="firstPhase + avg_fields_similarity() + log( bm25(title) + bm25(description))"
        ),
        match_features=[
            "closest(step_embeddings)",
            "firstPhase",
            "bm25(title)",
            "bm25(description)",
            "bm25(steps)", 
            "avg_fields_similarity",
            "max_fields_similarity",
            "all_fields_similarities",
        ],
    )
)

In [113]:
app_package.schema.add_rank_profile(
    RankProfile(
        name="hybrid_3",
        inherits="semantic_2",
        functions=[
            Function(
                name="avg_fields_similarity",
                expression="""reduce(
                              sum(l2_normalize(query(q),x) * l2_normalize(attribute(body_split_embedding),x),x),
                              avg,
                              p
                          )""",
            ),
            Function(
                name="max_fields_similarity",
                expression="""reduce(
                              sum(l2_normalize(query(q),x) * l2_normalize(attribute(body_split_embedding),x),x),
                              max,
                              p
                          )""",
            ),
            Function(
                name="all_fields_similarities",
                expression="sum(l2_normalize(query(q),x) * l2_normalize(attribute(body_split_embedding),x),x)",
            ),
            
        ],
        first_phase=FirstPhaseRanking(
            expression="cos(distance(field,embedding))"
        ),
        second_phase=SecondPhaseRanking(
            expression="firstPhase + avg_fields_similarity()"
        ),
        match_features=[
            "closest(step_embeddings)",
            "firstPhase",
            "bm25(title)",
            "bm25(description)",
            "bm25(steps)", 
            "avg_fields_similarity",
            "max_fields_similarity",
            "all_fields_similarities",
        ],
    )
)

In [117]:
app_package.schema.add_rank_profile(
    RankProfile(
        name="semantic_3", 
        inputs=[("query(q)", "tensor<float>(x[384])")],
        first_phase="closeness(field, body_split_embedding)"
    )
)

## Configure fieldset

A [fieldset](https://docs.vespa.ai/en/reference/schema-reference.html#fieldset)
is a way to configure search in multiple fields:


In [99]:
# app_package.schema.add_field_set(
#     FieldSet(name="default", fields=["title", "tags", "steps", "description", "ingredients"])
# )

## Configure document summary

A [document summary](https://docs.vespa.ai/en/document-summaries.html)
is the collection of fields to return in query results -
the default summary is used unless other specified in the query.

In [94]:
# app_package.schema.add_document_summary(
#     DocumentSummary(
#         name="minimal",
#         summary_fields=[Summary("id", "int"), Summary("title", "string")],
#     )
# )

## Export the configuration

At this point, the application is well defined.
Remember that the Component configuration at start configures model files to be found in a `model` directory.
We must therefore export the configuration and add the models, before we can deploy to the Vespa instance.
Export the [application package](https://docs.vespa.ai/en/application-packages.html):


In [118]:
Path("pkg").mkdir(parents=True, exist_ok=True)
app_package.to_files("pkg")

It is a good idea to inspect the files exported into `pkg` - these are files referred to in the
[Vespa Documentation](https://docs.vespa.ai/).


## Download model files

At this point, we can save the model files into the application package:


In [86]:
! mkdir -p pkg/model
! curl -L -o pkg/model/tokenizer.json \
  https://raw.githubusercontent.com/vespa-engine/sample-apps/master/simple-semantic-search/model/tokenizer.json

! curl -L -o pkg/model/e5-small-v2-int8.onnx \
  https://github.com/vespa-engine/sample-apps/raw/master/simple-semantic-search/model/e5-small-v2-int8.onnx

A sintaxe do comando est� incorreta.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  694k  100  694k    0     0  1038k      0 --:--:-- --:--:-- --:--:-- 1041k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
  8 32.3M   

## Deploy the application

As all the files in the app package are ready, we can start a Vespa instance - here using Docker.
Deploy the app package:


In [121]:
# pip install requests_toolbelt

In [119]:
from vespa.deployment import VespaDocker

vespa_docker = VespaDocker()
app = vespa_docker.deploy_from_disk(application_name="findmypasta", application_root="pkg")

# vespa_docker = VespaDocker()
# app = vespa_docker.deploy(application_package=app_package)

Waiting for configuration server, 0/60 seconds...
Waiting for configuration server, 5/60 seconds...
Waiting for configuration server, 10/60 seconds...
Using plain http against endpoint http://localhost:8080/ApplicationStatus
Waiting for application status, 0/300 seconds...
Using plain http against endpoint http://localhost:8080/ApplicationStatus
Waiting for application status, 5/300 seconds...
Using plain http against endpoint http://localhost:8080/ApplicationStatus
Waiting for application status, 10/300 seconds...
Using plain http against endpoint http://localhost:8080/ApplicationStatus
Waiting for application status, 15/300 seconds...
Using plain http against endpoint http://localhost:8080/ApplicationStatus
Waiting for application status, 20/300 seconds...
Using plain http against endpoint http://localhost:8080/ApplicationStatus
Waiting for application status, 25/300 seconds...
Using plain http against endpoint http://localhost:8080/ApplicationStatus
Waiting for application status, 3

## Feed documents


In [None]:
def recipe_file_body_lines(recipe, complementary_data = None):
    """
    Function responsible for creating the recipe body.
    """
    # Transformar as colunas de strings para listas
    recipe['tags'] = recipe['tags'].strip("[]").replace("'", "").split(', ')
    recipe['steps'] = recipe['steps'].strip("[]").replace("'", "").split(', ')
    recipe['ingredients'] = recipe['ingredients'].strip("[]").replace("'", "").split(', ')

    # reviews = complementary_data[complementary_data['recipe_id'] == recipe['id']]

    # # ordering by descending date
    # reviews = reviews.sort_values('date', ascending=False)

    # # getting the average rating
    # avg_rating = reviews['rating'].mean()

    # # if the average rating is NaN, we will set it to "No reviews"
    # if np.isnan(avg_rating):
    #     avg_rating = "No reviews"

    # creating the recipe body
    recipe_body = recipe['name'] + '\n' \
    + "Recipe posted on: " + str(recipe['submitted']) + '\n' \
    + "Tags: " + ', '.join(recipe['tags']) + '\n' \
    + "Description: " + recipe['description'] + '\n' \
    + "This recipe takes " + str(recipe['minutes']) + " minutes to be done." + '\n' \
    + "For this recipe you will need the ingredients: " + '\n' \
    + ', '.join(recipe['ingredients']) + '\n' \
    + "The " + str(recipe["n_steps"]) + " steps to make this recipe are: " + '\n' \
    + ', '.join(recipe['steps']) 
    return recipe_body

In [None]:
# Função para aplicar recipe_file_body_lines a cada linha do DataFrame de receitas
def apply_recipe_file_body_lines(recipe_row):
    return recipe_file_body_lines(recipe_row)

In [None]:
# # Carregando o CSV e removendo valores nulos
# df = pd.read_csv('archive/RAW_recipes.csv')
# df = df.dropna()
# df = df.reset_index(drop=True)

# df_reviews = pd.read_csv('archive/RAW_interactions.csv')
# df_reviews = df_reviews.dropna()
# df_reviews = df_reviews.reset_index(drop=True)
# df_reviews['review'] = df_reviews['review'].apply(treat_text)
# df_reviews['review+rating'] = '"' + df_reviews['review'] + '"' + ' - User rating: ' + df_reviews['rating'].astype(str)

In [179]:
# # Carregando o CSV e removendo valores nulos
# df = pd.read_csv('archive/RAW_recipes.csv')
# df = df.dropna()
# df = df.reset_index(drop=True)

# df['body'] = df.apply(apply_recipe_file_body_lines, axis=1)
# df['body_split'] = df['body'].str.split('\n')

# df['minutes'] = "This recipe takes " + df['minutes'].astype(str) + " minutes to be done."
# df['submitted'] = 'Recipe submitted on: ' + df["submitted"]
# df['tags'] = df["tags"]
# df['n_steps'] = 'Number of steps to make this recipe: ' + df['n_steps'].astype(str)
# df['n_ingredients'] = 'Number of ingredients: ' + df['n_ingredients'].astype(str)
# df['steps'] = df["steps"]
# df['description'] = df["description"]
# df['ingredients'] = df["ingredients"]
# df['title'] = df['name']

# namespace = "recipes"
# document_type = "findmypasta"

# # Função para converter o formato dos dados para o formato esperado pelo Vespa
# def to_vespa_format(x):
#     document_id = f"id:{namespace}:{document_type}::{x['id']}"
#     return {
#         "put": document_id,
#         "fields": {
#             "id": x["id"],
#             "title": x["name"],
#             "tags": ast.literal_eval(x["tags"]),
#             "steps": ast.literal_eval(x["steps"]),
#             "description": x["description"],
#             "ingredients": ast.literal_eval(x["ingredients"]),
#             "minutes": x["minutes"],
#             "n_steps": x["n_steps"],
#             "n_ingredients": x["n_ingredients"],
#             "submitted": x["submitted"],
#             "body": x["body"],
#             "body_split": x["body_split"]
#         }
#     }

# # Criando o feed do Vespa
# vespa_feed = df.apply(to_vespa_format, axis=1).tolist()
# vespa_feed_slice = vespa_feed[0:10000]
# # Salvando o feed em um arquivo JSONL
# with open("vespa_feed2.jsonl", "w") as f:
#     for item in vespa_feed_slice:
#         f.write(json.dumps(item) + "\n")

In [180]:
# ! vespa config set target local
# ! vespa feed vespa_feed2.jsonl

{
  "feeder.operation.count": 10000,
  "feeder.seconds": 16113.020,
  "feeder.ok.count": 3010,
  "feeder.ok.rate": 0.187,
  "feeder.error.count": 70570,
  "feeder.inflight.count": 0,
  "http.request.count": 73580,
  "http.request.bytes": 3073306,
  "http.request.MBps": 0.000,
  "http.exception.count": 70570,
  "http.response.count": 3010,
  "http.response.bytes": 287530,
  "http.response.MBps": 0.000,
  "http.response.error.count": 70570,
  "http.response.latency.millis.min": 1,
  "http.response.latency.millis.avg": 2131,
  "http.response.latency.millis.max": 3179913,
  "http.response.code.counts": {
    "200": 3010
  }
}


feed: got error "Post "http://127.0.0.1:8080/document/v1/recipes/findmypasta/docid/385912": unexpected EOF" (no body) for put id:recipes:findmypasta::385912: giving up after 10 attempts
feed: got error "Post "http://127.0.0.1:8080/document/v1/recipes/findmypasta/docid/314417": unexpected EOF" (no body) for put id:recipes:findmypasta::314417: giving up after 10 attempts
feed: got error "Post "http://127.0.0.1:8080/document/v1/recipes/findmypasta/docid/204480": unexpected EOF" (no body) for put id:recipes:findmypasta::204480: giving up after 10 attempts
feed: got error "Post "http://127.0.0.1:8080/document/v1/recipes/findmypasta/docid/407621": unexpected EOF" (no body) for put id:recipes:findmypasta::407621: giving up after 10 attempts
feed: got error "Post "http://127.0.0.1:8080/document/v1/recipes/findmypasta/docid/162376": unexpected EOF" (no body) for put id:recipes:findmypasta::162376: giving up after 10 attempts
feed: got error "Post "http://127.0.0.1:8080/document/v1/recipes/findm

# Add Results

In [36]:
# loading the Questions.xlsx and answering each question query
import pandas as pd
questions = pd.read_excel('Search_Engine/input/Questions.xlsx')
questions = pd.read_excel('Search_Engine/input/Recipe_Search_Questions.xlsx')

In [120]:
from vespa.io import VespaQueryResponse
import json

# Supondo que 'questions' é um DataFrame com colunas ['Query', 'Tipo', 'Descrição']
data = pd.DataFrame(columns=['id', 'title', 'Query', 'Tipo', 'Descrição'])

model_to_ranking_dict = {
    "bm25_2": "bm25_2",
    "bm25_3": "bm25_3", 
    "semantic_2": "semantic_2",
    "semantic_3": "semantic_3",
    "hybrid_2": "hybrid_2",
    "hybrid_3": "hybrid_3",
}

selected_model = "semantic_3"

assert selected_model in model_to_ranking_dict.keys()

output_name = 'output/Results_' + selected_model + '_extraQuestions' + '.xlsx'

if model_to_ranking_dict[selected_model] is not None:
    i = 0
    for input_query in questions['Query']:
        # save a checkpoint each 100 queries
        if i % 100 == 0:
            data.to_excel(output_name, index=False)

        with app.syncio(connections=1) as session:
            try:
                response: VespaQueryResponse = session.query(
                    yql="select * from sources * where ({targetHits:1000}nearestNeighbor(embedding,q)) limit 5",
                    query=input_query,
                    ranking=model_to_ranking_dict[selected_model],
                    body={
                        "input.query(q)": f"embed({input_query})",
                        "timeout": "30s"  # Aumentar o tempo limite para 10 segundos
                    }
                )
                assert response.is_successful()
            except Exception as e:
                print(f"Error with query '{input_query}': {e}")
                continue

            for hit in response.hits:
                record = {}
                for field in ['id', 'title']:
                    record[field] = hit['fields'].get(field, None)
                record["Query"] = input_query
                record["Tipo"] = questions[questions['Query'] == input_query]['Tipo'].values[0]
                record["Descrição"] = questions[questions['Query'] == input_query]['Descrição'].values[0]
                data = pd.concat([data, pd.DataFrame([record])], ignore_index=True)

        i += 1

    # Sorting
    data = data.sort_values(by=['Tipo', 'Query'])

    # reordering columns
    data = data[['Tipo', 'Descrição', 'Query', 'id', 'title']]

    # exporting to excel
    data.to_excel(output_name, index=False)


## Semantic vector search on the paragraph level

This query creates an embedding of the query "what does 24 mean in the context of railways"
and specifies the `semantic` ranking profile: `cos(distance(field,paragraph_embeddings))`.
This will hence compute the distance between the vector in the query
and the vectors computed when indexing: `"input paragraphs", "embed", "index", "attribute"`:


In [62]:
result = app.query(
    body={
        "yql": "select * from findmypasta where {targetHits:2}nearestNeighbor(step_embeddings,q)",
        "input.query(q)": "embed(chocolate?)",
        "ranking.profile": "semantic_2",
        "presentation.format.tensors": "short-value",
        "hits": 2,
    }
)
result.hits
if len(result.hits) != 2:
    raise ValueError("Expected 2 hits, got {}".format(len(result.hits)))
print(json.dumps(result.hits, indent=4))

[
    {
        "id": "id:recipes:findmypasta::306471",
        "relevance": 0.0,
        "source": "findmypasta_content",
        "fields": {
            "sddocname": "findmypasta",
            "body_split": [
                "7 layer cookies",
                "Recipe posted on: 2008-05-30",
                "Tags: 60-minutes-or-less, time-to-make, course, cuisine, preparation, occasion, north-american, desserts, american, easy, beginner-cook, holiday-event, cookies-and-brownies, comfort-food, taste-mood, sweet",
                "Description: a very nice lady from the hospital i gotta go to gave me this recipe the other day and said these are the best cookies ever :)\r",
                "i made them for the first time today and they were good.\r",
                "i would suggest to use a square pan ( i have a 8x8x2.5 inch glass baking pan and think that it is perfect) and might use more graham crackers next time.\r",
                "i am posting the recipe as i received it.\r",
     

An interesting question then is, of the paragraphs in the document, which one was the closest?
When analysing ranking,
using [match-features](https://docs.vespa.ai/en/reference/schema-reference.html#match-features)
lets you export the scores used in the ranking calculations, see
[closest](<https://docs.vespa.ai/en/reference/rank-features.html#closest(name)>) - from the result above:

```
 "matchfeatures": {
                "closest(paragraph_embeddings)": {
                    "4": 1.0
                }
}
```

This means, the tensor of index 4 has the closest match. With this, it is straight forward to feed articles with an array of paragraphs and highlight the best matching paragraph in the document!


In [None]:
def find_best_paragraph(hit: dict) -> str:
    paragraphs = hit["fields"]["steps"]
    match_features = hit["fields"]["matchfeatures"]
    index = int(list(match_features["closest(step_embeddings)"].keys())[0])
    return paragraphs[index]

In [None]:
find_best_paragraph(result.hits[0])

'melt chocolate'

## Hybrid search and ranking

Hybrid combining keyword search on the article level with vector search in the paragraph index:


In [None]:
result = app.query(
    body={
        "yql": "select * from findmypasta where userQuery() or ({targetHits:6}nearestNeighbor(step_embeddings,q))",
        "input.query(q)": "embed(chocolate)",
        "query": "chocolate cake",
        "ranking.profile": "hybrid",
        "presentation.format.tensors": "short-value",
        "hits": 6,
    }
)
if len(result.hits) != 6:
    raise ValueError("Expected 1 hits, got {}".format(len(result.hits)))
print(json.dumps(result.hits, indent=4))

[
    {
        "id": "id:recipes:findmypasta::392181",
        "relevance": 4.94590496132451,
        "source": "findmypasta_content",
        "fields": {
            "matchfeatures": {
                "bm25(description)": 6.668711207976459,
                "bm25(steps)": 7.984511600035208,
                "bm25(title)": 2.0003665414910587,
                "closest(step_embeddings)": {
                    "14": 1.0
                },
                "firstPhase": 0.8350588786320291,
                "all_paragraph_similarities": {
                    "0": 0.7568165063858032,
                    "1": 0.775661826133728,
                    "2": 0.7706385850906372,
                    "3": 0.7548873424530029,
                    "4": 0.7492822408676147,
                    "5": 0.7702987194061279,
                    "6": 0.7517319917678833,
                    "7": 0.7745205163955688,
                    "8": 0.8311383724212646,
                    "9": 0.7718607783317566,
              

This case combines exact search with nearestNeighbor search. The `hybrid` rank-profile above
also calculates several additional features using
[tensor expressions](https://docs.vespa.ai/en/tensor-user-guide.html):

- `firstPhase` is the score of the first ranking phase, configured in the hybrid
  profile as `cos(distance(field, paragraph_embeddings))`.
- `all_paragraph_similarities` returns all the similarity scores for all paragraphs.
- `avg_paragraph_similarity` is the average similarity score across all the paragraphs.
- `max_paragraph_similarity` is the same as `firstPhase`, but computed using a tensor expression.

These additional features are calculated during [second-phase ranking](https://docs.vespa.ai/en/phased-ranking.html)
to limit the number of vector computations.

The [Tensor Playground](https://docs.vespa.ai/playground/) is useful to play with tensor expressions.

The [Hybrid Search](https://blog.vespa.ai/improving-zero-shot-ranking-with-vespa/) blog post series
is a good read to learn more about hybrid ranking!


In [None]:
def find_paragraph_scores(hit: dict) -> str:
    paragraphs = hit["fields"]["paragraphs"]
    match_features = hit["fields"]["matchfeatures"]
    indexes = [int(v) for v in match_features["all_paragraph_similarities"]]
    scores = list(match_features["all_paragraph_similarities"].values())
    return list(zip([paragraphs[i] for i in indexes], scores))

In [None]:
find_paragraph_scores(result.hits[0])

[('<hi>The</hi> <hi>24</hi>-hour clock is a way <hi>of</hi> telling <hi>the</hi> time <hi>in</hi> which <hi>the</hi> day runs from midnight to midnight and is divided into <hi>24</hi> hours, numbered from 0 to 23. It <hi>does</hi> not use a.m. or p.m. This system is also referred to (only <hi>in</hi> <hi>the</hi> US and <hi>the</hi> English speaking parts <hi>of</hi> Canada) as military time or (only <hi>in</hi> <hi>the</hi> United Kingdom and now very rarely) as continental time. <hi>In</hi> some parts <hi>of</hi> <hi>the</hi> world, it is called <hi>railway</hi> time. Also, <hi>the</hi> international standard notation <hi>of</hi> time (ISO 8601) is based on this format.',
  0.8497915267944336),
 ('A time <hi>in</hi> <hi>the</hi> <hi>24</hi>-hour clock is written <hi>in</hi> <hi>the</hi> form hours:minutes (for example, 01:23), or hours:minutes:seconds (01:23:45). Numbers under 10 have a zero <hi>in</hi> front (called a leading zero); e.g. 09:07. Under <hi>the</hi> <hi>24</hi>-hour cl

## Hybrid search and filter

YQL is a structured query langauge.
In the query examples, the user input is fed as-is using the `userQuery()` operator.

Filters are normally separate from the user input,
below is an example of adding a filter `url contains "9985"` to the YQL string.

Finally, the use the [Query API](https://docs.vespa.ai/en/query-api.html) for other options, like highlighting -
here disable [bolding](https://docs.vespa.ai/en/reference/schema-reference.html#bolding):


In [None]:
result = app.query(
    body={
        "yql": 'select * from findmypasta where ({targetHits:1}nearestNeighbor(step_embeddings,q))',
        "input.query(q)": "embed(what does 24 mean in the context of railways)",
        "query": "what does 24 mean in the context of railways",
        "ranking.profile": "hybrid",
        "bolding": False,
        "presentation.format.tensors": "short-value",
        "hits": 1,
    }
)
if len(result.hits) != 1:
    raise ValueError("Expected one hit, got {}".format(len(result.hits)))
print(json.dumps(result.hits, indent=4))

[
    {
        "id": "id:recipes:findmypasta::336323",
        "relevance": "-Infinity",
        "source": "findmypasta_content",
        "fields": {
            "matchfeatures": {
                "bm25(description)": 0.0,
                "bm25(steps)": 0.0,
                "bm25(title)": 0.0,
                "closest(step_embeddings)": {
                    "1": 1.0
                },
                "firstPhase": 0.7806437331403645,
                "all_paragraph_similarities": {
                    "0": 0.7388210296630859,
                    "1": 0.7806437015533447,
                    "2": 0.7215437889099121,
                    "3": 0.7284496426582336,
                    "4": 0.7706419229507446,
                    "5": 0.7345936298370361,
                    "6": 0.7291796207427979,
                    "7": 0.7272926568984985,
                    "8": 0.7424706220626831,
                    "9": 0.7324963808059692,
                    "10": 0.7383525371551514
                }

In short, the above query demonstrates how easy it is to combine various ranking strategies,
and also combine with filters.

To learn more about pre-filtering vs post-filtering,
read [Filtering strategies and serving performance](https://blog.vespa.ai/constrained-approximate-nearest-neighbor-search/).
[Semantic search with multi-vector indexing](https://blog.vespa.ai/semantic-search-with-multi-vector-indexing/)
is a great read overall for this domain.


## Cleanup


In [None]:
vespa_docker.container.stop()
vespa_docker.container.remove()