<picture>
  <source media="(prefers-color-scheme: dark)" srcset="https://assets.vespa.ai/logos/Vespa-logo-green-RGB.svg">
  <source media="(prefers-color-scheme: light)" srcset="https://assets.vespa.ai/logos/Vespa-logo-dark-RGB.svg">
  <img alt="#Vespa" width="200" src="https://assets.vespa.ai/logos/Vespa-logo-dark-RGB.svg" style="margin-bottom: 25px;">
</picture>

# Evaluating a Vespa Application

This guide will show how you easily can evaluate a Vespa application using pyvespa's `VespaEvaluator` class. 

<div class="alert alert-info">
    Refer to <a href="https://pyvespa.readthedocs.io/en/latest/troubleshooting.html">troubleshooting</a>
    for any problem when running this guide.
</div>


**Pre-requisite**: Create a tenant at [cloud.vespa.ai](https://cloud.vespa.ai/), save the tenant name.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vespa-engine/pyvespa/blob/master/docs/sphinx/source/getting-started-pyvespa-cloud.ipynb)


## Install

Install [pyvespa](https://pyvespa.readthedocs.io/) >= 0.45
and the [Vespa CLI](https://docs.vespa.ai/en/vespa-cli.html).
The Vespa CLI is used for data and control plane key management ([Vespa Cloud Security Guide](https://cloud.vespa.ai/en/security/guide)).


In [1]:
#!pip3 install pyvespa vespacli

## Configure application


In [2]:
# Replace with your tenant name from the Vespa Cloud Console
tenant_name = "scoober"
# Replace with your application name (does not need to exist yet)
application = "evaluation"

## Create an application package

The [application package](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.package.ApplicationPackage)
has all the Vespa configuration files -
create one from scratch:


In [3]:
from vespa.package import (
    ApplicationPackage,
    Field,
    Schema,
    Document,
    HNSW,
    RankProfile,
    Component,
    Parameter,
    FieldSet,
    GlobalPhaseRanking,
    Function,
)

package = ApplicationPackage(
    name=application,
    schema=[
        Schema(
            name="doc",
            document=Document(
                fields=[
                    Field(name="id", type="string", indexing=["summary"]),
                    Field(
                        name="text",
                        type="string",
                        indexing=["index", "summary"],
                        index="enable-bm25",
                        bolding=True,
                    ),
                    Field(
                        name="embedding",
                        type="tensor<float>(x[384])",
                        indexing=[
                            "input text",
                            "embed",  # uses default model
                            "index",
                            "attribute",
                        ],
                        ann=HNSW(distance_metric="angular"),
                        is_document_field=False,
                    ),
                ]
            ),
            fieldsets=[FieldSet(name="default", fields=["text"])],
            rank_profiles=[
                RankProfile(
                    name="bm25",
                    inputs=[("query(q)", "tensor<float>(x[384])")],
                    functions=[Function(name="bm25sum", expression="bm25(text)")],
                    first_phase="bm25sum",
                ),
                RankProfile(
                    name="semantic",
                    inputs=[("query(q)", "tensor<float>(x[384])")],
                    first_phase="closeness(field, embedding)",
                ),
                RankProfile(
                    name="fusion",
                    inherits="bm25",
                    inputs=[("query(q)", "tensor<float>(x[384])")],
                    first_phase="closeness(field, embedding)",
                    global_phase=GlobalPhaseRanking(
                        expression="reciprocal_rank_fusion(bm25sum, closeness(field, embedding))",
                        rerank_count=1000,
                    ),
                ),
            ],
        )
    ],
    components=[
        Component(
            id="e5",
            type="hugging-face-embedder",
            parameters=[
                Parameter(
                    "transformer-model",
                    {
                        "url": "https://huggingface.co/intfloat/e5-small-v2/resolve/main/model.onnx"
                    },
                ),
                Parameter(
                    "tokenizer-model",
                    {
                        "url": "https://huggingface.co/intfloat/e5-small-v2/resolve/main/tokenizer.json"
                    },
                ),
            ],
        )
    ],
)

Note that the name cannot have `-` or `_`.


## Deploy to Vespa Cloud

The app is now defined and ready to deploy to Vespa Cloud.

Deploy `package` to Vespa Cloud, by creating an instance of
[VespaCloud](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.deployment.VespaCloud):


In [4]:
from vespa.deployment import VespaCloud
import os

# Key is only used for CI/CD. Can be removed if logging in interactively

vespa_cloud = VespaCloud(
    tenant=tenant_name,
    application=application,
    key_content=os.getenv(
        "VESPA_TEAM_API_KEY", None
    ),  # Key is only used for CI/CD. Can be removed if logging in interactively
    application_package=package,
)

Setting application...
Running: vespa config set application scoober.evaluation
Setting target cloud...
Running: vespa config set target cloud

Api-key found for control plane access. Using api-key.


For more details on different authentication options and methods, see [authenticating-to-vespa-cloud](https://pyvespa.readthedocs.io/en/latest/authenticating-to-vespa-cloud.html).

The following will upload the application package to Vespa Cloud Dev Zone (`aws-us-east-1c`), read more about [Vespa Zones](https://cloud.vespa.ai/en/reference/zones.html).
The Vespa Cloud Dev Zone is considered as a sandbox environment where resources are down-scaled and idle deployments are expired automatically.
For information about production deployments, see the following [method](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.deployment.VespaCloud.deploy_to_prod).

> Note: Deployments to dev and perf expire after 7 days of inactivity, i.e., 7 days after running deploy. This applies to all plans, not only the Free Trial. Use the Vespa Console to extend the expiry period, or redeploy the application to add 7 more days.


Now deploy the app to Vespa Cloud dev zone.

The first deployment typically takes 2 minutes until the endpoint is up. (Applications that for example refer to large onnx-models may take a bit longer.)


In [None]:
from vespa.deployment import VespaDocker

vespa_docker = VespaDocker()

app = vespa_docker.deploy(application_package=package)

Waiting for configuration server, 0/60 seconds...
Waiting for configuration server, 5/60 seconds...
Waiting for application to come up, 0/300 seconds.
Waiting for application to come up, 5/300 seconds.
Waiting for application to come up, 10/300 seconds.
Waiting for application to come up, 15/300 seconds.
Waiting for application to come up, 20/300 seconds.
Waiting for application to come up, 25/300 seconds.


If the deployment failed, it is possible you forgot to add the key in the Vespa Cloud Console in the `vespa auth api-key` step above.

If you can authenticate, you should see lines like the following

```
 Deployment started in run 1 of dev-aws-us-east-1c for mytenant.hybridsearch.
```

The deployment takes a few minutes the first time while Vespa Cloud sets up the resources for your Vespa application

`app` now holds a reference to a [Vespa](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.application.Vespa) instance. We can access the
mTLS protected endpoint name using the control-plane (vespa_cloud) instance. This endpoint we can query and feed to (data plane access) using the
mTLS certificate generated in previous steps.


In [None]:
# endpoint = vespa_cloud.get_mtls_endpoint()
# endpoint

HTTPError: HTTP 401 reason: Unauthorized error_text: {
  "code" : 401,
  "message" : "Access denied - not authenticated"
} for /application/v4/tenant/scoober/application/evaluation/instance/default/environment/dev/region/aws-us-east-1c

## Feeding documents to Vespa

In this example we use the [HF Datasets](https://huggingface.co/docs/datasets/index) library to stream the
[BeIR/nfcorpus](https://huggingface.co/datasets/BeIR/nfcorpus) dataset and index in our newly deployed Vespa instance. Read
more about the [NFCorpus](https://www.cl.uni-heidelberg.de/statnlpgroup/nfcorpus/):

> NFCorpus is a full-text English retrieval data set for Medical Information Retrieval.

The following uses the [stream](https://huggingface.co/docs/datasets/stream) option of datasets to stream the data without
downloading all the contents locally. The `map` functionality allows us to convert the
dataset fields into the expected feed format for `pyvespa` which expects a dict with the keys `id` and `fields`:

`{ "id": "vespa-document-id", "fields": {"vespa_field": "vespa-field-value"}}`


In [7]:
from datasets import load_dataset

dataset_id = "zeta-alpha-ai/NanoMSMARCO"

dataset = load_dataset(dataset_id, "corpus", split="train", streaming=True)
vespa_feed = dataset.map(
    lambda x: {
        "id": x["_id"],
        "fields": {"text": x["text"], "id": x["_id"]},
    }
)

In [None]:
query_ds = load_dataset(dataset_id, "queries", split="train")
qrels = load_dataset(dataset_id, "qrels", split="train")

In [9]:
_ids, _texts = query_ds["_id"], query_ds["text"]
ids_to_query = dict(zip(_ids, _texts))

In [10]:
for idx, (qid, q) in enumerate(ids_to_query.items()):
    print(f"qid: {qid}, query: {q}")
    if idx == 5:
        break

qid: 994479, query: which health care system provides all citizens or residents with equal access to health care services
qid: 1009388, query: what's right in health care
qid: 1088332, query: weather in oran
qid: 265729, query: how long keep financial records
qid: 1099433, query: how do hoa fees work
qid: 200600, query: heels or heal


In [11]:
query_ids, doc_ids = qrels["query-id"], qrels["corpus-id"]
relevant_docs = dict(zip(query_ids, doc_ids))
relevant_docs

{'994479': '7275120',
 '1009388': '7248824',
 '1088332': '7094398',
 '265729': '7369987',
 '1099433': '7255675',
 '200600': '7929603',
 '924398': '7813557',
 '531490': '3550561',
 '408563': '2322244',
 '1048359': '7187663',
 '603050': '7707024',
 '1060040': '7168976',
 '96749': '7301814',
 '792789': '7970736',
 '1067764': '7160805',
 '1029003': '7220310',
 '1091973': '7870707',
 '865660': '7948042',
 '804996': '7958943',
 '527568': '7503570',
 '1059045': '7170234',
 '570070': '7623736',
 '429675': '7689707',
 '242107': '92150',
 '721409': '7424964',
 '601684': '7527096',
 '825954': '7741548',
 '866251': '7810605',
 '988540': '7288502',
 '1074807': '7154587',
 '1028652': '7833608',
 '497757': '7526867',
 '1086384': '7110606',
 '538333': '7672042',
 '1092161': '7853304',
 '971233': '4589415',
 '1067640': '7161101',
 '1093082': '7778956',
 '1027209': '7831679',
 '47864': '1867101',
 '643572': '7420248',
 '544319': '7562624',
 '1039495': '7856346',
 '1093172': '7770963',
 '558548': '621259

Now we can feed to Vespa using `feed_iterable` which accepts any `Iterable` and an optional callback function where we can
check the outcome of each operation. The application is configured to use [embedding](https://docs.vespa.ai/en/embedding.html)
functionality, that produce a vector embedding using a concatenation of the title and the body input fields. This step is resource intensive.

Read more about embedding inference in Vespa in the [Accelerating Transformer-based Embedding Retrieval with Vespa](https://blog.vespa.ai/accelerating-transformer-based-embedding-retrieval-with-vespa/)
blog post.

Default node resources in Vespa Cloud have 2 v-cpu for the Dev Zone.


In [12]:
from vespa.io import VespaResponse


def callback(response: VespaResponse, id: str):
    if not response.is_successful():
        print(f"Error when feeding document {id}: {response.get_json()}")


app.feed_iterable(vespa_feed, schema="doc", namespace="tutorial", callback=callback)

In [13]:
%load_ext autoreload
%autoreload 2

from vespa.evaluation import VespaEvaluator


def semantic_query_fn(query_text: str, top_k: int) -> dict:
    """
    Convert a plain text user query to a Vespa query body dict.
    The example below uses a YQL statement and requests a fixed number of hits.
    Adapt this for your ranking profile, filters, etc.
    """
    return {
        "yql": "select * from sources * where ({targetHits:1000}nearestNeighbor(embedding,q))",
        "query": query_text,
        "ranking": "semantic",
        "input.query(q)": f"embed({query_text})",
        "hits": top_k,
    }


def bm25_query_fn(query_text: str, top_k: int) -> dict:
    """
    Convert a plain text user query to a Vespa query body dict.
    The example below uses a YQL statement and requests a fixed number of hits.
    Adapt this for your ranking profile, filters, etc.
    """
    return {
        "yql": "select * from sources * where userQuery();",
        "query": query_text,
        "ranking": "bm25",
        "hits": top_k,
    }


def fusion_query_fn(query_text: str, top_k: int) -> dict:
    """
    Convert a plain text user query to a Vespa query body dict.
    The example below uses a YQL statement and requests a fixed number of hits.
    Adapt this for your ranking profile, filters, etc.
    """
    return {
        "yql": "select * from sources * where userQuery() OR ({targetHits:1000}nearestNeighbor(embedding,q));",
        "query": query_text,
        "ranking": "fusion",
        "input.query(q)": f"embed({query_text})",
        "hits": top_k,
    }


#
# 3) Instantiate the evaluator with the chosen IR metrics and run it.
#

# app = Vespa(url="http://localhost", port=8080)  # or your Vespa endpoint

In [14]:
all_results = {}
for evaluator_name, query_fn in [
    ("semantic", semantic_query_fn),
    ("bm25", bm25_query_fn),
    ("fusion", fusion_query_fn),
]:
    print(f"Evaluating {evaluator_name}...")
    evaluator = VespaEvaluator(
        queries=ids_to_query,
        relevant_docs=relevant_docs,
        vespa_query_fn=query_fn,
        app=app,
        name=evaluator_name,
        write_csv=True,  # optionally write metrics to CSV
    )

    results = evaluator()
    all_results[evaluator_name] = results

Evaluating semantic...
Evaluating bm25...
Evaluating fusion...


In [15]:
all_results

{'semantic': {'accuracy@1': 0.38,
  'accuracy@3': 0.64,
  'accuracy@5': 0.72,
  'accuracy@10': 0.82,
  'precision@1': 0.38,
  'precision@3': 0.21333333333333332,
  'precision@5': 0.14400000000000002,
  'precision@10': 0.08199999999999999,
  'recall@1': 0.38,
  'recall@3': 0.64,
  'recall@5': 0.72,
  'recall@10': 0.82,
  'mrr@10': 0.5308571428571428,
  'ndcg@10': 0.6007397354752749,
  'map@100': 0.5393493336728631},
 'bm25': {'accuracy@1': 0.3,
  'accuracy@3': 0.6,
  'accuracy@5': 0.66,
  'accuracy@10': 0.76,
  'precision@1': 0.3,
  'precision@3': 0.2,
  'precision@5': 0.132,
  'precision@10': 0.07600000000000001,
  'recall@1': 0.3,
  'recall@3': 0.6,
  'recall@5': 0.66,
  'recall@10': 0.76,
  'mrr@10': 0.4520793650793651,
  'ndcg@10': 0.526410407397105,
  'map@100': 0.45999902392693265},
 'fusion': {'accuracy@1': 0.44,
  'accuracy@3': 0.7,
  'accuracy@5': 0.72,
  'accuracy@10': 0.8,
  'precision@1': 0.44,
  'precision@3': 0.2333333333333333,
  'precision@5': 0.14400000000000002,
  'pre

In [17]:
# Nice plot of the results

import pandas as pd

results = pd.DataFrame(all_results)
results.plot(kind="bar", figsize=(12, 6))

ImportError: matplotlib is required for plotting when the default backend "matplotlib" is selected.

In [71]:
results

{'accuracy@1': 0.38,
 'accuracy@3': 0.64,
 'accuracy@5': 0.72,
 'precision@1': 0.38,
 'precision@3': 0.21333333333333332,
 'precision@5': 0.14400000000000002,
 'recall@1': 0.38,
 'recall@3': 0.64,
 'recall@5': 0.72,
 'mrr@10': 0.5308571428571428,
 'ndcg@10': 0.6007397354752749,
 'map@100': 0.5393493336728631}

In [68]:
results

{'accuracy@1': 0.38,
 'accuracy@3': 0.64,
 'accuracy@5': 0.72,
 'precision@1': 0.38,
 'precision@3': 0.21333333333333332,
 'precision@5': 0.14400000000000002,
 'recall@1': 0.38,
 'recall@3': 0.64,
 'recall@5': 0.72,
 'mrr@10': 0.5308571428571428,
 'ndcg@10': 0.6007397354752749,
 'map@100': 0.5393493336728631}

In [None]:
# Results from sentence transformers
{
    "accuracy@1": 0.38,
    "accuracy@3": 0.64,
    "accuracy@5": 0.72,
    "accuracy@10": 0.82,
    "precision@1": 0.38,
    "precision@3": 0.21333333333333332,
    "precision@5": 0.14400000000000002,
    "precision@10": 0.08199999999999999,
    "recall@1": 0.38,
    "recall@3": 0.64,
    "recall@5": 0.72,
    "recall@10": 0.82,
    "ndcg@10": 0.6007397354752749,
    "mrr@10": 0.5308571428571428,
    "map@100": 0.5393493336728631,
}

In [20]:
vespa_cloud.delete()

Deactivated vespa-team.hybridsearch in dev.aws-us-east-1c
Deleted instance vespa-team.hybridsearch.default
