# Adventures with the Approximate Nearest Neighbor Search in Vespa

TL;DR: when using `targetHits` << `hits` in `nearestNeighbor` in Vespa, the hit count depends on the actual query embedding value and filters. 


## Context

Recently, I've been working on introducing the approximate nearest neighbor (ANN) search using Vespa for an eCommerce search application.
Overall, it was a lot of fun and the search experience has improved: latencies are low and relevance is "better".

However, one corner case was a bit unexpected: when searching with an applied filter there were more hits than without a filter.
In other words, more restrictive query returns more hits than a less restrictive one.
Crazy, right?


## Setup

When defining the AB test we've tried to be as conservative as possible.
This was primarily to reduce the risk of matching too many "irrelevant" documents by ANN.
One requirement then became to limit the number of hits from ANN in the overall list of search results.
To achieve that we've set the `targetHits=1` while `hits` was set to a lot bigger value.

Intuitively, one would expect that in the final search results there would be at most `targetHits * N` (where `N` is the number of Vespa content nodes handling the search) hits that are `matched` with the `nearestNeighbor` operator.

The overall query looks like this: `select * from ann where _filters_ AND (_lexical_matches_ OR _ann_)`.


## Problem

During the AB test we've got a complaint from a user that after applying a filter the number of hits increases instead of decreasing!
The ticket even contained a video that showed exactly that.
And the problem is reproducible.
Exciting! Let's get to work.


## Investigation

The very first thing that came to my mind was that some consistent partial timeouts are happening for the ANN query without a filter and, therefore, fewer hits are being returned.
But the timeout hypothesis was quickly ruled out because both queries were returning results way faster than the set timeout.

Next, we've checked how many hits there were from the ANN.
It turned out that all hits were from the ANN query.
But not exactly `1 * N` hits in the filter-less case but a bit less than `2*N`.
The difference can be explained by the fact that the `targetHits` is not a hard limit but a target.
In other words, Vespa promises to get at least `targetHits` hits with ANN, but it can expose more for ranking.

Anyway, applying a filter on ANN search should decrease the total number of hits, right?
Instead of that, the number of hits increased to 500+ which is way more than `2*N`.

Next, followed a session of `changing random stuff and seeing what happens`.
Soon, a parameter was identified which changed the behaviour: `ranking.matching.approximateThreshold`.
When the parameter value is set to less than 0.11 then the number of hits is as expected: equal to the filter-less query.
Having that, some guru meditation followed.
After which it became clear that different search paths were executed depending on the parameter value.
For full overview on different search paths, see this great [blogpost](https://blog.vespa.ai/constrained-approximate-nearest-neighbor-search/).

When the hit count estimate is LESS than set by the `ranking.matching.approximateThreshold` then the **exact search with pre-filters** is executed: all filtered documents are scored by the vector distance metric (HNSW is skipped).
And therefore, many hits.

When the hit count estimate is MORE than set by the `ranking.matching.approximateThreshold` then the **ANN search using HNSW with pre-filters** is executed: having a list of filtered documents the HNWS is being searched for `targetHits` nearest neighbors.
And therefore, only few hits due to the fact that `targetHits` limits how many hits are needed.


## Mitigation

The is no mitigation implemented to that situation yet.
The main reason being is that in an index of 100+M docs it is very rare that when a search query returns few hit (less than one webpage) somebody would apply a filter on top.
On mobile, this problem is even less visible.
Also, as the time goes we'll get more confident with ANN and `targetHits` should increase to a bigger value so that the problem will be even less visible.


## Demo

I've prepared a small Vespa application that demonstrates that demonstrates what can happen when `targetHits` << `hits`.




In [31]:
from vespa.package import (ApplicationPackage, Field, Schema, Document, RankProfile)
from vespa.deployment import VespaDocker
from vespa.io import VespaResponse

vap = ApplicationPackage(
    name="anncornercase",
    schema=[
        Schema(
            name="ann",
            document=Document(
                fields=[
                    Field(
                        name="filter",
                        type="int",
                        indexing=["attribute", "summary"]
                    ),
                    Field(
                        name="embedding",
                        type="tensor<float>(d0[1])",
                        indexing=["attribute"])
                ]
            ),
            rank_profiles=[
                RankProfile(
                    name="ann",
                    inputs=[
                        ("query(q)", "tensor<float>(d0[1])"),
                    ],
                    first_phase="closeness(field, embedding)"
                )
            ]
        )
    ]
)

vespa_docker = VespaDocker(container_image="vespaengine/vespa:8.411.13")
app = vespa_docker.deploy(application_package=vap)

Waiting for configuration server, 0/60 seconds...
Waiting for configuration server, 5/60 seconds...




Waiting for application to come up, 0/300 seconds.




Waiting for application to come up, 5/300 seconds.




Waiting for application to come up, 10/300 seconds.




Application is up!
Finished deployment.


In [32]:
# Create and feed 100 dummy docs
docs = [
    {
        'id': f'{i}',
        'fields': {
            'filter': i,
            'embedding': [i]
        }
    } for i in range(100)]
def callback(response: VespaResponse, document_id: str):
    if not response.is_successful():
        print(f"Error when feeding document {document_id}: {response.get_json()}")

app.feed_iterable(docs, schema="ann", namespace="ann", callback=callback)

In [69]:
resp = app.query(body={
    'yql': 'select * from ann where ({targetHits:1}nearestNeighbor(embedding, q))',
    'hits': 10,
    'ranking': 'ann',
    "input.query(q)": [2.0]
})
resp.hits

[{'id': 'id:ann:ann::2',
  'relevance': 1.0,
  'source': 'anncornercase_content',
  'fields': {'sddocname': 'ann', 'documentid': 'id:ann:ann::2', 'filter': 2}}]

With an embedding `[2.0]` we get 1 hit: exactly the `targetHits`.

In [64]:
resp = app.query(body={
    'yql': 'select * from ann where ({targetHits:1}nearestNeighbor(embedding, q))',
    'hits': 10,
    'ranking': 'ann',
    "input.query(q)": [50.0]
})
len(resp.hits)

10

However, with embedding `[50.0]` we get 10 hits (unexpected), somehow much more than set with `targetHits`.

In [99]:
resp = app.query(body={
    'yql': 'select * from ann where ({targetHits:1}nearestNeighbor(embedding, q))',
    'hits': 10,
    'ranking': 'ann',
    "input.query(q)": [95.0]
})
len(resp.hits)

10

With embedding `[95.0]` we also get 10 hits.
It seems that docs in HNSW that were inserted later are "more discoverable" than the ones inserted earlier. Similar effects are reported in a blogpost [here](https://www.marqo.ai/blog/understanding-recall-in-hnsw-search).


Let's try searching with `ranking.matching.approximateThreshold=0.1` and a filter.

In [100]:
resp = app.query(body={
    'yql': 'select * from ann where ({targetHits:1, approximate:false}nearestNeighbor(embedding, q)) AND filter < 5',
    'hits': 10,
    'ranking': 'ann',
    'input.query(q)': [50.0],
    
})
len(resp.hits)

3

While searching with a filter `filter < 5` and `hits=10` for some reason we get 3 hits (unexpected).
There are clearly 5 documents (ids 0..4) that satisfy the filter but for some reason the `nearestNeighbor` retrieved 3 documents.

In the next example, let's go crazy and run the same query with different embeddings from 0 to 99.

In [95]:
hit_counts = []
for i in range(100):
    resp = app.query(body={
        'yql': 'select * from ann where ({targetHits:1, approximate:false}nearestNeighbor(embedding, q)) AND filter < 5',
        'hits': 10,
        'ranking': 'ann',
        "input.query(q)": [i],
        'ranking.matching.approximateThreshold': 0.05
    })
    hit_counts.append(len(resp.hits))
print(hit_counts)

[3, 2, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]


From the results above we see that most of the time it returns 3 hits.
But for some reason with embeddings from 0 to 5 we get `[3, 2, 1, 2, 3]` hits.
Interesting.

Anyway, that is all for this time.
Let's clean our Vespa instance and call it a day.

In [9]:
app.delete_all_docs(content_cluster_name='anncornercase_content', schema='ann')

In [149]:
vespa_docker.container.stop()
vespa_docker.container.remove()

## Fin

Various nuances of ANN search should be advertised a little bit more.
That would allow for better planning for introducing ANN into search, prevent some confusion while looking at the results, and save some time overall.
Even though with the demo setup I haven't reproduced the exact issue from the production, I hope that the demo was informative.
I encourage you to play with the setup and let me know what unexpected results you're getting.
That's it for this time. Bye!