# Vector Distance Query With Elasticsearch

This tutorial will explain the process of vector filtering in elasticsearch <br/> 

## Roadmap
1. Load the saved file from the previous notebook
2. Establish a connection to elasticsearch
3. Create index with mapping
4. Insert multiple documents vector field
5. Query the index to find nearby vector

## Environment setup

In [1]:
import os
from typing import Final

ELASTIC_PROTOCOL: Final[str] = 'http://'
ELASTIC_PORT: Final[str] = '30000'
ELASTIC_HOSTS: Final[str] = ['10.0.0.13']
INDEX_NAME: Final[str] = 'tests_vectors_test_index'
VECTORS_FILE_PATH: Final[str] = os.path.join('tutorial_workspace', 'outputs', 'vectors.json')

## 1. Load the data

In [2]:
import json

with open(VECTORS_FILE_PATH) as f:
    vectors: dict[str, list] = json.loads(f.buffer.read())

print(vectors.keys(), len(vectors['anchor']))

dict_keys(['anchor', 'negative', 'positive']) 4096


## 2. Setup elasticsearch connection:

In [3]:
from elasticsearch import Elasticsearch

es = Elasticsearch([f'{ELASTIC_PROTOCOL}{host}:{ELASTIC_PORT}' for host in ELASTIC_HOSTS])

First let's check whether the target index is exists

In [4]:
is_exists: bool = es.indices.exists(index=INDEX_NAME)
is_exists

True

In case you want to delete the index, you will do it like this:

In [5]:
es.indices.delete(INDEX_NAME)

  es.indices.delete(INDEX_NAME)


{'acknowledged': True}

After we checked the existence of the index. Let's create it. (Just in case we need)

In [52]:
# mapping = {
#     'properties': {
#         'face_embeddings': {
#             'type': 'dense_vector'
#         },
#         'name': {
#             'type': 'keyword'
#         }
#     }
# }
# es.indices.create(index=INDEX_NAME, mappings=mapping, ignore=400)

{'error': {'root_cause': [{'type': 'mapper_parsing_exception',
    'reason': 'Missing required parameter [dims] for field [face_embeddings]'}],
  'type': 'mapper_parsing_exception',
  'reason': 'Failed to parse mapping [_doc]: Missing required parameter [dims] for field [face_embeddings]',
  'caused_by': {'type': 'mapper_parsing_exception',
   'reason': 'Missing required parameter [dims] for field [face_embeddings]'}},
 'status': 400}

In [6]:
es.index(index=INDEX_NAME, document={
    'face_embeddings': vectors['positive'],
    'name': 'Positive'
})

es.index(index=INDEX_NAME, document={
    'face_embeddings': vectors['negative'],
    'name': 'Negative'
})

{'_index': 'tests_vectors_test_index',
 '_type': '_doc',
 '_id': '3pAOaYYBIe4H3HY13aK-',
 '_version': 1,
 'result': 'created',
 '_shards': {'total': 2, 'successful': 1, 'failed': 0},
 '_seq_no': 1,
 '_primary_term': 1}

In [54]:
es.indices.get_mapping(INDEX_NAME)

  es.indices.get_mapping(INDEX_NAME)


{'tests_vectors_test_index': {'mappings': {'properties': {'face_embeddings': {'type': 'float'},
    'name': {'type': 'text',
     'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}}}}}

In [42]:
es.indices.refresh(index=INDEX_NAME)

es.search()

res = es.search(index=INDEX_NAME, query={"match_all": {}})
print(f"Got {res['hits']['total']} Hits")
res['hits']['hits']

Got {'value': 2, 'relation': 'eq'} Hits


[{'_index': 'tests_vectors_test_index',
  '_type': '_doc',
  '_id': 'LpCGV4YBIe4H3HY1ozx1',
  '_score': 1.0,
  '_source': {'face_embeddings': [0.5233800411224365,
    0.24507896602153778,
    0.37103939056396484,
    0.6281567811965942,
    0.6209915280342102,
    0.38286954164505005,
    0.7842245697975159,
    0.2632266581058502,
    0.3624735176563263,
    0.3829912841320038,
    0.6232348084449768,
    0.5644453167915344,
    0.3738228380680084,
    0.3706113398075104,
    0.6637787818908691,
    0.35649314522743225,
    0.40709570050239563,
    0.3791903257369995,
    0.7797307968139648,
    0.3402252197265625,
    0.3618432581424713,
    0.7914558053016663,
    0.22229474782943726,
    0.3584567904472351,
    0.5570077300071716,
    0.5662405490875244,
    0.6380018591880798,
    0.5349650382995605,
    0.6257343888282776,
    0.2694028913974762,
    0.39909031987190247,
    0.4051913917064667,
    0.38770729303359985,
    0.62427818775177,
    0.6153143644332886,
    0.371887385

Now, that the database is filled up with data. Now it's the time to query by `dense vector`

In [46]:
res = es.search(index=INDEX_NAME, query={
  "script_score": {
    "query": {
      "match_all": {}
    },
    "script": {
      "source": "1 / (l1norm(params.queryVector, 'face_embeddings') + 0.1)",
      "params": {
        "queryVector": anchor
      }
    }
  }
})

res

RequestError: RequestError(400, 'search_phase_execution_exception', 'compile error')