<a href="https://colab.research.google.com/github/LorenzoBellomo/InformationRetrieval/blob/main/notebooks/4_SearchEngineWeaviate.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Weaviate as a Search Engine

In [1]:
!pip install -U weaviate-client
import weaviate
import weaviate.classes.config as wc

Collecting weaviate-client
  Downloading weaviate_client-4.11.0-py3-none-any.whl.metadata (3.6 kB)
Collecting validators==0.34.0 (from weaviate-client)
  Downloading validators-0.34.0-py3-none-any.whl.metadata (3.8 kB)
Collecting authlib<1.3.2,>=1.2.1 (from weaviate-client)
  Downloading Authlib-1.3.1-py2.py3-none-any.whl.metadata (3.8 kB)
Collecting grpcio-tools<2.0.0,>=1.66.2 (from weaviate-client)
  Downloading grpcio_tools-1.70.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.3 kB)
Collecting grpcio-health-checking<2.0.0,>=1.66.2 (from weaviate-client)
  Downloading grpcio_health_checking-1.70.0-py3-none-any.whl.metadata (1.1 kB)
Collecting protobuf<6.0dev,>=5.26.1 (from grpcio-health-checking<2.0.0,>=1.66.2->weaviate-client)
  Downloading protobuf-5.29.3-cp38-abi3-manylinux2014_x86_64.whl.metadata (592 bytes)
Downloading weaviate_client-4.11.0-py3-none-any.whl (350 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m350.1/350.1 kB[0m [31m6.5

In [2]:
import weaviate
from weaviate.classes.query import MetadataQuery
from weaviate.classes.config import Configure, Property, DataType, Tokenization
from weaviate.classes.query import Filter

client = weaviate.connect_to_embedded()

INFO:weaviate-client:Binary /root/.cache/weaviate-embedded did not exist. Downloading binary from https://github.com/weaviate/weaviate/releases/download/v1.26.6/weaviate-v1.26.6-Linux-amd64.tar.gz
INFO:weaviate-client:Started /root/.cache/weaviate-embedded: process ID 404


Let's create a simple collection that has just one field of texts.  

In [None]:
client.collections.delete_all()
client.collections.create(
    name="TestCollection",
    properties=[
        wc.Property(name="text", data_type=wc.DataType.TEXT),
    ]
)

<weaviate.collections.collection.sync.Collection at 0x79be07d587d0>

Here is a list of simple documents that are useful to test some simple queries

In [None]:
sample_docs = [
    {"text": "Trump u.s.a. NATO"},
    {"text": "trump usa N.A.T.O."},
    {"text": "trump u s a NATO"},
    {"text": "the cat sleeps"},
    {"text": "u are a star"}
]

Now we create the collection and we insert the samples

In [None]:
documents = client.collections.get("TestCollection")
for doc in sample_docs:
    documents.data.insert(doc)

Here is how to iterate over all documents in the collection

In [None]:
# retrieve the elements
for doc in documents.iterator():
  print(doc.uuid, " - ", doc.properties)

05fabf65-4b65-43e3-9c5b-6d360cee6680  -  {'text': 'Trump u.s.a. NATO'}
0c2b253f-e05b-4e30-813a-13eaa92ff154  -  {'text': 'the cat sleeps'}
18203cdc-bf70-4e35-a3e1-885323891c68  -  {'text': 'trump usa N.A.T.O.'}
2f1b9720-a35d-43eb-aa00-c5d16bf20e55  -  {'text': 'Trump u.s.a. NATO'}
38ce9058-bbe0-4ea0-bf8e-bfa0d6f2b55f  -  {'text': 'Trump u.s.a. NATO'}
80d4c9c5-71cc-4db3-b4f2-2318d5ee4faf  -  {'text': 'u are a star'}
8f252bb5-5e20-46d0-9e51-ec1c16626f8e  -  {'text': 'trump usa N.A.T.O.'}
b8410731-c4d2-4207-9ed1-15b004b1572f  -  {'text': 'trump u s a NATO'}
d5f38c9f-8de4-42aa-803f-b86af397967a  -  {'text': 'trump u s a NATO'}
dd6fa32f-0aec-4cab-b490-4d9311ebfe3c  -  {'text': 'u are a s_tar'}
df381273-0e10-45f2-bb2d-50f89bfdd6eb  -  {'text': 'trump usa N.A.T.O.'}
ecfbf532-d3c6-46e8-8c4a-ea0f8a91c8c2  -  {'text': 'trump u s a NATO'}
fb6edc15-0079-4aae-91ce-c1b5379a540b  -  {'text': 'the cat sleeps'}
fcdd120c-0a09-4362-b150-e05c86cdcfab  -  {'text': 'the cat sleeps'}


Let's try some simple queries, bm25 is the vectorization textual technique that we saw in lecture 2 (better than TFIDF). This means that the following query is processed textually.

In [None]:
query = "sleep"
response = documents.query.bm25(query=query, return_metadata=MetadataQuery(score=True))
for o in response.objects:
    print("{} - {}".format(round(o.metadata.score*100)/100, o.properties["text"]))

Unfortunately, words are not stemmed, but are lowercased. This is on the roadmap of features that Weaviate plans to support in the future.

Let's also define a function that properly prints the results of a query

In [None]:
def print_query_results(query, prop_name, collection):
  print("QUERY:: {}\n".format(query))
  response = collection.query.bm25(query=query, return_metadata=MetadataQuery(score=True))
  for o in response.objects:
    print("{} - {}".format(round(o.metadata.score*100)/100, o.properties[prop_name]))

In [None]:
print_query_results("TRUMP", "text", documents) #the words are lowercased

QUERY:: TRUMP

0.2 - trump u s a NATO
0.2 - trump u s a NATO
0.2 - trump u s a NATO
0.2 - Trump u.s.a. NATO
0.2 - Trump u.s.a. NATO
0.2 - Trump u.s.a. NATO
0.19 - trump usa N.A.T.O.
0.19 - trump usa N.A.T.O.
0.19 - trump usa N.A.T.O.


In [None]:
print_query_results("Trump", "text", documents) #the words are lowercased

QUERY:: Trump

0.2 - trump u s a NATO
0.2 - trump u s a NATO
0.2 - trump u s a NATO
0.2 - Trump u.s.a. NATO
0.2 - Trump u.s.a. NATO
0.2 - Trump u.s.a. NATO
0.19 - trump usa N.A.T.O.
0.19 - trump usa N.A.T.O.
0.19 - trump usa N.A.T.O.


In [None]:
print_query_results("the", "text", documents) # the stopwords are not present by assuming English

QUERY:: the



Now we define a function that shows some very basic queries, but that are able

In [None]:
def example_queries(prop_name, collection):
    queries = ["She is sleeping", "I sleep", "the usa", "I live in the u.s.a.", "TRUMP"]
    for query in queries:
      print_query_results(query, prop_name, collection)
      print("===============================================================")
      print()

In [None]:
print(sample_docs)
print("\n")
example_queries("text", documents)

[{'text': 'Trump u.s.a. NATO'}, {'text': 'trump usa N.A.T.O.'}, {'text': 'trump u s a NATO'}, {'text': 'the cat sleeps'}, {'text': 'u are a star'}]


QUERY:: She is sleeping


QUERY:: I sleep


QUERY:: the usa

0.6 - trump usa N.A.T.O.
0.6 - trump usa N.A.T.O.
0.6 - trump usa N.A.T.O.

QUERY:: I live in the u.s.a.

0.56 - trump u s a NATO
0.56 - u are a s_tar
0.56 - trump u s a NATO
0.56 - trump u s a NATO
0.56 - Trump u.s.a. NATO
0.56 - Trump u.s.a. NATO
0.56 - Trump u.s.a. NATO
0.28 - u are a star

QUERY:: TRUMP

0.2 - trump u s a NATO
0.2 - trump u s a NATO
0.2 - trump u s a NATO
0.2 - Trump u.s.a. NATO
0.2 - Trump u.s.a. NATO
0.2 - Trump u.s.a. NATO
0.19 - trump usa N.A.T.O.
0.19 - trump usa N.A.T.O.
0.19 - trump usa N.A.T.O.



But how is the input really treated? How is it tokenized?

**TOKENIZATION OPTIONS**
* word: alphanumeric, lowercased tokens (default tokenizer for Weaviate)
* lowercase: lowercased tokens
* whitespace: whitespace-separated, case-sensitive tokens
* the entire value of the property is treated as a single token

In [None]:
client.collections.create(
    name="TestWhitespace",
    properties=[
        wc.Property(name="text", data_type=wc.DataType.TEXT, tokenization=Tokenization.WHITESPACE),
    ],
)

<weaviate.collections.collection.sync.Collection at 0x79be19c065d0>

In [None]:
documents = client.collections.get("TestWhitespace")
for doc in sample_docs:
    documents.data.insert(doc)

In [None]:
print_query_results("the", "text", documents) # stopword is found

QUERY:: the

0.68 - the cat sleeps


In [None]:
print_query_results("Trump", "text", documents) # no lowercasing, thus not find "trump"

QUERY:: Trump

0.68 - Trump u.s.a. NATO


In [None]:
print_query_results("TRUMP", "text", documents) # no lowercasing, thus not find "trump" and "Trump"

QUERY:: TRUMP



In [None]:
print_query_results("u", "text", documents) # whitespace does not split "u.s.a." which is one token

QUERY:: u

0.38 - u are a star
0.34 - trump u s a NATO


In [None]:
print_query_results("u.s.a.", "text", documents)

QUERY:: u.s.a.

0.68 - Trump u.s.a. NATO


In [None]:
example_queries("text", documents)

QUERY:: She is sleeping


QUERY:: I sleep


QUERY:: the usa

0.68 - trump usa N.A.T.O.
0.68 - the cat sleeps

QUERY:: I live in the u.s.a.

0.68 - Trump u.s.a. NATO
0.68 - the cat sleeps

QUERY:: TRUMP




## Properties
Let's now add some simple properties to our index. As of now we only handled the "text" property, containing some simple textual snippets.

In [None]:
!wget https://raw.githubusercontent.com/LorenzoBellomo/InformationRetrieval/refs/heads/main/data/5articles.json
import json

with open("5articles.json", 'r') as f:
  articles = json.load(f)

--2025-02-21 15:47:19--  https://raw.githubusercontent.com/LorenzoBellomo/InformationRetrieval/refs/heads/main/data/5articles.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12566 (12K) [text/plain]
Saving to: ‘5articles.json’


2025-02-21 15:47:20 (5.55 MB/s) - ‘5articles.json’ saved [12566/12566]



In [None]:
articles[0]

{'title': 'American Airlines orders 60 Overture supersonic jets',
 'maintext': "The revival of supersonic passenger travel, thought to be long dead with the demise of Concorde nearly two decades ago, could be about to take wing as American Airlines has put in an order for 60 aircraft capable of flying at 1.7 times the speed of sound. \nBoom is a start-up based in Denver, Colorado, whose development of Overture, an ultra-fast successor to Concorde that seats 65 to 88 passengers, is so advanced that it showed off designs at last month's Farnborough air show.",
 'date': '2022-08-18',
 'source': 'The New York Times'}

In [None]:
client.collections.create(
    name="TestProperties",
    properties=[
        wc.Property(name="maintext", data_type=wc.DataType.TEXT, tokenization=Tokenization.WORD),
        wc.Property(name="title", data_type=wc.DataType.TEXT, tokenization=Tokenization.LOWERCASE),
    ],
)

<weaviate.collections.collection.sync.Collection at 0x79be07adadd0>

In [None]:
documents = client.collections.get("TestProperties")
for doc in articles:
    documents.data.insert({"maintext": doc["maintext"], "title": doc["title"]})

In [None]:
for doc in documents.iterator():
  print(doc.uuid, " - ", doc.properties)

aa558acd-6e3c-45a4-8fde-15af3be8acff  -  {'maintext': 'Luke O\'Reilly with his mother Janet O\'Brien Luke O\'Reilly Jack Hall Ellis The Metro One Bar in Tallaght, where Hall Ellis had earlier accused Luke O\'Reilly of talking to his girlfriend\nThe mother of a young Dublin man who lost his life following a one-punch attack hopes the sentence her son\'s killer was handed down will act as a deterrent for others.\nJack Hall Ellis (21) was yesterday jailed for five years after pleading guilty to the manslaughter of Luke O\'Reilly in Tallaght almost two years ago.\nHall Ellis, who was on bail at the time over an alleged violent disorder incident, struck the 20-year-old with a single punch, which resulted in Mr O\'Reilly hitting his head on the ground and suffering fatal injuries.\nJudge Melanie Greally remarked that single-punch assaults leading to traumatic brain injuries are recurring on the courts\' case load.\nLast night, Mr O\'Reilly\'s mother, Janet O\'Brien, told the Herald she was s

In [None]:
print_query_results("mother", "title", documents) # prints the score and the title of the retrieved article

QUERY:: mother

0.52 - 'One-punch killer's sentence will make others think twice'
0.3 - Leclerc dedicates win to Hubert


In [None]:
print_query_results("cars", "title", documents) # There is no stemming, indeed, thus the next article is not returned

QUERY:: cars

0.48 - Leclerc dedicates win to Hubert


In [None]:
print_query_results("car", "title", documents) # The score can be larger than 1

QUERY:: car

1.87 - Gunman opens fire on car just metres from scene of Hamid Sanambar murder


Say that you now want to consider some words as "stopwords", that the system does not consider as such by default

In [None]:
print_query_results("victory", "title", documents) #As above, but below we classify it as a stopword

documents.config.update(inverted_index_config=wc.Reconfigure.inverted_index(stopwords_additions=["victory"]))

print("\n")
print_query_results("victory", "title", documents)

QUERY:: victory



QUERY:: victory



But fields in the query are not all "born equal". Some are more important than others (e.g., title)

In [None]:
response = documents.query.bm25(
    query="race",
    return_metadata=MetadataQuery(score=True)
)
print("BEFORE FIELD BOOSTING: (query = race)\n")
for o in response.objects:
    print("{} - {}".format(round(o.metadata.score*100)/100, o.properties["title"]))

BEFORE FIELD BOOSTING: (query = race)
1.27 - Conte: 'Chelsea are not in the race to sign Sanchez'
0.54 - Leclerc dedicates win to Hubert


In [None]:
response = documents.query.bm25(
    query="race",
    query_properties=["title^2", "maintext"],
    return_metadata=MetadataQuery(score=True)
)
print("AFTER FIELD BOOSTING: (query = race)\n")
for o in response.objects:
    print("{} - {}".format(round(o.metadata.score*100)/100, o.properties["title"]))

AFTER FIELD BOOSTING: (query = race)
1.43 - Conte: 'Chelsea are not in the race to sign Sanchez'
0.54 - Leclerc dedicates win to Hubert


Add some basic filtering

In [None]:
response = documents.query.bm25(
    query="mother",
    return_metadata=MetadataQuery(score=True)
)
print("BEFORE FILTERING: (query = mother)\n")
for o in response.objects:
    print("{} - {}".format(round(o.metadata.score*100)/100, o.properties["title"]))

BEFORE FILTERING: (query = mother)

0.52 - 'One-punch killer's sentence will make others think twice'
0.3 - Leclerc dedicates win to Hubert


In [None]:
response = documents.query.bm25(
    query="mother",
    filters=Filter.by_property("title").contains_any(["Leclerc", "formula"]),
    return_metadata=MetadataQuery(score=True)
)
print("AFTER FILTERING: (query = mother)\n")
for o in response.objects:
    print("{} - {}".format(round(o.metadata.score*100)/100, o.properties["title"]))

AFTER FILTERING: (query = mother)

0.3 - Leclerc dedicates win to Hubert


Let's see what happens when we also add dates as properties

In [None]:
client.collections.create(
    name="TestDate",
    properties=[
        wc.Property(name="maintext", data_type=wc.DataType.TEXT, tokenization=Tokenization.WORD),
        wc.Property(name="title", data_type=wc.DataType.TEXT, tokenization=Tokenization.LOWERCASE),
        wc.Property(name="date", data_type=wc.DataType.DATE)
    ]
)

<weaviate.collections.collection.sync.Collection at 0x79be198aa250>

[All property types](https://weaviate.io/developers/weaviate/config-refs/datatypes)

In [None]:
from datetime import timezone, datetime
documents = client.collections.get("TestDate")
for doc in articles:
    documents.data.insert({"maintext": doc["maintext"], "title": doc["title"], "date": datetime.strptime(doc["date"], "%Y-%m-%d").replace(tzinfo=timezone.utc)})

In [None]:
for doc in documents.iterator():
  print(doc.uuid, " - ", doc.properties['date'], '  ', doc.properties['title'])
  # print(doc.uuid, " - ", doc.properties)

6cb02d0b-7968-422a-b7f5-5aeb17ec48d1  -  2018-01-23 00:00:00+00:00    Conte: 'Chelsea are not in the race to sign Sanchez'
c7dae885-b1bc-4036-9a3e-41fc8ebf8dc6  -  2019-06-07 00:00:00+00:00    Gunman opens fire on car just metres from scene of Hamid Sanambar murder
c8fe1507-ec38-4a20-96ad-5ba46bb0f609  -  2022-08-18 00:00:00+00:00    American Airlines orders 60 Overture supersonic jets
e7baca4c-264e-4be9-9bee-ac0b09150218  -  2019-06-29 00:00:00+00:00    'One-punch killer's sentence will make others think twice'
fca2ac22-1cdd-492d-8a14-2d75eadb640c  -  2019-09-01 00:00:00+00:00    Leclerc dedicates win to Hubert


In [None]:
response = documents.query.bm25(
    query="mother",
    return_metadata=MetadataQuery(score=True)
)
print("BEFORE FILTERING: (query = mother)\n")
for o in response.objects:
    print("{} - {}".format(round(o.metadata.score*100)/100, o.properties["title"]))

BEFORE FILTERING: (query = mother)

0.52 - 'One-punch killer's sentence will make others think twice'
0.3 - Leclerc dedicates win to Hubert


In [None]:
response = documents.query.bm25(
    query="mother",
    filters=Filter.by_property("date").greater_or_equal(datetime.strptime("2019-08-15", "%Y-%m-%d").replace(tzinfo=timezone.utc)),
    return_metadata=MetadataQuery(score=True)
)
print("AFTER FILTERING: (query = mother)\n")
for o in response.objects:
    print("{} - {}".format(round(o.metadata.score*100)/100, o.properties["title"]))

AFTER FILTERING: (query = mother)

0.3 - Leclerc dedicates win to Hubert


Some advanced features

In [None]:
response = documents.query.bm25(query="sport", return_metadata=MetadataQuery(score=True))
for o in response.objects:
    print("{} - {}".format(round(o.metadata.score*100)/100, o.properties["title"]))

[https://aistudio.google.com/app/apikey](https://aistudio.google.com/app/apikey)

In [None]:
# Unfortunately, we cannot use all the vectorizer modules that are present in Weaviate. Here is a list of the ones that are available
client.get_meta()

{'hostname': 'http://127.0.0.1:8079',
 'modules': {'generative-openai': {'documentationHref': 'https://platform.openai.com/docs/api-reference/completions',
   'name': 'Generative Search - OpenAI'},
  'qna-openai': {'documentationHref': 'https://platform.openai.com/docs/api-reference/completions',
   'name': 'OpenAI Question & Answering Module'},
  'ref2vec-centroid': {},
  'reranker-cohere': {'documentationHref': 'https://txt.cohere.com/rerank/',
   'name': 'Reranker - Cohere'},
  'text2vec-cohere': {'documentationHref': 'https://docs.cohere.ai/embedding-wiki/',
   'name': 'Cohere Module'},
  'text2vec-huggingface': {'documentationHref': 'https://huggingface.co/docs/api-inference/detailed_parameters#feature-extraction-task',
   'name': 'Hugging Face Module'},
  'text2vec-openai': {'documentationHref': 'https://platform.openai.com/docs/guides/embeddings/what-are-embeddings',
   'name': 'OpenAI Module'}},
 'version': '1.26.6'}

Let's use COHERE [https://dashboard.cohere.com/api-keys](https://dashboard.cohere.com/api-keys)

In [None]:
## You need first to create a KEY !!!!
from google.colab import userdata

client.close()
cohere_key = userdata.get('COHERE_KEY') # MAKE SURE YOU CREATED A KEY
headers = {
    "X-Cohere-Api-Key": cohere_key,
}
client = weaviate.connect_to_embedded(headers=headers)

INFO:weaviate-client:Started /root/.cache/weaviate-embedded: process ID 10315


In [None]:
client.collections.delete_all()
client.collections.create(
    name="TestVectorizer",
    properties=[
        wc.Property(name="maintext", data_type=wc.DataType.TEXT, tokenization=Tokenization.WORD),
        wc.Property(name="title", data_type=wc.DataType.TEXT, tokenization=Tokenization.LOWERCASE),
    ],
    vectorizer_config=[
        Configure.NamedVectors.text2vec_cohere(
            name="maintext_vector",
            source_properties=["maintext"],
            #model="embed-multilingual-light-v3.0"
        )
    ],
    generative_config=Configure.Generative.openai()
)

<weaviate.collections.collection.sync.Collection at 0x79be1a057cd0>

In [None]:
documents = client.collections.get("TestVectorizer")
for doc in articles:
    documents.data.insert({"maintext": doc["maintext"], "title": doc["title"]})

In [None]:
print("pure syntactical search: 'sport'\n")
response = documents.query.bm25(query="sport", return_metadata=MetadataQuery(score=True))
for o in response.objects:
    print("{} - {}".format(round(o.metadata.score*100)/100, o.properties["title"]))

pure syntactical search: 'sport'



In [None]:
print("pure vector search: 'sport'\n")
# NOTE THAT WE ALSO NEED THE PARAMETER DISTANCE
response = documents.query.near_text(query="sport", return_metadata=MetadataQuery(score=True, distance=True), limit=3)
for o in response.objects:
  print("{} - {}".format(round(o.metadata.distance*100)/100, o.properties["title"]))

pure vector search: 'sport'

0.61 - Leclerc dedicates win to Hubert
0.61 - Gunman opens fire on car just metres from scene of Hamid Sanambar murder
0.65 - Conte: 'Chelsea are not in the race to sign Sanchez'


In [None]:
print("pure syntactical search: 'race'\n")
response = documents.query.bm25(query="race", return_metadata=MetadataQuery(score=True))
for o in response.objects:
    print("{} - {}".format(round(o.metadata.score*100)/100, o.properties["title"]))

pure syntactical search: 'race'

1.27 - Conte: 'Chelsea are not in the race to sign Sanchez'
0.54 - Leclerc dedicates win to Hubert


In [None]:
print("pure vector search: 'race'\n")
# NOTE THAT WE ALSO NEED THE PARAMETER DISTANCE
response = documents.query.near_text(query="race", return_metadata=MetadataQuery(score=True, distance=True), limit=3)
for o in response.objects:
  print("{} - {}".format(round(o.metadata.distance*100)/100, o.properties["title"]))

pure vector search: 'race'

0.61 - Leclerc dedicates win to Hubert
0.61 - Gunman opens fire on car just metres from scene of Hamid Sanambar murder
0.69 - Conte: 'Chelsea are not in the race to sign Sanchez'


In [None]:
print("hybrid search: 'race'")
response = documents.query.hybrid(query="race", alpha=0.5, return_metadata=MetadataQuery(score=True, explain_score=True), limit=3)
for o in response.objects:
  print("{} - {} [{}]".format(round(o.metadata.score*100)/100, o.properties["title"],  o.metadata.explain_score.strip().replace("\n", '')))

hybrid search: 'sport'
0.59 - Conte: 'Chelsea are not in the race to sign Sanchez' [Hybrid (Result Set keyword,bm25) Document aedfa18d-18d1-44a8-b357-7e7bec0bb53a: original score 1.2714014, normalized score: 0.5 - Hybrid (Result Set vector,hybridVector) Document aedfa18d-18d1-44a8-b357-7e7bec0bb53a: original score 0.31189978, normalized score: 0.09428425]
0.5 - Leclerc dedicates win to Hubert [Hybrid (Result Set keyword,bm25) Document 3af5a11a-2d8d-4b38-9734-6bad777f8db9: original score 0.5364737, normalized score: 0 - Hybrid (Result Set vector,hybridVector) Document 3af5a11a-2d8d-4b38-9734-6bad777f8db9: original score 0.39161265, normalized score: 0.5]
0.48 - Gunman opens fire on car just metres from scene of Hamid Sanambar murder [Hybrid (Result Set vector,hybridVector) Document c8c149c6-4e18-4952-acd4-01c15d60e216: original score 0.38676548, normalized score: 0.47532928]


Now let's try to include some generative AI prompts to this query (let's add context to the entities in the news, or let's translate them in Italian)