Skip to content

Cannot debug similarity search #3378

Closed
Closed
@ssdidis

Description

@ssdidis

am trying to build a similarity search in python, cannot debug the function:

def perform_similarity_search(query_text, index, embeddings, top_k=5):
"""Perform similarity search in the FAISS index for a given query text."""
# Use the embeddings object to embed the query_text into a vector.
# Ensure the text is passed as a list and the result is accessed correctly.
query_vector = embeddings.encode([query_text])
# Reshape the query_vector for compatibility with FAISS search method if necessary.
# FAISS expects the query vector to be a 2D array.
if len(query_vector.shape) == 1:
query_vector = query_vector.reshape(1, -1)

# Search the index using the reshaped query_vector.
distances, indices = index.search(query_vector, top_k)  # Search the index for the top_k closest vectors
return distances, indices

def run_indexing_pipeline():
documents = fetch_documents(documents_dir)
text_chunks = divide_documents_into_text_chunks(documents)
embeddings_model = prepare_embeddings()
faiss_index = build_and_store_faiss_index(text_chunks, embeddings_model, faiss_db_path)

# Example query for testing purposes
query = "Enter some example text here"
distances, indices = perform_similarity_search(query, faiss_index, embeddings_model)
print("Distances:", distances)
print("Indices:", indices)

def perform_similarity_search(query_text, index, embeddings, top_k=5):
"""Perform similarity search in the FAISS index for a given query text."""
# Use the embeddings object to embed the query_text into a vector.
# Ensure the text is passed as a list and the result is accessed correctly.
query_vector = embeddings.encode([query_text])
# Reshape the query_vector for compatibility with FAISS search method if necessary.
# FAISS expects the query vector to be a 2D array.
if len(query_vector.shape) == 1:
query_vector = query_vector.reshape(1, -1)

# Search the index using the reshaped query_vector.
distances, indices = index.search(query_vector, top_k)  # Search the index for the top_k closest vectors
return distances, indices

ERRORS:
Traceback (most recent call last):
File "/home/ubuntu/new_d.py", line 61, in
run_indexing_pipeline()
File "/home/ubuntu/new_d.py", line 56, in run_indexing_pipeline
distances, indices = perform_similarity_search(query, faiss_index, embeddings_model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/new_d.py", line 36, in perform_similarity_search
query_vector = embeddings.encode([query_text])

This is error being shown, pls let me know how I can correct it

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions