# Introduction to Grover Vector Experiment. 
This is a primitive experiment as aforementioned in the README.md to use Grover's Algorithm for Vector Similarity Search. These series of experiments 
attempt to speed up Gaia's Vector Search process. Though these were mostly for fun, prototype versions of using Grover's Algorithm with Top-K scoring did make it into the first versions of Gaia though as of v1.0.4 for the sake of an decently operational demo to forgo the custom Top-K scoring system.

# Replication 
To replicate, we must first install Qiskit (Tests were done on Quantum Hardware (IOQ)). Bless the people at AWS who let Free Tier members have 1 Hour of Free Circuit Simulator Time.

In [1]:
!pip3 install qiskit qiskit-aer scikit-learn numpy 

^C


DEPRECATION: torchsde 0.2.5 has a non-standard dependency specifier numpy>=1.19.*; python_version >= "3.7". pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of torchsde or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063


Collecting qiskit
  Downloading qiskit-0.45.1-py3-none-any.whl.metadata (12 kB)
Collecting qiskit-terra==0.45.1 (from qiskit)
  Downloading qiskit_terra-0.45.1-cp38-abi3-win_amd64.whl.metadata (12 kB)
Collecting rustworkx>=0.13.0 (from qiskit-terra==0.45.1->qiskit)
  Downloading rustworkx-0.13.2-cp311-cp311-win_amd64.whl.metadata (10 kB)
Collecting ply>=3.10 (from qiskit-terra==0.45.1->qiskit)
  Downloading ply-3.11-py2.py3-none-any.whl (49 kB)
     ---------------------------------------- 0.0/49.6 kB ? eta -:--:--
     ---------------------------------------- 49.6/49.6 kB 2.6 MB/s eta 0:00:00
Collecting dill>=0.3 (from qiskit-terra==0.45.1->qiskit)
  Downloading dill-0.3.7-py3-none-any.whl.metadata (9.9 kB)
Collecting stevedore>=3.0.0 (from qiskit-terra==0.45.1->qiskit)
  Downloading stevedore-5.1.0-py3-none-any.whl.metadata (2.2 kB)
Collecting pbr!=2.1.0,>=2.0.0 (from stevedore>=3.0.0->qiskit-terra==0.45.1->qiskit)
  Downloading pbr-6.0.0-py2.py3-none-any.whl.metadata (1.3 kB)
Downlo

Import all neccessary libraries as such. This imports qiskit, numpy and sk-learn

In [5]:
from qiskit import Aer, transpile, assemble
from qiskit.algorithms import AmplificationProblem, Grover
from qiskit.circuit.library import Diagonal
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer

  from qiskit.algorithms import AmplificationProblem, Grover


Now, these functions below just convert Text to Vector Embeddings, Prepare the Quantum Oracle for Grover's Algorithm for Top K and Constructs the QC Circut for Simulation. In addition, it also adjusts vectors to prevent CircuitErrors if the number of Vectors aren't a power of 2.

In [21]:
# Step 1: Convert Text to Vector Embeddings
def convert_texts_to_vectors(texts):
    vectorizer = TfidfVectorizer()
    X = vectorizer.fit_transform(texts)
    return X.toarray()

# Function to adjust the number of vectors to a power of 2
def adjust_vectors_to_power_of_2(vectors):
    next_power_of_2 = 2 ** np.ceil(np.log2(len(vectors)))
    padding_length = int(next_power_of_2 - len(vectors))
    if padding_length > 0:
        padding = np.zeros((padding_length, vectors.shape[1]))
        vectors = np.vstack([vectors, padding])
    return vectors

# Step 2: Prepare Oracle for Grover's Algorithm for Top-K Scoring
def prepare_oracle_top_k(query_vector, vectors, k):
    similarities = np.dot(vectors, query_vector)
    top_k_indices = np.argsort(similarities)[-k:]
    diag_elements = np.ones(len(vectors))
    diag_elements[top_k_indices] = -1
    oracle = Diagonal(diag_elements)
    return oracle, top_k_indices

#Step 3: Run Grover's Algorithm for Top-K Scoring
def run_grovers_algorithm_top_k(query_vector, vectors, k):
    oracle, top_k_indices = prepare_oracle_top_k(query_vector, vectors, k)
    problem = AmplificationProblem(oracle, is_good_state=lambda x: x in top_k_indices)
    grover = Grover()
    backend = Aer.get_backend('qasm_simulator')
    qc = grover.construct_circuit(problem, power=1)
    qc.measure_all()
    tqc = transpile(qc, backend)
    qobj = assemble(tqc)
    result = backend.run(qobj).result()
    measurements = result.get_counts(qc)  # Retrieve counts for the specific circuit
    most_frequent = max(measurements, key=measurements.get)
    found_index = int(most_frequent, 2)
    return found_index



I've went ahead here and sent up some fake GreenQuest strings just to simulate real life queries. You can add as many as you realistically want. To do comparision, the first string of the list will be the test string to compare to other strings.

I would like to also forewarn people who are running this code that you may get a circuit error depending on the number of vectors. Just ensure that the number of vectors is a power of 2. I've included a function to help default that, just something to note.

In [3]:
greenquest_text_strings = [
    "How can I earn points for recycling at home with GreenQuest?",
    "GreenQuest tips for reducing energy consumption in the office",
    "Sustainable transportation options supported by GreenQuest",
    "GreenQuest community initiatives for local environmental impact",
    "Using GreenQuest to track and improve my carbon footprint",
    "GreenQuest partnerships with renewable energy providers",
    "Eco-friendly product recommendations from GreenQuest",
    "GreenQuest's role in promoting urban green spaces",
    "Water conservation strategies with GreenQuest's guidance",
    "Impact of GreenQuest on corporate sustainability practices"
]

Now, this is the fun part. I've defined the code below to define the Top Vectors. The code below should vectorize everything and run Grover's algorithm for Top-K for scoring. Essentially, primitive Quantum Vector Searching using Top-K scoring. 

In [22]:
# Convert texts to vector embeddings and adjust to power of 2
vectors = convert_texts_to_vectors(greenquest_text_strings)
vectors = adjust_vectors_to_power_of_2(vectors)

# Choose a query vector (for example, the first vector)
query_vector = vectors[0]

# Define the number of top vectors to find (K)
K = 3

# Run Grover's Algorithm for Top-K Scoring
top_k_results = []
for _ in range(K):
    found_index = run_grovers_algorithm_top_k(query_vector, vectors, K)
    top_k_results.append(found_index)

# Output the result
print("Indices of the top K found vectors:", top_k_results)

Indices of the top K found vectors: [8, 5, 0]


  result = backend.run(qobj).result()
