### First we begin importing the Python libraries.
#### <a href="https://pennylane.ai/">Pennylane</a> is an awesome resource for Quantum Python packages.
#### The text embedding model was used from <a href="https://platform.openai.com/">OpenAi </a>to vectorize data.

In [1]:
# Quantum computing python libraries
import pennylane as qml 
from pennylane import numpy as np
from pennylane.templates import AmplitudeEmbedding, BasisEmbedding, AngleEmbedding

# LLM/Openai python libaraies
from openai import OpenAI
import pandas as pd
client = OpenAI(api_key="")


### We used the **cosine similarity** formula to compare vector embeddings

$$\text{cosine\_similarity}(\mathbf{a}, \mathbf{b}) = \frac{\mathbf{a} \cdot \mathbf{b}}{\|\mathbf{a}\| \|\mathbf{b}\|}$$

In [2]:
# Functions for LLM portion of test

def generate_embedding(text,dimensions=25): # Converts the input text into a vector embedding
    response = client.embeddings.create(input=text,model="text-embedding-3-small",dimensions=dimensions)
    return np.array(response.data[0].embedding)
    
def cosine_similarity(u, v) -> float: # Performs cosine_similarity between the vectorized input text and the vectorized data
    dot_product = np.dot(u, v)
    norm_u = np.linalg.norm(u)
    norm_v = np.linalg.norm(v)
    return dot_product / (norm_u * norm_v)

def show_similarities(input_text,df) -> list: # Returns a list of th first 5 most common elements from the cosine similarity
    input_embedding = generate_embedding(input_text,1536)
    similarites = {key:cosine_similarity(input_embedding,value) for key,value in df.loc[:,["ProductId","embedding"]].values.reshape(-1,2)}
    return sorted(similarites,key=similarites.get,reverse=True)[:5]
    

In [3]:
# Read sample csv file to Data Frame
df = pd.read_csv("fine_food_reviews_with_embeddings_1k.csv") 

# Embeddings were read as strings, convert to float
df["embedding"] = df["embedding"].apply(lambda x: np.array([float(val) for val in x[1:-1].split(",")]))

In [4]:
# Result
df["embedding"]

0      [0.03599238395690918, -0.02116263099014759, -0...
1      [-0.07042013108730316, -0.03175969794392586, -...
2      [0.05692615360021591, -0.005402443464845419, 0...
3      [-0.011223138310015202, -0.049720242619514465,...
4      [0.05692615360021591, -0.005402443464845419, 0...
                             ...                        
995    [-0.04803164303302765, 0.04621649533510208, 0....
996    [0.02654704451560974, -0.027484629303216934, -...
997    [-0.011052397079765797, -0.029021456837654114,...
998    [-0.0058358414098620415, 0.021213747560977936,...
999    [0.019206926226615906, -0.019285108894109726, ...
Name: embedding, Length: 1000, dtype: object

In [18]:
generate_embedding("")

tensor([ 0.0937627 , -0.21216166, -0.05695323,  0.08791763,  0.04480417,
         0.08753118, -0.31843573,  0.29969284, -0.08279715,  0.09236182,
         0.0506251 ,  0.19998845, -0.25467128, -0.09400424,  0.12405081,
         0.38625789, -0.27727866,  0.16491802, -0.04731611,  0.11380985,
         0.45330715,  0.09023634, -0.10173325, -0.18713894,  0.24462356], requires_grad=True)

In [5]:
df.columns

Index(['Unnamed: 0', 'ProductId', 'UserId', 'Score', 'Summary', 'Text',
       'combined', 'n_tokens', 'embedding'],
      dtype='object')

In [19]:
# Lets test it out!!

for i,ID in enumerate(show_similarities("York College",df)):
    similar_text = df[df["ProductId"] == ID]["Text"].values
    print(f"{i+1}) {similar_text[0]}\n")
    

1) I absolutely love this hot cocoa! Our office purchased a Keurig brewer about a year ago, and while the brewer is fancy and very nice, I never had an opportunity to use it until now. You see, I'm not a coffee drinker at all. Coffee makes me jittery all day and I feel like my bones and veins are vibrating. Not what I need when I work with high schoolers all day! This hot chocolate k-cup is a perfect treat for me, especially now that the mornings in Southern California are a little cooler and crisper. The chocolate flavor is well balanced, not too rich and definitely not too "pasty". I hate it when the powder mixes have leave the residue in the bottom of the cup. I usually make my hot chocolate with milk when I use powder, but with this k-cup, water is just fine. I definitely plan on buying more all year!

2) Great coffee!  Love all Green Mountain coffee and all the wonderful flavors.  Would and do recommend this coffee to all my friends.

3) I purchased this to send to my son who's aw

# Now with <span style="color: #a742e8;">Quantum</span>

## In this test we will use the angle encoding to embed the vectorized data

#### We will use this to compare what is and is not a fruit

In [7]:
# Functions for Quantum Embedding portion of test

# Amplitude Encoding
wires = range(25) # Quantum states >25 will freeze my computer

amp_dev = qml.device("default.qubit",wires) # Creating the quantum system for Amplitude Embedding
angle_dev = qml.device("default.qubit",wires) # Creating the quantum system for Angle Embedding

# Pass in encoder function object 
@qml.qnode(amp_dev) 
def amp_encoder(data):
    qml.AmplitudeEmbedding(data,wires,pad_with=0,normalize=True)
    return qml.state()

## Angle Encoding
@qml.qnode(angle_dev)
def angle_encoder(data):
    qml.AngleEmbedding(features=data,wires=wires,rotation="X")
    return qml.state()

def QuantumEmbedder(term): # Converts a string or a collection of strings into quantum embeddings
    if "__iter__" in dir(term) and not isinstance(term,str):
        embedding = list(map(generate_embedding,term))
        embedding = list(map(amp_encoder,embedding))
    else:
        embedding = amp_encoder(generate_embedding(str(term)))
    return embedding

def quantum_similarity(compare,comparand): # Perform comparison 
    test = {key:cosine_similarity(comparand.get(compare),val) for key,val in comparand.items()}
    return sorted(test,key=test.get,reverse=True)[1:6]
    
    
    

In [8]:
terms = ["homework","tiktok","Essay","ChatGPT","All-Nighter","Energy Drinks","Finals Week","Stress Eating","Midterm","Study Break"]

In [12]:
#embeddings = list(map(generate_embedding,concepts))
#embeddings = list(map(amp_encoder,embeddings))

embeddings = QuantumEmbedder("terms")

In [13]:
results = dict(zip(terms,embeddings))
results

{'homework': tensor(-0.11038017+0.j, requires_grad=True),
 'tiktok': tensor(0.12338762+0.j, requires_grad=True),
 'Essay': tensor(0.18366161+0.j, requires_grad=True),
 'ChatGPT': tensor(-0.12302122+0.j, requires_grad=True),
 'All-Nighter': tensor(0.00472608+0.j, requires_grad=True),
 'Energy Drinks': tensor(0.04161012+0.j, requires_grad=True),
 'Finals Week': tensor(-0.09535747+0.j, requires_grad=True),
 'Stress Eating': tensor(-0.12494485+0.j, requires_grad=True),
 'Midterm': tensor(0.09434985+0.j, requires_grad=True),
 'Study Break': tensor(-0.01931653+0.j, requires_grad=True)}

In [21]:
#cosine_similarity(results["Finals Week"],results["All-Nighter"])
for i,res in enumerate(quantum_similarity("Finals Week",results)):
    print(f"{i+1}) {res}") 


1) Finals Week
2) Stress Eating
3) Study Break
4) homework
5) tiktok


### Not that great since I could only work with maximun 25 dimenstions, but hopefully will improve overtime when more progress is made in the Quantum space.