### Open Source Embeddings (Huggingface)

In [1]:
from dotenv import load_dotenv
import os

app_dir = os.path.join(os.getcwd(), "app")
load_dotenv(os.path.join(app_dir, ".env"))


with open("./data/restaurant.txt") as f:
    raw_data = f.read()

In [2]:
from langchain_text_splitters import CharacterTextSplitter

text_splitter = CharacterTextSplitter(
    separator="\n",
    chunk_size=200,
    chunk_overlap=20,
    length_function=len,
    is_separator_regex=False,
)
texts = text_splitter.split_text(raw_data)
texts

Created a chunk of size 329, which is longer than the specified 200
Created a chunk of size 331, which is longer than the specified 200
Created a chunk of size 291, which is longer than the specified 200
Created a chunk of size 376, which is longer than the specified 200
Created a chunk of size 291, which is longer than the specified 200


['In the charming streets of Palermo, tucked away in a quaint alley, stood Chef Amico, a restaurant that was more than a mere eateryâ€”it was a slice of Sicilian heaven. Founded by Amico, a chef whose name was synonymous with passion and creativity, the restaurant was a mosaic of his lifeâ€™s journey through the flavors of Italy.',
 'Chef Amicoâ€™s doors opened to a world where the aromas of garlic and olive oil were as welcoming as a warm embrace. The walls, adorned with photos of Amicoâ€™s travels and family recipes, spoke of a rich culinary heritage. The chatter and laughter of patrons filled the air, creating a symphony as delightful as the dishes served.',
 "One evening, as the sun cast a golden glow over the city, a renowned food critic, Elena Rossi, stepped into Chef Amico. Her mission was to uncover the secret behind the restaurant's growing fame. She was greeted by Amico himself, whose eyes sparkled with the joy of a man who loved his work.",
 'Elena was led to a table adorned

In [4]:
#  pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("paraphrase-MiniLM-L6-v2")

embeddings_huggingface = model.encode(texts)

In [8]:
len(embeddings_huggingface[0])

384

In [9]:
embeddings_huggingface[0]

array([ 1.26175299e-01,  4.17751372e-01, -1.12065554e-01,  4.60001715e-02,
       -3.20104927e-01, -2.77941644e-01,  1.84422106e-01, -1.41149282e-01,
        9.47148502e-02, -1.57854911e-02,  2.84208089e-01, -2.01127931e-01,
       -5.12925945e-02,  1.25655100e-01,  2.72517800e-01, -3.62387359e-01,
        2.35507593e-01, -8.82827267e-02,  2.03624487e-01,  4.81934659e-02,
       -3.39868218e-02, -1.03866860e-01, -9.32260379e-02,  2.22075149e-01,
        3.85922343e-01, -1.90588236e-01,  3.89328927e-01,  2.90763795e-01,
       -6.22040778e-02, -6.92467168e-02,  1.97223008e-01, -1.65435120e-01,
        1.78786322e-01, -2.32760683e-02, -1.31499574e-01,  2.63680458e-01,
       -7.40772709e-02, -2.39875525e-01,  1.49779111e-01,  2.47150436e-02,
        1.14711925e-01,  1.52374193e-01, -1.07586920e-01, -2.28516787e-01,
        1.58248827e-01, -1.97336361e-01,  1.25389859e-01,  1.16207071e-01,
       -3.07202023e-02, -1.14178076e-01, -5.13785779e-01,  5.75091243e-02,
        2.72502713e-02, -

### OpenAI Embeddings

In [22]:
from langchain_openai import OpenAIEmbeddings

# embeddings = OpenAIEmbeddings()
embeddings = OpenAIEmbeddings(model="text-embedding-3-small", dimensions=1536)

In [23]:
vectors = [embeddings.embed_query(text) for text in texts]

In [24]:
vectors

[[-0.01118954736739397,
  -0.056152839213609695,
  -0.034024428576231,
  -0.00023679509467910975,
  0.054739903658628464,
  -0.06973526626825333,
  -0.000929376226849854,
  -0.0019228473538532853,
  -6.398363439075183e-06,
  -0.0760251134634018,
  -0.003671926213428378,
  -0.04986299201846123,
  0.0036206503864377737,
  -0.01532580517232418,
  -0.014767467975616455,
  0.003811510745435953,
  -0.0066202920861542225,
  0.04136258363723755,
  0.05551473796367645,
  -0.015097912400960922,
  0.057292304933071136,
  0.01673874258995056,
  -0.06412909924983978,
  0.009474651888012886,
  0.030674399808049202,
  0.006808303762227297,
  -0.025068232789635658,
  0.03808092325925827,
  0.026002593338489532,
  -0.006933645345270634,
  0.026936955749988556,
  -0.02923867478966713,
  0.036280568689107895,
  -0.03732887655496597,
  0.01684129424393177,
  0.011907409876585007,
  0.027483897283673286,
  -0.04225136712193489,
  -0.017228711396455765,
  -0.019143013283610344,
  -0.04168163239955902,
  0.0

In [25]:
len(vectors[0])

1536