✅ What This Code Does Well

* Document and Query Embeddings:
    * Mocked 3 documents with distinct orthogonal embeddings.
    * The query embedding is close to the third document, which makes testing predictable

* Cosine Similarity for Retrieval:
    * Uses cosine_similarity from sklearn.metrics.pairwise — a solid choice for comparing vector similarities

* Retrieval Logic:
    * Computes cosine similarity for each document.
    * Uses np.argmax to find the most relevant document — simple and effective.

* Response Generation:
    * A mocked generation step that nicely mimics how a language model might condition on a retrieved context

In [29]:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Step 1: Mock "documents" and their "embeddings"
documents = [
    {
        "text": "Python is a popular programming language for AI.",
        "embedding": np.array([1, 0, 0]),
    },
    {
        "text": "Machine learning involves training models on data.",
        "embedding": np.array([0, 1, 0]),
    },
    {
        "text": "Transformers are deep learning models used in NLP.",
        "embedding": np.array([0, 0, 1]),
    },
]


# Step 2: Mock query and its "embedding"
query_embedding = np.array([0, 0, 0.85])


# Step 3: Retrieve relevant document using cosine similarity
def retrieve_top_document(query_embedding, documents):
    similarities = [
        cosine_similarity([query_embedding], [document["embedding"]])[0][0]
        for document in documents
    ]
    top_index = np.argmax(similarities)
    print(f"Top Index: {top_index}")
    return documents[top_index]


top_doc = retrieve_top_document(query_embedding, documents)


# Step 4: Mock generation (simulate response using the document)
def generate_response(query, context):
    return f"Based on the retrieved document: '{context}', here's an answer about your query: NLP uses models like transformers for language tasks."


# Step 5: Output the result

query = "Tell me about NLP models"
print("🔍 Query:", query)
response = generate_response(query, top_doc["text"])
print("📄 Retrieved Document:", top_doc["text"])
print("🤖 Generated Response:", response)

A = np.array([[1, 0], [0, 1]])
B = np.array([[1, 1]])

similarity = cosine_similarity(A, B)
print(similarity.shape)

Top Index: 2
🔍 Query: Tell me about NLP models
📄 Retrieved Document: Transformers are deep learning models used in NLP.
🤖 Generated Response: Based on the retrieved document: 'Transformers are deep learning models used in NLP.', here's an answer about your query: NLP uses models like transformers for language tasks.
(2, 1)


In [30]:
!(uv add sentence-transformers)

^C


[2mResolved [1m169 packages[0m [2min 18.46s[0m[0m
[36m[1mDownloading[0m[39m transformers [2m(11.1MiB)[0m
[36m[1mDownloading[0m[39m torch [2m(230.2MiB)[0m
[36m[1mDownloading[0m[39m sympy [2m(6.0MiB)[0m
[36m[1mDownloading[0m[39m networkx [2m(1.9MiB)[0m
[36m[1mDownloading[0m[39m tokenizers [2m(2.6MiB)[0m
 [32m[1mDownloading[0m[39m networkx
 [32m[1mDownloading[0m[39m tokenizers
 [32m[1mDownloading[0m[39m sympy
 [32m[1mDownloading[0m[39m transformers
  [31m×[0m Failed to download `torch==2.8.0`
[31m  ├─▶ [0mFailed to extract archive: torch-2.8.0-cp313-cp313-win_amd64.whl
[31m  ├─▶ [0mI/O operation failed during extraction
[31m  ╰─▶ [0mFailed to download distribution due to network timeout. Try increasing
[31m      [0mUV_HTTP_TIMEOUT (current value: 30s).
[36m  help: [0mIf you want to add the package regardless of the failed resolution,
        provide the `[32m--frozen[39m` flag to skip locking and syncing.


In [32]:
!uv add sentence-transformers

[2mResolved [1m136 packages[0m [2min 2ms[0m[0m
[2mUninstalled [1m3 packages[0m [2min 112ms[0m[0m
         If the cache and target directories are on different filesystems, hardlinking may not be supported.
[2mInstalled [1m3 packages[0m [2min 1.40s[0m[0m
 [31m-[39m [1mipykernel[0m[2m==6.29.5[0m
 [32m+[39m [1mipykernel[0m[2m==6.30.1[0m
 [31m-[39m [1mjupyter-client[0m[2m==7.4.9[0m
 [32m+[39m [1mjupyter-client[0m[2m==8.6.3[0m
 [31m-[39m [1mnotebook[0m[2m==6.5.7[0m
 [32m+[39m [1mnotebook[0m[2m==7.4.5[0m


In [1]:
from sentence_transformers import SentenceTransformer
import numpy as np

# Load a pre-trained embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")

# Your query
query = "Tell me about NLP models."

# Generate the embedding (a high-dimensional vector)
query_embedding = model.encode(query)

print("Query embedding shape:", query_embedding.shape)
print("First 5 values:", query_embedding[:5])

Query embedding shape: (384,)
First 5 values: [-0.00403581 -0.05510387  0.03066592  0.01598372  0.01516357]
