# Simple Encoder+FAISS
Embeddings are vector representations of words or sentences. They are used in LLM’s to represent language in a format that the model can learn from and understand. Since they are vectors that hold contextual information, many interesting operations can be made with them.

For instance, in this notebook I’ve encoded a set of 5 events and 5 queries about these events into embeddings using the lightweight all-MiniLM encoder model.

In [None]:
!pip install faiss-cpu sentence-transformers

In [2]:
from sentence_transformers import SentenceTransformer
import faiss

In [3]:
events = [
    "The player entered the dark cave.",
    "The dragon sleeps on a pile of gold.",
    "The merchant offered a healing potion.",
    "The knight swore loyalty to the kings.",
    "The village was attacked by bandits."
]

In [4]:
model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = model.encode(events, convert_to_numpy=True, normalize_embeddings=True)
print("Embeding shape:", embeddings.shape)

Embeding shape: (5, 384)


Now, we need a way to store these vectors and query them to do interesting things. FAISS is a database which allows for fast index search, instead of looping through manually.

In [5]:
d = embeddings.shape[1]
index = faiss.IndexFlatIP(d)
index.add(embeddings)
print("Number of vectors indexed:", index.ntotal)

Number of vectors indexed: 5


Since they are vectors, I can compare them using FAISS to find shortest euclidian distance between them and find which event has an answer to which query, because similar sentences will be close to each other in this vector space!

In [6]:
queries = [
    "Who attacked the town?",
    "Who owns the treasure?",
    "Where did the player go?",
    "Who helps with healing?",
    "Who did the knight swore loyalty to?"
]

for q in queries:
  query_vec = model.encode([q], convert_to_numpy=True)
  D, I = index.search(query_vec, k=2)
  print("Query:", q)
  for rank, idx in enumerate(I[0]):
    print(f"Match {rank+1}: {events[idx]} (cosine similarity={D[0][rank]:.2f})")


Query: Who attacked the town?
Match 1: The village was attacked by bandits. (cosine similarity=0.68)
Match 2: The player entered the dark cave. (cosine similarity=0.23)
Query: Who owns the treasure?
Match 1: The player entered the dark cave. (cosine similarity=0.24)
Match 2: The knight swore loyalty to the kings. (cosine similarity=0.19)
Query: Where did the player go?
Match 1: The player entered the dark cave. (cosine similarity=0.43)
Match 2: The merchant offered a healing potion. (cosine similarity=0.20)
Query: Who helps with healing?
Match 1: The merchant offered a healing potion. (cosine similarity=0.53)
Match 2: The dragon sleeps on a pile of gold. (cosine similarity=0.11)
Query: Who did the knight swore loyalty to?
Match 1: The knight swore loyalty to the kings. (cosine similarity=0.87)
Match 2: The merchant offered a healing potion. (cosine similarity=0.24)


As we can see, the results are not that bad! One thing I noticed, in the dragon-treasure query, the model failed. That's a clear indication that this simple setup just captures surface similarity, not reason or inference. Though, it is still impressive as all of the converting, lookup, attention, and pooling happens inside just one line of model.encode

In conclusion, we translated the sentences into mathematically represented vector space, which is pretty awesome!