From Conference Talk: Beyond the basics of Retrieval for Augmenting Generation (w/ Ben Clavié)
Mastering LLMs: A Conference For Developers & Data Scientists

[Check it out](https://maven.com/parlance-labs/fine-tuning?utm_campaign=4f3c51&utm_medium=partner&utm_source=instructor)

### Load libraries

In [1]:
from sentence_transformers import SentenceTransformer
from wikipediaapi import Wikipedia
import numpy as np
import tqdm
import os

  from tqdm.autonotebook import tqdm, trange


In [2]:
model = SentenceTransformer("Alibaba-NLP/gte-base-en-v1.5", trust_remote_code=True)



Fetch some text

In [3]:
wiki = Wikipedia('RAGBOT/0.0', 'en')
doc = wiki.page("Albert Einstein").text
paragraphs = doc.split("\n\n")

Embed the text

In [4]:
docs_embed = model.encode(paragraphs, normalize_embeddings=True)

Embed the query

In [5]:
query = "Where did Einstein study phyics?"
query_embed = model.encode(query, normalize_embeddings=True)

Find the three closest paragraphs

In [6]:
import numpy as np
similarity = np.dot(docs_embed, query_embed.T)
top_3_idx = np.argsort(similarity)[-3:][::-1]
most_similar_documents = [paragraphs[idx] for idx in top_3_idx]

In [7]:
# Print the most similar documents
for doc in most_similar_documents:
    print(doc)

Albert Einstein ( EYEN-styne; German: [ˈalbɛɐt ˈʔaɪnʃtaɪn] ; 14 March 1879 – 18 April 1955) was a German-born theoretical physicist who is widely held to be one of the greatest and most influential scientists of all time. Best known for developing the theory of relativity, Einstein also made important contributions to quantum mechanics, and was thus a central figure in the revolutionary reshaping of the scientific understanding of nature that modern physics accomplished in the first decades of the twentieth century. His mass–energy equivalence formula E = mc2, which arises from relativity theory, has been called "the world's most famous equation". He received the 1921 Nobel Prize in Physics "for his services to theoretical physics, and especially for his discovery of the law of the photoelectric effect", a pivotal step in the development of quantum theory. His work is also known for its influence on the philosophy of science.
Born in the German Empire, Einstein moved to Switzerland in 

In [8]:
import lancedb
from lancedb.pydantic import LanceModel, Vector
from lancedb.embeddings import get_registry
from lancedb.rerankers import CohereReranker
import os

In [9]:
model_registry = get_registry()
lance_model = model_registry.get("sentence-transformers").create(name="BAAI/bge-small-en-v1.5")

In [10]:
class Document(LanceModel):
    text: str
    vector: Vector(768)
    category: str

In [12]:
db = lancedb.connect(".my_db")

In [13]:
tbl = db.create_table("my_table", schema=Document)

In [14]:
paragraph_embeddings = model.encode(paragraphs, normalize_embeddings=True)
documents = [Document(text=paragraph, vector=embedding.tolist(), category="biography") for paragraph, embedding in zip(paragraphs, paragraph_embeddings)]
tbl.add(documents)

In [16]:
tbl.create_fts_index("text")

In [17]:
reranker = CohereReranker()

In [21]:
query_text = "What year did Einstein win the Nobel Prize?"
query_embedding = model.encode(query_text, normalize_embeddings=True)