>RAG(Retrieval-Augmented Generation)

- It's a framework that helps LLMs be more accurate, up-to-date & produce contexually relevant responses.
- Retrieval: retrieve relevant information from a knowledge base or an external source, for example, using text embeddings stored in a vector store.
- Generation: insert the relevant information to the prompt for the LLM to generate information.

In [1]:
from mistralai import Mistral
import requests
import numpy as np
import faiss        #pip install faiss-cpu for newer python versions - python 3.11, python 3.12
import os
from getpass import getpass
from dotenv import load_dotenv

In [2]:
load_dotenv()

api_key=os.getenv("MISTRAL_KEY")
client=Mistral(api_key=api_key)

In [3]:
response=requests.get('https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt')
text=response.text

In [4]:
#split document into chunks
#in a RAG system, it is crucial to split the document into smaller chunks so that 
# it's more effective to identify and retrieve the most relevant information in the retrieval process later

chunk_size=2048
chunks=[text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]
len(chunks)


37

In [14]:
#creating embeddings for each text chunk

def get_text_embedding(input):
    embeddings_batch_response=client.embeddings.create(
        model="mistral-embed",
        inputs=input
    )
    return embeddings_batch_response.data[0].embedding
    #text_embeddings=np.array(get_text_embedding(chunk) for chunk in chunks)


if not chunks:
    raise ValueError("Input 'chunks' is empty")

#collect embeddings into a list
embeddings_list = [get_text_embedding(chunk) for chunk in chunks]



In [15]:
# text_embeddings = np.array(text_embeddings)  # Convert to NumPy array
# if text_embeddings.ndim == 1:  # If 1D, reshape to 2D
#     text_embeddings = text_embeddings.reshape(1, -1)

text_embeddings = np.array(embeddings_list, dtype=np.float32)

In [16]:
#loading into vector DB
#Once we get the text embeddings, a common practice is to store them in a vector database for efficient processing and retrieval
#When selecting a vector database, there are several factors to consider including speed, scalability,
#cloud management, advanced filtering, and open-source vs. closed-source.
print(text_embeddings)
print(type(text_embeddings))
print(text_embeddings.shape)
d=text_embeddings.shape[1]
index=faiss.IndexFlatL2(d)
index.add(text_embeddings)

[[-0.03979492  0.07733154  0.00013709 ... -0.01274109 -0.02101135
  -0.00264168]
 [-0.03152466  0.07226562  0.02961731 ... -0.01079559 -0.01189423
  -0.00821686]
 [-0.05905151  0.06112671  0.01206207 ... -0.0226593   0.00488663
  -0.00665283]
 ...
 [-0.05477905  0.06890869  0.02703857 ... -0.02456665 -0.02526855
  -0.02687073]
 [-0.03884888  0.05587769  0.04718018 ... -0.01812744  0.00926208
  -0.00866699]
 [-0.03048706  0.05831909  0.01704407 ... -0.01620483 -0.01800537
  -0.04415894]]
<class 'numpy.ndarray'>
(37, 1024)


In [25]:
#creating embeddings for a question

#question = "What were the two main things the author worked on before college?"
#question="What is the meaning of the name Nicole"
question="Give me the general gist of the essay"
question_embeddings = np.array([get_text_embedding(question)])

In [26]:
#We perform a search on the vector database with index.search, which takes two arguments: the first is the vector of the question embeddings, 
# and the second is the number of similar vectors to retrieve

D, I = index.search(question_embeddings, k=2) # distance, index
retrieved_chunk = [chunks[i] for i in I.tolist()[0]]

In [27]:
prompt = f"""
Context information is below.
---------------------
{retrieved_chunk}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {question}
Answer:
"""

In [28]:
def run_mistral(user_message, model="mistral-large-latest"):
    messages = [
        {
            "role": "user", "content": user_message
        }
    ]
    chat_response = client.chat.complete(
        model=model,
        messages=messages
    )
    return (chat_response.choices[0].message.content)

run_mistral(prompt)

"The essay discusses the author's realization of the power of publishing content online, particularly essays, and how this democratized access to an audience. The author recounts his experience of sharing a talk on Lisp programming, which unexpectedly gained significant attention after being posted on Slashdot. This event made him realize that the web allowed anyone to publish and reach a wide audience, bypassing traditional gatekeepers like editors and publishers.\n\nThe author reflects on the limitations of print media, which restricted the publication of essays to a select few, and highlights how the web opened up new opportunities for a broader range of writers. He sees this as a turning point in his career, deciding to continue writing essays online alongside other work.\n\nAdditionally, the author touches on his experiences with painting and working at a software company, highlighting lessons learned about attention to detail and the dynamics of software development. Overall, the