# RAG

This notebook is looking to use a vector db to create a Retrieval-Augmented Generation (RAG) model by optimizing LLM output trough an authorized knowledge base.

We are going to take the following approach:
1. Set up
  a. Problem definition
  b. Data
  c. Package stack
2. Connection
3. Data
4. Modelling
5. RAG Setup
6. Evaluation

## 1. Setup
### 1.1. Problem definition
In a statement
> Giving a book in a plain text format, are we able to answer simple related-questions to the book?

### 1.2. Data
The authorized knowledge base is "Harry Potter 1" by "J.K. Rowling" retrieved from [Kaggle - Harry Potter Books](https://www.kaggle.com/datasets/santiviquez/hp1txt) for learning purposes.

### 1.3 Package stack
- `pip install weaviate-client transformers accelerate sentence-transformers`

### Preparing the tools

In [186]:
from sklearn.metrics.pairwise import cosine_similarity
from sentence_transformers import SentenceTransformer
from transformers import pipeline
from weaviate.classes.init import Auth
from weaviate.classes.config import Property, DataType
import numpy as np
import weaviate
import os

In [218]:

class bcolors:
    HEADER = '\033[95m'
    OKBLUE = '\033[94m'
    OKCYAN = '\033[96m'
    OKGREEN = '\033[92m'
    WARNING = '\033[93m'
    FAIL = '\033[91m'
    ENDC = '\033[0m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'

## 2. Connection

In [138]:
# --- 2.1: Initialize Weaviate ---
client = weaviate.connect_to_weaviate_cloud(
    cluster_url=os.environ["WEAVIATE_URL"],
    auth_credentials=Auth.api_key(os.environ["WEAVIATE_API_KEY"]),
)

if client.is_ready():
  client.connect()
  print('* succesfull conection')

if not client.collections.exists("TextChunk"):
  client.collections.create(
    name="TextChunk",
    description="A chunk of text from a book",
    properties=[
        Property(name='content', data_type=DataType.TEXT)
    ]
  )

  print("* collection created")
else :
  print("* collection already exist")

# client.close()

            Please make sure to close the connection using `client.close()`.


* succesfull conection
* collection created


## 3. Data
### --- 3.1: Load plain text book ---

In [139]:
with open("data/document.txt", "r") as f:
    book_text = f.read()

### --- 3.2: Load the Textbook and Split into Chunks ---

In [140]:
def split_into_chunks(text, max_length=500):
    """Splits text into chunks of specified max length."""
    words = text.split()
    chunks = []
    current_chunk = []
    current_length = 0

    for word in words:
        if current_length + len(word) <= max_length:
            current_chunk.append(word)
            current_length += len(word) + 1
        else:
            chunks.append(" ".join(current_chunk))
            current_chunk = [word]
            current_length = len(word) + 1

    if current_chunk:
        chunks.append(" ".join(current_chunk))

    return chunks

In [141]:
chunks = split_into_chunks(book_text)

### --- 3.3: Embed and Upload Chunks to Weaviate ---

In [142]:
model = SentenceTransformer("all-MiniLM-L6-v2")  # Use a small and efficient model

collection_object = client.collections.get("TextChunk")

for chunk in chunks:
    vector = model.encode(chunk).tolist()
    props = {"content": chunk}
    collection_object.data.insert(properties=props, vector=vector)

print("Book chunks uploaded to Weaviate!")

Book chunks uploaded to Weaviate!


## 4. Modeling
### --- 4.1: Set Up Local QA Model ---

In [None]:
# Efficient QA model
# qa_pipeline = pipeline("text2text-generation", model="google/flan-t5-small")  # Try 1: ...
# qa_pipeline = pipeline("text2text-generation", model="google/flan-t5-Large")  # Try 2: 3m 47s
qa_pipeline = pipeline("text2text-generation", model="google/flan-t5-xl")  # Try 3: 26m 59s

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]Error while downloading from https://cdn-lfs.hf.co/repos/16/41/16418edd56a7c42307a0f361531c01ee227a92a98628972bd433062c276dad7c/99196ddfbe886e8ef860f52de979df64890edfc792c3d94ce0502991f347dd18?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27model-00001-of-00002.safetensors%3B+filename%3D%22model-00001-of-00002.safetensors%22%3B&Expires=1735238982&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTczNTIzODk4Mn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5oZi5jby9yZXBvcy8xNi80MS8xNjQxOGVkZDU2YTdjNDIzMDdhMGYzNjE1MzFjMDFlZTIyN2E5MmE5ODYyODk3MmJkNDMzMDYyYzI3NmRhZDdjLzk5MTk2ZGRmYmU4ODZlOGVmODYwZjUyZGU5NzlkZjY0ODkwZWRmYzc5MmMzZDk0Y2UwNTAyOTkxZjM0N2RkMTg%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=tYc-3mjaLMeQdF2Axznfs4V6Ms9NqQva%7Enhd-wVtBZJG4irdkzrxM2q4cSIME4BFLpD8rb4fjV0O1YfWmzQd03F%7EQnSV9UJyH1sNgvdyhrZMbLQr3Tu8WwfYTFYqmrJ%7Efc0%7Egy%7Eruak-DxhPxvCFZ%7EU5471jV1d8JmcLxGwO

## 5. RAG Setup
### --- 5.1: RAG System Functions ---


In [206]:
def retrieve_relevant_chunks(query, top_k=3):
    """Retrieve the most relevant text chunks from a collection in Weaviate."""
    query_vector = model.encode(query).tolist()
    collection = client.collections.get("TextChunk")
    chunks = []

    # Iterate through the collection and rank based on vector similarity
    for item in collection.iterator(include_vector=True):
        content = item.properties["content"]
        vector = item.vector["default"]
        
        # Calculate similarity (cosine similarity or equivalent)
        similarity = cosine_similarity([query_vector], [vector])[0][0]
        chunks.append((content, similarity))

    # Sort chunks by similarity and return the top_k
    chunks = sorted(chunks, key=lambda x: x[1], reverse=True)
    top_chunks = [chunk[0] for chunk in chunks[:top_k]]
    
    # Debug: Check retrieved chunks
    print("\nRetrieved Chunks:")
    for i, chunk in enumerate(top_chunks):
        print(f"Chunk {i + 1}: {chunk[:100]}...")  # Show first 100 characters of each chunk

    return top_chunks

# 6. Evaluation

In [232]:
def answer_question(query):
    """Answer a question by retrieving relevant chunks and generating an answer."""
    top = 5
    relevant_chunks = retrieve_relevant_chunks(query, top_k=top)
    context = "\n".join(relevant_chunks[:top])  # Limit context to top chunks

    # Debug: Check context
    print("\nGenerated Context for QA:")
    print(context)

    # Combine context and question into a structured prompt
    prompt = f"""
        You are an helpfull assistant. Use the following context from a book to answer the user's question.
        Only if you don't know the answer just say that you need more information.

        Context:
        {context}

        Use 5 sentences minimum and keep the answer concise and accurate.

        Question: {query}
        Answer:"""
    
    # Use the local Hugging Face model to generate an answer
    response = qa_pipeline(prompt, max_length=200, truncation=True)
    
    # Handle short or unhelpful answers
    if len(response[0]['generated_text']) < 10:  # If the answer is too short
        fallback_chunk = relevant_chunks[0]  # Use the most relevant chunk
        return f"{response[0]['generated_text']} (Additional context: {fallback_chunk[:200]}...)"

    return response[0]['generated_text']

### --- 6.1: Test the RAG System ---

In [238]:
# Questions
questions = [
  # questions in the scoped data
  "What is the main theme of the book?", # 0
  "What is the main theme of the book?", # 1
  "Who is Harry Potter?", # 2
  "What is the main threat?", # 3
  "What is the moral of this story?", # 4
  "Who are the main characters?", # 5
  "Who are harry potter friends?", # 6

  # questions out the scoped data
  "Who is Bellatrix Lestrange?", # 7
]


In [239]:
# Answer a question in the scope
client.connect()
if __name__ == "__main__" and client.is_ready():
  questionNo = 0
  answer = answer_question(questions[questionNo])
  print("")
  print(f"{bcolors.UNDERLINE + bcolors.BOLD}Question: {questions[questionNo]}{bcolors.ENDC}")
  print(f"{bcolors.BOLD + bcolors.OKGREEN} Answer: {bcolors.ENDC}{answer}")

client.close()


 Retrieved Chunks:
Chunk 1: the creeps. The Restricted Section was right at the back of the library. Step ping carefully over th...
Chunk 2: young as you, I'm sure it seems incredible, but to Nicolas and Perenelle, it really is like going to...
Chunk 3: plates. But from that moment on, Hermione Granger became their friend. There are some things you can...
Chunk 4: Developments in Wizardry. And then, of course, there was the sheer size of the library; tens of thou...
Chunk 5: from one of the teachers to look in any of the restricted books, and he knew he'd never get one. The...

Generated Context for QA:
the creeps. The Restricted Section was right at the back of the library. Step ping carefully over the rope that separated these books from the rest of the library, he held up his lamp to read the titles. They didn't tell him much. Their peeling, faded gold letters spelled words in languages Harry couldn't understand. Some had no title at all. One book had a dark stain on it that looked

In [240]:
# Answer a question in the scope
client.connect()
if __name__ == "__main__" and client.is_ready():
  questionNo = 2
  answer = answer_question(questions[questionNo])
  print("")
  print(f"{bcolors.UNDERLINE + bcolors.BOLD}Question: {questions[questionNo]}{bcolors.ENDC}")
  print(f"{bcolors.BOLD + bcolors.OKGREEN} Answer: {bcolors.ENDC}{answer}")

client.close()


 Retrieved Chunks:
Chunk 1: pointing at two large ice creams to show he couldn't come in. "That's Hagrid," said Harry, pleased t...
Chunk 2: bought him (chocolate and raspberry with chopped nuts). "What's up?" said Hagrid. "Nothing," Harry l...
Chunk 3: looked like bodyguards. "Oh, this is Crabbe and this is Goyle," said the pale boy carelessly, notici...
Chunk 4: as he unwrapped the frog. "Thanks, Harry... I think I'll go to bed.... D'you want the card, you coll...
Chunk 5: These people will never understand him! He'll be famous -- a legend -- I wouldn't be surprised if to...

Generated Context for QA:
pointing at two large ice creams to show he couldn't come in. "That's Hagrid," said Harry, pleased to know something the boy didn't. "He works at Hogwarts." "Oh," said the boy, "I've heard of him. He's a sort of servant, isn't he?" "He's the gamekeeper," said Harry. He was liking the boy less and less every second. "Yes, exactly. I heard he's a sort of savage -- lives in a hut on the s

In [241]:
# Answer a questions that is OUT of the scope
client.connect()
if __name__ == "__main__" and client.is_ready():
  questionNo = 7
  answer = answer_question(questions[questionNo])
  print("")
  print(f"{bcolors.UNDERLINE + bcolors.BOLD}Question: {questions[questionNo]}{bcolors.ENDC}")
  print(f"{bcolors.BOLD + bcolors.OKGREEN} Answer: {bcolors.ENDC}{answer}")

client.close()


 Retrieved Chunks:
Chunk 1: his robes. Harry and Ron were delighted to hear Hagrid call Fitch "that old git." "An' as fer that c...
Chunk 2: of Norbert. We'll have to risk it. And we have got the invisibility cloak, Malfoy doesn't know about...
Chunk 3: "What utter rubbish! How dare you tell such lies! Come on -- I shall see Professor Snape about you, ...
Chunk 4: You-Know-Who disappeared. Said they'd been bewitched. My dad doesn't believe it. He says Malfoy's fa...
Chunk 5: last chamber. There was already someone there -- but it wasn't Snape. It wasn't even Voldemort. CHAP...

Generated Context for QA:
his robes. Harry and Ron were delighted to hear Hagrid call Fitch "that old git." "An' as fer that cat, Mrs. Norris, I'd like ter introduce her to Fang sometime. D'yeh know, every time I go up ter the school, she follows me everywhere? Can't get rid of her -- Fitch puts her up to it." Harry told Hagrid about Snape's lesson. Hagrid, like Ron, told Harry not to worry about it, that Snape

### Conclusion

- Weaviate as a vector databases has performed well retrieving chunks relates to the query provided by the user.
- We have used 3 different models to evaluate the query and return an answer, on the first model the answer was short and austere so is hard to measure the results. For the second model we saw a similar approach.
- On the third model we saw a significant change with the answers provided, however the model was not able to provide accurate answers on most of the cases, neither to handle questions totally out of the scope, but is important to mention in this point that I dind't provide fake answers.
- Perhaps, we could try to improve the prompt or/and the question, tune the params of the model or prove with different models for better results.