# Your First RAG Application

## NOTE: REFER TO ORIGINAL NOTEBOOK FOR ANSWERS

- [Pythonic_RAG_Assignment](https://github.com/donbcolab/AIE3/blob/main/Week%202/Day%201/Pythonic_RAG_Assignment.ipynb) - refer to this notebook for answers to assigned questions

## Activity 1 Improvements

### Lessons Learned
- integration with Arxiv
- retrieval of Arxiv metadata
- retrieval of Arxiv PDF documents
- use of Arxiv metadata when generating metadata
- tweaking the prompt to improve evaluation score

### Lessons to be learned
- AVOID RABBIT HOLES
- STAY ON TRACK
- improved use of metatdata, and verifying it's providing maximum value
- investigate use of HYDE to improve matching of questions against info in Vector Database
- evaluation and assessment of embedding models
- review patterns [LangChain RAG from Scratch](https://gist.github.com/donbr/5f952f52dcbdf18a8f2dac8aaffd2be4) series

### Rabbit Holes
- intelligent(?) splitting using TokenTextSplitter # where is strikethrough when you need it!
- switch to FAISS # dude... stay on track  (5 changes do not equal 1)

## Table of Contents:

- Task 1: Imports and Utilities
- Task 2: Documents
- Task 3: Embeddings and Vectors
- Task 4: Prompts
- Task 5: Retrieval Augmented Generation
  - 🚧 Activity #1: Augment RAG
- Task 6: Visibility Tooling
- Task 7: RAG Evaluation Using GPT-4

## Task 1: Imports and Utility

In [None]:
!pip install -qU numpy matplotlib plotly pandas scipy scikit-learn openai python-dotenv tiktoken typing PyPDF2

In [None]:
from aimakerspace.text_utils import TextFileLoader, CharacterTextSplitter
from aimakerspace.vectordatabase import VectorDatabase
import asyncio

In [None]:
import nest_asyncio
nest_asyncio.apply()

In [None]:
import os
import openai
from getpass import getpass

openai.api_key = os.getenv("OPENAI_API_KEY")
os.environ["OPENAI_API_KEY"] = openai.api_key

## Task 2: Documents

### Integration with Arxiv

In [None]:
import arxiv
import asyncio
from aimakerspace.text_utils import TokenTextSplitter
from aimakerspace.vectordatabase import VectorDatabase
from aimakerspace.openai_utils.embedding import EmbeddingModel
from PyPDF2 import PdfReader
import io
import requests

# Fetch metadata
def fetch_arxiv_metadata(query: str, max_results: int = 10):
    client = arxiv.Client(
        page_size=max_results,
        delay_seconds=10.0,
        num_retries=5
    )
    results = []
    search = arxiv.Search(query=query, max_results=max_results)
    for result in client.results(search):
        pdf_link = next((link.href for link in result.links if link.title == "pdf"), None)
        metadata = {
            "title": result.title,
            "authors": [author.name for author in result.authors],
            "updated": result.updated,
            "source_document": pdf_link,
            "links": [link.href for link in result.links],
#            "summary": result.summary,
        }
        results.append(metadata)
    return results

### retrieve Arxiv metadata

In [None]:
# Fetch metadata for your documents
arxiv_metadata = fetch_arxiv_metadata("alignment concerns with large language models", max_results=10)

# Print the fetched metadata to verify
for metadata in arxiv_metadata:
    print(metadata)


### retrieve Arxiv PDF documents

In [None]:
# Example function to extract text from a PDF
def fetch_pdf_text(pdf_link):
    response = requests.get(pdf_link)
    with io.BytesIO(response.content) as open_pdf_file:
        reader = PdfReader(open_pdf_file)
        text = ""
        for page_num in range(len(reader.pages)):
            text += reader.pages[page_num].extract_text()
    return text


### Splitting PDFs Into Chunks

In [None]:
# Initialize the splitter
# The max_tokens parameter is set to 2048
# text_utils.py has logic to split text conditionally into sentences or paragraphs
# based on the number of tokens in the text

token_splitter = TokenTextSplitter(max_tokens=2048, tokenizer_name="cl100k_base")

chunked_documents = []
metadata_list = []

for metadata in arxiv_metadata:
    source_doc = metadata["source_document"]
    document_text = fetch_pdf_text(source_doc)
    
    # Split document text into chunks
    chunks = token_splitter.split(document_text)
    chunked_documents.extend(chunks)
    
    # Extend metadata list with source_document for each chunk
    chunk_metadata = {"source_document": source_doc}
    metadata_list.extend([chunk_metadata] * len(chunks))


In [None]:
# print metadata_list
for metadata in metadata_list:
    print(metadata)

In [None]:
# Print chunked documents to verify uniqueness
for i, chunk in enumerate(chunked_documents):
    print(f"Chunk {i}: {chunk[:100]}...")  # Print the first 100 characters for brevity

## Task 3: Embeddings and Vectors

In [None]:
from aimakerspace.openai_utils.embedding import EmbeddingModel

# Initialize the embedding model
embedding_model = EmbeddingModel()

# Generate embeddings for chunked documents
async def generate_embeddings(text_list):
    embeddings = await embedding_model.async_get_embeddings(text_list)
    return embeddings

embeddings = asyncio.run(generate_embeddings(chunked_documents))

# Print embeddings to verify uniqueness
print("Verifying embeddings:")
for i, embedding in enumerate(embeddings):
    # Print the embedding length
    print(f"\n\nEmbedding {i} length: {len(embedding)}")
    
    # Print the first 10 elements of the embedding
    print(f"Embedding {i}: {embedding[:10]}...")

In [None]:
# Build the vector database
vector_db = VectorDatabase(embedding_model=embedding_model)
vector_db = asyncio.run(vector_db.abuild_from_list(chunked_documents, metadata_list))

# Verify FAISS index
print(f"FAISS index size: {vector_db.index.ntotal}")

In [None]:
# Example search query
query = "alignment concerns with large language models"
results = vector_db.search_by_text(query, k=5)

# print result count
print(f"Result count: {len(results)}")

# Print search results
for text, distance, metadata in results:
    print(f"\n\n**Distance:** {distance}\n**Metadata source_document:** {metadata['source_document']}\n**Text:** {text}")

So, to review what we've done so far in natural language:

1. We load source documents
2. We split those source documents into smaller chunks (documents)
3. We send each of those documents to the `text-embedding-3-small` OpenAI API endpoint
4. We store each of the text representations with the vector representations as keys/values in a dictionary

### Semantic Similarity

In [None]:
vector_db.search_by_text("alignment concerns with large language models", k=5)

## Task 4: Prompts

In [None]:
from aimakerspace.openai_utils.prompts import (
    UserRolePrompt,
    SystemRolePrompt,
    AssistantRolePrompt,
)

from aimakerspace.openai_utils.chatmodel import ChatOpenAI

chat_openai = ChatOpenAI(model_name="gpt-3.5-turbo-0125")
user_prompt_template = "{content}"
user_role_prompt = UserRolePrompt(user_prompt_template)
system_prompt_template = (
    "You are an expert in {expertise}, you always answer in a kind way."
)
system_role_prompt = SystemRolePrompt(system_prompt_template)

messages = [
    user_role_prompt.create_message(
        content="What is the best way to write a loop?"
    ),
    system_role_prompt.create_message(expertise="Python"),
]

response = chat_openai.run(messages)

In [None]:
print(response)

## Task 5: Retrieval Augmented Generation

In [None]:
RAG_PROMPT_TEMPLATE = """ \
Use the provided context to answer the user's query accurately and comprehensively.

Ensure your response is based only on the information given in the context. If the context does not contain enough information to answer the query, respond with "I don't know" and provide a brief explanation.

Summarize or synthesize the information if necessary to provide a clear and complete answer.

Your response will be evaluated based on its accuracy, relevance, and completeness.  If you do not achieve a score of 9 out of 10 or higher on each of these criteria you will be fired!
"""

rag_prompt = SystemRolePrompt(RAG_PROMPT_TEMPLATE)

USER_PROMPT_TEMPLATE = """ \
Context:
{context}

User Query:
{user_query}
"""


user_prompt = UserRolePrompt(USER_PROMPT_TEMPLATE)

class RetrievalAugmentedQAPipeline:
    def __init__(self, llm: ChatOpenAI(), vector_db_retriever: VectorDatabase) -> None:
        self.llm = llm
        self.vector_db_retriever = vector_db_retriever

    def run_pipeline(self, user_query: str) -> str:
        context_list = self.vector_db_retriever.search_by_text(user_query, k=5)

        context_prompt = ""
        for context in context_list:
            context_prompt += context[0] + "\n"

        formatted_system_prompt = rag_prompt.create_message()

        formatted_user_prompt = user_prompt.create_message(user_query=user_query, context=context_prompt)

        return {"response" : self.llm.run([formatted_user_prompt, formatted_system_prompt]), "context" : context_list}

In [None]:
retrieval_augmented_qa_pipeline = RetrievalAugmentedQAPipeline(
    vector_db_retriever=vector_db,
    llm=chat_openai
)

In [None]:
retrieval_augmented_qa_pipeline.run_pipeline("alignment concerns with large language models")

## Task 6: Visibility Tooling

In [None]:
!pip install -qU wandb

In [None]:
wandb_key = os.getenv("WANDB_API_KEY")
os.environ["WANDB_API_KEY"] = wandb_key

In [None]:
import wandb

wandb.init(project="Visibility Example - AIE3")

In [None]:
import datetime
from wandb.sdk.data_types.trace_tree import Trace

class RetrievalAugmentedGenerationPipeline:
    def __init__(self, llm: ChatOpenAI(), vector_db_retriever: VectorDatabase, wandb_project = None) -> None:
        self.llm = llm
        self.vector_db_retriever = vector_db_retriever
        self.wandb_project = wandb_project

    def run_pipeline(self, user_query: str) -> str:
        context_list = self.vector_db_retriever.search_by_text(user_query, k=5)

        context_prompt = ""
        for context in context_list:
            context_prompt += context[0] + "\n"

        formatted_system_prompt = rag_prompt.create_message()

        formatted_user_prompt = user_prompt.create_message(user_query=user_query, context=context_prompt)


        start_time = datetime.datetime.now().timestamp() * 1000

        try:
            openai_response = self.llm.run([formatted_system_prompt, formatted_user_prompt], text_only=False)
            end_time = datetime.datetime.now().timestamp() * 1000
            status = "success"
            status_message = (None, )
            response_text = openai_response.choices[0].message.content
            token_usage = dict(openai_response.usage)
            model = openai_response.model

        except Exception as e:
            end_time = datetime.datetime.now().timestamp() * 1000
            status = "error"
            status_message = str(e)
            response_text = ""
            token_usage = {}
            model = ""

        if self.wandb_project:
            root_span = Trace(
                name="root_span",
                kind="llm",
                status_code=status,
                status_message=status_message,
                start_time_ms=start_time,
                end_time_ms=end_time,
                metadata={
                    "token_usage" : token_usage,
                    "model_name" : model
                },
                inputs= {"system_prompt" : formatted_system_prompt, "user_prompt" : formatted_user_prompt},
                outputs= {"response" : response_text}
            )

            root_span.log(name="openai_trace")

        return {"response" : self.llm.run([formatted_user_prompt, formatted_system_prompt]), "context" : context_list} if response_text else "We ran into an error. Please try again later. Full Error Message: " + status_message

In [None]:
retrieval_augmented_qa_pipeline = RetrievalAugmentedGenerationPipeline(
    vector_db_retriever=vector_db,
    llm=chat_openai,
    wandb_project="LLM Visibility Example"
)

In [None]:
retrieval_augmented_qa_pipeline.run_pipeline("alignment concerns with large language models")

Navigate to the Weights and Biases "run" link to see how your LLM is performing!

```
View run at YOUR LINK HERE
```

## Task 7: RAG Evaluation Using GPT-4



In [None]:
query = "alignment concerns with large language models"

response = retrieval_augmented_qa_pipeline.run_pipeline(query)

print(response["response"])

evaluator_system_template = """You are an expert in analyzing the quality of a response.

You should be hyper-critical.

Provide scores (out of 10) for the following attributes:

1. Clarity - how clear is the response
2. Faithfulness - how related to the original query is the response and the provided context
3. Correctness - was the response correct?

Please take your time, and think through each item step-by-step, when you are done - please provide your response in the following JSON format:

{"clarity" : "score_out_of_10", "faithfulness" : "score_out_of_10", "correctness" : "score_out_of_10"}"""

evaluation_template = """Query: {input}
Context: {context}
Response: {response}"""

try:
    chat_openai = ChatOpenAI(model_name="gpt-4o")
except:
    chat_openai = ChatOpenAI()

evaluator_system_prompt = SystemRolePrompt(evaluator_system_template)
evaluation_prompt = UserRolePrompt(evaluation_template)

messages = [
    evaluator_system_prompt.create_message(format=False),
    evaluation_prompt.create_message(
        input=query,
        context="\n".join([context[0] for context in response["context"]]),
        response=response["response"]
    ),
]

chat_openai.run(messages, response_format={"type" : "json_object"})

# Conclusion

In this notebook, we've gone through the steps required to create your own simple RAQA application!

Please feel free to extend this as much as you'd like.

In [None]:
wandb.finish()