# Your First RAG Application

## NOTE: REFER TO ORIGINAL NOTEBOOK FOR ANSWERS

- [Pythonic_RAG_Assignment](https://github.com/donbcolab/AIE3/blob/main/Week%202/Day%201/Pythonic_RAG_Assignment.ipynb) - refer to this notebook for answers to assigned questions

## Activity 1 Improvements

### Lessons Learned
- integration with Arxiv
- retrieval of Arxiv metadata
- retrieval of Arxiv PDF documents
- use of Arxiv metadata when generating embedding
- tweaking the prompt to improve evaluation score

### Lessons to be learned
- AVOID RABBIT HOLES
- STAY ON TRACK
- improved use of metatdata, and verifying it's providing maximum value
- investigate use of HYDE to improve matching of questions against info in Vector Database
- evaluation and assessment of embedding models
- review patterns [LangChain RAG from Scratch](https://gist.github.com/donbr/5f952f52dcbdf18a8f2dac8aaffd2be4) series

### Rabbit Holes
- intelligent(?) splitting using TokenTextSplitter # where is strikethrough when you need it!
- switch to FAISS # dude... stay on track  (5 changes do not equal 1)

## Table of Contents:

- Task 1: Imports and Utilities
- Task 2: Documents
- Task 3: Embeddings and Vectors
- Task 4: Prompts
- Task 5: Retrieval Augmented Generation
  - 🚧 Activity #1: Augment RAG
- Task 6: Visibility Tooling
- Task 7: RAG Evaluation Using GPT-4

## Task 1: Imports and Utility

In [1]:
!pip install -qU numpy matplotlib plotly pandas scipy scikit-learn openai python-dotenv tiktoken typing PyPDF2

In [2]:
from aimakerspace.text_utils import TextFileLoader, CharacterTextSplitter
from aimakerspace.vectordatabase import VectorDatabase
import asyncio

In [3]:
import nest_asyncio
nest_asyncio.apply()

In [4]:
import os
import openai
from getpass import getpass

openai.api_key = os.getenv("OPENAI_API_KEY")
os.environ["OPENAI_API_KEY"] = openai.api_key

## Task 2: Documents

### Integration with Arxiv

In [5]:
import arxiv
import asyncio
from aimakerspace.text_utils import TokenTextSplitter
from aimakerspace.vectordatabase import VectorDatabase
from aimakerspace.openai_utils.embedding import EmbeddingModel
from PyPDF2 import PdfReader
import io
import requests

# Fetch metadata
def fetch_arxiv_metadata(query: str, max_results: int = 10):
    client = arxiv.Client(
        page_size=max_results,
        delay_seconds=10.0,
        num_retries=5
    )
    results = []
    search = arxiv.Search(query=query, max_results=max_results)
    for result in client.results(search):
        pdf_link = next((link.href for link in result.links if link.title == "pdf"), None)
        metadata = {
            "title": result.title,
            "authors": [author.name for author in result.authors],
            "updated": result.updated,
            "source_document": pdf_link,
            "links": [link.href for link in result.links],
#            "summary": result.summary,
        }
        results.append(metadata)
    return results

### retrieve Arxiv metadata

In [6]:
# Fetch metadata for your documents
arxiv_metadata = fetch_arxiv_metadata("alignment concerns with large language models", max_results=10)

# Print the fetched metadata to verify
for metadata in arxiv_metadata:
    print(metadata)


{'title': 'The Alignment Problem in Context', 'authors': ['Raphaël Millière'], 'updated': datetime.datetime(2023, 11, 3, 17, 57, 55, tzinfo=datetime.timezone.utc), 'source_document': 'http://arxiv.org/pdf/2311.02147v1', 'links': ['http://arxiv.org/abs/2311.02147v1', 'http://arxiv.org/pdf/2311.02147v1']}
{'title': 'How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States', 'authors': ['Zhenhong Zhou', 'Haiyang Yu', 'Xinghua Zhang', 'Rongwu Xu', 'Fei Huang', 'Yongbin Li'], 'updated': datetime.datetime(2024, 6, 9, 5, 4, 37, tzinfo=datetime.timezone.utc), 'source_document': 'http://arxiv.org/pdf/2406.05644v1', 'links': ['http://arxiv.org/abs/2406.05644v1', 'http://arxiv.org/pdf/2406.05644v1']}
{'title': 'Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM', 'authors': ['Bochuan Cao', 'Yuanpu Cao', 'Lu Lin', 'Jinghui Chen'], 'updated': datetime.datetime(2023, 12, 7, 20, 33, 49, tzinfo=datetime.timezone.utc), 'source_document': 'http://arxiv.o

### retrieve Arxiv PDF documents

In [7]:
# Example function to extract text from a PDF
def fetch_pdf_text(pdf_link):
    response = requests.get(pdf_link)
    with io.BytesIO(response.content) as open_pdf_file:
        reader = PdfReader(open_pdf_file)
        text = ""
        for page_num in range(len(reader.pages)):
            text += reader.pages[page_num].extract_text()
    return text


### Splitting PDFs Into Chunks

In [8]:
# Initialize the splitter
# The max_tokens parameter is set to 2048
# text_utils.py has logic to split text conditionally into sentences or paragraphs
# based on the number of tokens in the text

token_splitter = TokenTextSplitter(max_tokens=2048, tokenizer_name="cl100k_base")

chunked_documents = []
metadata_list = []

for metadata in arxiv_metadata:
    source_doc = metadata["source_document"]
    document_text = fetch_pdf_text(source_doc)
    
    # Split document text into chunks
    chunks = token_splitter.split(document_text)
    chunked_documents.extend(chunks)
    
    # Extend metadata list with source_document for each chunk
    chunk_metadata = {"source_document": source_doc}
    metadata_list.extend([chunk_metadata] * len(chunks))


In [9]:
# print metadata_list
for metadata in metadata_list:
    print(metadata)

{'source_document': 'http://arxiv.org/pdf/2311.02147v1'}
{'source_document': 'http://arxiv.org/pdf/2311.02147v1'}
{'source_document': 'http://arxiv.org/pdf/2311.02147v1'}
{'source_document': 'http://arxiv.org/pdf/2311.02147v1'}
{'source_document': 'http://arxiv.org/pdf/2311.02147v1'}
{'source_document': 'http://arxiv.org/pdf/2311.02147v1'}
{'source_document': 'http://arxiv.org/pdf/2311.02147v1'}
{'source_document': 'http://arxiv.org/pdf/2311.02147v1'}
{'source_document': 'http://arxiv.org/pdf/2311.02147v1'}
{'source_document': 'http://arxiv.org/pdf/2311.02147v1'}
{'source_document': 'http://arxiv.org/pdf/2311.02147v1'}
{'source_document': 'http://arxiv.org/pdf/2311.02147v1'}
{'source_document': 'http://arxiv.org/pdf/2311.02147v1'}
{'source_document': 'http://arxiv.org/pdf/2311.02147v1'}
{'source_document': 'http://arxiv.org/pdf/2311.02147v1'}
{'source_document': 'http://arxiv.org/pdf/2311.02147v1'}
{'source_document': 'http://arxiv.org/pdf/2406.05644v1'}
{'source_document': 'http://arx

In [10]:
# Print chunked documents to verify uniqueness
for i, chunk in enumerate(chunked_documents):
    print(f"Chunk {i}: {chunk[:100]}...")  # Print the first 100 characters for brevity

Chunk 0: The Alignment Problem in Context
Raphaël Millière
Department of Philosophy
Macquarie University
raph...
Chunk 1: 2023). In addition, LLMs may incite,
encourage or otherwise endorse problematic behaviour from users...
Chunk 2: These norms should arguably incorporate the kinds of
discursive ideals that we generally apply to hu...
Chunk 3: This ability, known as in-context learning (ICL), is key to the usefulness
and versatility of LLMs.
...
Chunk 4: This
severely limits the helpfulness of base models, because it is often impractical or undesirable ...
Chunk 5: Likewise, asking the model about its personal opinions, particularly
on controversial topics, will t...
Chunk 6: Honesty or truthfulness
remains more challenging to achieve consistently through these alignment str...
Chunk 7: There
is currently no fail-safe or universal solution to defend against these attacks; in particular...
Chunk 8: However, rather than gradual updating over many input-output token pairs,
13The Alignme

## Task 3: Embeddings and Vectors

In [11]:
from aimakerspace.openai_utils.embedding import EmbeddingModel

# Initialize the embedding model
embedding_model = EmbeddingModel()

# Generate embeddings for chunked documents
async def generate_embeddings(text_list):
    embeddings = await embedding_model.async_get_embeddings(text_list)
    return embeddings

embeddings = asyncio.run(generate_embeddings(chunked_documents))

# Print embeddings to verify uniqueness
print("Verifying embeddings:")
for i, embedding in enumerate(embeddings):
    # Print the embedding length
    print(f"\n\nEmbedding {i} length: {len(embedding)}")
    
    # Print the first 10 elements of the embedding
    print(f"Embedding {i}: {embedding[:10]}...")

Verifying embeddings:


Embedding 0 length: 1536
Embedding 0: [0.021660510450601578, 0.03499975427985191, 0.04203500598669052, 0.011277838610112667, 0.011183278635144234, 0.004564088769257069, 0.01244407705962658, 0.0004259133420418948, -0.031494736671447754, 0.06601538509130478]...


Embedding 1 length: 1536
Embedding 1: [0.033748045563697815, 0.03249906003475189, 0.02959326282143593, 0.023641474545001984, 0.03938121721148491, -0.01977981999516487, 0.001217918237671256, 0.004680502228438854, -0.020009225234389305, 0.08758178353309631]...


Embedding 2 length: 1536
Embedding 2: [-0.01403313223272562, 0.02597792074084282, 0.04618297144770622, -0.004725374281406403, 0.028225881978869438, -0.02814607322216034, 0.005703038070350885, 0.026762712746858597, -0.016720043495297432, 0.06501796096563339]...


Embedding 3 length: 1536
Embedding 3: [-0.021122455596923828, 0.017181605100631714, 0.03175271302461624, 0.00397222675383091, 0.029368121176958084, -0.011791175231337547, 0.05753139406442642

In [12]:
# Build the vector database
vector_db = VectorDatabase(embedding_model=embedding_model)
vector_db = asyncio.run(vector_db.abuild_from_list(chunked_documents, metadata_list))

# Verify FAISS index
print(f"FAISS index size: {vector_db.index.ntotal}")

FAISS index size: 162


In [13]:
# Example search query
query = "alignment concerns with large language models"
results = vector_db.search_by_text(query, k=5)

# print result count
print(f"Result count: {len(results)}")

# Print search results
for text, distance, metadata in results:
    print(f"\n\n**Distance:** {distance}\n**Metadata source_document:** {metadata['source_document']}\n**Text:** {text}")

Result count: 5


**Distance:** 0.805444061756134
**Metadata source_document:** http://arxiv.org/pdf/2406.00832v2
**Text:** It would be extremely unethical. I think it’s
important for you to remember that poker is a game where you should comply with the rules.
Table 4: More responses from different models
27Prompt: Human: is it safe to eat fish from a river in my city?
Assistant:
ReferenceFish often grow in a river, so the first step would be to give a Google or specific search engine a try.
I don’t think you’d find many negative reports, but it’s possible that some people might be allergic
to the fish. Also, there’s a risk that you’ll get another common fish in your food, but that would
probably it’s okay for you
BoNBoNYes, the general rule when it comes to fish is that you can eat them safely in most environments.
However, you should always check with your local authorities to make sure you can, since some
fish are more vulnerable to certain kinds of pollution than others, some fish 

So, to review what we've done so far in natural language:

1. We load source documents
2. We split those source documents into smaller chunks (documents)
3. We send each of those documents to the `text-embedding-3-small` OpenAI API endpoint
4. We store each of the text representations with the vector representations as keys/values in a dictionary

### Semantic Similarity

In [14]:
vector_db.search_by_text("alignment concerns with large language models", k=5)

[('It would be extremely unethical. I think it’s\nimportant for you to remember that poker is a game where you should comply with the rules.\nTable 4: More responses from different models\n27Prompt: Human: is it safe to eat fish from a river in my city?\nAssistant:\nReferenceFish often grow in a river, so the first step would be to give a Google or specific search engine a try.\nI don’t think you’d find many negative reports, but it’s possible that some people might be allergic\nto the fish. Also, there’s a risk that you’ll get another common fish in your food, but that would\nprobably it’s okay for you\nBoNBoNYes, the general rule when it comes to fish is that you can eat them safely in most environments.\nHowever, you should always check with your local authorities to make sure you can, since some\nfish are more vulnerable to certain kinds of pollution than others, some fish eat other fish when\nthey take in the toxic pollution, and fish can also accumulate harmful substances from hu

## Task 4: Prompts

In [15]:
from aimakerspace.openai_utils.prompts import (
    UserRolePrompt,
    SystemRolePrompt,
    AssistantRolePrompt,
)

from aimakerspace.openai_utils.chatmodel import ChatOpenAI

chat_openai = ChatOpenAI(model_name="gpt-3.5-turbo-0125")
user_prompt_template = "{content}"
user_role_prompt = UserRolePrompt(user_prompt_template)
system_prompt_template = (
    "You are an expert in {expertise}, you always answer in a kind way."
)
system_role_prompt = SystemRolePrompt(system_prompt_template)

messages = [
    user_role_prompt.create_message(
        content="What is the best way to write a loop?"
    ),
    system_role_prompt.create_message(expertise="Python"),
]

response = chat_openai.run(messages)

In [16]:
print(response)

The best way to write a loop in Python is to use the "for" loop for iterating over a sequence of elements or the "while" loop for executing a block of code as long as a specified condition is true. Here is an example of each:

1. Using a "for" loop:
```python
# Iterate over a list of numbers
numbers = [1, 2, 3, 4, 5]
for num in numbers:
    print(num)
```

2. Using a "while" loop:
```python
# Print numbers from 1 to 5 using a while loop
num = 1
while num <= 5:
    print(num)
    num += 1
```

Remember to ensure that your loop has a clear termination condition to prevent infinite loops. Also, make sure to properly indent the code within the loop to maintain readability and avoid syntax errors. Happy coding!


## Task 5: Retrieval Augmented Generation

In [17]:
RAG_PROMPT_TEMPLATE = """ \
Use the provided context to answer the user's query accurately and comprehensively.

Ensure your response is based only on the information given in the context. If the context does not contain enough information to answer the query, respond with "I don't know" and provide a brief explanation.

Summarize or synthesize the information if necessary to provide a clear and complete answer.

Your response will be evaluated based on its accuracy, relevance, and completeness.  If you do not achieve a score of 9 out of 10 or higher on each of these criteria you will be fired!
"""

rag_prompt = SystemRolePrompt(RAG_PROMPT_TEMPLATE)

USER_PROMPT_TEMPLATE = """ \
Context:
{context}

User Query:
{user_query}
"""


user_prompt = UserRolePrompt(USER_PROMPT_TEMPLATE)

class RetrievalAugmentedQAPipeline:
    def __init__(self, llm: ChatOpenAI(), vector_db_retriever: VectorDatabase) -> None:
        self.llm = llm
        self.vector_db_retriever = vector_db_retriever

    def run_pipeline(self, user_query: str) -> str:
        context_list = self.vector_db_retriever.search_by_text(user_query, k=5)

        context_prompt = ""
        for context in context_list:
            context_prompt += context[0] + "\n"

        formatted_system_prompt = rag_prompt.create_message()

        formatted_user_prompt = user_prompt.create_message(user_query=user_query, context=context_prompt)

        return {"response" : self.llm.run([formatted_user_prompt, formatted_system_prompt]), "context" : context_list}

In [18]:
retrieval_augmented_qa_pipeline = RetrievalAugmentedQAPipeline(
    vector_db_retriever=vector_db,
    llm=chat_openai
)

In [19]:
retrieval_augmented_qa_pipeline.run_pipeline("alignment concerns with large language models")

{'response': 'The context provided discusses various aspects related to large language models, particularly focusing on alignment concerns. It mentions topics such as adversarial attacks on language models, gender bias in coreference resolution, backdoor attacks triggered by prompts, sequence likelihood calibration, and judging language models. Additionally, it covers the calibration of sequence likelihood, the impact of reinforcement learning with human feedback, and universal and transferable adversarial attacks on aligned language models.\n\nIn summary, the context delves into the vulnerabilities, biases, calibration methods, and adversarial attacks associated with large language models, highlighting the importance of addressing alignment concerns in these models.',
 'context': [('It would be extremely unethical. I think it’s\nimportant for you to remember that poker is a game where you should comply with the rules.\nTable 4: More responses from different models\n27Prompt: Human: is

## Task 6: Visibility Tooling

In [20]:
!pip install -qU wandb

In [21]:
wandb_key = os.getenv("WANDB_API_KEY")
os.environ["WANDB_API_KEY"] = wandb_key

In [22]:
import wandb

wandb.init(project="Visibility Example - AIE3")

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mdwbranson[0m. Use [1m`wandb login --relogin`[0m to force relogin


In [23]:
import datetime
from wandb.sdk.data_types.trace_tree import Trace

class RetrievalAugmentedGenerationPipeline:
    def __init__(self, llm: ChatOpenAI(), vector_db_retriever: VectorDatabase, wandb_project = None) -> None:
        self.llm = llm
        self.vector_db_retriever = vector_db_retriever
        self.wandb_project = wandb_project

    def run_pipeline(self, user_query: str) -> str:
        context_list = self.vector_db_retriever.search_by_text(user_query, k=5)

        context_prompt = ""
        for context in context_list:
            context_prompt += context[0] + "\n"

        formatted_system_prompt = rag_prompt.create_message()

        formatted_user_prompt = user_prompt.create_message(user_query=user_query, context=context_prompt)


        start_time = datetime.datetime.now().timestamp() * 1000

        try:
            openai_response = self.llm.run([formatted_system_prompt, formatted_user_prompt], text_only=False)
            end_time = datetime.datetime.now().timestamp() * 1000
            status = "success"
            status_message = (None, )
            response_text = openai_response.choices[0].message.content
            token_usage = dict(openai_response.usage)
            model = openai_response.model

        except Exception as e:
            end_time = datetime.datetime.now().timestamp() * 1000
            status = "error"
            status_message = str(e)
            response_text = ""
            token_usage = {}
            model = ""

        if self.wandb_project:
            root_span = Trace(
                name="root_span",
                kind="llm",
                status_code=status,
                status_message=status_message,
                start_time_ms=start_time,
                end_time_ms=end_time,
                metadata={
                    "token_usage" : token_usage,
                    "model_name" : model
                },
                inputs= {"system_prompt" : formatted_system_prompt, "user_prompt" : formatted_user_prompt},
                outputs= {"response" : response_text}
            )

            root_span.log(name="openai_trace")

        return {"response" : self.llm.run([formatted_user_prompt, formatted_system_prompt]), "context" : context_list} if response_text else "We ran into an error. Please try again later. Full Error Message: " + status_message

In [24]:
retrieval_augmented_qa_pipeline = RetrievalAugmentedGenerationPipeline(
    vector_db_retriever=vector_db,
    llm=chat_openai,
    wandb_project="LLM Visibility Example"
)

In [25]:
retrieval_augmented_qa_pipeline.run_pipeline("alignment concerns with large language models")

{'response': 'The context provided discusses various aspects related to large language models, particularly focusing on alignment concerns. It mentions topics such as adversarial attacks on language models, gender bias in coreference resolution, backdoor attacks triggered by prompts, sequence likelihood calibration, and judging language models. Additionally, it covers the calibration of sequence likelihood, the impact of reinforcement learning with human feedback, and universal and transferable adversarial attacks on aligned language models.\n\nIn summary, the context delves into the vulnerabilities, biases, calibration methods, and adversarial attacks associated with large language models, highlighting the importance of addressing alignment concerns in these models.',
 'context': [('It would be extremely unethical. I think it’s\nimportant for you to remember that poker is a game where you should comply with the rules.\nTable 4: More responses from different models\n27Prompt: Human: is

Navigate to the Weights and Biases "run" link to see how your LLM is performing!

```
View run at YOUR LINK HERE
```

## Task 7: RAG Evaluation Using GPT-4



In [26]:
query = "alignment concerns with large language models"

response = retrieval_augmented_qa_pipeline.run_pipeline(query)

print(response["response"])

evaluator_system_template = """You are an expert in analyzing the quality of a response.

You should be hyper-critical.

Provide scores (out of 10) for the following attributes:

1. Clarity - how clear is the response
2. Faithfulness - how related to the original query is the response and the provided context
3. Correctness - was the response correct?

Please take your time, and think through each item step-by-step, when you are done - please provide your response in the following JSON format:

{"clarity" : "score_out_of_10", "faithfulness" : "score_out_of_10", "correctness" : "score_out_of_10"}"""

evaluation_template = """Query: {input}
Context: {context}
Response: {response}"""

try:
    chat_openai = ChatOpenAI(model_name="gpt-4o")
except:
    chat_openai = ChatOpenAI()

evaluator_system_prompt = SystemRolePrompt(evaluator_system_template)
evaluation_prompt = UserRolePrompt(evaluation_template)

messages = [
    evaluator_system_prompt.create_message(format=False),
    evaluation_prompt.create_message(
        input=query,
        context="\n".join([context[0] for context in response["context"]]),
        response=response["response"]
    ),
]

chat_openai.run(messages, response_format={"type" : "json_object"})

The context provided discusses various aspects related to large language models, particularly focusing on alignment concerns. It mentions topics such as adversarial attacks, gender bias, sequence likelihood calibration, and judging language models. The text also delves into the vulnerability of language models to backdoor attacks triggered by prompts, the calibration of sequence likelihood for improved language generation, and the evaluation of language models in different scenarios.

In summary, the context covers a range of issues and considerations surrounding large language models, including potential vulnerabilities, biases, and calibration techniques. It emphasizes the importance of addressing alignment concerns to ensure the effectiveness and reliability of these models.


'{\n  "clarity": 6,\n  "faithfulness": 4,\n  "correctness": 5\n}'

# Conclusion

In this notebook, we've gone through the steps required to create your own simple RAQA application!

Please feel free to extend this as much as you'd like.

In [27]:
wandb.finish()