# University Presence - Retrieval Augmented Generation Model Workshop

# Introduction to Retrieval-Augmented Generation (RAG)

## Overview
Retrieval-Augmented Generation (RAG) is an advanced framework that combines retrieval-based and generation-based approaches to enhance the performance of natural language processing (NLP) tasks. It leverages the strengths of both methods to provide more accurate, relevant, and contextually appropriate responses. This hybrid approach is particularly powerful in scenarios where vast amounts of information need to be efficiently accessed and summarized, such as in question answering systems, customer support, and knowledge management.

## Components of RAG
1. **Document Retrieval:**
   - The first step involves retrieving relevant documents or passages from a large corpus of text. This is typically done using vector databases and similarity search techniques. The aim is to narrow down the vast information to a few relevant pieces that can be further processed.

2. **Embedding Models:**
   - Embedding models transform text data into numerical vectors that capture semantic meanings. Various embedding techniques can be used, such as character-level, word-level, sentence-level, and document-level embeddings. The choice of embedding type depends on the specific use case and the desired level of granularity.

3. **Vector Database (e.g., Pinecone):**
   - A vector database stores and indexes these embeddings, enabling efficient similarity searches. Pinecone, for instance, is a scalable vector database that supports high-dimensional vector search, making it ideal for real-time retrieval tasks in RAG systems.

4. **Language Model (LLM) Prompting:**
   - Once the relevant documents are retrieved, a language model (such as GPT-3.5 or similar) generates a response based on the retrieved context and the user's query. This step involves prompt engineering to guide the model in producing high-quality outputs.

## How RAG Works
1. ⚙️ **Query Processing:**
   - The user inputs a query. This query is embedded using an embedding model to create a vector representation.
   
2. 🚚 **Retrieval Step:** 
   - The query vector is used to search the vector database, retrieving the most similar documents or passages. This narrows down the information to the most relevant pieces.

3. 🏗️ **Generation Step:** 
   - The retrieved documents are fed into a language model along with the original query. The language model uses this context to generate a coherent and relevant response.

4. 🎤 **Response Delivery:** 
   - The generated response is presented to the user, providing a comprehensive answer or summary based on the combined knowledge of the retrieved documents and the language model's generative capabilities.


---
### ❗IMPORTANT INSTRUCTIONS ❗

Wherever you see *** in the code, you must replace it with your answer

Answers are not directly given to you, just like any developer, you should use the web and find resources if you get a bit stumped. A good place to start looking would be here: 

[Quickstart](https://python.langchain.com/v0.1/docs/use_cases/question_answering/quickstart/)

You can always ask us for help, but we recommend you try to find the answers yourself first! Learning is not a one-shot process!

Another way of reaching resources is looking at the documentation of the libraries and functions that we are calling.

---

### 📚 Import necessary Libraries 

In [1]:
# Imports for Generic Typing
from langchain_core.language_models.chat_models import BaseChatModel
from langchain_core.prompts.prompt import PromptTemplate
from langchain_core.vectorstores import VectorStoreRetriever

from langchain_chroma import Chroma

# Imports for a document loading, tokenizing, parsing, etc.
from langchain_community.document_loaders.generic import GenericLoader
from langchain_community.document_loaders.parsers import PDFMinerParser

from langchain_text_splitters import Language
from langchain_text_splitters import RecursiveCharacterTextSplitter

from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Imports for OpenAI ChatGPT
from langchain_openai import AzureChatOpenAI
from langchain_openai import  AzureOpenAIEmbeddings
from pydantic.v1 import SecretStr

import os

from dotenv import load_dotenv

load_dotenv()

azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
openai_key = os.getenv("AZURE_OPENAI_API")
openai_key_secret = SecretStr(openai_key) if openai_key else None 
azure_deployment = os.getenv("AZURE_DEPLOYMENT_NAME")
azure_emb_deployment = os.getenv("AZURE_EMBEDDING_DEPLOYMENT_NAME")
azure_api_version = os.getenv("AZURE_API_VERSION")



### :battery: Load LLM Models 

In [None]:
chatGPT = AzureChatOpenAI(
    azure_deployment=azure_deployment,
    azure_endpoint=azure_endpoint,
    api_version=azure_api_version or '',
    api_key=openai_key_secret,
)

### :fortune_cookie:Fetch the configuration for the group topic

In [None]:
def fetch_topic_objects(topic:str) -> dict:

    '''    
    Parameters:
        topic - str: Provides the key topic for which we will load all the necessary variables for the workshop.
    
    Explanation: 
    
    This function returns a *dictionary* with the keys and value types:
        {
            'loader': GenericLoader
            'text_splitter': TextSplitter,
            'prompt_template': String,
            'question' : String,
        },

    This is done to faciliate the configuration of multiple topics for the workshop and facilitate future extensions with little overhead
    and little technical skill required to expand.
    '''

    topic_parameters = {
        'health' : 
            {
                'loader': GenericLoader.from_filesystem(
                        "./documents/health",
                        glob="*",
                        suffixes=[".pdf"],
                        parser=PDFMinerParser(),
                    ),
                'text_splitter': RecursiveCharacterTextSplitter(
                        chunk_size=1000, 
                        chunk_overlap=200, 
                        add_start_index=True
                    ),
                'prompt_template': """
                    You are a dutch analyst looking working at a research institute company studying the public health sector in the netherlands.
                    {context}
                    Question: {question}
                    Answer the question provided by the human.
                    """,
                    'question' : 'Can you give me a non-technical summary of the research findings?'
            },
        'beverage': 
            {
                'loader': GenericLoader.from_filesystem(
                        "./documents/beverage",
                        glob="*",
                        suffixes=[".pdf"],
                        parser=PDFMinerParser(),
                    ),
                'text_splitter': RecursiveCharacterTextSplitter(
                        chunk_size=1000, 
                        chunk_overlap=200, 
                        add_start_index=True
                    ),
                'prompt_template': """
                    You are a senior executive in the food industry analyzing the performane of a beverage company.
                    {context}
                    Question: {question}
                    Answer the question provided by the human.
                    """,
                    'question' : 'Can you give me a non-technical summary of the research findings?'
            },
        'tech': 
            {
                'loader': GenericLoader.from_filesystem(
                        "./documents/tech",
                        glob="*",
                        suffixes=[".pdf"],
                        parser=PDFMinerParser(),
                    ),
                'text_splitter': RecursiveCharacterTextSplitter(
                        chunk_size=1000, 
                        chunk_overlap=200, 
                        add_start_index=True
                    ),
                'prompt_template': """
                    You are a senior consultatnt in a tech company analyzing the impact of AI worldwide and its applications.
                    {context}
                    Question: {question}
                    Answer the question provided by the human.
                    """,
                    'question' : 'Can you give me a short summary of the findings?'
            },
        'finance' :
            {
                'loader': GenericLoader.from_filesystem(
                        './documents/finance',
                        glob="*",
                        suffixes=[".pdf"],
                        parser=PDFMinerParser(),
                    ),
                'text_splitter': RecursiveCharacterTextSplitter(
                        chunk_size=1000, 
                        chunk_overlap=200, 
                        add_start_index=True
                    ),
                'prompt_template': """
                    You are a senior financial analyst analyzing the below document and having a conversation with a human.
                    {context}
                    Question: {question}
                    Answer the question provided by the human.
                    """,
                'question' : 'Can you give me a summary of the financial document?'                 
            },
    }

    try: 
        return topic_parameters[topic]
    except KeyError:
        message = f'Topic: {topic} is not an valid topic, verify the value or contact a member of the team.'
        raise KeyError(message)


document_topic = ***
pipeline_parameters = fetch_topic_objects(document_topic)


### :clipboard: Create Document Loaders, Load Documents and Process them

In [None]:
# Create Document Loaders 
loader = pipeline_parameters['loader']
document_splitter = pipeline_parameters['text_splitter']

# Load Documents
documents = loader.load()

# Process documents chunk them appropriately
split_documments = document_splitter.split_documents(documents)

In [None]:
# ❗Hint: Look at the imports
embedding_method = ***(
    azure_deployment=azure_emb_deployment,
    azure_endpoint=azure_endpoint,
    api_version=azure_api_version,
    api_key=openai_key_secret,
)

In [None]:
vectorDB = Chroma.from_documents(documents=split_documments, embedding=embedding_method)

retriever = vectorDB.as_retriever(
    search_type= ***
)

prompt = PromptTemplate(
    input_variables= ****,
    optional_variables=["chat_history"],
    template= pipeline_parameters['prompt_template']
)

### 👷‍♂️ Create LangChain Pipeline 

In [None]:
def create_rag_langchain_pipeline (
        retriever:VectorStoreRetriever,
        prompt: PromptTemplate,
        llm: BaseChatModel,
    ):

    def format_docs(docs):
        return "\n\n".join(doc.page_content for doc in docs)

    rag_chain = (
        {"context": retriever | format_docs, "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )

    return rag_chain


pipeline = create_rag_langchain_pipeline(
        retriever= retriever, 
        prompt = prompt, 
        llm = chatGPT
    )

### 🎤 Generate a response using a LLM

In [None]:
question = pipeline_parameters[***]

print(f'We just asked the following: \n {question}')

result = pipeline.invoke(question)

print(result)

Great! Now that we have a working RAG (Retrieval-Augmented Generation) Model, we will want our customers/users to be able to retrieve documents freely using natural language.

However, it is often the case that these RAG Models perform better when given prompts that provide additional context and guidance in the role they are trying to fulfill.

# Understanding Prompt Engineering
Prompt engineering is the practice of designing and refining the input prompts given to a language model to achieve the desired output. Essentially, it involves crafting questions or statements in a way that leverages the strengths of the language model, guiding it to produce more accurate, relevant, and contextually appropriate responses. This is crucial for improving the performance of RAG systems, where the quality of retrieved documents and generated responses can directly impact user satisfaction and correctness.

We will now be looking at the benefits of prompt engineering and how we can modify the following query to enhance our RAG Model performance.

## ⚙️ Benefits of Prompt Engineering ⚙️

Consinder the query: "Tell me about climate change."

it is quite an open-ended and ambigious prompt, leaving the opportunity for the RAG model to hallucinate and generat potentially misleading, useless or incorrect responses. We can _mitigate_ this by employing certain technniques in our prompts.

### 💡 Contextual Understanding:

Prompts that provide specific context can help the language model better understand the user's intent. This reduces ambiguity and ensures that the model retrieves and generates content that is closely aligned with the user's needs.

__Modified Query__:

"Tell me about climate change and its impact on coastal cities."

### 📖 Enhanced Relevance:

By framing prompts to include relevant details, users can improve the relevance of the documents retrieved from the vector database. This ensures that the information presented is more pertinent to the query.

__Modified Query__:

"Tell me about the effects of climate change on agriculture in North America."

### 👩‍🏫 Role Specification:

Defining the role of the model within the prompt (e.g., "As an expert in history, summarize the events of World War II") can help the model generate responses that are more authoritative and tailored to the specified role.

__Modified Query__:

"As an environmental scientist, explain the causes and effects of climate change."

### 🏗️ Guidance and Structure: 

Structured prompts (e.g., "Given the following context, provide a summary: [context]") guide the model on how to approach the response, which can lead to more coherent and well-organized outputs.

__Modified Query__:

"Given the following context, provide a summary of the main points about climate change: [context]"

### :abacus: Bias Mitigation:

Thoughtfully crafted prompts can help mitigate model biases by steering the model towards neutral and objective language, particularly in sensitive or controversial topics.

__Modified Query__:
"Provide an objective overview of climate change, including its causes, effects, and potential solutions."

# 🤖 Adding Prompt Engineering to our RAG Model

It would be too much to ask from our users/customers to apply all these techniques themselves when they are querying the system for information. Therefore, many applications that employ RAG models do some additional preprocessing to user prompts to leverage the benefits of Prompt Engineering.

In this case, your client wants to leverage a RAG Model to behave as a Subject Matter Expert (SME), who is capable of explaining the questions to junior analysts. Their goal is to reduce the amount of time (and therefore money) they spend on training new talent, such that they can quickly get up to speed in their new roles, by leveraging AI, they are looking to expedite knowledge acquisition.

Given the topic that has been assigned/selected to you, think of the following properties to add to your query:
- 👩‍🏫 Role Specification
- 🏗️ Guidance and Structure 
- 🧮 Bias Mitigation

Think carefully of the language you use, the instructions and structure you use, and how you specify the role of your prompt to achieve the correct behavior.

Realize that the output of a model is the input to another, how can you ensure that the information is not lost? Or that the model responds appropriately?

Here is an example of what not to do:

Example:
<pre>
<strong> User Query </strong>:

    Can you tell me who the most famous person in the world is?

<strong> Chain Link 1 Prompt </strong>:

    You are a celebrity afficionado, up to date to the latest trends and famous people world wide. 
    Answer the following question:
    {query}

<strong> Output </strong>:

    The most famous man in the world is Dwayne the Rock Jonhson!

<strong> Chain Link 2 Prompt - Poor Structure </strong>:

    Is this prompt structured and does it contain a lot of information?
    {output}

<strong> Final Response </strong>:

    No, this prompt does not contain a lot of structure, additionally, it only states a single fact.

</pre>

In this example the _Chain Link 2 Prompt_ was poorly designed, and therefore the output was modified to answer 
whether the prompt itself was structured - this is not the behaviour we want.

Instead a better prompt design would be:

<pre>
<strong> Chain Link 2 Prompt - Good Structure </strong>

    Given the following information: 
    {output}

    Rewrite the response such that:
    You indicate the subject, and if known, their profession.
    All factual information is in bullet points
    You include a short final analysis in which you indicate whether the response is informational or not.

</pre>

In this example _Chain Link 2 Prompt_ was designed correctly, asking the large language model to modify the contents while restructuring the response.

In [None]:
# Create a prompt which transforms the answer provided given a role.
role_prompt_template = """ 
' *** Add your role description here *** ':
{llm_response}
"""

role_prompt = PromptTemplate(
    input_variables=['llm_response'],
    template=role_prompt_template,
)

composed_pipeline_role =  ( 
    {"llm_response": pipeline} 
    | *** 
    | *** 
    | StrOutputParser()
    ) 

result = composed_pipeline_role.invoke(question)
print(result)

In [None]:
# Create a prompt which transforms the answer provided given a structure.
structure_prompt_template = """ 
'Add your structure description here':
{llm_response}
"""

structure_prompt = PromptTemplate(
    input_variables=***,
    template=***,
)

composed_pipeline_structure =  ( 
    {"llm_response": pipeline}
    | ***
    | *** 
    | StrOutputParser()
    ) 

result = composed_pipeline_structure.invoke(question)
print(result)

In [None]:
# Create a prompt which transforms the answer provided and verifies whether there are any biases.
bias_prompt_template = ***

bias_prompt = PromptTemplate(
    input_variables=***,
    template=***,
)

composed_pipeline_bias =  ( 
    {"llm_response": pipeline}
    | ***
    | *** 
    | ***
    ) 

result = composed_pipeline_bias.invoke(question)
print(result)

#  ⛓️ Chaining Prompts - Go Beyond a Single Link!

As we have seen in the previous exercise, we have been able to chain two pipelines together.

However, would we really be proud to call our two chain links a chain? 

In the following exercise, we will do just this! We will build a chain we can be proud of, where all the previous components are connected to each other - each modifying and performing their task and providing the output to the following chain link!

In [None]:
# Create a prompt which transforms the answer provided given a role.
role_prompt_template = ***

role_prompt = PromptTemplate(
    input_variables=***,
    template=***,
)

composed_pipeline_role =  ( 
    {"llm_response": pipeline} 
    | *** 
    | *** 
    | ***
    ) 

result = composed_pipeline_role.invoke(question)
print(result)

In [None]:
# Create a prompt which transforms the answer provided given a structure.
structure_prompt_template = ***

structure_prompt = PromptTemplate(
    input_variables=["llm_response_sme"],
    template= ***,
)

composed_pipeline_role_structure =  ( 
    {"llm_response_sme": composed_pipeline_role}
    | ***
    | *** 
    | ***
    ) 

result = composed_pipeline_role_structure.invoke(question)
print(result)

In [None]:
# Create a prompt which transforms the answer provided and verifies whether there are any biases.
bias_prompt_template = ***

bias_prompt = PromptTemplate(
    input_variables=["llm_response_sme_struct"],
    template=***,
)

composed_pipeline_role_structure_bias =  ( 
    { *** : ***}
    | ***
    | *** 
    | ***
    ) 

result = composed_pipeline_role_structure_bias.invoke(question)
print(result)