**Ensure you successfully run and test a few models in __llm_testing.ipynb__ before implementing a RAG pattern**

### RAG Pattern Overview 

RAG stands for Retrieval Augmented Generation. A RAG pattern is essentially a way to augment our models response by grounding it in some sort of information retrieved from a database (typically a chunk of a document). 

A general flow for the RAG pattern is; 
1. A user enters a question or a prompt 
2. The users query is searched for in a vector database 
3. The documents retrieved from the vector database are passed to GenAI alongside a prompt
    i. (for example, "Your job is to summarize the chunks of the documents")
4. Our generative AI provides us a response grounded in the document. 

#### 1. Create the model

In [2]:
from ibm_watsonx_ai.foundation_models import Model
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from dotenv import load_dotenv
import os

## DO NOT SHARE .env ANYWHERE
load_dotenv()
project_id = os.getenv('PROJECT_ID')
api_key=os.getenv('GENAI_API_KEY')
url=os.getenv('GENAI_URL')

In [3]:
# Modify your prompt below. 
prompt = '''
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
Why is the sky blue?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
'''

# Tokens can be thought of as part words - almost like syllables. 
# Larger MAX_NEW_TOKENS will allow the models to generate longer responses.
# Each model has a unique context window for tokens - for llama3 it is 8000 tokens. 
generate_params = {
            GenParams.MAX_NEW_TOKENS: 300
        }

## Model choices:
# - ibm/granite-34b-code-instruct
# - meta-llama/llama-3-70b-instruct
# - ibm/granite-8b-code-instruct
# - meta-llama/llama-3-8b-instruct
# - ibm/granite-13b-chat-v2

model = Model(
    model_id="ibm/granite-13b-chat-v2",
    params=generate_params,
    credentials={
        "apikey": f"{api_key}",
        "url": f"{url}"
    },
    project_id=project_id
    )


## For testing
# generated_response = model.generate(prompt=prompt)
# print(generated_response['results'][0]['generated_text'])

The sky appears blue due to a phenomenon called Rayleigh scattering. When sunlight reaches Earth's atmosphere, it is made up of different wavelengths, including red, orange, yellow, green, blue, and violet. Shorter-wavelength light, such as blue and violet, is scattered in all directions more than longer-wavelength light. As a result, the scattered blue and violet light reach our eyes from all directions, making the sky appear blue. This scattering also causes the sun to appear yellow during sunrise and sunset, as the shorter-wavelength light is scattered away from our line of sight, leaving the longer-wavelength light, like red and orange, to dominate.


#### 2. Set up vector DB

- The vector DB will store all your documents after they are chunked 
- It will be searched by the LLM to find appropriate responses to queries 

In [None]:
import chromadb

## We will be using ChromaDB but feel free to try other vector DBs such as FAISS, Milvus, ElasticSearch...etc.

# Instantiate database client 
chroma_client = chromadb.Client()

# Create a collection to store the vectors (or get collection if it already exists)
# NOTE: If you want to test with a fresh collection every time, use delete_collection
collection = chroma_client.get_or_create_collection(name="my_collection")

#### 3. Create embeddings

- Embeddings are vectors associated with tokens from a text
- An embedding function will take text chunks as input and output vectors to represent the tokens
- Vectors are fed into the vector DB

**Steps**

1. Read pdf / document
2. Chunk document 
3. Feed chunks to embedding function
4. Put embeddings into vector DB

**Lines for you to code are marked with <-- TODO -->**

**All TODOs can be done with the existing imports. Read the documentation and resources to find what functions to use. If you would like to do your own implementation or use other libraries, feel free**

In [4]:
import os

from ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames as EmbedParams
from langchain_ibm import WatsonxEmbeddings

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

import chromadb
from langchain_chroma import Chroma

## Set up embedding function
# embedding params
embed_params = {
            EmbedParams.TRUNCATE_INPUT_TOKENS: 3,
            EmbedParams.RETURN_OPTIONS: {
            'input_text': True
            }
        }

# embedding function
watsonx_embedding = WatsonxEmbeddings(
    model_id="ibm/slate-125m-english-rtrvr",
    url=f"{url}",
    apikey=f"{api_key}",
    project_id=project_id,
    params=embed_params,
)

### Extract text from pdfs
## 1. Parse PDFs 

test_file = "da-vinci-test.pdf"  # File to test your rag pattern

all_files = os.listdir(path='./pdfs')  # List of all PDF files to use in your implementation (construction codes)


## Load the pdf file (start with test file)

# <-- TODO: Create loader -->

document = loader.load()  # loads file from loader


## 2. chunk text 

## Split the text
# <-- TODO: Create text splitter -->

# <-- TODO: Use text splitter to chunk 'document' -->



## 3. create embeddings and store in vector DB

# Create embeddings and store in collection
chroma_db = Chroma.from_documents(
    documents=#<-- TODO: Put appropriate var -->,
    embedding=#<-- TODO: Put appropriate var -->,
    collection_name="my_collection",
    client=chroma_client,
)


#### TODO: Modify steps 1 to 3 so that it works for all_files





['Australia_code_vol_1.pdf', 'Australia_code_vol_2.pdf', 'Australia_code_vol_3.pdf', 'Australia_housing_provisions.pdf', 'Building Control Act 1989 Singapore.pdf', 'Building Control Regulations 2003 Singapore.pdf', 'HK_demolition.pdf', 'HK_escalators.pdf', 'HK_external_maintenance.pdf', 'HK_firefighting.pdf', 'HK_fire_escape.pdf', 'HK_fire_resistance.pdf', 'HK_fire_safety.pdf', 'HK_foundations.pdf', 'HK_glass.pdf', 'HK_site_supervision.pdf', 'HK_steel.pdf', 'HK_thermal_transfer.pdf', 'HK_wind.pdf', 'Japan_building_standard_law.pdf', 'London_building_act.pdf', 'London_building_regulations.pdf', 'Los Angeles County, CA Code of Ordinances.pdf', 'Netherlands-2011-0212-000-EN.pdf', 'NYC_local_laws_2023.pdf', 'Paris_regulations.pdf', 'singapore_building_control_buildability_regulations.pdf', 'singapore_building_control_env_sus_regulations.pdf', 'singapore_circular2017.pdf', 'singapore_cop2017.pdf', 'Toronto_Building Code Act, 1992, S.O. 1992, c. 23.pdf', 'Toronto_O. Reg. 332_12_ BUILDING COD

#### 4. Combine elements into RAG pattern

RAG patterns consist of:
- A query
- A model to answer the query
- An engineered prompt for the model which asks for the query to be answered using the provided context 
- A retriever to retrieve the appropriate documents from the vector DB using the original query


Steps:

1. Query is asked
2. Query is input to the engineered prompt
3. Engineered prompt is asked to the model
4. Vector DB is searched to find best answer from the provided context (done by retriever)
5. Vector DB returns new information to the model
6. Model generates answer to original query using the information from the vector DB 


In [None]:
# Helper function to combine list of docs from vector DB into a single doc
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


# Function to combine all elements into a complete RAG pattern
# Returns RAG response from LLM
def rag_query(query, retriever, model)->str:
    
    # Template for engineered prompt asking for a context-based answer
    template = f"""You are an assistant for question-answering tasks. 
        Use the following pieces of retrieved context to answer the question. 
        If you don't know the answer, just say that you don't know. 
        Use three sentences maximum and keep the answer concise.
        Question: {query} 
        Context: {format_docs(retriever.invoke(query))} 
        Answer:
        """   
    
    
    ## Generate a context based response from the LLM
    ## <-- TODO: Generate a response by feeding the engineered prompt to the LLM-->
    ## <-- TODO: Return response string -->


### Run your implemented RAG pattern!

# Instantiates a retriever from the vector DB to find appropriate context
retriever = chroma_db.as_retriever()

# Sample question for test pdf
# Verify the answer by reading page 3 in the test pdf
question = "What happened to art during the Renaissance?"
response = rag_query(question, retriever, model)

print(response)



### CONGRATULATIONS! ###

You have created your first RAG pattern (or maybe you've done this before)! Now take this knowledge and see where you can take it to solve the brief. Good luck :D