# Homework

The idea of this homework is to create a whole RAG system(ingestion and inference pipeline) for specific data and task. 
Important parts that need to be implemented are:
- Loader and chunker for the data
- Embedder for the data
- LLM for the RAG
- Vector Store(Database) for the data
- Retriever for the data
- Prompt Template for the RAG
- RetrieverQA chain for the RAG

Play around and try different things, the goal is to have fun and learn something new.


### Requirements

In [17]:
# Installing the required packages
!pip install langchain langchain_community boto3



In [2]:
# Importing the required packages
import os

from langchain.chains import RetrievalQA
from langchain.document_loaders import PyMuPDFLoader
from langchain.memory import ConversationBufferWindowMemory
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.chat_models import BedrockChat
from langchain_community.embeddings import BedrockEmbeddings
from langchain_community.vectorstores import FAISS

### Loader and Chunker

In [3]:
# Defining the configuration
REGION_NAME = "us-east-1"
CREDENTIALS_PROFILE_NAME = "MLEngineers"
EMBEDDER_MODEL_ID = "amazon.titan-embed-text-v2:0"
EMBEDDER_MODEL_KWARGS = {
    "dimensions": 512,
    "normalize": True
}

CHUNK_SIZE = 2000
CHUNK_OVERLAP = 100

DATA_PATHS = [
    "bimby.pdf",
    "gordon.pdf"
]

INSTRUCTION_VECTOR_STORE_PATH = "./instruction_vector_database/"
RECIPE_VECTOR_STORE_PATH = "./recipe_vector_database/"

In [4]:
# Defining the chunker
splitter = RecursiveCharacterTextSplitter(
chunk_size=CHUNK_SIZE,
chunk_overlap=CHUNK_OVERLAP
)

In [5]:
# Creating chunks from the documents
global_chunks = []
for data_path in DATA_PATHS:
    loader = PyMuPDFLoader(os.path.join(os.getcwd(), data_path))
    docs = loader.load()
    chunks = splitter.split_documents(docs)
    global_chunks.append(chunks)

### Embedder

In [6]:
# Creating the embedder
embedder = BedrockEmbeddings(
    model_id=EMBEDDER_MODEL_ID,
    model_kwargs=EMBEDDER_MODEL_KWARGS,
    region_name=REGION_NAME,
    #credentials_profile_name=CREDENTIALS_PROFILE_NAME
)

  embedder = BedrockEmbeddings(


### Vector Store (Database)

In [8]:
# Creating the vector store
instruction_vector_store = FAISS.from_documents(documents=global_chunks[0], embedding=embedder)
recipe_vector_store = FAISS.from_documents(documents=global_chunks[1], embedding=embedder)

instruction_vector_store.save_local(INSTRUCTION_VECTOR_STORE_PATH)
recipe_vector_store.save_local(RECIPE_VECTOR_STORE_PATH)

### Prompt Template

In [9]:
LLM_MODEL_ID = "anthropic.claude-3-sonnet-20240229-v1:0" # anthropic.claude-3-haiku-20240307-v1:0 or anthropic.claude-3-sonnet-20240229-v1:0 or anthropic.claude-v2:1
LLM_MODEL_KWARGS = {
    "max_tokens": 4096,
    "temperature": 0.1,
    "top_p": 1,
    "top_k": 250,
    "stop_sequences": ["\n\nHuman"]
}

SEARCH_TYPE = "similarity"
RETRIEVER_KWARGS = {
    "k": 5
}

INPUT_KEY = "question"
MEMORY_KEY = "history"
INPUT_VARIABLES = ["context", "history", "question"]

# Inside in the prompt template, you can play with the system's persona, the context, the history, and the question.
PROMPT_TEMPLATE = """
System: You are a helpful, respectful and honest assistant for learning how to make dishes like Gordon Ramsay using a TM6 (cutting-edge kitchen appliance).

The assistant has the following characteristics:
- Knowledgeable about all the available Gordon Ramsay's recipes.
- Skilled in answering questions about TM6's safety instructions, how to use, settings and cleaning maintenance.

Always answer as helpfully as possible, while being safe.
Please ensure that your responses are socially unbiased and positive in nature.
When addressing the user, always base your responses on the context provided and the previous chat history if its available.
If you are unsure about the answer, please let the user know.
If the user asks something that is not related to Cooking with a TM6, please let the user know.
Human:
----------
<context>
{context}
</context>
----------
<history>
{history}
</history>
----------
<question>
{question}
</question>
----------
Assistant:
"""

### LLM

In [10]:
# Creating the LLM and Embedder models
#llm = BedrockChat(region_name=REGION_NAME, credentials_profile_name=CREDENTIALS_PROFILE_NAME,model_id=LLM_MODEL_ID, model_kwargs=LLM_MODEL_KWARGS)
llm = BedrockChat(region_name=REGION_NAME, model_id=LLM_MODEL_ID, model_kwargs=LLM_MODEL_KWARGS)

  llm = BedrockChat(region_name=REGION_NAME, model_id=LLM_MODEL_ID, model_kwargs=LLM_MODEL_KWARGS)


### MergerRetriever

In [11]:
from langchain.retrievers import MergerRetriever

# Loading the vector store and creating retriever
instruction_vector_store = FAISS.load_local(INSTRUCTION_VECTOR_STORE_PATH, embeddings=embedder, allow_dangerous_deserialization=True)
recipe_vector_store = FAISS.load_local(RECIPE_VECTOR_STORE_PATH, embeddings=embedder, allow_dangerous_deserialization=True)


instruction_retriever = instruction_vector_store.as_retriever(search_type=SEARCH_TYPE, **RETRIEVER_KWARGS)
recipe_retriever = recipe_vector_store.as_retriever(search_type=SEARCH_TYPE, **RETRIEVER_KWARGS)

lort = MergerRetriever(retrievers = [instruction_retriever, recipe_retriever])

In [12]:
# Creating the memory and the prompt template
memory = ConversationBufferWindowMemory(memory_key=MEMORY_KEY, input_key=INPUT_KEY, k=3, ai_prefix="Assistant")
prompt = PromptTemplate(template=PROMPT_TEMPLATE, input_variables=INPUT_VARIABLES)

### RetrievalQA chain

In [13]:
# Creating the Chain for usage
chain = RetrievalQA.from_chain_type(
            llm=llm,
            retriever=lort,
            verbose=True,
            return_source_documents=True,
            chain_type_kwargs={
                "prompt": prompt,
                "memory": memory
            }
)
response = chain.invoke("What is a TM6?")



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [14]:
# Checking the response
response

{'query': 'What is a TM6?',
 'result': "The TM6 is Thermomix's latest model of their cutting-edge kitchen appliance. It is a powerful all-in-one device that can perform a wide range of cooking functions like chopping, blending, kneading, cooking, steaming, weighing ingredients, and more. \n\nSome key features of the Thermomix TM6:\n\n- High-speed blending up to 10,200 rpm\n- Precise heating from 37°C to 120°C\n- Integrated digital scales\n- Guided cooking functions with recipes built-in\n- WiFi connectivity for updates and recipe sharing\n- Large 3.3L stainless steel mixing bowl\n- Easy to use color touchscreen interface\n\nThe TM6 is designed to simplify cooking by combining multiple appliances into one compact unit. It allows home cooks to prepare restaurant-quality dishes efficiently using Thermomix's curated recipes and automated cooking programs. The instructions provided cover how to safely use and maintain the TM6 for optimal performance.",
 'source_documents': [Document(metadat

### Prompt Engineering

In [15]:
# Printing the answer
print(response["result"])

The TM6 is Thermomix's latest model of their cutting-edge kitchen appliance. It is a powerful all-in-one device that can perform a wide range of cooking functions like chopping, blending, kneading, cooking, steaming, weighing ingredients, and more. 

Some key features of the Thermomix TM6:

- High-speed blending up to 10,200 rpm
- Precise heating from 37°C to 120°C
- Integrated digital scales
- Guided cooking functions with recipes built-in
- WiFi connectivity for updates and recipe sharing
- Large 3.3L stainless steel mixing bowl
- Easy to use color touchscreen interface

The TM6 is designed to simplify cooking by combining multiple appliances into one compact unit. It allows home cooks to prepare restaurant-quality dishes efficiently using Thermomix's curated recipes and automated cooking programs. The instructions provided cover how to safely use and maintain the TM6 for optimal performance.


In [16]:
# Getting the documents used
for doc in response["source_documents"]:
    print("\n################################################# Document #################################################")
    print(doc.page_content)
    print(doc.metadata)


################################################# Document #################################################
使用说明书
User manual / Manuel d'utilisation / Manual de instrucciones / 使用说明书
TM6®
{'source': '/Users/sanchabbarroso/Desktop/Internship/RAG/bimby.pdf', 'file_path': '/Users/sanchabbarroso/Desktop/Internship/RAG/bimby.pdf', 'page': 0, 'total_pages': 36, 'format': 'PDF 1.4', 'title': 'web_25351_V6.0_TM6_USCAMX_20220817.pdf', 'author': '', 'subject': '', 'keywords': '', 'creator': 'Canva', 'producer': 'Canva', 'creationDate': "D:20241002145729+00'00'", 'modDate': "D:20241002145720+00'00'", 'trapped': ''}

################################################# Document #################################################
5. Press the olives and sunblush tomatoes into the dough so that they are slightly poking
above the surface. Season with salt and pepper and scatter over the rosemary leaves.
6. Drizzle the bread with a little extra olive oil and place in the preheated oven for 30
minutes unt

In [18]:
# Asking a new question and printing the answer
response = chain.invoke("Can you make a soufflé using a TM6?")
print(response["result"])



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
Yes, you can definitely make soufflés using the Thermomix TM6. The TM6 has the capability to prepare the soufflé base mixture with its precise heating, blending and mixing functions.

Here are some tips for making soufflés in the TM6:

1. Use the blending function to whip the egg whites to stiff, glossy peaks. The TM6 can whip them perfectly in just 1-2 minutes.

2. Make the soufflé base by melting butter, cooking a roux, and incorporating milk/cream using the TM6's heating and mixing modes. The consistent heating prevents scorching or lumps.

3. Fold the whipped egg whites gently into the soufflé base using the TM6's kneading function on a low speed. This incorporates air without deflating the whites.

4. Transfer the soufflé mixture to a buttered soufflé dish. The TM6 can even help you melt and brush the butter coating.

5. Bake the soufflés in a preheated oven as you normally would. The TM6 itself does not have

In [19]:
# Getting the documents used
for doc in response["source_documents"]:
    print("\n################################################# Document #################################################")
    print(doc.page_content)
    print(doc.metadata)


################################################# Document #################################################
3.
4. 5.
6.
7.
4.
5.
6.
7.
Use the Thermomix® TM6® only for its intended purpose as described in these
 operating instructions to prevent risks to you and other people and to preven
t damage to the Thermomix® TM6® , or to other property and equipment.
The Thermomix® TM6® is a kitchen appliance intended for the preparation of
food in domestic households.
The Thermomix® TM6® is intended to be used only by adults. It may not be used
by children – not even when supervised by an adult.
–Keep children away from the Thermomix® TM6® at all time.
–Please ensure you warn children of the potential dangers of heat/steam
development, hot condensation and hot surfaces.
–Close supervision is necessary when the Thermomix® TM6® is used near
children.
–Never allow children to play with the Thermomix® TM6®.
Thermomix® TM6® is not intended to be used for deep-frying food.
Thermomix® TM6® may be us

In [20]:
# Asking a new question and printing the answer
response = chain.invoke("How can I make a Soufflé just like Gordon Ramsay?")
print(response["result"])



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
To make a soufflé like Gordon Ramsay using the Thermomix TM6, here are some tips:

Soufflé Base:
1. Use the TM6 to melt butter, make a roux with flour, and slowly whisk in hot milk to create a smooth, velvety bechamel sauce base.
2. Allow the bechamel to cool slightly, then use the TM6 to blend in egg yolks one at a time until fully incorporated.
3. Flavor the base as desired - Ramsay often uses cheese, herbs, citrus zests etc.

Whipping Egg Whites:
1. Make sure the TM6 bowl is completely clean and dry.
2. Use the whisk function at a high speed to whip the egg whites to stiff, glossy peaks. The TM6 can do this very quickly.
3. Optionally add a pinch of cream of tartar or lemon juice to stabilize the whites.

Folding:
1. Use a large metal spoon to gently fold 1/3 of the whipped whites into the soufflé base to lighten it.
2. Then carefully fold in the remaining whites in two additions, being careful not to overmix a

In [21]:
# Asking a new question and printing the answer
response = chain.invoke("What would be the easiest recipe to cook?")
print(response["result"])



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
For an easy recipe to cook with the Thermomix TM6 that's still delicious, I would recommend the Noodle Soup with Poached Egg. This is a great beginner recipe that showcases some of the TM6's versatile functions.

Here's a quick overview of how you can make it in the TM6:

1. Use the chopping blade to finely slice vegetables like carrots, celery, onions etc. for the soup broth.

2. Sauté the chopped veggies using the TM6's heating and stirring modes to build flavor.

3. Add chicken or vegetable stock and seasonings. Use the TM6 to bring it to a simmer.

4. While the broth simmers, use the egg poaching function to perfectly poach eggs to add to the soup later.

5. Cook noodles like ramen, udon or egg noodles directly in the TM6 using the cooking basket accessory.

6. Once everything is cooked, remove the noodles and poached eggs. Use the TM6 to blend the broth if you prefer it smooth.

7. Assemble by placing noodles

In [22]:
# Getting the documents used
for doc in response["source_documents"]:
    print("\n################################################# Document #################################################")
    print(doc.page_content)
    print(doc.metadata)


################################################# Document #################################################
English ..................................................................................................................................................
Français .................................................................................................................................................
Español ..................................................................................................................................................
Chinese (simplified)..............................................................................................................................
3
37
73
107
{'source': '/Users/sanchabbarroso/Desktop/Internship/RAG/bimby.pdf', 'file_path': '/Users/sanchabbarroso/Desktop/Internship/RAG/bimby.pdf', 'page': 1, 'total_pages': 36, 'format': 'PDF 1.4', 'title': 'web_25351_V6.0_TM6_USCAMX_20220817.pdf', 'author': '', 'subject':