# Homework

The idea of this homework is to create a whole RAG system(ingestion and inference pipeline) for specific data and task. 
Important parts that need to be implemented are:
- Loader and chunker for the data
- Embedder for the data
- LLM for the RAG
- Vector Store(Database) for the data
- Retriever for the data
- Prompt Template for the RAG
- RetrieverQA chain for the RAG

Play around and try different things, the goal is to have fun and learn something new.


### Requirements

In [1]:
# Installing the required packages
!pip install langchain langchain_community boto3



In [2]:
# Importing the required packages
import os

from langchain.chains.router import MultiRetrievalQAChain
from langchain.document_loaders import PyMuPDFLoader
from langchain.memory import ConversationBufferWindowMemory
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.chat_models import BedrockChat
from langchain_community.embeddings import BedrockEmbeddings
from langchain_community.vectorstores import FAISS

### Loader and Chunker

In [3]:
# Defining the configuration
REGION_NAME = "us-east-1"
CREDENTIALS_PROFILE_NAME = "MLEngineers"
EMBEDDER_MODEL_ID = "amazon.titan-embed-text-v2:0"
EMBEDDER_MODEL_KWARGS = {
    "dimensions": 512,
    "normalize": True
}

CHUNK_SIZE = 2000
CHUNK_OVERLAP = 100

DATA_PATHS = [
    "bimby.pdf",
    "gordon.pdf"
]

INSTRUCTION_VECTOR_STORE_PATH = "./instruction_vector_database/"
RECIPE_VECTOR_STORE_PATH = "./recipe_vector_database/"

In [4]:
# Defining the chunker
splitter = RecursiveCharacterTextSplitter(
chunk_size=CHUNK_SIZE,
chunk_overlap=CHUNK_OVERLAP
)

In [5]:
# Creating chunks from the documents
global_chunks = []
for data_path in DATA_PATHS:
    loader = PyMuPDFLoader(os.path.join(os.getcwd(), data_path))
    docs = loader.load()
    chunks = splitter.split_documents(docs)
    global_chunks.append(chunks)

### Embedder

In [6]:
# Creating the embedder
embedder = BedrockEmbeddings(
    model_id=EMBEDDER_MODEL_ID,
    model_kwargs=EMBEDDER_MODEL_KWARGS,
    region_name=REGION_NAME,
    #credentials_profile_name=CREDENTIALS_PROFILE_NAME
)

  embedder = BedrockEmbeddings(


### Vector Store (Database)

In [7]:
# Creating the vector store
instruction_vector_store = FAISS.from_documents(documents=global_chunks[0], embedding=embedder)
recipe_vector_store = FAISS.from_documents(documents=global_chunks[1], embedding=embedder)

instruction_vector_store.save_local(INSTRUCTION_VECTOR_STORE_PATH)
recipe_vector_store.save_local(RECIPE_VECTOR_STORE_PATH)

### Prompt Template

In [28]:
from langchain.prompts import PromptTemplate

LLM_MODEL_ID = "anthropic.claude-3-sonnet-20240229-v1:0" # anthropic.claude-3-haiku-20240307-v1:0 or anthropic.claude-3-sonnet-20240229-v1:0 or anthropic.claude-v2:1
LLM_MODEL_KWARGS = {
    "max_tokens": 4096,
    "temperature": 0.1,
    "top_p": 1,
    "top_k": 250,
    "stop_sequences": ["\n\nHuman"]
}

SEARCH_TYPE = "similarity"
RETRIEVER_KWARGS = {
    "k": 5
}

# Inside in the prompt template, you can play with the system's persona, the context and the question.
prompt_template = """ You are an expert chef that provides helpful answers on how to make recipes, either by hand or using a TM6. 
Use the following pieces of context to provide a helpful answer on the question. {context}

Question: {question}
"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

### LLM

In [29]:
# Creating the LLM and Embedder models
#llm = BedrockChat(region_name=REGION_NAME, credentials_profile_name=CREDENTIALS_PROFILE_NAME,model_id=LLM_MODEL_ID, model_kwargs=LLM_MODEL_KWARGS)
llm = BedrockChat(region_name=REGION_NAME, model_id=LLM_MODEL_ID, model_kwargs=LLM_MODEL_KWARGS)

### MultiRetriever

In [30]:
# Loading the vector store and creating retriever
instruction_vector_store = FAISS.load_local(INSTRUCTION_VECTOR_STORE_PATH, embeddings=embedder, allow_dangerous_deserialization=True)
recipe_vector_store = FAISS.load_local(RECIPE_VECTOR_STORE_PATH, embeddings=embedder, allow_dangerous_deserialization=True)

instruction_retriever = instruction_vector_store.as_retriever(search_type=SEARCH_TYPE, **RETRIEVER_KWARGS)
recipe_retriever = recipe_vector_store.as_retriever(search_type=SEARCH_TYPE, **RETRIEVER_KWARGS)

retriever_infos = [
    {
        "name": "instructions", 
        "description": "Good for answering questions about TM6's safety instructions, how to use, settings and cleaning maintenance.", 
        "retriever": instruction_retriever,
        "prompt": PROMPT
    },
    {
        "name": "recipes", 
        "description": "Good for answering questions Gordon Ramsay's recipes and cooking advices.",
        "retriever": recipe_retriever,
        "prompt": PROMPT
    },
]

### MultiRetrievalQA chain

In [32]:
# Creating the Chain for usage
multi_chain = MultiRetrievalQAChain.from_retrievers(
            llm=llm,
            retriever_infos=retriever_infos,
            default_retriever=instruction_retriever,
            verbose=True,
)

response = multi_chain.invoke("What is a TM6?")



[1m> Entering new MultiRetrievalQAChain chain...[0m
instructions: {'query': 'What is a TM6?'}
[1m> Finished chain.[0m


In [33]:
# Checking the response
response

{'input': 'What is a TM6?',
 'query': 'What is a TM6?',
 'result': "Based on the context provided from the TM6 user manual, the TM6 is a kitchen appliance made by Thermomix. Some key points about the TM6:\n\n- It is a kitchen appliance intended for preparing food in domestic households.\n\n- It is designed to be used only by adults, not children, even with adult supervision. Children should be kept away from it due to dangers like heat/steam, hot surfaces, etc.\n\n- It is not intended for deep frying foods.\n\n- It can be used by persons with reduced physical, sensory or mental capabilities if they are properly supervised and understand the hazards.\n\n- It has basic cooking functions like weighing ingredients, mixing, heating, etc. \n\n- It has Wi-Fi connectivity capabilities that allow setting up a network connection.\n\nSo in summary, the TM6 is Thermomix's latest model of a versatile countertop appliance that combines heating and blending functions to prepare various foods, with sm

In [34]:
# Printing the answer
print(response["result"])

Based on the context provided from the TM6 user manual, the TM6 is a kitchen appliance made by Thermomix. Some key points about the TM6:

- It is a kitchen appliance intended for preparing food in domestic households.

- It is designed to be used only by adults, not children, even with adult supervision. Children should be kept away from it due to dangers like heat/steam, hot surfaces, etc.

- It is not intended for deep frying foods.

- It can be used by persons with reduced physical, sensory or mental capabilities if they are properly supervised and understand the hazards.

- It has basic cooking functions like weighing ingredients, mixing, heating, etc. 

- It has Wi-Fi connectivity capabilities that allow setting up a network connection.

So in summary, the TM6 is Thermomix's latest model of a versatile countertop appliance that combines heating and blending functions to prepare various foods, with smart connectivity features. It is an all-in-one food processor designed for conveni

### Prompt Engineering

In [35]:
# Asking a new question and printing the answer
response = multi_chain.invoke("How can I make a Soufflé just like Gordon Ramsay?")
print(response["result"])



[1m> Entering new MultiRetrievalQAChain chain...[0m
recipes: {'query': 'How can I make a Soufflé just like Gordon Ramsay?'}
[1m> Finished chain.[0m
Here are some tips for making soufflés like Gordon Ramsay, based on the provided context:

Preparing the Ramekins:
1. Brush the soufflé ramekins or molds with softened butter, applying it in upward vertical strokes. Make sure to cover the rims as well.
2. Chill the ramekins for a few minutes, then repeat with a second layer of butter.
3. Sprinkle some grated chocolate (or other coating like flour) into each mold, shaking to cover the inside completely. Tap out any excess.
4. Chill the prepared ramekins until needed.

Making the Soufflé Base:
1. Follow the recipe for the soufflé base, making a thick, velvety mixture by cooking milk/cream, sugar, egg yolks, and flours together.
2. Allow the base to cool to room temperature.

Whipping the Egg Whites:
1. Whisk the egg whites until they hold stiff peaks.
2. Add a few drops of lemon juice t

In [36]:
# Asking a new question and printing the answer
response = multi_chain.invoke("How can I make a soufflé using a TM6?")
print(response["result"])



[1m> Entering new MultiRetrievalQAChain chain...[0m
recipes: {'query': 'How can I make a soufflé using a Thermomix?'}
[1m> Finished chain.[0m
Here are some tips for making a soufflé using a Thermomix:

Soufflé Base:
1. Add milk, cream, and half the sugar to the Thermomix bowl. Heat 5 mins/90°C/Speed 4.
2. Add egg yolks and flours. Cook 7 mins/90°C/Speed 4 until thickened into a smooth soufflé base. Transfer to a clean bowl and set aside to cool slightly.

Egg Whites:
3. Thoroughly clean and dry the Thermomix bowl. Add egg whites. Mix 6 mins/Speed 4. 
4. With the blades running on Speed 4, gradually add in the remaining sugar through the hole in the lid. Continue mixing for 2 more mins until thick and glossy.

Combining:
5. Fold in zests and juices into the soufflé base.
6. Fold in 1/3 of the whipped egg whites to loosen the base mixture.
7. Gently fold in remaining egg whites with a large metal spoon until just combined, being careful not to overmix.

Baking:
8. Prepare ramekins 

In [38]:
# Asking a new question and printing the answer
response = multi_chain.invoke("What would be the easiest recipe to cook by hand?")
print(response["result"])



[1m> Entering new MultiRetrievalQAChain chain...[0m
recipes: {'query': 'What would be the easiest recipe to cook by hand?'}
[1m> Finished chain.[0m
Based on the context provided, one of the easiest recipes to cook by hand would be the focaccia bread mentioned. The passage states that focaccia is "a very simple hands-on bread that produces really impressive results" and that "it's easy to shape - you simply massage it into the corners." 

A few key points that make focaccia an easy recipe to prepare by hand:

1. It uses a yeast dough, but doesn't require extensive kneading like some other yeasted breads. The passage says you just need to knead it until smooth and elastic.

2. Since it's baked in a tray, you don't have to shape it into a boule or loaf like with other breads. You can simply stretch and massage the dough into the tray.

3. It's very versatile for toppings - you can top it with simple ingredients like tomatoes, olives, rosemary or cured meats that you may have on hand

In [39]:
# Asking a new question and printing the answer
response = multi_chain.invoke("What would be the easiest recipe to cook using a TM6?")
print(response["result"])



[1m> Entering new MultiRetrievalQAChain chain...[0m
recipes: {'query': 'What would be the easiest recipe to cook using a Thermomix TM6?'}
[1m> Finished chain.[0m
As an expert chef, I understand the need for quick and convenient meals after a long day's work. The Thermomix TM6 is an excellent kitchen appliance that can make cooking hassle-free and efficient. Here's a simple recipe that you can whip up using the TM6:

Easy Tomato Basil Soup

Ingredients:
- 1 onion, quartered
- 2 garlic cloves
- 2 tablespoons olive oil
- 1 (28 oz) can diced tomatoes
- 1 cup vegetable or chicken broth
- 1/2 cup fresh basil leaves
- Salt and pepper to taste
- Grated Parmesan cheese for serving (optional)

Instructions:

1. Place the quartered onion and garlic cloves in the TM6 bowl. Chop for 5 seconds on Speed 5.
2. Add the olive oil and sauté for 3 minutes at 120°C on Speed 1.
3. Add the canned tomatoes (with juice), broth, and basil leaves. Cook for 15 minutes at 100°C on Speed 1.
4. Use the puree f