# **Retrieval-Augmented Cooking Assistant**
## **Phase 2: Building the Retrieval Pipeline**

In this phase, we'll set up a retrieval pipeline that efficiently fetches relevant documents based on a given query. Using tools like HuggingFace Transformers, LangChain, and Chroma, we'll build a system that can handle conversational queries while keeping track of context. The goal is to create a flexible and scalable retrieval setup that can be adapted for various applications.  

## Steps:

1. **Set Up the Local LLM and Pipeline**  
   We load the pre-trained LLaMA 3.1-3B-Instruct model using HuggingFace Transformers, configure its tokenizer, and wrap it into a HuggingFacePipeline. This makes the model compatible with LangChain, enabling it to generate text based on our custom instructions.

2. **Load the Recipes Vector Store**  
   Using the HuggingFaceEmbeddings and Chroma vector store, we load the vector store containing the cooking recipes that we created in the previous notebook. This vector store is used to retrieve relevant context for a given cooking query, which helps in generating precise answers.

3. **Define a Custom Prompt Template**  
   A custom prompt template is designed to instruct the model as an expert chef. The prompt ensures that responses include:
   - A brief, complete introduction to the recipe or topic.
   - A clear list of ingredients with quantities.
   - Detailed yet concise step-by-step cooking instructions.
   - Useful tips or variations for the dish.

4. **Create and Compare Two RAG Chains**  
   Two chains are constructed:
   - **RAG Chain:** Integrates a retriever to gather context from the vector store and processes the documents before feeding them to the LLM.
   - **Simple Chain:** Generates responses without any additional context (for comparison purposes).  
   Both chains process the output to isolate the final answer from the model's response.

---

In [None]:
! pip install langchain_community==0.2.19 chromadb==0.5.23 langchain==0.2.17 transformers==4.46.3

In [4]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain.llms import HuggingFacePipeline
from langchain.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser

### Setup the model to run localy

In [5]:
model_id = "meta-llama/Llama-3.2-3B-Instruct"

In [None]:
access_token = "< YOUR HUGGING FACE TOKEN >"

In [None]:
# set up the local LLM and wrap it in a HuggingFacePipeline, so that it can be used with LangChain

tokenizer = AutoTokenizer.from_pretrained(model_id, token=access_token)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", token=access_token)

pipe = pipeline(
    task="text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=2048,
    pad_token_id=tokenizer.eos_token_id,
    truncation=True
    )

llm = HuggingFacePipeline(pipeline=pipe)

### Load the vector store created earlier

In [None]:
# Load the vectorstore
persist_directory = "/data/recipes_vectorstore"  # Same directory used before
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

vector_store = Chroma(
    embedding_function = embeddings,
    persist_directory = persist_directory
    )

retriever = vector_store.as_retriever(search_kwargs={"k": 5})

### Set up the RAG pipeline

In [9]:
# create a custom promt
custom_prompt = PromptTemplate(
    input_variables=["context", "question"],
    template="""You are an expert chef and cooking assistant. You provide well-structured, informative, and concise responses to cooking-related questions. Your responses must always include:

    1. A brief but complete introduction to the recipe or cooking topic.
    2. A clear and structured list of ingredients, specifying quantities where relevant.
    3. Step-by-step cooking instructions that are detailed yet to the point.
    4. Useful tips or variations to enhance the dish.

    Now, answer the following question while maintaining clarity and precision.

    Use the following context to answer the user's question. If the context doesn't have the answer, say you don't know.

    Context: {context}

    Question: {question}

    Answer:
"""
)

In [10]:
def process_retireved_docs(retrieved_docs):
  """
  Takes a list of retrieved documents and joins them without repetitions.

  Args:
    retrieved_docs: A list of retrieved documents.

  Returns:
    A string containing the joined documents without repetitions.
  """
  joint_doc = []

  for doc in retrieved_docs:
    if doc.page_content not in joint_doc:
      joint_doc.append(doc.page_content)

  joint_doc = "\n\n".join(doc for doc in joint_doc)
  return joint_doc

def post_process_output(output):
  """
  Isolates the answer of the model.

  Args:
    output: The output string from the model.

  Returns:
    The isolated answer string.
  """
  return output.split("Answer:")[1]


In [11]:
# create the RAG chain
rag_chain = (
    {'question': RunnablePassthrough(),
     'context': retriever | process_retireved_docs}
    | custom_prompt
    | llm
    | StrOutputParser()
    | post_process_output
)

# create a chain without retrival, to compare the two responses
simple_chain = (
    {'question': RunnablePassthrough(),
     'context': RunnableLambda(lambda _: "")} # pass an empty context
    | custom_prompt
    | llm
    | StrOutputParser()
    | post_process_output
)

In [None]:
question = "Can you explain me how to prepare an Arnold’s roast chicken, following Yotam Ottolenghi's recipe?"

In [None]:
print('Response without RAG')
print(simple_chain.invoke(question))

Response without RAG

    Introduction:
    Yotam Ottolenghi's roast chicken recipe is a classic example of Middle Eastern-inspired flavors and simplicity. This recipe is a staple in many Israeli and Middle Eastern households, and its unique blend of spices and herbs will elevate your roast chicken to a new level.

    Ingredients:
    For the chicken:
    1 x 1.5-2 kg whole chicken
    2 tbsp olive oil
    2 cloves garlic, peeled and minced
    1 tsp ground cumin
    1 tsp smoked paprika
    1 tsp salt
    1 tsp black pepper
    1 tsp sumac
    2 tbsp freshly squeezed lemon juice
    2 tbsp chopped fresh parsley

    For the potatoes:
    4-6 medium-sized potatoes, peeled and halved
    2 tbsp olive oil
    1 tsp salt
    1 tsp black pepper
    1 tsp sumac

    For the carrots:
    4-6 medium-sized carrots, peeled and sliced
    2 tbsp olive oil
    1 tsp salt
    1 tsp black pepper

    Instructions:
    1. Preheat the oven to 220°C (425°F).
    2. In a small bowl, mix together the o

In [15]:
print('Response with RAG')
print(rag_chain.invoke(question))

Response with RAG

    Introduction:
    Arnold’s roast chicken with caraway and cranberry stuffing is a delicious and flavorful dish that combines the richness of a roasted chicken with the tanginess of a caraway and cranberry stuffing. This recipe is inspired by Arnold Rogow, a family friend of Ixta Belfrage, who is often credited with testing and refining Yotam Ottolenghi's recipes. To prepare this dish, you will need to make the stuffing and prep the chicken up to 1 day ahead, then refrigerate and roast at the last minute.

    Ingredients:
    For the chicken:
    - 1 whole chicken (about 3 lb/1.4kg)
    - 5 tbsp/70g unsalted butter
    - 5 tsp caraway seeds, toasted and lightly crushed
    - 7 garlic cloves, crushed
    - 1 tbsp dark brown sugar
    - Salt
    - 1 whole chicken (about 3 lb/1.4kg)
    - 5–6 large celery stalks, cut into ½-inch/1cm dice (3 cups/300g)
    - 1 onion, cut into ½-inch/1cm dice (scant 1 cup/140g)
    - 3½ oz/100g dried cranberries
    - 3½ oz/100g ready

---

## *Conclusions*

This experiment demonstrates the impact of Retrieval-Augmented Generation (RAG) on the quality and accuracy of responses.

### Key Findings:

1. **Limitations of the Model Without Retrieval**  
   - The model without retrieval is able to generate a seemingly acceptable answer. However, the content is generic and does not reference the specific recipe from the book.  
   - This suggests that while large language models can produce plausible text, they may not always provide factually correct or source-based answers.

2. **Advantages of Introducing Retrieval**  
   - By incorporating a retrieval step, the model gains access to the specific cookbook stored in the vector database.  
   - As a result, the generated response is both **correct and complete**, directly referencing the actual recipe.  
   - This ensures that the model provides accurate instructions that align with the source material.

### Final Thoughts

This experiment highlights the importance of combining large language models with document retrieval for applications that require factual accuracy. While LLMs can generate fluent and structured text, retrieval-based augmentation ensures that responses are grounded in reliable information, making the system a more trustworthy and effective cooking assistant.



