# Retrieval Augmented Generation Demo

## Configuration to store websites, questions

In [1]:
config = {
    "websites": [
        "https://www.101cookbooks.com/panade/",
        "https://www.101cookbooks.com/good-chana-masala-recipe/",
        "https://www.101cookbooks.com/tofu-scramble/",
    ],
    "questions": [
        "What are the top 5 ingredients in 'Chana Masala'?",
        "How to make a delicious 'Tofu Scramble'?",
        "Do you know how to make 'Paneer Tikka'?",
        "I have these items in my bucket: 'broccoli florets', 'cabbage', 'cauliflower florets', 'zucchini'. What's the dish that I can prepare?"
    ],
}

## Load the data

In [2]:
from langchain.document_loaders import WebBaseLoader

In [3]:
loader = WebBaseLoader(config["websites"])

In [4]:
docs = loader.load()

In [5]:
print(len(docs))

3


In [6]:
print(docs[0].metadata)

{'source': 'https://www.101cookbooks.com/panade/', 'title': 'A Rustic Scallion & Chive Panade - 101 Cookbooks', 'description': 'If you love hearty stuffings, bread soups, or savory bread puddings, this beautiful panade is for you.', 'language': 'en-US'}


## Prepare the data

In [7]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

def split_docs(documents, chunk_size=1000, chunk_overlap=10):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    docs = text_splitter.split_documents(documents)
    return docs

docs = split_docs(docs)
print(len(docs))

66


## Convert to embeddings

In [8]:
from langchain.embeddings import SentenceTransformerEmbeddings
embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

  from .autonotebook import tqdm as notebook_tqdm


## Index the data (embeddings)

In [9]:
from langchain.vectorstores import Chroma
db = Chroma.from_documents(docs, embeddings)

## LLM

In [11]:
from dotenv import load_dotenv

load_dotenv(override=True)

True

In [12]:
from langchain.chat_models import ChatOpenAI
model_name = "gpt-3.5-turbo"
llm = ChatOpenAI(model_name=model_name)

## Retrieval chain

In [13]:
from langchain.chains import RetrievalQA

retrieval_chain = RetrievalQA.from_chain_type(llm, chain_type="stuff", retriever=db.as_retriever())

## Answer questions using chain

In [14]:
import time


for q in config["questions"]:
    ans = retrieval_chain.run(q)
    print(ans)
    print("---")
    time.sleep(5)

The top 5 ingredients in Chana Masala are cayenne, serrano, chana masala powder, tomatoes, and mango powder.
---
To make a delicious tofu scramble, here's a basic recipe you can follow:

Ingredients:
- 1 block of firm tofu
- 1 tablespoon of olive oil
- 1 small onion, diced
- 2 cloves of garlic, minced
- 1 teaspoon of curry powder
- Salt and pepper to taste
- 2 cups of spinach (or any other seasonal vegetables you prefer)
- Optional toppings: nutritional yeast, hot sauce, avocado, salsa, etc.

Instructions:
1. Start by pressing the tofu to remove excess water. Wrap the tofu block in a clean kitchen towel or paper towels and place something heavy on top (like a plate or a book) for about 15-20 minutes.

2. In the meantime, heat the olive oil in a large skillet over medium heat. Add the diced onion and minced garlic, and sauté until they become translucent and fragrant.

3. Crumble the pressed tofu into the skillet with your hands or a fork. You can make the crumbles as large or small as 