<a href="https://colab.research.google.com/github/Haritha-Kotte/python-langchain-weaviate/blob/main/meal_planner_weaviate_vectorizer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Meal Planner

In the following RAG chain, we are doing the following steps:

1. Load Dataset from huggingface
2. Connect to weaviate
3. Create collection in weaviate with the properties
4. Ingest embedding data to weaviate collection
5. Create the retriever from vector store
6. Create a prompt with a template
7. Build the RAG chain
8. Invoke RAG chain and get the output
9. Close connection to weaviate

## Prerequisits

Before we proceed to do the above steps, we need to

- Create an account in weaviate and get the API key for the weaviate cluster
- Create an account in huggingface and get the API key for embedding generattion
- Create an account in openAI and get the API key for using LLM in the generation




In [1]:
!pip install weaviate-client datasets

Collecting weaviate-client
  Downloading weaviate_client-4.8.1-py3-none-any.whl.metadata (3.6 kB)
Collecting datasets
  Downloading datasets-3.0.0-py3-none-any.whl.metadata (19 kB)
Collecting httpx<=0.27.0,>=0.25.0 (from weaviate-client)
  Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)
Collecting validators==0.34.0 (from weaviate-client)
  Downloading validators-0.34.0-py3-none-any.whl.metadata (3.8 kB)
Collecting authlib<1.3.2,>=1.2.1 (from weaviate-client)
  Downloading Authlib-1.3.1-py2.py3-none-any.whl.metadata (3.8 kB)
Collecting grpcio-tools<2.0.0,>=1.57.0 (from weaviate-client)
  Downloading grpcio_tools-1.66.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.3 kB)
Collecting grpcio-health-checking<2.0.0,>=1.57.0 (from weaviate-client)
  Downloading grpcio_health_checking-1.66.1-py3-none-any.whl.metadata (1.1 kB)
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Colle

## Load Dataset

In [3]:
from datasets import load_dataset

df = load_dataset("Shengtao/recipe")
recipes = df['train']
recipes = recipes.select(range(100))

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/21.0 [00:00<?, ?B/s]

recipe.csv:   0%|          | 0.00/64.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/32722 [00:00<?, ? examples/s]

## Connect to Weaviate

You have to enter the following credentials in google colab's secrets before connecting to weaviate

* WEAVIATE_CLUSTER
* WEAVIATE_KEY
* HUGGINGFACE_NEW_APIKEY

In [8]:
from google.colab import userdata
import weaviate
from weaviate.auth import Auth

cluster_url = userdata.get('WEAVIATE_CLUSTER')
auth_key = userdata.get('WEAVIATE_KEY')
huggingface_new_apikey = userdata.get('HUGGINGFACE_NEW_APIKEY')

weaviate_client = weaviate.connect_to_weaviate_cloud(
    cluster_url=cluster_url,
    auth_credentials=Auth.api_key(auth_key),
    headers={"X-HuggingFace-Api-Key": huggingface_new_apikey},
    skip_init_checks=True,
)

## Create collection in weaviate

In the following code block, we are creating a collection in weaviate with the properties from the recipe dataset

In [11]:
import weaviate.classes.config as wvcc

properties = [wvcc.Property(
                name=col,
                data_type=wvcc.DataType.TEXT
            ) for col in recipes.column_names if col != "embedding"]

weaviate_client.collections.delete("RecipeV4")
# Note that you can use `client.collections.create_from_dict()` to create a collection from a v3-client-style JSON object
collection = weaviate_client.collections.create(
    name="RecipeV4",
    description="A collection to store recipes",
    vectorizer_config=wvcc.Configure.Vectorizer.text2vec_huggingface(
        model="sentence-transformers/all-mpnet-base-v2",
        vectorize_collection_name=True
    ),
    properties=properties
)

## Ingest embedding data to Weaviate

In here, we are ingesting data with a rate limited batches as we are triggering huggingface API to generate embeddings as we upload the data and we don't want to hit the rate limit in huggingface.

After uploading the data, close the connection to weaviate.

In [12]:
try:
    with weaviate_client.batch.rate_limit(requests_per_minute=10) as batch:  # or <collection>.batch.rate_limit()

        for index, row in enumerate(recipes):
            data_object = {col: str(row[col]) for col in recipes.column_names}
            max_retries = 5
            for attempt in range(max_retries):
                try:
                    batch.add_object(properties=data_object, collection="RecipeV4") #Add object to the collection
                    print("Data uploaded successfully.")
                    break
                except weaviate.exceptions.UnexpectedStatusCodeException as e:
                    if '503' in str(e):
                        print(f"Attempt {attempt + 1}: Model is still loading, retrying...")
                        time.sleep(20)  # Wait and retry
                    if '429' in str(e):
                        # Handle rate limit error
                        retry_after = 60  # Retry-After header might be in seconds
                        print(f"Rate limit exceeded. Retrying after {retry_after} seconds...")
                        time.sleep(retry_after)
                    else:
                        raise  # Raise if it's a different error

finally:
    weaviate_client.close()

Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded successfully.
Data uploaded succes

Now, we have to install necessary packages for RAG pipeline.

In [16]:
!pip install langchain_weaviate langchain_huggingface langchain_openai

Collecting langchain_openai
  Downloading langchain_openai-0.2.0-py3-none-any.whl.metadata (2.6 kB)
Collecting openai<2.0.0,>=1.40.0 (from langchain_openai)
  Downloading openai-1.46.0-py3-none-any.whl.metadata (24 kB)
Collecting tiktoken<1,>=0.7 (from langchain_openai)
  Downloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting jiter<1,>=0.4.0 (from openai<2.0.0,>=1.40.0->langchain_openai)
  Downloading jiter-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.6 kB)
Downloading langchain_openai-0.2.0-py3-none-any.whl (51 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.5/51.5 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading openai-1.46.0-py3-none-any.whl (375 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m375.0/375.0 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.wh

## Create the Retriever from the vectorstore

Connect to the weaviate client and initialize the vectorstore with the embedding model we want to use for the query embedding.


In [15]:
from langchain_weaviate.vectorstores import WeaviateVectorStore
from langchain_huggingface import HuggingFaceEmbeddings

# Mention the embedding model name
embedding_model_name = "sentence-transformers/all-mpnet-base-v2"
embeddings = HuggingFaceEmbeddings(
    model_name=embedding_model_name
)

weaviate_client.connect()
# Initialise the vector store
vectorstore = WeaviateVectorStore(client=weaviate_client, index_name="RecipeV4", text_key="title", embedding=embeddings)
# Create the retriever to fetch relevant documents based on a query.
retriever = vectorstore.as_retriever()



## Create a prompt with the template

Created a prompt template that instructs the model to answer questions using retrieved context and to format as mentioned in it.

In [17]:
from langchain_core.prompts import ChatPromptTemplate

# Construct a template for the RAG mode
template = """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Show in a detailed information list format for the user to prepare the dishes and analyze the nutrition information of the dishes.
Question: {question}
Context: {context}
Answer:
"""
prompt = ChatPromptTemplate.from_template(template)
print(prompt)

input_variables=['context', 'question'] input_types={} partial_variables={} messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Show in a detailed information list format for the user to prepare the dishes and analyze the nutrition information of the dishes.\nQuestion: {question}\nContext: {context}\nAnswer:\n"), additional_kwargs={})]


## Build the RAG chain

We are now going to build a RAG chain that takes the retrieved result as the context and passes it to the prompt which later sends it to the LLM to generate the answer to the query.

In [22]:
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Connect to OpenAI GPT Model
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0, api_key=userdata.get("OPENAI_API_KEY"))
# Build the RAG chain
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)


### Recipe with Potatoes:

#### Basic Mashed Potatoes:
- **Ingredients:**
  - 2 pounds baking potatoes, peeled and quartered
  - 2 tablespoons butter
  - 1 cup milk
  - Salt and pepper to taste

- **Instructions:**
  1. Bring a pot of salted water to a boil.
  2. Add potatoes and cook until tender but still firm, about 15 minutes; drain.
  3. Heat butter and milk in a saucepan until butter is melted.
  4. Blend milk mixture into potatoes until smooth and creamy.
  5. Season with salt and pepper to taste.

- **Nutrition Information:**
  - Calories: 257.1
  - Fat: 7.2g
  - Saturated Fat: 4.5g
  - Cholesterol: 20.1mg
  - Carbohydrates: 43.7g
  - Protein: 5.6g
  - Fiber: 3.7g
  - Potassium: 763.1mg
  - Vitamin C: 15.2mg
  - Iron: 0.7mg
  - Calcium: 89.4mg

#### Roasted Potatoes and Onions:
- **Ingredients:**
  - 2 pounds potatoes, sliced into 1/2-inch-thick pieces
  - 1 onion, halved and each half cut into quarters
  - 1/2 cup canola oil
  - 1/2 cup olive oil
  - 4 cloves garlic
  - 1 enve

## Input the query

You can enter the query here

In [25]:
query = "Recipe with potatoes" # @param {"type":"string","placeholder":"Enter your query"}


## Invoke RAG chain

Invoke the RAG chain with the query and print the result

In [26]:
output = rag_chain.invoke(query)
print(output)

### Recipe with Potatoes:

#### Basic Mashed Potatoes:
- **Ingredients:**
  - 2 pounds baking potatoes, peeled and quartered
  - 2 tablespoons butter
  - 1 cup milk
  - Salt and pepper to taste
- **Instructions:**
  1. Bring a large pot of salted water to a boil. Add potatoes and garlic, lower heat to medium, and simmer until potatoes are tender, 15 to 20 minutes.
  2. Heat milk and butter in a small saucepan over low heat until butter is melted.
  3. Drain potatoes and return to the pot. Slowly add warm milk mixture, blending it in with a potato masher or electric mixer until potatoes are smooth and creamy. Season with salt and pepper.
- **Nutrition Information:**
  - Calories: 257.1
  - Fat: 7.2g
  - Carbohydrates: 43.7g
  - Protein: 5.6g
  - Fiber: 3.7g
  - Cholesterol: 20.1mg
  - Sodium: 76.1mg
  - Potassium: 763.1mg
  - Vitamin C: 15.2mg
  - Calcium: 89.4mg
  - Iron: 0.7mg

#### Roasted Potatoes and Onions:
- **Ingredients:**
  - 2 pounds potatoes, sliced into 1/2-inch-thick piece

## Close weaviate connection

In [27]:
weaviate_client.close()