<a href="https://colab.research.google.com/github/Haritha-Kotte/python-langchain-weaviate/blob/main/meal_planner_weaviate_vectorizer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Meal Planner

In this RAG chain, we are going to:

1. Load Dataset from huggingface
2. Connect to weaviate
3. Create collection in weaviate with the properties
4. Ingest embedding data to weaviate collection
5. Create the retriever from vector store
6. Create a prompt with a template
7. Build the RAG chain
8. Invoke RAG chain and get the output
9. Close connection to weaviate

## Prerequisits

Before we proceed to do the above steps, we need to

- Create an account in weaviate and get the API key for the weaviate cluster
- Create an account in huggingface and get the API key for embedding generattion
- Create an account in openAI and get the API key for using LLM in the generation




In [None]:
!pip install weaviate-client datasets

## Load Dataset

In [None]:
from datasets import load_dataset

df = load_dataset("Shengtao/recipe")
recipes = df['train']
recipes = recipes.select(range(100))

## Connect to Weaviate

You have to enter the following credentials in google colab's secrets before connecting to weaviate

* WEAVIATE_CLUSTER
* WEAVIATE_KEY
* HUGGINGFACE_NEW_APIKEY

In [8]:
from google.colab import userdata
import weaviate
from weaviate.auth import Auth

cluster_url = userdata.get('WEAVIATE_CLUSTER')
auth_key = userdata.get('WEAVIATE_KEY')
huggingface_new_apikey = userdata.get('HUGGINGFACE_NEW_APIKEY')

weaviate_client = weaviate.connect_to_weaviate_cloud(
    cluster_url=cluster_url,
    auth_credentials=Auth.api_key(auth_key),
    headers={"X-HuggingFace-Api-Key": huggingface_new_apikey},
    skip_init_checks=True,
)

## Create collection in weaviate

In the following code block, we are creating a collection in weaviate with the properties from the recipe dataset

In [11]:
import weaviate.classes.config as wvcc

properties = [wvcc.Property(
                name=col,
                data_type=wvcc.DataType.TEXT
            ) for col in recipes.column_names if col != "embedding"]

weaviate_client.collections.delete("RecipeV4")
# Note that you can use `client.collections.create_from_dict()` to create a collection from a v3-client-style JSON object
collection = weaviate_client.collections.create(
    name="RecipeV4",
    description="A collection to store recipes",
    vectorizer_config=wvcc.Configure.Vectorizer.text2vec_huggingface(
        model="sentence-transformers/all-mpnet-base-v2",
        vectorize_collection_name=True
    ),
    properties=properties
)

## Ingest embedding data to Weaviate

In here, we are ingesting data with a rate limited batches as we are triggering huggingface API to generate embeddings as we upload the data and we don't want to hit the rate limit in huggingface.

After uploading the data, close the connection to weaviate.

In [None]:
try:
    with weaviate_client.batch.rate_limit(requests_per_minute=10) as batch:  # or <collection>.batch.rate_limit()

        for index, row in enumerate(recipes):
            data_object = {col: str(row[col]) for col in recipes.column_names}
            max_retries = 5
            for attempt in range(max_retries):
                try:
                    batch.add_object(properties=data_object, collection="RecipeV4") #Add object to the collection
                    print("Data uploaded successfully.")
                    break
                except weaviate.exceptions.UnexpectedStatusCodeException as e:
                    if '503' in str(e):
                        print(f"Attempt {attempt + 1}: Model is still loading, retrying...")
                        time.sleep(20)  # Wait and retry
                    if '429' in str(e):
                        # Handle rate limit error
                        retry_after = 60  # Retry-After header might be in seconds
                        print(f"Rate limit exceeded. Retrying after {retry_after} seconds...")
                        time.sleep(retry_after)
                    else:
                        raise  # Raise if it's a different error

finally:
    weaviate_client.close()

Now, we have to install necessary packages for RAG pipeline.

In [None]:
!pip install langchain_weaviate langchain_huggingface langchain_openai

## Create the Retriever from the vectorstore

Connect to the weaviate client and initialize the vectorstore with the embedding model we want to use for the query embedding.


In [None]:
from langchain_weaviate.vectorstores import WeaviateVectorStore
from langchain_huggingface import HuggingFaceEmbeddings

# Mention the embedding model name
embedding_model_name = "sentence-transformers/all-mpnet-base-v2"
embeddings = HuggingFaceEmbeddings(
    model_name=embedding_model_name
)

weaviate_client.connect()
# Initialise the vector store
vectorstore = WeaviateVectorStore(client=weaviate_client, index_name="RecipeV4", text_key="title", embedding=embeddings)
# Create the retriever to fetch relevant documents based on a query.
retriever = vectorstore.as_retriever()

## Create a prompt with the template

Created a prompt template that instructs the model to answer questions using retrieved context and to format as mentioned in it.

In [None]:
from langchain_core.prompts import ChatPromptTemplate

# Construct a template for the RAG mode
template = """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Show in a detailed information list format for the user to prepare the dishes and analyze the nutrition information of the dishes.
Question: {question}
Context: {context}
Answer:
"""
prompt = ChatPromptTemplate.from_template(template)
print(prompt)

## Build the RAG chain

We are now going to build a RAG chain that takes the retrieved result as the context and passes it to the prompt which later sends it to the LLM to generate the answer to the query.

In [None]:
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Connect to OpenAI GPT Model
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0, api_key=userdata.get("OPENAI_API_KEY"))
# Build the RAG chain
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)


## Input the query

You can enter the query here

In [25]:
query = "Recipe with potatoes" # @param {"type":"string","placeholder":"Enter your query"}


## Invoke RAG chain

Invoke the RAG chain with the query and print the result

In [None]:
output = rag_chain.invoke(query)
print(output)

## Close weaviate connection

In [27]:
weaviate_client.close()