RAG implementation using FAISS vectorstore , llama model and huggingface embedding

Ingestion -> retrival -> Generation

In [1]:
pip install -U langchain faiss-cpu langchain-community langchain-ollama langchain-huggingface sentence-transformers

Note: you may need to restart the kernel to use updated packages.


In [2]:
from langchain.document_loaders import TextLoader
from langchain.vectorstores import FAISS

In [3]:
with open('food.txt', 'r') as file:
    text = file.read()
    print(text)


Food Item: Apple
Category: Fruit
Description: Apples are sweet, edible fruits produced by an apple tree. They are rich in fiber and vitamin C.
Nutritional Info (per 100g): Calories: 52, Carbs: 14g, Fiber: 2.4g, Sugar: 10g, Protein: 0.3g

Food Item: Broccoli
Category: Vegetable
Description: Broccoli is a cruciferous vegetable known for its high levels of vitamin C, K, and fiber. Often eaten steamed or raw.
Nutritional Info (per 100g): Calories: 34, Carbs: 7g, Fiber: 2.6g, Protein: 2.8g

Food Item: Chicken Breast
Category: Protein
Description: A lean cut of chicken meat, commonly used in healthy diets. It is a great source of protein and low in fat.
Nutritional Info (per 100g): Calories: 165, Protein: 31g, Fat: 3.6g, Carbs: 0g

Food Item: Almonds
Category: Nuts
Description: Almonds are nutrient-dense nuts packed with healthy fats, vitamin E, magnesium, and antioxidants.
Nutritional Info (per 100g): Calories: 579, Protein: 21g, Fat: 50g, Carbs: 22g, Fiber: 12g

Food Item: Brown Rice
Categ

In [None]:
# Load the data from text file
loader = TextLoader('food.txt', encoding='utf-8')
documents = loader.load()
print(documents)

[Document(metadata={'source': 'food.txt'}, page_content='Food Item: Apple\nCategory: Fruit\nDescription: Apples are sweet, edible fruits produced by an apple tree. They are rich in fiber and vitamin C.\nNutritional Info (per 100g): Calories: 52, Carbs: 14g, Fiber: 2.4g, Sugar: 10g, Protein: 0.3g\n\nFood Item: Broccoli\nCategory: Vegetable\nDescription: Broccoli is a cruciferous vegetable known for its high levels of vitamin C, K, and fiber. Often eaten steamed or raw.\nNutritional Info (per 100g): Calories: 34, Carbs: 7g, Fiber: 2.6g, Protein: 2.8g\n\nFood Item: Chicken Breast\nCategory: Protein\nDescription: A lean cut of chicken meat, commonly used in healthy diets. It is a great source of protein and low in fat.\nNutritional Info (per 100g): Calories: 165, Protein: 31g, Fat: 3.6g, Carbs: 0g\n\nFood Item: Almonds\nCategory: Nuts\nDescription: Almonds are nutrient-dense nuts packed with healthy fats, vitamin E, magnesium, and antioxidants.\nNutritional Info (per 100g): Calories: 579, 

In [None]:
# Spli data into chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
text_chunks = text_splitter.split_documents(documents)
print(text_chunks[0].page_content)

Food Item: Apple
Category: Fruit
Description: Apples are sweet, edible fruits produced by an apple tree. They are rich in fiber and vitamin C.
Nutritional Info (per 100g): Calories: 52, Carbs: 14g, Fiber: 2.4g, Sugar: 10g, Protein: 0.3g


In [7]:
from langchain.prompts import ChatPromptTemplate

template="""You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question.
If you don't know the answer, just say that you don't know.
Use ten sentences maximum and keep the answer concise.
Question: {question}
Context: {context}
Answer:
"""
prompt = ChatPromptTemplate.from_template(template)

In [None]:
# embedding and generating vectorstore
from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

texts = [chunk.page_content for chunk in text_chunks]
vectorstore = FAISS.from_texts(texts, embeddings)

  from .autonotebook import tqdm as notebook_tqdm





In [9]:
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
from langchain_ollama import OllamaLLM


retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

llm_model = OllamaLLM(
    model="llama3.1:latest",
    base_url="http://localhost:11434"
)

output_parser = StrOutputParser()
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
     | prompt | llm_model | output_parser)



In [10]:
question = "suggest a fiber rich food?"
response = rag_chain.invoke(question)
print(f"Answer: {response}")

Answer: Almonds are a fiber-rich food option. According to the nutritional information provided, one ounce of almonds contains approximately 3.5 grams of fiber. This can contribute to daily dietary needs for healthy digestion and satiety.


In [11]:
question = "Create a protein rich diet?"
response = rag_chain.invoke(question)
print(f"Answer: {response}")

Answer: To create a protein-rich diet, include foods high in protein such as chicken breast and yogurt. Chicken breast is a great source of protein with 31g per 100g. Yogurt is also rich in protein with 10g per 100g. Both are low in fat and calories, making them ideal for healthy diets. Eat chicken breast 2-3 times a week or as part of your main meals. Add yogurt to your breakfast or snack time for an extra protein boost. You can also include other protein-rich foods like fish, eggs, and legumes. Aim to consume at least 1g of protein per kilogram of body weight daily. This will help you meet your protein needs. Consult a healthcare professional or registered dietitian for personalized advice.
