# Data Ingestion

## Installing required dependencies

In [1]:
%pip install --q unstructured langchain
%pip install --q "unstructured[all-docs]"
%pip install langchain_community

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## Importing UnstructuredPDFLoader from LangChain

In [2]:
from langchain_community.document_loaders import UnstructuredPDFLoader

## Importing the data in pdf format

In [3]:
local_path = "10k_recipes.pdf"

loader = UnstructuredPDFLoader(file_path=local_path)
data = loader.load()

# Text Chunking and Embedding

## Getting the nomic-embed-text model from Ollama for embedding

In [4]:
!ollama pull nomic-embed-text

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠏ [?25h[?25l[2K[1Gpulling manifest 
pulling 970aa74c0a90... 100% ▕████████████████▏ 274 MB                         
pulling c71d239df917... 100% ▕████████████████▏  11 KB                         
pulling ce4a164fc046... 100% ▕████████████████▏   17 B                         
pulling 31df23ea7daa... 100% ▕████████████████▏  420 B                         
verifying sha256 digest ⠋ [?25h[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 970aa74c0a90... 100% ▕████████████████▏ 274 MB                         
pulling c71d239df917... 100% ▕████████████████▏  11 KB             

## Check to see if the model has been loaded

In [1]:
!ollama list

NAME                   	ID          	SIZE  	MODIFIED     
mistral:latest         	2ae6f6dd7a3d	4.1 GB	10 hours ago	
nomic-embed-text:latest	0a109f422b47	274 MB	10 hours ago	
llama2:latest          	78e26419b446	3.8 GB	3 days ago  	


## Installing required dependencies, chunking the text and creating the vector database with the embedded text

In [6]:
%pip install --q chromadb
%pip install --q langchain-text-splitters

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [7]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma

In [8]:
# Split and chunk 
text_splitter = RecursiveCharacterTextSplitter(chunk_size=7500, chunk_overlap=100)
chunks = text_splitter.split_documents(data)

In [9]:
# Add to vector database
vector_db = Chroma.from_documents(
    documents=chunks, 
    embedding=OllamaEmbeddings(model="nomic-embed-text",show_progress=True),
    collection_name="FoodGPT"
)

OllamaEmbeddings: 100%|█████████████████████| 1098/1098 [03:41<00:00,  4.95it/s]


# Retreieval

## Installing dependencies for prompting and retrieving

In [10]:
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever

## Getting the LLM from Ollama

In [11]:
# LLM from Ollama
local_model = "mistral"
llm = ChatOllama(model=local_model)

## Designing the prompt template to provide context to the LLM

In [12]:
QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate five
    different versions of the given user question to retrieve relevant documents from
    a vector database. By generating multiple perspectives on the user question, your
    goal is to help the user overcome some of the limitations of the distance-based
    similarity search. Provide these alternative questions separated by newlines.
    Original question: {question}""",
)

## Setting up the retriever

In [13]:
retriever = MultiQueryRetriever.from_llm(
    vector_db.as_retriever(), 
    llm,
    prompt=QUERY_PROMPT
)

# RAG prompt
template = """Answer the question based ONLY on the following context:
{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

In [14]:
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

## Pulling the large language model "mistral" from Ollama.ai

In [15]:
!ollama pull mistral

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest 
pulling ff82381e2bea... 100% ▕████████████████▏ 4.1 GB                         
pulling 43070e2d4e53... 100% ▕████████████████▏  11 KB                         
pulling c43332387573... 100% ▕████████████████▏   67 B                         
pulling ed11eda7790d... 100% ▕████████████████▏   30 B                         
pulling 42347cd80dc8... 100% ▕████████████████▏  485 B                         
verifying sha256 digest ⠋ [?25h[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling ff82381e2bea... 100% ▕████████████████▏ 4.1 GB                         
pulling 43070e2d4e53... 100% ▕████████████████▏  11 KB

verifying sha256 digest ⠴ [?25h[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling ff82381e2bea... 100% ▕████████████████▏ 4.1 GB                         
pulling 43070e2d4e53... 100% ▕████████████████▏  11 KB                         
pulling c43332387573... 100% ▕████████████████▏   67 B                         
pulling ed11eda7790d... 100% ▕████████████████▏   30 B                         
pulling 42347cd80dc8... 100% ▕████████████████▏  485 B                         
verifying sha256 digest ⠦ [?25h[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling ff82381e2bea... 100% ▕████████████████▏ 4.1 GB                         
pulling 43070e2d4e53... 100% ▕████████████████▏  11 KB                         
pulling c43332387573... 100% ▕████████████████▏   67 B                         
pulling ed11eda7790d... 100% ▕████████████████▏   30 B                         
pulling 42347cd80dc8

# Testing the RAG

## Prompt 1

In [16]:
final_output_1 = chain.invoke("Can you suggest a vegetarian pasta recipe?")
cleaned_output_1 = final_output_1.replace('\n', '')
cleaned_output_1 = cleaned_output_1.replace('-', ',')

# Print the cleaned output
print(cleaned_output_1)

OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00,  2.71it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 66.48it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 71.76it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 80.78it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 79.76it/s]


 One of the recipes in your list is not a pasta dish, but I can suggest a simple and delicious Vegetarian Pasta Primavera Recipe for you. Here it is:Title: Vegetarian Pasta PrimaveraRecipe:, 12 oz (340 g) spaghetti or linguine pasta, 2 tbsp olive oil, 2 cups mixed chopped vegetables (such as bell peppers, zucchini, squash, broccoli, and carrots), 3 cloves garlic, minced, 1 cup cherry tomatoes, halved, Salt and pepper to taste, 1/2 cup vegetable broth, 1/4 cup grated Parmesan cheese, 2 tbsp chopped fresh basil or parsley (optional), Red pepper flakes for garnish (optional)Instructions:1. Cook the pasta according to package directions until al dente, then drain and set aside.2. In a large skillet, heat olive oil over medium heat. Add the chopped vegetables, minced garlic, salt, and pepper. Cook for about 5,7 minutes or until the vegetables are tender.3. Add cherry tomatoes and vegetable broth to the skillet. Reduce heat to low and simmer for another 5 minutes.4. Toss the cooked pasta wit

## Prompt 2

In [17]:
final_output_2 = chain.invoke("How do I make a classic French dessert?")
cleaned_output_2 = final_output_2.replace('\n', '')
cleaned_output_2 = cleaned_output_2.replace('-', ',')

# Print the cleaned output
print(cleaned_output_2)

OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00,  1.58it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 51.56it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 64.54it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 58.46it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 60.34it/s]


 To make a classic French dessert called Crème Brûlée, you will need the following ingredients and equipment:Ingredients:, 6 egg yolks, 1/2 cup granulated sugar, 3 cups heavy cream, 1 vanilla bean, split or 1 teaspoon of vanilla extract, Superfine or baker's sugar (for caramelizing the top)Equipment:, A large mixing bowl and a small one for straining, Whisk or electric mixer, 6 ramekins or oven,safe dishes, around 1 cup capacity each, Water bath or roasting pan with a rack, Kitchen torch or broilerInstructions:1. Preheat your oven to 325°F (160°C). Fill a large roasting pan or water bath halfway with hot water.2. In the large mixing bowl, whisk together egg yolks and granulated sugar until the mixture becomes pale yellow and smooth.3. In a saucepan over medium heat, combine heavy cream and vanilla bean (or extract). Heat the cream just until it begins to steam, but do not let it boil. Remove from heat.4. Gradually add the hot cream to the egg yolk mixture while continuously whisking. O

## Prompt 3

In [18]:
final_output_3 = chain.invoke("What's a quick and easy recipe for a weeknight dinner?")
cleaned_output_3 = final_output_3.replace('\n', '')
cleaned_output_3 = cleaned_output_3.replace('-', ',')

# Print the cleaned output
print(cleaned_output_3)

OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00,  1.59it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 51.90it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 68.48it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 69.11it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 73.25it/s]


 One quick and easy recipe for a weeknight dinner is the "Delicious Casserole" which includes hamburger meat, onions or onion salt, raw potatoes, cream of mushroom soup, and vegetable beef soup. Brown the hamburger meat with onions or onion salt, slice raw potatoes into a casserole dish, top with browned meat, heat together the soups, pour over potatoes and meat, cover and bake for at least 1 hour and 15 minutes. This recipe usually does not need additional seasoning.


## Prompt 4

In [19]:
final_output_4 = chain.invoke("Do you have any healthy smoothie recipes for breakfast?")
cleaned_output_4 = final_output_4.replace('\n', '')
cleaned_output_4 = cleaned_output_4.replace('-', ',')

# Print the cleaned output
print(cleaned_output_4)

OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00,  2.77it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 55.00it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 52.02it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 59.91it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 71.76it/s]


 Title: Green Smoothie BowlRecipe:2 cups Spinach, packed1 banana, frozen and sliced1 cup pineapple chunks, frozen1/2 avocado1 cup almond milk (or other preferred non,dairy milk)1 tablespoon chia seeds1 tablespoon flaxseed meal1 tablespoon honey or agave syrup (optional)1 cup ice cubesToppings: granola, fresh berries, sliced almonds, coconut flakes, and more!Instructions:1. Blend spinach, banana, pineapple, avocado, almond milk, chia seeds, flaxseed meal, honey or agave syrup (if using), and ice cubes in a blender until smooth.2. Pour the smoothie into a bowl.3. Top with granola, fresh berries, sliced almonds, coconut flakes, or any other desired toppings.4. Enjoy immediately!


## Prompt 5 out of context

In [20]:
final_output_5 = chain.invoke("What is the distance of earth from the moon")
cleaned_output_5 = final_output_5.replace('\n', '')
cleaned_output_5 = cleaned_output_5.replace('-', ',')

# Print the cleaned output
print(cleaned_output_5)

OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00,  1.60it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 55.99it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 54.60it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 58.97it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 56.30it/s]


 The question you asked is not related to the provided recipes. However, the average distance between Earth and the Moon is approximately 238,900 miles (384,400 kilometers). This distance varies slightly due to the elliptical shape of the Moon's orbit around the Earth.
