<a href="https://colab.research.google.com/github/Prinella-cyber/FirstGit/blob/main/LangChain_Demo_TMLS_June_2023.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chatting with our friend's plant based recipes

**Context**

- You've been inspired by a friend, and have decided to try eating plant based for a week.
- You really like experimenting in the kitchen, and want to try making everything from scratch this week. No packaged foods!
- You're not sure where to start. Normally you could ask your friend, but they're away right now so they can't help you.
- Your friend sends a text file of all their plant based recipes to you.

Download the text file here : https://drive.google.com/file/d/1xILOymt2HV-zxHd0Zft5olviH1btJEe2/view?usp=sharing

**The Challenge**

You want to use LangChain to build a chat application to:
1. Teach you how to cook plant based
2. Efficiently explore your friend's recipes.

**The Method**

You want to learn the following things:

1. Given an ingredient, what is the best plant based substitute?
- e.g., given milk, the best plant based substitute is almond milk.
2. Given that best plant based substitute, how can you actually make it at home? (Because you want to make everything from scratch!)
- e.g., given the plant base substitue almond milk, you can make it by blending together 1/4 cup of almonds with 1 cup of water.
3. Does your friend have any recipes that use this substitute?
- e.g., do any of the recipes in the file your friend sent you contain almond milk?

**Let's go!**

Let's see how we can use LangChain to solve this!

# 1. Given an ingredient, what is the best plant based substitute?

We don't want to know how to make it yet. We just want to know what the best substitute is. We don't want to overwhelm ourselves with information because we're new to this!

## Set up: Installation, imports, and API key

In [None]:
!pip install langchain

In [None]:
!pip install openai==0.26

In [None]:
import os

OPENAI_API_KEY = 'YOUR-OPEN-AI-API-KEY-HERE'
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

In [None]:
from langchain.llms import OpenAI
from langchain import PromptTemplate

## Getting familiar with the building blocks of LangChain

Remember what the building blocks are? PROMPTS and CHAINS.

### Prompts

What is a prompt? It's a way to talk to a language model. Prompts are a way to query a language model repeatedly for the same purpose.


In [None]:
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain import LLMChain

We know we are going to, potentially, want to know again and again the best plant based substitutes for an ingredient.

We could write "What's the best substitute for XYZ?" over and over again. But with a prompt template, we can easily just parameterize in that 'XYZ'.

From the LangChain docs:

*A prompt template refers to a reproducible way to generate a prompt. It contains a text string (“the template”), that can take in a set of parameters from the end user and generate a prompt.*

*The prompt template may contain:*

*- instructions to the language model,*

*- a set of few shot examples to help the language model generate a better response,*

*- a question to the language model.*



In [None]:
# initialize our llm
llm = OpenAI(
    temperature=0.9,
)

# create prompt
# formatted for ease of readability, but not necessary
# add instructions to say "don't tell me how", and "use whole foods".
# Note that we don't f-string it in.
# Could also say "Don't give me multiple responses. Just one ingredient will do."
vegan_ingredient_template = """
  Convert the non-vegan ingredient to the single best vegan version.
  Don't tell me how to make it. Just tell me the ingredient.
  Use whole foods.

  Ingredient: {ingredient}

  Answer:
"""

# pass into PromptTemplate
vegan_ingredient_prompt = PromptTemplate(
    input_variables=["ingredient"],
    template=vegan_ingredient_template,
)
# and then pass the prompt into our first chain - we will talk more about chains later
vegan_ingredient = LLMChain(
    llm=llm,
    prompt=vegan_ingredient_prompt,
)

Great, so our first prompt has been created. Let's try asking the LLM some questions!

In [None]:
ingredient = "2 to 2.5 tablespoon milk"
print(vegan_ingredient.run(ingredient))

Unsweetened almond milk


Let's try a few more.

In [None]:
ingredient = "1 cup butter"
print(vegan_ingredient.run(ingredient))


1 cup coconut oil


In [None]:
ingredient = "1 pound ground beef"
print(vegan_ingredient.run(ingredient))


  Soy crumbles


Ok great, so at this point we understand a bit more about what the best plant based substitutes are for our ingredients.

And along the way, we have learned how to create and use a simple prompt template.

Next up: let's try actually learning how to make these substitutes at home!

# 2. Given that best plant based substitute, how can you actually make it at home? (Because you want to make everything from scratch!)

As before, let's create a prompt!

In [None]:
vegan_instructions_template = """How do I make this ingredient?

Ingredient: {ingredient}
Instructions: """

vegan_instructions_prompt = PromptTemplate(
    input_variables=["ingredient"],
    template=vegan_instructions_template,
)

vegan_instructions = LLMChain(
    llm=llm,
    prompt=vegan_instructions_prompt
)


Great, like before, we've created a simple prompt. Let's try running it!

In [None]:
ingredient = 'Unsweetened almond milk'
print(vegan_instructions.run(ingredient))



1. Combine 1 cup almonds with 4 cups of water in a high-speed blender.

2. Blend for 1-2 minutes until the almonds are fully broken down and the mixture is creamy.

3. Strain the mixture through a nut milk bag or cheesecloth, discarding the almond pulp.

4. Pour the strained almond milk into a mason jar or airtight container and store in the fridge for up to 4 days. Enjoy!


Hm but actually, you're feeling a bit particular. You want the results to be just in a sentence.

So you give the LLM some instructions on how to construct the output. This is few shot learning!

With LangChain, it's very easy to do this through the use of prompts.

In [None]:
vegan_method_template = """How do I make this ingredient?

Don't create a list. Just output the instructions in a single sentence.

Example 1:
- Ingredient: 'tofu steak'
- The question you understand this as is: 'how do I make tofu steak'?
- Instructions: 'Marinade extra firm tofu for 15 minutes in oil and spices and then grill for 10 minutes'.

Example 2:
- Ingredient: 'tofu aioli'
- The question you understand this as is: 'how do I make tofu aioli'?
- Instructions: 'Blend together tofu, lemon juice, salt, and pepper.'

Ingredient: {ingredient}
Instructions: """

vegan_method_prompt = PromptTemplate(
    input_variables=["ingredient"],
    template=vegan_method_template,
)

vegan_method = LLMChain(
    llm=llm,
    prompt=vegan_method_prompt,
)

ingredient = 'unsweetened almond milk'
print(vegan_method.run(ingredient))

 Blend together equal parts almonds and filtered water, then strain.


That looks better!

But wait, what I did was essentially I did 2 calls to the language model:

1. get the vegan version of an ingredient, by running `vegan_ingredient.run(ingredient)` --> I passed in `milk`, and the LLM gave us `unsweetened almond milk'`.
2. get the method for how to make that ingredient, by running `vegan_method.run(ingredient)`, where I passed in `unsweetened almond milk`.

I copied the output of step 1 and used it as an input to step 2.

So I made two calls to an LLM, where the output of the first is the input to the second.

There's a chain for that in LangChain! This is what I mean when I say LangChain helps make developing LLM applications easier.

## Chains

We're going to CHAIN the calls to the LLM together.

From the LangChain docs:

*Chains allow us to combine multiple components together to create a single, coherent application.*

*For example, we can create a chain that takes user input, formats it with a PromptTemplate, and then passes the formatted response to an LLM*.

This is exactly what we would like to do!


So we want to make two calls to a language model. We can do this using a SequentialChain.

In this series of chains, each individual chain has a single input and a single output, and the output of one step is used as input to the next.

In [None]:
from langchain.chains import SimpleSequentialChain

# instead of making 2 calls to our chains and manually passing in the output of `vegan_ingredient`
# as an input to `vegan_method`, we can just directly use a SimpleSequentialChain.
# SimpleSequentialChain will take care of passing the output of `vegan_ingredient` into `vegan_method` for us!
vegan_ingredient_and_method_chain = SimpleSequentialChain(
    chains=[
        vegan_ingredient,
        vegan_method,
    ],
    verbose=True
)


And that's it ! It's that simple. We have defined our chains, `vegan_ingredient`, and `vegan_method` already.

Now let's call our SimpleSequentialChain!

In [None]:
# Run the chain specifying only the input variable for the first chain.
vegan_ingredient_method = vegan_ingredient_and_method_chain.run(
    "milk"
    )
print(vegan_ingredient_and_method_chain)



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m  Plant-based milk (such as almond, oat, coconut, soy, etc.)[0m
[33;1m[1;3m Blend together water and plant-based milk of your choice in 1:1.5 ratio.[0m

[1m> Finished chain.[0m
memory=None callbacks=None callback_manager=None verbose=True chains=[LLMChain(memory=None, callbacks=None, callback_manager=None, verbose=False, prompt=PromptTemplate(input_variables=['ingredient'], output_parser=None, partial_variables={}, template="\n  Convert the non-vegan ingredient to the single best vegan version.\n  Don't tell me how to make it. Just tell me the ingredient.\n  Use whole foods.\n  Don't give me multiple responses. Just one ingredient will do.\n\n  Ingredient: {ingredient}\n\n  Answer:\n", template_format='f-string', validate_template=True), llm=OpenAI(cache=None, verbose=False, callbacks=None, callback_manager=None, client=<class 'openai.api_resources.completion.Completion'>, model_name='text-davinci-003', temperat

In [None]:
vegan_ingredient_method = vegan_ingredient_and_method_chain.run(
    "1 pound ground beef"
    )
print(vegan_ingredient_method)



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m  1 pound of crumbled tempeh.[0m
[33;1m[1;3m Marinate the crumbled tempeh in a mixture of oil, soy sauce, garlic, and spices for at least 30 minutes before cooking.[0m

[1m> Finished chain.[0m
 Marinate the crumbled tempeh in a mixture of oil, soy sauce, garlic, and spices for at least 30 minutes before cooking.


It's looking good! Let's revisit what we were aiming to achieve:

1. Given an ingredient, what is the best plant based substitute? **DONE!**
2. Given that best plant based substitute, how can you actually make it at home? (Because you want to make everything from scratch!) **DONE!**

But we still need to implement our third requirement:

3. Does your friend have any recipes that use this substitute?

But first, out of curiosity, let's see what it would look like if we were to ask the LLM to generate a recipe for us.

This consists of adding a third and final chain to our SimpleSequentialChain!

In [None]:
# Similar to before!
vegan_recipe_generator_prompt = """
Give me a vegan recipe that uses this ingredient.

Ingredient: {ingredient}
Recipe: """

vegan_recipe_generation_prompt = PromptTemplate(
    input_variables=["ingredient"],
    template=vegan_recipe_generator_prompt,
)

vegan_recipe_generator = LLMChain(
    llm=llm,
    prompt=vegan_recipe_generation_prompt,
)


In [None]:
vegan_recipe_generator_chain = SimpleSequentialChain(
    chains=[
        vegan_ingredient,
        vegan_method,
        vegan_recipe_generator,
    ],
    verbose=True
)


In [None]:
# Run the chain specifying only the input variable for the first chain.
vegan_recipe = vegan_recipe_generator_chain.run(
    "milk"
    )
print(vegan_recipe)



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m  Plant-based milk (e.g. almond milk, coconut milk, oat milk, etc.)[0m
[33;1m[1;3m Blend together 1 part plant-based milk (such as almonds, coconut, oats, etc.) with 3 parts water in a blender until smooth.[0m
[38;5;200m[1;3m
Vegan Milky Oatmeal
Ingredients:

• 2 cups plant-based milk (almonds, coconut, oats, etc.) 
• 6 cups water
• 2 cups rolled oats
• 2 tablespoons maple syrup or agave nectar
• 1 teaspoon ground cinnamon
• ½ teaspoon sea salt
• 1 teaspoon vanilla extract
• 2 tablespoons melted coconut oil
• 1 cup mixed berries (fresh or frozen)
• 2 tablespoons chopped nuts (optional)

Instructions:

1. In a blender, blend together the plant-based milk and the water until smooth. 
2. In a large mixing bowl, combine the milk-water mixture, oats, maple syrup or agave, cinnamon, salt, and vanilla extract and stir until combined.
3. Heat the coconut oil in a non-stick skillet over medium-high heat.
4. Pour the oat 

Great! What's cool about this is if you chain everything together you can see the intermediate results which could be useful for your application, for if you need to debug for example.

Now, of course you could directly just make a similar call through a prompt to an LLM, like the below:

In [None]:
template = """
Convert the non-vegan ingredient to the single best vegan version.
Use whole foods.
Then give me the instructions on how to make it.
Then give me a vegan recipe that uses the vegan version of this ingredient.
Description: {ingredient}

Answer: """

vegan_recipe_prompt_one_shot = PromptTemplate(
    input_variables=["ingredient"],
    template=template,
)

description = "milk"

vegan_recipe_chain_one_shot = LLMChain(
    llm=llm,
    prompt=vegan_recipe_prompt_one_shot,
)
print(vegan_recipe_chain_one_shot.run(description))



Non-vegan ingredient: Milk
Best vegan version: Plant-based milk (e.g. almond, oat, soy, coconut, etc.)
Instructions on how to make it: 
1. Choose the type of plant-based milk you want to make.
2. Place one cup of raw nuts/seeds or rolled oats in a blender.
3. Add four cups of filtered water to the blender.
4. Blend the mixture on high for two minutes.
5. Strain the mixture through a cheesecloth or nut milk bag.
6. Store in the refrigerator for up to five days.

Vegan recipe using plant-based milk: Vegan Chocolate Chip Pancakes 
Ingredients: 
- 1 cup plant-based milk of choice 
- 2 tablespoons of apple cider vinegar 
- 2 cups of all-purpose flour 
- 2 tablespoons of baking powder 
- 1 teaspoon of salt 
- 2 teaspoons of sugar 
- 2 tablespoons of melted vegan butter 
- 1 cup of vegan chocolate chips 
Instructions:
1. In a medium-sized bowl, combine plant-based milk and apple cider vinegar. Let the mixture sit for 5 to 10 minutes


So you can see we can actually generate a recipe by using a single prompt. Why would we use a chain?

Well, remember, we want to actually browse through our friend's recipes in an efficient way. We don't want to use an LLM for ALL of it -- we want to stop part way through and search for recipes in our friend's messy text file to see if anything good exists.

So let's look at how we can connect to a dataset.

# Connect to a database
<!--
Our steps will be as follows:
1. get the vegan version of an ingredient
2. get a method for how to make that ingredient
3. get a recipe using that ingredient. -->


Let's remind ourselves of our goal. We want to learn about plant based eating, and we started at the ingredient level: *Given an ingredient, what is the best plant based substitute?*

We learned how to make that ingredient at home: *Given that best plant based substitute, how can you actually make it at home?*


And then we grabbed a random recipe generated from an LLM using that ingredient.

But we want to look through our friend's ingredients! *Does your friend have any recipes that use this substitute?*

But, if your friend doesn't have a recipe that contains that ingredient, then let's generate something from the LLM.

So our flow will be as follows:

1. Get the vegan version of an ingredient (using vegan_ingredient chain)
2. Get a method for how to make that ingredient (using vegan method chain, or vegan_ingredient_and_method_chain to chain steps 1 and 2 together).
3. Get your friend's recipe using that vegan ingredient, but if your friend doesn't have a recipe with that ingredient, then generate a new recipe.

We will make queries to the LLM for steps 1 and 2, but for step 3, we'll query our own database using our friend's recipes, first. If no recipe shows up, then we'll make a call to the LLM.

### Let's start by loading in our friend's recipes as a database.

In [None]:
! pip install chromadb
# ! pip install tiktoken

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting chromadb
  Downloading chromadb-0.3.26-py3-none-any.whl (123 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m123.6/123.6 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
Collecting requests>=2.28 (from chromadb)
  Downloading requests-2.31.0-py3-none-any.whl (62 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.6/62.6 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
Collecting hnswlib>=0.7 (from chromadb)
  Downloading hnswlib-0.7.0.tar.gz (33 kB)
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting clickhouse-connect>=0.5.7 (from chromadb)
  Downloading clickhouse_connect-0.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (938 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m938.7/938.7

In [None]:
! pip install tiktoken

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tiktoken
  Downloading tiktoken-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m14.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tiktoken
Successfully installed tiktoken-0.4.0


In [None]:
import pandas as pd
from google.colab import files

In [None]:
# Upload our friend's recipes
uploaded = files.upload()

Saving vegan_recipes.txt to vegan_recipes.txt


Great, now that we've loaded our friend's recipes into this notebook, we can get back to LangChain!

We'll load our friend's recipes into LangChain by using a TextLoader.

Once we load the recipes into LangChain, we will add them to a Vector Store.

Essentially a vector store is a database, that is optimized for storing documents, and their embeddings, or mathematical representations. Vector stores work well for cases where you would like to input a query, and then output the document that is the most similar to the query.

In [None]:
from langchain.document_loaders import TextLoader
from langchain.indexes import VectorstoreIndexCreator

# load into LangChain and create a vector store
vegan_recipe_loader = TextLoader('vegan_recipes.txt')
# index the recipes
vegan_recipe_index = VectorstoreIndexCreator().from_loaders([
    vegan_recipe_loader
])


Now that our friend's recipes have been indexed (embedded) we can talk to the database!

So we will ask a question, which will get embedded into some representation by LangChain. Then under the hood, LangChain searches the vector store for the document with the most similar embedding to the query. Then it outputs the result.

In [None]:
question = input('Ask a question: ')
answer = vegan_recipe_index.query(question)
print(answer)

Ask a question: Which recipes do not use almond milk?
 The base and the cream layer recipes do not use almond milk.


Great, so now we know we can "talk" to the database, using natural language! Now we can get back to creating an application of sorts that does the following:

1. Get the vegan version of an ingredient
2. Get a method for how to make that ingredient
3. Get a friend's recipe for that ingredient, but if you friend doesn't have a recipe with that ingredient, then generate a new recipe.

Steps 1 and 2 we already did previously. Let's remind ourselves of what we have already done by looking at the chains themselves.

In [None]:
# Recall the first step: vegan_ingredient chain
vegan_ingredient

LLMChain(memory=None, callbacks=None, callback_manager=None, verbose=False, prompt=PromptTemplate(input_variables=['ingredient'], output_parser=None, partial_variables={}, template="\n  Convert the non-vegan ingredient to the single best vegan version.\n  Don't tell me how to make it. Just tell me the ingredient.\n  Use whole foods.\n  Don't give me multiple responses. Just one ingredient will do.\n\n  Ingredient: {ingredient}\n\n  Answer:\n", template_format='f-string', validate_template=True), llm=OpenAI(cache=None, verbose=False, callbacks=None, callback_manager=None, client=<class 'openai.api_resources.completion.Completion'>, model_name='text-davinci-003', temperature=0.9, max_tokens=256, top_p=1, frequency_penalty=0, presence_penalty=0, n=1, best_of=1, model_kwargs={}, openai_api_key='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', openai_api_base='', openai_organization='', openai_proxy='', batch_size=20, request_timeout=None, logit_bias={}, max_retries=6, streaming=False,

In [None]:
# Our second standalone chain
vegan_method

LLMChain(memory=None, callbacks=None, callback_manager=None, verbose=False, prompt=PromptTemplate(input_variables=['ingredient'], output_parser=None, partial_variables={}, template="How do I make this ingredient?\n\nDon't create a list. Just output the instructions in a single sentence.\n\nExample 1:\n- Ingredient: 'tofu steak'\n- The question you understand this as is: 'how do I make tofu steak'?\n- Instructions: 'Marinade extra firm tofu for 15 minutes in oil and spices and then grill for 10 minutes'.\n\nExample 2:\n- Ingredient: 'tofu aioli'\n- The question you understand this as is: 'how do I make tofu aioli'?\n- Instructions: 'Blend together tofu, lemon juice, salt, and pepper.'\n\nIngredient: {ingredient}\nInstructions: ", template_format='f-string', validate_template=True), llm=OpenAI(cache=None, verbose=False, callbacks=None, callback_manager=None, client=<class 'openai.api_resources.completion.Completion'>, model_name='text-davinci-003', temperature=0.9, max_tokens=256, top_p=

### Now let's see how we can combine vegan_ingredien and method_chain with a call to the database.

In [None]:
INPUT_INGREDIENT = 'milk'

# 1. Get the vegan version of an ingredient using vegan_chain
vegan_ingredient_output = vegan_ingredient.run(INPUT_INGREDIENT)

# 2. Get a method for it using vegan_method
vegan_method_output = vegan_method.run(vegan_ingredient_output)

# 3. At the same time, we'll search our friend's database for a recipe using
# that vegan ingredient, now that we know how to make it from scratch!
database_search_prompt = f"""
Find a recipe in the database that contains {vegan_ingredient_output} as an ingredient.
Output the recipe.
If there is no such recipe, output 'Sorry! Your friend doesn't have a recipe using {vegan_ingredient_output}'.
"""

# We'll query the index, which is like a database
recipe = vegan_recipe_index.query(database_search_prompt)

# here's how we handle generating a new recipe, if your friend doesn't have
# a recipe that uses the vegan version of that query ingredient.
if 'Sorry!' not in recipe:
  print(f'Your friend has a recipe that uses the vegan version of {INPUT_INGREDIENT}! The vegan ingredient becomes {vegan_ingredient_output}, and the recipe is:')
  print(recipe)
  print(f'And here is how you make {vegan_ingredient_output} at home:')
  print(vegan_method_output)

if 'Sorry!' in recipe:
  print(f'Your friend does not have a recipe that uses the vegan version of {INPUT_INGREDIENT}! Let\'s generate one!')

  # llm = OpenAI(temperature=0.9)

  template = """Give me a vegan recipe that uses {vegan_ingredient_output} as an ingredient.
  Recipe: """

  prompt = PromptTemplate(
      input_variables=["vegan_ingredient_output"],
      template=template,
  )
  chain = LLMChain(
      llm=llm,
      prompt=prompt,
    )
  print(f'A recipe with the vegan version of {INPUT_INGREDIENT} does not exist! The vegan ingredient becomes {vegan_ingredient_output}. A new recipe is:')
  recipe = chain.run(vegan_ingredient_output)
  print('Here is our generated recipe:')
  print(recipe)
  print()
  print(f'And here is how you make {vegan_ingredient_output} at home:')
  print(vegan_method_output)

Your friend has a recipe that uses the vegan version of milk! The vegan ingredient becomes   Almond milk and the recipe is:

Oatmeal:

Ingredients:

1 1/2 cups of oats
1/4 cup of flaxseed meal
1 cup of almond milk
3/4 cup of water
2 bananas
1 tablespoon maple syrup
3 tablespoons peanut butter

Directions:

1. In a medium saucepan, combine the oats, flaxseed meal, almond milk, and water.

2. Bring to a boil over medium-high heat, stirring occasionally.

3. Reduce heat to low and simmer for 5 minutes, stirring occasionally.

4. Add the bananas, maple syrup, and peanut butter.

5. Simmer for an additional 5 minutes, stirring occasionally.

6. Serve warm and enjoy!
And here is how you make   Almond milk at home:
 Blend raw almonds and water together until a smooth consistency is achieved.


Anyways, this probably isn't the most efficient but hey at least now you can see how and why you might want to use LangChain to search over a document!

# Building a Q&A Application

Great! Now we're able to talk to your friend's recipe database, just like your friend was here :')

We just developed a way for us to ask single questions to the database through querying it. But we aren't able to have a conversation. A conversation requires memory!

Note that the entry point in the flow we previously built was to input an ingredient, and we used Chains to obtain an output recipe.

But what if we want to just directly "chat" with your friend's database, without having to specify an input ingredient? What if we want to talk to it, using natural language?

## First, let's recall... we've actually done this already!

This is one such way to do it. LangChain has many possibilities! Another way would be using a `question_answering` chain.

In [None]:
question = input('Ask a question: ')
answer = vegan_recipe_index.query(question)
print(answer)

Ask a question: Which of our friend's recipes contain almond milk?
 The oatmeal recipe contains almond milk.


In [None]:
# ! pip install unstructured

In [None]:
# ! pip install pdf2image

Great, so now we asked a single question to the database. Now let's try to have a conversation! Will it work using our existing set up?

In [None]:
question = input('Ask a question: ')
answer = vegan_recipe_index.query(question)
print(answer)

Ask a question: What question did I just ask you? I forget.
 You asked what ingredients are needed for the Roasted Brussels Sprouts recipe.


As you can see, no, this won't work. We need to try something else.

## Let's use a ConversationalRetrievalChain. This will let us actually incorporate memory, which is important in order to have a conversation!

In [None]:
# based off https://python.langchain.com/en/latest/modules/chains/index_examples/chat_vector_db.html
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory

In [None]:
# 1. Load in our recipes.
recipes = vegan_recipe_loader.load()

Now that our recipes are loaded, we will split them into chunks. Because remember, we input a single text document. That is long! We want to be able to search over it. We don't want to just straight up embed the entire document. Then we wouldn't be able to search within it!

So we split our recipe doc into chunks, and then create embeddings of these chunks. This will let us search over every part of the recipe doc.

In [None]:
# 2. Split into characters.
text_splitter = CharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=0,
    )
recipes = text_splitter.split_documents(recipes)




In [None]:
# 3. Create embeddings and add to an embedding store
# we use Chroma because it's popular. *shrug*

embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(
    recipes,
    embeddings,
)

In [None]:
# 4. Add in memory! We'll use ConversationBufferMemory which is the most basic kind of memory.
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True,
)

In [None]:
# 5. Almost there... let's initialize our question answering LLM
vegan_qa_llm = ConversationalRetrievalChain.from_llm(
    OpenAI(temperature=0),
    vectorstore.as_retriever(),
    memory=memory,
)

And with that, we can chat with our friend's recipes!

In [None]:
query = "Which recipes in my friend's database contains tofu as an ingredient?"
result = vegan_qa_llm({"question": query})
print(result['answer'])

 The recipes for Tofu Pesto Aioli and Braised Tofu with Vegetables both contain tofu as an ingredient.


In [None]:
query = "Which recipes in my friend's database contains almond milk as an ingredient?"
result = vegan_qa_llm({"question": query})
print(result['answer'])

 I don't know.


In [None]:
query = "What was the first answer you gave me? Please remind me."
result = vegan_qa_llm({"question": query})
print(result['answer'])

 The two recipes that contain tofu as an ingredient are Tofu Pesto Aioli and Roasted Brussels Sprouts.


In [None]:
# total time: about 35 minutes