# NER using `DSPy`
In this notebook, we want to try to evaluate and use [DSPy](https://dspy-docs.vercel.app/) for *Named Entity Recognition* (NER) use cases. This uses LLMs.

DSPy:
> DSPy is a framework for algorithmically **optimizing** LM **prompts** and **weights**, mainly when LMs are used one or more times within a pipeline

Some *prompt engineering* techniques that are useful for *NER* problems are:
* Function/tools calling
* Zero/few-shot examples using `domain` instructions
* Chain-of-Thought (CoT) prompting
* Prompt-CHaining

We want to build a data extraction system using NER to extract food-related entities from online recipes. 

We are interested in the following entities:
* `FOOD`
* `QUANTITY`
* `UNIT`
* `PHYSICAL_QUALITY`
* `COLOR`

# 0. Setup environment

In [115]:
import dspy
import nest_asyncio
from pydantic import BaseModel, Field
from dspy.functional import TypedPredictor
from IPython.display import Markdown, display
from typing import List, Optional, Union
from dotenv import load_dotenv
from devtools import pprint

Load env variables see `.env.example`

In [2]:
assert load_dotenv() == True

Init some selected LLMs

In [3]:
gpt4 = dspy.OpenAI(model="gpt-4-turbo-preview", max_tokens=4096, model_type="chat")
gpt_turbo = dspy.OpenAI(model="gpt-3.5-turbo", max_tokens=4096, model_type="chat")

Configure `DSPy` to use `gpt-3.5-turbo` as default

In [551]:
dspy.settings.configure(lm=gpt4)

In [5]:
gpt_turbo("Why is 42 seen as the meaning of life?")

['The number 42 being seen as the meaning of life comes from Douglas Adams\' science fiction series "The Hitchhiker\'s Guide to the Galaxy." In the series, a supercomputer named Deep Thought is asked to find the answer to the ultimate question of life, the universe, and everything. After much calculation, Deep Thought reveals that the answer is simply the number 42, but the actual question is unknown.\n\nThe significance of 42 as the meaning of life has since become a popular cultural reference and meme, often used humorously to suggest that life\'s ultimate purpose is unknowable or absurd. It has also been interpreted as a commentary on the futility of seeking a single, definitive answer to the mysteries of existence.']

In [6]:
gpt4("Why is 42 seen as the meaning of life?")

['The notion that 42 is the "meaning of life" comes from Douglas Adams\' science fiction series, "The Hitchhiker\'s Guide to the Galaxy." In the story, a supercomputer named Deep Thought is asked to find the "Answer to the Ultimate Question of Life, the Universe, and Everything." After much computation—7.5 million years\' worth—Deep Thought reveals that the answer is, somewhat anticlimactically, 42.\n\nHowever, Deep Thought also points out that the problem is actually that nobody knows what the Ultimate Question is. Therefore, while 42 is the Answer, it doesn\'t make much sense without knowing the corresponding Question. This humorous and absurd take on the search for meaning in life has led to 42 becoming a cultural reference point and a symbol for an elusive or absurd answer to a complex or impossible question.\n\nDouglas Adams later explained that he chose the number 42 as the answer simply because it was a funny number, not intending any deeper meaning, which in itself became a par

# 1. Loading data

In [7]:
with open("./data/recipe.md", "r") as f:
    train_data = f.read()

In [8]:
display(Markdown(train_data))

### Chashu pork (for ramen and more)
Chashu pork is a classic way to prepare pork belly for Japanese dishes such as ramen. While it takes a little time, it's relatively hands off and easy, and the result is delicious.

### Prep Time
15 minutes

### Cook Time
2hours 5minutes

### Rest time (approx)
12 hours hrs

### Total Time
2 hours 20 minutes 

### Course
Main Course

###
Cuisine
Japanese 

### Servings
4 or more, depending how used 

### Calories: 1378kcal 

### Author: Caroline's Cooking 


### Ingredients
* 2 lb pork belly or a little more/less
* 2 green onions spring onions, or 3 if small
* 1 in fresh ginger (a chunk that will give around 4 - 6 slices)
* 2 cloves garlic
* ⅔ cup sake
* ⅔ cup soy sauce
* ¼ cup mirin
* ½ cup sugar
* 2 cups water or a little more as needed

US Customary - Metric

### Instructions
1. Have some string/kitchen twine ready and a pair of scissors before you prepare the pork. If needed, trim excess fat from the outside of the pork but you still want a later over the meat itself. Roll the piece of pork belly with the fatter end in the middle, and the flatter part making the roll around it. It may be a little tricky to hold together, but try your best as you tie it. Loop a piece of string around one end of the rolled pork and tie it in a double knot so that it holds the pork together. Then, taker the long end of the string and make additional loops around the pork, around ⅔in/2cm apart (or closer), looping through as you go and tightening to make a row of loops, linked together, along the roll. Then loop through over both ends so that it holds the ends in a little better as well and tie off and cut the string. (See pictures and video if you need a bit more guidance on how.)
   
2. Cut the piece of ginger into slices - there's no need to peel it. Peel and trim the ends from the garlic. Trim both ends off the green/spring onions and cut into long lengths.

3. Warm a skillet/frying pan large enough to hold the pork but not much bigger, ideally cast iron or at least heavy based. Then sear the pork on all sides to gently brown and seal the pork and fat, including the ends. Start with a fatty side so that some of the fat releases to help with the searing. Once browned all over, remove from skillet and set aside. You can discard the excess fat that has come out.

4. Meanwhile, place the sake, soy sauce, mirin, sugar, water and the prepared ginger, garlic and green onions in a pot. You want a pot that is just a little wider than the piece of rolled pork so that it will be mostly covered by the liquid once added. Place over a medium heat, covered, and bring to a simmer, stirring now and then to ensure sugar dissolves.

5. Once the sake-soy mixture is simmering, and you have seared the pork, add the pork into the liquid. The liquid should be at least ⅔ up the pork, if not a little more - you can add a little more water if needed. Cover with the lid and bring back to a simmer then reduce the heat so that it simmers gently. Cook for about two hours, turning roughly every 30 minutes so that each side is submerged half the time. If the liquid gets below halfway, then add a little more water.

6. After two hours, the pork will be cooked through and should be fairly tender. Turn off the heat and leave the pork to cool in the cooking liquid at least 10 minutes, but you can leave until room temperature. Once cool, remove the pork and set aside a minute. Strain the cooking liquid into a measuring jug to remove the ginger etc. Let it sit a few minutes so that the fat floats to the top then spoon off the fat and discard.

7. Place the pork inside a freezer bag, just a little bigger than the pork itself, and sit it in a dish in case of any spills. Then, pour some of the cooking liquid into the bag so that the pork is submerged. Carefully remove excess air from the bag, seal up and then place in the fridge overnight. Don't skip this step, as it helps the pork firm up, but also tenderize further and take in more flavor.

8. The next day, when ready to use, carefully take the pork out of the liquid and remove any fat that has solidified on it (that's not park of the pork belly itself). Cut off the string then cut the pork into relatively thin slices, so that you get coiled pieces of the pork belly.

9. To prepare it to top ramen, warm some of the cooking liquid in a skillet and then add the pork, a few slices at a time so that they form a single layer. Simmer a couple minutes, carefully turning as needed, so that the pork warms through and gently caramelizes in the liquid. For chashu fried rice, you will instead want to chop the pork further and then fry, along with the rice.

In [9]:
print(train_data[0:100])

### Chashu pork (for ramen and more)
Chashu pork is a classic way to prepare pork belly for Japanese


# 2. Define `data models`
We use `pydantic` to define the output schema of the data that we want to extract using the `DSPy` program. This in order to be able to use the [TypedPredictor](https://dspy-docs.vercel.app/docs/building-blocks/typed_predictors). The data model we will define will be called `FoodEntity` and `FoodEntities`.

In [135]:
class FoodMetaData(BaseModel):
    reasoning: str = Field(description="Reasoning for why the entity is correct")
    value: Union[str, int] = Field(description="Value of the entity")
    entity: str = Field(description="The actual entity i.e. pork, onions etc")

class FoodMetaDatas(BaseModel):
    context: List[FoodMetaData]

In [119]:
FoodMetaData.model_json_schema()

{'properties': {'reasoning': {'description': 'Reasoning for why the entity is correct',
   'title': 'Reasoning',
   'type': 'string'},
  'value': {'anyOf': [{'type': 'string'}, {'type': 'integer'}],
   'description': 'Value of the entity',
   'title': 'Value'},
  'entity': {'description': 'The actual entity i.e. pork, onions etc',
   'title': 'Entity',
   'type': 'string'}},
 'required': ['reasoning', 'value', 'entity'],
 'title': 'FoodMetaData',
 'type': 'object'}

In [120]:
FoodMetaDatas.model_json_schema()

{'$defs': {'FoodMetaData': {'properties': {'reasoning': {'description': 'Reasoning for why the entity is correct',
     'title': 'Reasoning',
     'type': 'string'},
    'value': {'anyOf': [{'type': 'string'}, {'type': 'integer'}],
     'description': 'Value of the entity',
     'title': 'Value'},
    'entity': {'description': 'The actual entity i.e. pork, onions etc',
     'title': 'Entity',
     'type': 'string'}},
   'required': ['reasoning', 'value', 'entity'],
   'title': 'FoodMetaData',
   'type': 'object'}},
 'properties': {'metadatas': {'items': {'$ref': '#/$defs/FoodMetaData'},
   'title': 'Metadatas',
   'type': 'array'}},
 'required': ['metadatas'],
 'title': 'FoodMetaDatas',
 'type': 'object'}

In [565]:
class FoodEntity(BaseModel):
    food: str = Field(description="This can be both liquid and solid food such as meat, vegetables, alcohol, etc")
    quantity: int = Field(description="The exact quantity or amount of the food that should be used in the recipe")
    unit: str = Field(description="The unit being used e.g. grams, milliliters, pounds, etc")
    physical_quality: Optional[str] = Field(description="The characteristic of the ingredient")
    color: str = Field(description="The color of the food")

class FoodEntities(BaseModel):
    entities: List[FoodEntity]

The schemas above define our desired output format, a JSON object with one or several `FoodEntities`. Notice that we set some validation checks as part of the schema. This is mainly for demonstration purposes.

In [566]:
FoodEntity.model_json_schema()

{'properties': {'food': {'description': 'This can be both liquid and solid food such as meat, vegetables, alcohol, etc',
   'title': 'Food',
   'type': 'string'},
  'quantity': {'description': 'The exact quantity or amount of the food that should be used in the recipe',
   'title': 'Quantity',
   'type': 'integer'},
  'unit': {'description': 'The unit being used e.g. grams, milliliters, pounds, etc',
   'title': 'Unit',
   'type': 'string'},
  'physical_quality': {'anyOf': [{'type': 'string'}, {'type': 'null'}],
   'description': 'The characteristic of the ingredient',
   'title': 'Physical Quality'},
  'color': {'description': 'The color of the food',
   'title': 'Color',
   'type': 'string'}},
 'required': ['food', 'quantity', 'unit', 'physical_quality', 'color'],
 'title': 'FoodEntity',
 'type': 'object'}

In [567]:
FoodEntities.model_json_schema()

{'$defs': {'FoodEntity': {'properties': {'food': {'description': 'This can be both liquid and solid food such as meat, vegetables, alcohol, etc',
     'title': 'Food',
     'type': 'string'},
    'quantity': {'description': 'The exact quantity or amount of the food that should be used in the recipe',
     'title': 'Quantity',
     'type': 'integer'},
    'unit': {'description': 'The unit being used e.g. grams, milliliters, pounds, etc',
     'title': 'Unit',
     'type': 'string'},
    'physical_quality': {'anyOf': [{'type': 'string'}, {'type': 'null'}],
     'description': 'The characteristic of the ingredient',
     'title': 'Physical Quality'},
    'color': {'description': 'The color of the food',
     'title': 'Color',
     'type': 'string'}},
   'required': ['food', 'quantity', 'unit', 'physical_quality', 'color'],
   'title': 'FoodEntity',
   'type': 'object'}},
 'properties': {'entities': {'items': {'$ref': '#/$defs/FoodEntity'},
   'title': 'Entities',
   'type': 'array'}},
 'r

# 3. DSPy Program for NER
A `DSPy` program consists of `Signatures` and `Modules.` These can then be optimized using `Teleprompters` and finally be compiled by the `Compiler.` This by using data for your program.

A **signature** consists of three simple elements:

> A minimal description of the sub-task the LM is supposed to solve.
A description of one or more input fields (e.g., input question) that we will give to the LM.
A description of one or more output fields (e.g., the question's answer) that we will expect from the LM

In our case we will be using a `Typed Signature` as we want the LLM to follow our pydantic schema.

### 1. Create `Signature`
Looking in to the code of `DSPy` the `InputField` and `OutPutFields` some to be wrappers for the `Field` object in `Pydantic`. See more [here](https://github.com/stanfordnlp/dspy/blob/main/dspy/signatures/field.py#L29)

In [568]:
class RecipeToFoodContext(dspy.Signature):
    """You are a food AI assistant. Your task is to extract the entity, the value of the entity and the reasoning 
    for why the extracted value is the correct value. If you cannot extract the entity, add null"""
    recipe: str = dspy.InputField()
    context: FoodMetaDatas = dspy.OutputField()

In [569]:
class RecipeToFoodEntities(dspy.Signature):
    """You are a food AI assistant. Your task is to extract food-related metadata from recipes."""
    recipe: str = dspy.InputField()
    entities: FoodEntities = dspy.OutputField()

Let's try the `TypedPredictor` with some dummy example

In [570]:
predictor_context = dspy.TypedPredictor(RecipeToFoodContext)

In [571]:
# Run with GPT-4 instead
with dspy.context(lm=gpt4):
    dummy_context = predictor_context(recipe="Ten grams of orange dutch cheese, 2 liters of water and 5 ml of ice")
    pprint(dummy_context.context)

FoodMetaDatas(
    context=[
        FoodMetaData(
            reasoning='The recipe specifies the quantity of orange dutch cheese as ten grams.',
            value='10 grams',
            entity='orange dutch cheese',
        ),
        FoodMetaData(
            reasoning='The recipe specifies the quantity of water as 2 liters.',
            value='2 liters',
            entity='water',
        ),
        FoodMetaData(
            reasoning='The recipe specifies the quantity of ice as 5 ml, indicating the volume of ice used.',
            value='5 ml',
            entity='ice',
        ),
    ],
)


In [572]:
gpt4.inspect_history(n=1)





You are a food AI assistant. Your task is to extract the entity, the value of the entity and the reasoning 
    for why the extracted value is the correct value. If you cannot extract the entity, add null

---

Follow the following format.

Recipe: ${recipe}
Context: ${context}. Respond with a single JSON object. JSON Schema: {"$defs": {"FoodMetaData": {"properties": {"reasoning": {"description": "Reasoning for why the entity is correct", "title": "Reasoning", "type": "string"}, "value": {"anyOf": [{"type": "string"}, {"type": "integer"}], "description": "Value of the entity", "title": "Value"}, "entity": {"description": "The actual entity i.e. pork, onions etc", "title": "Entity", "type": "string"}}, "required": ["reasoning", "value", "entity"], "title": "FoodMetaData", "type": "object"}}, "properties": {"context": {"items": {"$ref": "#/$defs/FoodMetaData"}, "title": "Context", "type": "array"}}, "required": ["context"], "title": "FoodMetaDatas", "type": "object"}

---

Recipe: Te

In [573]:
predictor = dspy.TypedPredictor(RecipeToFoodEntities)

In [574]:
# Run with GPT-4 instead
with dspy.context(lm=gpt4):
    dummy_recipe = predictor(recipe="Ten grams of orange dutch cheese, 2 liters of water and 5 ml of ice")
    pprint(dummy_recipe.entities)

FoodEntities(
    entities=[
        FoodEntity(
            food='dutch cheese',
            quantity=10,
            unit='grams',
            physical_quality=None,
            color='orange',
        ),
        FoodEntity(
            food='water',
            quantity=2,
            unit='liters',
            physical_quality=None,
            color='clear',
        ),
        FoodEntity(
            food='ice',
            quantity=5,
            unit='ml',
            physical_quality=None,
            color='clear',
        ),
    ],
)


Using `inspect_history` below, we can take a look a the resulting prompt that is created

In [575]:
gpt4.inspect_history(n=1)





You are a food AI assistant. Your task is to extract food-related metadata from recipes.

---

Follow the following format.

Recipe: ${recipe}

Past Error (entities): An error to avoid in the future

Entities:
${entities}. Respond with a single JSON object. 
You MUST use this format: ```json
{
  "entities": [
    {
      "food": "chicken breast",
      "quantity": 2,
      "unit": "pieces",
      "physical_quality": "boneless",
      "color": "white"
    }
  ]
}
```
JSON Schema: {"$defs": {"FoodEntity": {"properties": {"food": {"description": "This can be both liquid and solid food such as meat, vegetables, alcohol, etc", "title": "Food", "type": "string"}, "quantity": {"description": "The exact quantity or amount of the food that should be used in the recipe", "title": "Quantity", "type": "integer"}, "unit": {"description": "The unit being used e.g. grams, milliliters, pounds, etc", "title": "Unit", "type": "string"}, "physical_quality": {"anyOf": [{"type": "string"}, {"type": "nu

## 2. Create the `Module`
A DSPy module is a building block for programs that use LMs.

> Each built-in module abstracts a prompting technique (like chain of thought or ReAct). Crucially, they are generalized to handle any [DSPy Signature].

> Your __init__ method declares the modules you will use.

> Your forward method expresses any computation you want to do with your modules

#### Using `dspy.Module`

Let's define a helper method i.e. `parse_context` to present the parsed context in a pretty format for the LLM

In [576]:
def parse_context(food_context: FoodMetaDatas) -> str:
    context_str = ""
    for context in food_context:
        context: FoodMetaData
        context_str += f"{context.entity}:\n" + context.model_dump_json(indent=4) + "\n"
    return context_str

In [588]:
class ExtractFoodEntities(dspy.Module):
    def __init__(self, temperature: int = 0, seed: int = 123):
        super().__init__()
        self.temperature = temperature
        self.seed = seed
        self.extract_food_context = dspy.TypedPredictor(RecipeToFoodContext)
        self.extract_food_context_cot = dspy.TypedChainOfThought(RecipeToFoodContext)
        self.extract_food_entities = dspy.TypedPredictor(RecipeToFoodEntities)
        
    def forward(self, recipe: str) -> FoodEntities:
        food_context = self.extract_food_context(recipe=recipe).context
        parsed_context = parse_context(food_context.context)
        food_entities = self.extract_food_entities(recipe=parsed_context)
        return food_entities.entities

In [580]:
extract_food_entities = ExtractFoodEntities()

with dspy.context(lm=gpt4):
    entities = extract_food_entities(recipe="Ten grams of orange dutch cheese, 2 liters of water and 5 ml of ice")
    pprint(entities)

FoodEntities(
    entities=[
        FoodEntity(
            food='orange dutch cheese',
            quantity=10,
            unit='grams',
            physical_quality=None,
            color='orange',
        ),
        FoodEntity(
            food='water',
            quantity=2000,
            unit='milliliters',
            physical_quality=None,
            color='clear',
        ),
        FoodEntity(
            food='ice',
            quantity=5,
            unit='milliliters',
            physical_quality=None,
            color='clear',
        ),
    ],
)


Test with `train` data

In [591]:
with dspy.context(lm=gpt4):
    entities = extract_food_entities(recipe=train_data)
    pprint(entities)

FoodEntities(
    entities=[
        FoodEntity(
            food='pork belly',
            quantity=2,
            unit='lb',
            physical_quality=None,
            color='',
        ),
        FoodEntity(
            food='green onions',
            quantity=2,
            unit='count',
            physical_quality='or 3 if small',
            color='green',
        ),
        FoodEntity(
            food='fresh ginger',
            quantity=1,
            unit='inch',
            physical_quality='chunk',
            color='',
        ),
        FoodEntity(
            food='garlic',
            quantity=2,
            unit='cloves',
            physical_quality=None,
            color='',
        ),
        FoodEntity(
            food='sake',
            quantity=158,
            unit='milliliters',
            physical_quality=None,
            color='',
        ),
        FoodEntity(
            food='soy sauce',
            quantity=158,
            unit='milliliters',


In [593]:
entities.entities

[FoodEntity(food='pork belly', quantity=2, unit='lb', physical_quality=None, color=''),
 FoodEntity(food='green onions', quantity=2, unit='count', physical_quality='or 3 if small', color='green'),
 FoodEntity(food='fresh ginger', quantity=1, unit='inch', physical_quality='chunk', color=''),
 FoodEntity(food='garlic', quantity=2, unit='cloves', physical_quality=None, color=''),
 FoodEntity(food='sake', quantity=158, unit='milliliters', physical_quality=None, color=''),
 FoodEntity(food='soy sauce', quantity=158, unit='milliliters', physical_quality=None, color='dark'),
 FoodEntity(food='mirin', quantity=59, unit='milliliters', physical_quality=None, color=''),
 FoodEntity(food='sugar', quantity=118, unit='milliliters', physical_quality=None, color='white'),
 FoodEntity(food='water', quantity=473, unit='milliliters', physical_quality='or a little more as needed', color='clear')]

#### Using `FunctionalModule`

In [603]:
from dspy.functional import FunctionalModule, predictor, cot

class ExtractFoodEntitiesV2(FunctionalModule):
    def __init__(self, temperature: int = 0, seed: int = 123):
        super().__init__()
        self.temperature = temperature
        self.seed = seed

    @predictor
    def extract_food_context(self, recipe: str) -> FoodMetaDatas:
        """You are a food AI assistant. Your task is to extract the entity, the value of the entity and the reasoning 
        for why the extracted value is the correct value. If you cannot extract the entity, add null"""
        pass

    @cot
    def extract_food_context_cot(self, recipe: str) -> FoodMetaDatas:
        """You are a food AI assistant. Your task is to extract the entity, the value of the entity and the reasoning 
        for why the extracted value is the correct value. If you cannot extract the entity, add null"""
        pass
    
    @predictor
    def extract_food_entities(self, recipe: str) -> FoodEntities:
        """You are a food AI assistant. Your task is to extract food entities from a recipe."""
        pass
        
    def forward(self, recipe: str) -> FoodEntities:
        food_context = self.extract_food_context(recipe=recipe)
        parsed_context = parse_context(food_context.context)
        food_entities = self.extract_food_entities(recipe=parsed_context)
        return food_entities

In [604]:
extract_food_entities_v2 = ExtractFoodEntitiesV2()

with dspy.context(lm=gpt4):
    entities = extract_food_entities_v2(recipe="Ten grams of orange dutch cheese, 2 liters of water and 5 ml of ice")
    pprint(entities)

FoodEntities(
    entities=[
        FoodEntity(
            food='orange dutch cheese',
            quantity=10,
            unit='grams',
            physical_quality=None,
            color='orange',
        ),
        FoodEntity(
            food='water',
            quantity=2000,
            unit='milliliters',
            physical_quality=None,
            color='clear',
        ),
        FoodEntity(
            food='ice',
            quantity=5,
            unit='milliliters',
            physical_quality='solid',
            color='clear',
        ),
    ],
)


Test on the `training` data

In [605]:
with dspy.context(lm=gpt4):
    entities = extract_food_entities_v2(recipe=train_data)
    pprint(entities)

FoodEntities(
    entities=[
        FoodEntity(
            food='pork belly',
            quantity=2,
            unit='lb',
            physical_quality=None,
            color='',
        ),
        FoodEntity(
            food='green onions',
            quantity=2,
            unit='items',
            physical_quality='or 3 if small',
            color='',
        ),
        FoodEntity(
            food='fresh ginger',
            quantity=1,
            unit='inch',
            physical_quality='chunk',
            color='',
        ),
        FoodEntity(
            food='garlic',
            quantity=2,
            unit='cloves',
            physical_quality=None,
            color='',
        ),
        FoodEntity(
            food='sake',
            quantity=2,
            unit='⅔ cup',
            physical_quality=None,
            color='',
        ),
        FoodEntity(
            food='soy sauce',
            quantity=2,
            unit='⅔ cup',
            physical_

And below we have some of the resulting `entities`

In [607]:
pprint(entities)

FoodEntities(
    entities=[
        FoodEntity(
            food='pork belly',
            quantity=2,
            unit='lb',
            physical_quality=None,
            color='',
        ),
        FoodEntity(
            food='green onions',
            quantity=2,
            unit='items',
            physical_quality='or 3 if small',
            color='',
        ),
        FoodEntity(
            food='fresh ginger',
            quantity=1,
            unit='inch',
            physical_quality='chunk',
            color='',
        ),
        FoodEntity(
            food='garlic',
            quantity=2,
            unit='cloves',
            physical_quality=None,
            color='',
        ),
        FoodEntity(
            food='sake',
            quantity=2,
            unit='⅔ cup',
            physical_quality=None,
            color='',
        ),
        FoodEntity(
            food='soy sauce',
            quantity=2,
            unit='⅔ cup',
            physical_

Using the `inspect_history` we can look at the underlying prompt that is being used

In [608]:
gpt4.inspect_history(n=1)





You are a food AI assistant. Your task is to extract food entities from a recipe.

---

Follow the following format.

Recipe: ${recipe}
Extract Food Entities: ${extract_food_entities}. Respond with a single JSON object. JSON Schema: {"$defs": {"FoodEntity": {"properties": {"food": {"description": "This can be both liquid and solid food such as meat, vegetables, alcohol, etc", "title": "Food", "type": "string"}, "quantity": {"description": "The exact quantity or amount of the food that should be used in the recipe", "title": "Quantity", "type": "integer"}, "unit": {"description": "The unit being used e.g. grams, milliliters, pounds, etc", "title": "Unit", "type": "string"}, "physical_quality": {"anyOf": [{"type": "string"}, {"type": "null"}], "description": "The characteristic of the ingredient", "title": "Physical Quality"}, "color": {"description": "The color of the food", "title": "Color", "type": "string"}}, "required": ["food", "quantity", "unit", "physical_quality", "color"], 

### 3. Optimize program using `teleprompter` and `compile` it

> For large LMs, this is primarily in the form of creating and validating good demonstrations for inclusion in your prompt(s).

**Teleprompters**:
> Teleprompters act as optimizers for DSPy programs. They take a metric and, together with the DSPy compiler, learn to bootstrap and select effective prompts for a DSPy program’s modules

**Compiler**:
> The DSPy compiler will internally trace your program and then optimize it using an optimizer (teleprompter) to maximize a given metric

In [609]:
def validate_entities(example, pred, trace=None):
    """Check if both objects are equal"""
    return example.entities == pred

In [610]:
# create some dummy data for training
trainset = [
    dspy.Example(
        recipe="French omelett with 2 eggs, 500grams of butter and 10 grams gruyere", 
        entities=[
            FoodEntity(food="eggs", quantity=2, unit="", physical_quality="", color="white"),
            FoodEntity(food="butter", quantity=500, unit="grams", physical_quality="", color="yellow"),
            FoodEntity(food="cheese", quantity=10, unit="grams", physical_quality="gruyer", color="yellow")
        ]
    ).with_inputs("recipe"),
    dspy.Example(
        recipe="200 grams of Ramen noodles bowel with one pickled egg, 500grams of pork, and 1 spring onion", 
        entities=[
            FoodEntity(food="egg", quantity=1, unit="", physical_quality="pickled", color="ivory"),
            FoodEntity(food="ramen nudles", quantity=200, unit="grams", physical_quality="", color="yellow"),
            FoodEntity(food="spring onion", quantity=1, unit="", physical_quality="", color="white")
        ]
    ).with_inputs("recipe"),
    dspy.Example(
        recipe="10 grams of dutch orange cheese, 2 liters of water, and 5 ml of ice", 
        entities=[
            FoodEntity(food="cheese", quantity=10, unit="grams", physical_quality="", color="orange"),
            FoodEntity(food="water", quantity=2, unit="liters", physical_quality="translucent", color=""),
            FoodEntity(food="ice", quantity=5, unit="militers", physical_quality="cold", color="white")
        ]
    ).with_inputs("recipe"),
    dspy.Example(
        recipe="Pasta carbonara, 250 grams of pasta 300 grams of pancetta, \
        150 grams pecorino romano, 150grams parmesan cheese, 3 egg yolks", 
        entities=[
            FoodEntity(food="pasta", quantity=250, unit="grams", physical_quality="dried", color="yellow"),
            FoodEntity(food="egg yolk", quantity=3, unit="", physical_quality="", color="orange"),
            FoodEntity(food="pancetta", quantity=300, unit="grams", physical_quality="pork", color=""),
            FoodEntity(food="pecorino", quantity=150, unit="grams", physical_quality="goat chese", color="yellow"),
            FoodEntity(food="parmesan", quantity=150, unit="grams", physical_quality="chese", color="yellow"),
        ]
    ).with_inputs("recipe"),
    dspy.Example(
        recipe="American pancakes with 250g flour, 1 tsp baking powder, 1 gram salt, 10g sugar, 100ml fat milk", 
        entities=[
            FoodEntity(food="flour", quantity=250, unit="grams", physical_quality="", color="white"),
            FoodEntity(food="baking powder", quantity=1, unit="tsp", physical_quality="", color="white"),
            FoodEntity(food="salt", quantity=1, unit="grams", physical_quality="salty", color="white"),
            FoodEntity(food="milk", quantity=100, unit="mil", physical_quality="fat", color="white"),
        ]
    ).with_inputs("recipe")
]

For optimizer we are using `BoostrapFewShoot`:
> BootstrapFewShot: Uses your program to self-generate complete demonstrations for every stage of your program. Will simply use the generated demonstrations (if they pass the metric) without any further optimization. Advanced: Supports using a teacher program (a different DSPy program that has compatible structure) and a teacher LM, for harder tasks.

And when to use this `optimizer`:
> If you have very little data, e.g. 10 examples of your task, use BootstrapFewShot.


In [616]:
from dspy.teleprompt import BootstrapFewShot

teleprompter = BootstrapFewShot(metric=validate_entities)

compiled_ner = teleprompter.compile(ExtractFoodEntitiesV2(), trainset=trainset)

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 155.87it/s]

Bootstrapped 0 full traces after 5 examples in round 0.





In [619]:
pprint(compiled_ner(recipe=train_data))

FoodEntities(
    entities=[
        FoodEntity(
            food='pork belly',
            quantity=2,
            unit='lb',
            physical_quality=None,
            color='',
        ),
        FoodEntity(
            food='green onions',
            quantity=2,
            unit='items',
            physical_quality='or 3 if small',
            color='',
        ),
        FoodEntity(
            food='fresh ginger',
            quantity=1,
            unit='inch',
            physical_quality='chunk',
            color='',
        ),
        FoodEntity(
            food='garlic',
            quantity=2,
            unit='cloves',
            physical_quality=None,
            color='',
        ),
        FoodEntity(
            food='sake',
            quantity=2,
            unit='⅔ cup',
            physical_quality=None,
            color='',
        ),
        FoodEntity(
            food='soy sauce',
            quantity=2,
            unit='⅔ cup',
            physical_

In [622]:
gpt4.inspect_history(n=1)





You are a food AI assistant. Your task is to extract food entities from a recipe.

---

Follow the following format.

Recipe: ${recipe}
Extract Food Entities: ${extract_food_entities}. Respond with a single JSON object. JSON Schema: {"$defs": {"FoodEntity": {"properties": {"food": {"description": "This can be both liquid and solid food such as meat, vegetables, alcohol, etc", "title": "Food", "type": "string"}, "quantity": {"description": "The exact quantity or amount of the food that should be used in the recipe", "title": "Quantity", "type": "integer"}, "unit": {"description": "The unit being used e.g. grams, milliliters, pounds, etc", "title": "Unit", "type": "string"}, "physical_quality": {"anyOf": [{"type": "string"}, {"type": "null"}], "description": "The characteristic of the ingredient", "title": "Physical Quality"}, "color": {"description": "The color of the food", "title": "Color", "type": "string"}}, "required": ["food", "quantity", "unit", "physical_quality", "color"], 