In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import create_extraction_chain, create_extraction_chain_pydantic
from langchain.pydantic_v1 import BaseModel

from typing import Optional

## Setting the Functions LLM

In [None]:
functions_llm = ChatOpenAI(
    model="gpt-3.5-turbo-0613",
    temperature=0,
    openai_api_key = open("openai_api.txt").read()
)

## Use case

Getting structured output from raw LLM generations is hard.

For example, suppose you need the model output formatted with a specific schema for:
* `Extracting` a structured row to insert into a database
* `Extracting` API parameters
* `Extracting` different parts of a user query (e.g., for semantic vs keyword search)

There are two primary approaches for this:

1. `Functions`: Some LLMs can call functions to extract arbitrary entities from LLM responses.
2. `Parsing`: Output parsers are classes that structure LLM responses.


* Only some LLMs support functions (e.g., OpenAI), and they are more general than parsers.
* Parsers extract precisely what is enumerated in a provided schema (e.g., specific attributes of a person).
* Functions can infer things beyond of a provided schema (e.g., attributes about a person that you did not ask for).

## Quickstart

In [None]:
# Define Schema
schema = {
    "properties": {
        "name": {"type": "string"},
        "height": {"type": "integer"},
        "hair_color": {"type": "string"},
    },
    "required": ["name", "height"],
}

# Set Input 
inp = """Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde."""

# Initalize Chain
chain = create_extraction_chain(
    schema = schema,
    llm = functions_llm
)

In [None]:
chain.run(inp)

We can extend this further.

In [None]:
schema = {
    "properties": {
        "person_name": {"type": "string"},
        "person_height": {"type": "integer"},
        "person_hair_color": {"type": "string"},
        "dog_name": {"type": "string"},
        "dog_breed": {"type": "string"},
    },
    "required": ["person_name", "person_height"],
}

chain = create_extraction_chain(
    schema = schema,
    llm = functions_llm
)

inp = """Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.
Alex's dog Frosty is a labrador and likes to play hide and seek."""

chain.run(inp)

## Unrelated entities

If we use `required: []`, we allow the model to return only person attributes or only dog attributes for a single entity (person or dog).

In [None]:
schema = {
    "properties": {
        "person_name": {"type": "string"},
        "person_height": {"type": "integer"},
        "person_hair_color": {"type": "string"},
        "dog_name": {"type": "string"},
        "dog_breed": {"type": "string"},
    },
    "required": [],
}

chain = create_extraction_chain(
    schema = schema,
    llm = functions_llm
)

inp = """Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.
Willow is a German Shepherd that likes to play with other dogs and can always be found playing with Milo, a border collie that lives close by."""

chain.run(inp)

##  Extra information

The power of functions (relative to using parsers alone) lies in the ability to perform `semantic extraction`.

In particular, we can ask for things that are not explicitly enumerated in the schema.

In [2]:
schema = {
    "properties": {
        "person_name": {"type": "string"},
        "person_height": {"type": "integer"},
        "person_hair_color": {"type": "string"},
        "dog_name": {"type": "string"},
        "dog_breed": {"type": "string"},
        "dog_extra_info": {"type": "string"},
    },
}

chain = create_extraction_chain(
    schema = schema,
    llm = functions_llm
)

chain.run(inp)

## Pydantic Schema

Pydantic is a data validation and settings management library for Python.

It allows you to create data classes with attributes that are automatically validated when you instantiate an object.

Lets define a class with attributes annotated with types.

In [None]:
# Pydantic data class
class Properties(BaseModel):
    person_name: str
    person_height: int
    person_hair_color: str
    dog_breed: Optional[str]
    dog_name: Optional[str]
        
# Extraction
chain = create_extraction_chain_pydantic(
    pydantic_schema = Properties,
    llm = functions_llm
)

# Run 
inp = """Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde."""

chain.run(inp)