## Kor style schema

This is a half-baked prototype that “helps” you extract structured data from text using LLMs 🧩.

Specify the schema of what should be extracted and provide some examples.

Kor will generate a prompt, send it to the specified LLM and parse out the output.

You might even get results back.

So yes – it’s just another wrapper on top of LLMs with its own flavor of abstractions. 😸

In [2]:
from langchain.chat_models import ChatOpenAI
from kor import create_extraction_chain, Object, Text 

In [3]:
llm = ChatOpenAI(
    model_name="gpt-3.5-turbo",
    temperature=0,
    max_tokens=2000,
    frequency_penalty=0,
    presence_penalty=0,
    top_p=1.0,
)

schema = Object(
    id="player",
    description=(
        "O usuário está controlando um reprodutor de música para selecionar músicas, pausá-las, iniciá-las ou reproduzi-las"
        "música de um determinado artista."
    ),
    attributes=[
        Text(
            id="musica",
            description="O usuário quer tocar esta música",
            examples=[],
            many=True,
        ),
        Text(
            id="album",
            description="O usuário deseja reproduzir este álbum",
            examples=[],
            many=True,
        ),
        Text(
            id="artist",
            description="Música do artista fornecido",
            examples=[("Músicas de paul simon", "paul simon")],
            many=True,
        ),
        Text(
            id="action",
            description="Ação para tomar um dos: `play`, `stop`, `next`, `previous`.",
            examples=[
                ("Por favor, pare a música", "stop"),
                ("Toque qualquer coisa", "play"),
                ("Toque uma música", "play"),
                ("próxima música", "next"),
            ],
        ),
    ],
    many=False,
)



                    frequency_penalty was transferred to model_kwargs.
                    Please confirm that frequency_penalty is what you intended.
                    presence_penalty was transferred to model_kwargs.
                    Please confirm that presence_penalty is what you intended.
                    top_p was transferred to model_kwargs.
                    Please confirm that top_p is what you intended.


In [4]:
chain = create_extraction_chain(llm, schema, encoder_or_encoder_class='json')
chain.run("tocar músicas de Paul Simon e Led Zeppelin e The Doors")['data']

{'player': {'artist': ['paul simon', 'led zeppelin', 'the doors']}}

## Pydantic style schema



In [9]:
import enum
from langchain.chat_models import ChatOpenAI
from kor import create_extraction_chain, Object, Text, Number
import pydantic
from typing import List
from kor import from_pydantic
from pydantic import BaseModel, Field
from typing import Optional

In [13]:
llm = ChatOpenAI(
    model_name="gpt-3.5-turbo",
    temperature=0,
)

In [14]:
class Action(enum.Enum):
    play = "play"
    stop = "stop"
    previous = "previous"
    next_ = "next"


class MusicRequest(BaseModel):
    song: Optional[List[str]] = Field(
        description="The song(s) that the user would like to be played."
    )
    album: Optional[List[str]] = Field(
        description="The album(s) that the user would like to be played."
    )
    artist: Optional[List[str]] = Field(
        description="The artist(s) whose music the user would like to hear.",
        examples=[("Songs by paul simon", "paul simon")],
    )
    action: Optional[Action] = Field(
        description="The action that should be taken; one of `play`, `stop`, `next`, `previous`",
        examples=[
            ("Please stop the music", "stop"),
            ("play something", "play"),
            ("play a song", "play"),
            ("next song", "next"),
        ],
    )

In [15]:
schema, validator = from_pydantic(MusicRequest)

In [16]:
chain = create_extraction_chain(
    llm, schema, encoder_or_encoder_class="json", validator=validator
)

In [19]:
print(chain.prompt.format_prompt(text="[user input]").to_string())

Your goal is to extract structured information from the user's input that matches the form described below. When extracting information please make sure it matches the type information exactly. Do not add any attributes that do not appear in the schema shown below.

```TypeScript

musicrequest: { // 
 song: Array<string> // The song(s) that the user would like to be played.
 album: Array<string> // The album(s) that the user would like to be played.
 artist: Array<string> // The artist(s) whose music the user would like to hear.
 action: "play" | "stop" | "previous" | "next" // The action that should be taken; one of `play`, `stop`, `next`, `previous`
}
```


Please output the extracted information in JSON format. Do not output anything except for the extracted information. Do not add any clarifying information. Do not add any fields that are not in the schema. If the text contains attributes that do not appear in the schema, please ignore them. All output must be in JSON format and fo

In [20]:
chain.run("stop the music now")["validated_data"]

In [21]:
chain.run("i want to hear yellow submarine by the beatles")[
    "validated_data"
]