<a href="https://colab.research.google.com/github/chrispoole70/langchain-tutorials/blob/main/classification/classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# [Classify Text into Labels](https://python.langchain.com/docs/tutorials/classification/)

In [None]:
%pip install --upgrade --quiet langchain-core

In [None]:
%pip install -qU "langchain[openai]"

In [8]:
import os

from langchain.chat_models import init_chat_model
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

In [4]:
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

Initialize the chat model

In [7]:
llm = init_chat_model("gpt-4o-mini", model_provider="openai")

In [36]:
type(llm).__name__

'ChatOpenAI'

Create an input message for the chat model

In [9]:
tagging_prompt = ChatPromptTemplate.from_template("""
Extract the desired information from the following passage.

Only extract the properties mentioned in the 'Classification' function.

Passage:
{input}
""")

Define the schema of the output message the chat model should return

In [10]:
class Classification(BaseModel):
  sentiment: str = Field(description='The sentiment of the text')
  aggressiveness: int = Field(description='How aggressive the text is on a scale from 1 to 10')
  language: str = Field(description='The language the text is written in')

After defining the output message schema, the chat model has two steps:
1. Send the input message to the LLM
2. A function will be called behind the scenes to format the output message as a `Classification` object

In [11]:
structured_llm = llm.with_structured_output(Classification)

In [37]:
type(structured_llm).__name__

'RunnableSequence'

In [17]:
len(structured_llm.steps)

2

In [18]:
structured_llm.steps

[RunnableBinding(bound=ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x7d54f1711bd0>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x7d54f16cdbd0>, root_client=<openai.OpenAI object at 0x7d54f1ed4150>, root_async_client=<openai.AsyncOpenAI object at 0x7d54f1711e10>, model_name='gpt-4o-mini', model_kwargs={}, openai_api_key=SecretStr('**********')), kwargs={'response_format': <class '__main__.Classification'>, 'ls_structured_output_format': {'kwargs': {'method': 'json_schema', 'strict': None}, 'schema': {'type': 'function', 'function': {'name': 'Classification', 'description': '', 'parameters': {'properties': {'sentiment': {'description': 'The sentiment of the text', 'type': 'string'}, 'aggressiveness': {'description': 'How aggressive the text is on a scale from 1 to 10', 'type': 'integer'}, 'language': {'description': 'The language the text is written in', 'type': 'string'}}, 'required': ['sentiment', 'a

Even though `Classification` is a class, the LLM interacts with it as a function. The properties `sentiment`, `aggressiveness`, and `language` are treated as arguments to the function.

In [20]:
Classification.model_json_schema()

{'properties': {'sentiment': {'description': 'The sentiment of the text',
   'title': 'Sentiment',
   'type': 'string'},
  'aggressiveness': {'description': 'How aggressive the text is on a scale from 1 to 10',
   'title': 'Aggressiveness',
   'type': 'integer'},
  'language': {'description': 'The language the text is written in',
   'title': 'Language',
   'type': 'string'}},
 'required': ['sentiment', 'aggressiveness', 'language'],
 'title': 'Classification',
 'type': 'object'}

Format the input message with a passage in Spanish

In [12]:
inp = "Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!"
prompt = tagging_prompt.invoke({'input': inp})

In [13]:
prompt.to_messages()

[HumanMessage(content="\nExtract the desired information from the following passage.\n\nOnly extract the properties mentioned in the 'Classification' function.\n\nPassage:\nEstoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!\n", additional_kwargs={}, response_metadata={})]

Send the input message to the LLM

In [21]:
response = structured_llm.invoke(prompt)

The output message is a `Classification` object

In [22]:
response

Classification(sentiment='positive', aggressiveness=1, language='Spanish')

The response can also be turned into a dictionary

In [23]:
response.model_dump()

{'sentiment': 'positive', 'aggressiveness': 1, 'language': 'Spanish'}

## Finer Control

In addition to defining each property in the output schema, we can declare which properties are required and what possible values each can have

In [31]:
my_list = []

[my_list.append(i) for i in range(1, 11)]

my_list

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [32]:
class Classification(BaseModel):
  sentiment: str = Field(description='The sentiment of the text', enum=['happy', 'neutral', 'sad'])
  aggressiveness: int = Field(description='How aggressive the text is on a scale from 1 to 10', enum=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
  language: str = Field(description='The language the text is written in', enum=["spanish", "english", "french", "german", "italian"])

In [33]:
structured_llm = llm.with_structured_output(Classification)

In [34]:
response = structured_llm.invoke(prompt)

In [35]:
response

Classification(sentiment='happy', aggressiveness=1, language='spanish')