# Tagging

## Architecture

<img src="https://python.langchain.com/v0.2/assets/images/tagging-93990e95451d92b715c2b47066384224.png" alt="indexing" width="600"/>

Tagging has a few components:

`function`: Like `extraction`, tagging uses functions to specify how the model should tag a document
`schema`: defines how we want to tag the document

## Setup

In [1]:
import os

os.chdir("../../../")

In [2]:
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI

load_dotenv()

True

## Quickstart

In [3]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_openai import ChatOpenAI

tagging_prompt = ChatPromptTemplate.from_template(
    """
Extract the desired information from the following passage.

Only extract the properties mentioned in the 'Classification' function.

Passage:
{input}
"""
)

class Classification(BaseModel):
    sentiment: str = Field(description="The sentiment of the text")
    aggressiveness: int = Field(
        description="How aggressive the text is on a scale from 1 to 10"
    )
    language: str = Field(description="The language the text is written in")
    
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0).with_structured_output(
    Classification
)

tagging_chain = tagging_prompt | llm

In [4]:
inp = "Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!"
tagging_chain.invoke({"input": inp})

Classification(sentiment='positive', aggressiveness=1, language='Spanish')

In [5]:
inp = "Estoy muy enojado con vos! Te voy a dar tu merecido!"
res = tagging_chain.invoke({"input": inp})
res.dict()

{'sentiment': 'enojado', 'aggressiveness': 8, 'language': 'es'}

## Finer Control

Careful schema definition gives us more control over the model's output.

In [11]:
class Classification(BaseModel):
    sentiment: str = Field(..., enum=["happy", "neutral", "sad"])
    aggressiveness: int = Field(
        ...,
        description="describes how aggressive the statement is, the higher the number the more aggressive",
        enum=[1, 2, 3, 4, 5],
    )
    language: str = Field(
        ..., enum=["spanish", "english", "french", "german", "italian"]
    )

In [12]:
tagging_prompt = ChatPromptTemplate.from_template(
    """
Extract the desired information from the following passage.

Only extract the properties mentioned in the 'Classification' function.

Passage:
{input}
"""
)
    
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0).with_structured_output(
    Classification
)

chain = tagging_prompt | llm

In [13]:
inp = "Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!"
chain.invoke({"input": inp})

Classification(sentiment='happy', aggressiveness=1, language='spanish')

In [14]:
inp = "Estoy muy enojado con vos! Te voy a dar tu merecido!"
chain.invoke({"input": inp})

Classification(sentiment='sad', aggressiveness=5, language='spanish')

In [15]:
inp = "Weather is ok here, I can go outside without much more than a coat"
chain.invoke({"input": inp})

Classification(sentiment='happy', aggressiveness=1, language='english')