# Customize output format in prompt

## User prompt configuration

### [X] Create prompt template

The prompt template is defined in the file:

[/workspace/data/prompts/weather/date_range/template.md](/workspace/data/prompts/weather/date_range/template.md)

And it will be loaded as follows:

In [13]:
name_prompt = 'boe' #TODO: define your folder
name_template = 'tema' #TODO: define your folder

In [14]:
from pathlib import Path

folder_template = f'{name_prompt}/{name_template}' 
folder = Path(f'/workspace/data/prompts/{folder_template}')

path = folder / 'template.md'
with open(path, 'r') as file:
    template = file.read()

template

'Necesito informacion sobre los documentos publicados en el BOE segun la tematica {TEMA} entre las fechas {FECHA_INICIAL} y {FECHA_FINAL}.\n\nBusca en la web.\n'

### [X] Define custom output format and import it

[/workspace/data/prompts/weather/output_parser.py](/workspace/data/prompts/weather/output_parser.py)

In [15]:
output_class_name = 'DocumentoBOEList' #TODO: define your class

In [16]:
from importlib import import_module
OutputParser = getattr(import_module(f'data.prompts.{name_prompt}.output_parser'), output_class_name)

In [17]:
OutputParser

data.prompts.boe.output_parser.DocumentoBOEList

## Combine template and output format

In [18]:
from modules.prompt import CustomPrompt

custom_prompt = CustomPrompt(template, OutputParser)
prompt = custom_prompt.get_prompt()
prompt

PromptTemplate(input_variables=['FECHA_FINAL', 'FECHA_INICIAL', 'TEMA'], input_types={}, partial_variables={'FORMAT_INSTRUCTIONS': 'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"$defs": {"DocumentoBOE": {"properties": {"titulo": {"description": "Título del documento publicado en el BOE", "title": "Titulo", "type": "string"}, "fecha": {"description": "Fecha de publicación en el BOE", "format": "date", "title": "Fecha", "type": "string"}, "numero": {"description": "Número de disposición o referencia oficial", "title": "Numero", "type": "string"}, "departamento": {"description": "De

## Chain

### Define model

In [19]:
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-4o-search-preview")

model

ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x741ba2922c00>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x741ba2ff5250>, root_client=<openai.OpenAI object at 0x741ba2fdd250>, root_async_client=<openai.AsyncOpenAI object at 0x741ba2922f00>, model_name='gpt-4o-search-preview', model_kwargs={}, openai_api_key=SecretStr('**********'))

### Compose chain

In [20]:
chain = prompt | model | custom_prompt.parser
chain

PromptTemplate(input_variables=['FECHA_FINAL', 'FECHA_INICIAL', 'TEMA'], input_types={}, partial_variables={'FORMAT_INSTRUCTIONS': 'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"$defs": {"DocumentoBOE": {"properties": {"titulo": {"description": "Título del documento publicado en el BOE", "title": "Titulo", "type": "string"}, "fecha": {"description": "Fecha de publicación en el BOE", "format": "date", "title": "Fecha", "type": "string"}, "numero": {"description": "Número de disposición o referencia oficial", "title": "Numero", "type": "string"}, "departamento": {"description": "De

### [X] Invoke chain

To get the response, we need to invoke the chain with the input data.

In [21]:
output = chain.invoke({
  'TEMA': 'agricultura',
  'FECHA_INICIAL': '2020-01-01',
  'FECHA_FINAL': '2025-04-01'
})

## Output

### JSON

In [22]:
data = output.model_dump()
data

{'documentos': []}

### DataFrame

In [23]:
import pandas as pd

data_values = list(data.values())[0]
df = pd.DataFrame(data_values)
df.style

### Export to Excel and CSV

In [24]:
from datetime import datetime
import os

# Create a folder with the current datetime
current_datetime = datetime.now().strftime('%Y%m%d_%H%M%S')
output_folder = folder / f'outputs/{current_datetime}'
output_folder.mkdir(parents=True, exist_ok=True)

# Save the files in the newly created folder
df.to_excel(output_folder / 'output.xlsx', index=False)
df.to_csv(output_folder / 'output.csv', index=False)