# Examples of Structured Data Extraction 


We start with the simple syntax around LLMs, then move on to how to use it with higher-level modules like a query engine and agent.

A lot of the underlying behavior around structured outputs is powered by the Pydantic Program modules. 

Check out the [in-depth structured outputs guide](https://docs.llamaindex.ai/en/stable/module_guides/querying/structured_outputs/) for more details.

In [4]:
import nest_asyncio
import sys
import os 

# Sanity check
print(sys.executable)
nest_asyncio.apply()

from dotenv import load_dotenv
load_dotenv() 

/Users/amorvan/Documents/code_dw/llm_collection/.venv/bin/python


True

In [5]:
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings

llm = OpenAI(model="gpt-4o-mini")
embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.llm = llm
Settings.embed_model = embed_model

## 1. Simple Structured Extraction

You can convert any LLM to a "structured LLM" by attaching an output class to it through `as_structured_llm`.

Here we pass a simple `Album` class which contains a list of songs. We can then use the normal LLM endpoints like chat/complete.


In [8]:
from typing import List
from pydantic import BaseModel, Field
from llama_index.core.llms import ChatMessage

class Song(BaseModel):
    """Data model for a song."""

    title: str = Field("The title of the Song")
    length_seconds: int = Field("The duration of the Song")


class Album(BaseModel):
    """Data model for an album."""

    name: str
    artist: str
    songs: List[Song]

In [None]:


sllm = llm.as_structured_llm(output_cls=Album)
input_msg = ChatMessage.from_str("Generate an example album from The Shining")

#### Sync

In [None]:
output = sllm.chat([input_msg])
# get actual object
output_obj = output.raw

In [None]:
print(str(output))
print(output_obj)

#### Async

In [None]:
output = await sllm.achat([input_msg])
# get actual object
output_obj = output.raw
print(str(output))

#### Streaming

In [None]:
from IPython.display import clear_output
from pprint import pprint

stream_output = sllm.stream_chat([input_msg])
for partial_output in stream_output:
    clear_output(wait=True)
    pprint(partial_output.raw.dict())

output_obj = partial_output.raw
print(str(output))

#### Async Streaming

In [None]:
from IPython.display import clear_output
from pprint import pprint

stream_output = await sllm.astream_chat([input_msg])
async for partial_output in stream_output:
    clear_output(wait=True)
    pprint(partial_output.raw.dict())

### 1.b Use the `structured_predict` Function

Instead of explicitly doing `llm.as_structured_llm(...)`, every LLM class has a `structured_predict` function which allows you to more easily call the LLM with a prompt template + template variables to return a strutured output in one line of code.

In [None]:
# use query pipelines
from llama_index.core.prompts import ChatPromptTemplate
from llama_index.core.llms import ChatMessage
from llama_index.llms.openai import OpenAI

chat_prompt_tmpl = ChatPromptTemplate(
    message_templates=[
        ChatMessage.from_str(
            "Generate an example album from {movie_name}", role="user"
        )
    ]
)


album = llm.structured_predict(
    Album, chat_prompt_tmpl, movie_name="Lord of the Rings"
)
album

## 2. Exercise

Tell a story step by step. 

For each part of the story, there should be 
- The story part
- A description of the image that will go with this part of the story


Do it with a Pydantic class

In [1]:
from typing import List
from pydantic import BaseModel, Field


class StoryChunk(BaseModel):
    """Data model for a chunk of a  story"""
    text: str = Field("The text of the story chunk")
    image_description: str = Field("The depiction of the story")


class Story(BaseModel):
    """A story has several cunk made of a text and an image."""
    chunks: List[StoryChunk]

In [9]:
sllm = llm.as_structured_llm(output_cls=Story)
input_msg = ChatMessage.from_str("Generate a story about a bird")

story = sllm.complete(input_msg)

In [12]:
story.raw.chunks[0]

StoryChunk(text='Once upon a time in a small village, there lived a kind-hearted girl named Lily. She loved to help others and spent her days tending to the village garden, where flowers bloomed in vibrant colors.', image_description='A picturesque village garden filled with colorful flowers and a girl tending to them.')

In [13]:
story.raw.chunks[1]

StoryChunk(text='One day, while watering the plants, Lily discovered a tiny, injured bird lying on the ground. Without hesitation, she gently picked it up and took it home to care for it.', image_description='A close-up of a girl holding a small injured bird in her hands, looking concerned.')