# Validating LLM Outputs with Pydantic

!!! note
    To download this example as a Jupyter notebook, click [here](https://github.com/ShreyaR/guardrails/blob/main/docs/integrations/pydantic_validation.ipynb).

In this example, we will use Guardrails with Pydantic.

## Objective

We want to generate synthetic data that is consistent with a `Person` Pydantic BaseModel.

In [1]:
import guardrails as gd

from rich import print

## Step 1: Create the RAIL Spec

Ordinarily, we would create an RAIL spec in a separate file. For the purposes of this example, we will create the spec in this notebook as a string following the RAIL syntax. For more information on RAIL, see the [RAIL documentation](../rail/output.md).

Here, we define a Pydantic model for a `Person` with the following fields:

- `name`: a string  
- `age`: an integer  
- `zip_code`: a string zip code

and write very simple validators for the fields as an example. As a way to show how LLM reasking can be used to generate data that is consistent with the Pydantic model, we can define a validator that asks for a zip code in California (including being perversely opposed to the "90210" zip code). If this validator fails, the LLM will be sent the error message and will reask the question.

This Pydantic model could also be any model that you already have in your codebase, and just needs to be decorated with `@register_pydantic`.


To use this model in the `<output>` specification, we used the special
`pydantic` tag. This tag takes the name of the Pydantic model, as well as the
`on-fail-pydantic` attribute, which specifies what to do when the output
does not validate against the Pydantic model.

In [2]:
rail_str = """
<rail version="0.1">

<script language="python">
from guardrails.utils.pydantic_utils import register_pydantic
from pydantic import BaseModel, validator

@register_pydantic
class Person(BaseModel):
    '''
    Information about a person.

    Args:
        name (str): The name of the person.
        age (int): The age of the person.
        zip_code (str): The zip code of the person.
    '''
    name: str
    age: int
    zip_code: str

    @validator("zip_code")
    def zip_code_must_be_numeric(cls, v):
        if not v.isnumeric():
            raise ValueError("Zip code must be numeric.")
        return v

    @validator("age")
    def age_must_be_between_0_and_150(cls, v):
        if not 0 &lt;= v &lt;= 150:
            raise ValueError("Age must be between 0 and 150.")
        return v

    @validator("zip_code")
    def zip_code_in_california(cls, v):
        if not v.startswith("9"):
            raise ValueError("Zip code must be in California, and start with 9.")
        if v == "90210":
            raise ValueError("Zip code must not be Beverly Hills.")
        return v

</script>

<output>
    <list name="people" description="A list of 3 people.">
        <pydantic description="Information about a person." model="Person" on-fail-pydantic="reask" />
    </list>
</output>


<prompt>
Generate data for possible users in accordance with the specification below.

@xml_prefix_prompt

{output_schema}

@complete_json_suffix_v2</prompt>

</rail>
"""

## Step 2: Create a `Guard` object with the RAIL Spec

We create a `gd.Guard` object that will check, validate and correct the output of the LLM. This object:

1. Enforces the quality criteria specified in the RAIL spec.
2. Takes corrective action when the quality criteria are not met.
3. Compiles the schema and type info from the RAIL spec and adds it to the prompt.

In [3]:
guard = gd.Guard.from_rail_string(rail_str)

We see the prompt that will be sent to the LLM.

In [4]:
print(guard.base_prompt)

!!! note
    Notice that the prompt replaces the `pydantic` tag with the schema, validator and type information from the Pydantic model. This e.g. tells the LLM that we want that `zip-code-must-be-numeric` and `zip-code-in-california`. Guardrails will even automatically read the docstrings from the Pydantic model and add them to the prompt!

## Step 3: Wrap the LLM API call with `Guard`

In [5]:
import openai

raw_llm_response, validated_response = guard(
    openai.Completion.create,
    engine="text-davinci-003",
    max_tokens=512,
    temperature=0.5,
    num_reasks=2,
)

  if isinstance(o, (numpy.bool, numpy.bool_)):


In [6]:
print(validated_response)

The `guard` wrapper returns the raw_llm_respose (which is a simple string), and the validated and corrected output (which is a dictionary).

We can see that the output is a dictionary with the correct schema and contains a few `Person` objects!

We can even print out the logs of the most recent call. Notice that the first time the LLM actually returns a Beverly Hills zip code, the LLM is sent the error message and is reasked. The second time, the LLM returns a valid zip code and the output is returned.

In [8]:
print(guard.state.most_recent_call)