# Extracting entities from a Terms of Service document

!!! note
    To download this example as a Jupyter notebook, click [here](https://github.com/ShreyaR/guardrails/blob/main/docs/examples/extracting_entities.ipynb).

In this example, we will use Guardrails to extract key information from a Terms-of-Service document.

## Objective

We want to extract structured information about all fees and interest rates associated with the Chase credit card.

## Step 0: Download PDF and load it as string

To get started, download the document from [here](https://github.com/ShreyaR/guardrails/blob/main/docs/examples/data/chase_card_agreement.pdf) and save it in `data/chase_card_agreement.pdf`.

Guardrails has some built-in functions to help with common tasks. Here, we will use the `read_pdf` function to load the PDF as a string.

In [None]:
import guardrails as gd

from rich import print

content = gd.docs_utils.read_pdf("data/chase_card_agreement.pdf")

print(f"Chase Credit Card Document:\n\n{content[:275]}\n...")

## Step 1: Create the RAIL Spec

Ordinarily, we would create an RAIL spec in a separate file. For the purposes of this example, we will create the spec in this notebook as a string following the RAIL syntax. For more information on RAIL, see the [RAIL documentation](../rail/output.md).

Here, we request:

1. A list of the fees associated with the card. We ask for sub-information, each with its own quality criteria and corrective action.
2. A object (i.e. key-value pairs) for the interest.

In [None]:
rail_str = """
<rail version="0.1">

<output>

    <list name="fees" description="What fees and charges are associated with my account?">
        <object>
            <integer name="index" format="1-indexed" />
            <string name="name" format="lower-case; two-words" on-fail-lower-case="noop" on-fail-two-words="reask"/>
            <string name="explanation" format="one-line" on-fail-one-line="noop" />
            <float name="value" format="percentage"/>
        </object>
    </list>
    <object name="interest_rates" description="What are the interest rates offered by the bank on savings and checking accounts, loans, and credit products?" />
</output>


<prompt>
Given the following document, answer the following questions. If the answer doesn't exist in the document, enter 'None'.

${document}

${gr.xml_prefix_prompt}

${output_schema}

${gr.json_suffix_prompt_v2_wo_none}</prompt>

</rail>
"""

## Step 2: Create a `Guard` object with the RAIL Spec

We create a `gd.Guard` object that will check, validate and correct the output of the LLM. This object:

1. Enforces the quality criteria specified in the RAIL spec.
2. Takes corrective action when the quality criteria are not met.
3. Compiles the schema and type info from the RAIL spec and adds it to the prompt.

In [None]:
guard = gd.Guard.from_rail_string(rail_str)

As we can see, a few formatters weren't supported. These formatters won't be enforced in the output, but this information can still be used to generate a prompt.

We see the prompt that will be sent to the LLM. The `{document}` is substituted with the user provided value at runtime.

In [None]:
print(guard.base_prompt)

## Step 3: Wrap the LLM API call with `Guard`

In [None]:
import openai

raw_llm_response, validated_response = guard(
    openai.Completion.create,
    prompt_params={"document": content[:6000]},
    engine="text-davinci-003",
    max_tokens=2048,
    temperature=0,
)

The `guard` wrapper returns the raw_llm_respose (which is a simple string), and the validated and corrected output (which is a dictionary).

We can see that the output is a dictionary with the correct schema and types.

In [None]:
print(validated_response)

In [None]:
guard.state.most_recent_call.tree