> ℹ️ Adapted from the Guidance documentation https://guidance.readthedocs.io

We will be learning about constrained decoding using the `guidance` library.

In [1]:
!pip install guidance
!pip install --upgrade jsonschema

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Requirement already up-to-date: jsonschema in /home/mattfinlays/.local/lib/python3.8/site-packages (4.23.0)


In [2]:
import guidance

`guidance` allows us to load models from several different sources, 
such as Huggingface Transformers, OpenAI, and LlamaCpp.

In [3]:
# Mistral download link:
# https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q8_0.gguf
# Put this model the current directory.
# model = guidance.models.LlamaCpp("mistral-7b-instruct-v0.2.Q8_0.gguf", n_gpu_layers=-1, n_ctx=4096)
model = guidance.models.Transformers("meta-llama/Llama-3.2-1b") # Alternative
# model = guidance.models.Transformers("gpt2") # Use this one if you don't have resources to download/run a big model

In `guidance` we use the `+` operator to give the model a prompt.

In [4]:
llm = model + "Taylor Swift is"

# Temperature sampling and the diversity-coherence tradeoff

We can then generate from the model using the `gen` function.
You can pass decoding parameters like `temperature` here. 

> 📝
The library does not support other truncation sampling parameters like top-p or top-k yet, but top-p appears to be on the roadmap.

In [5]:
llm + guidance.gen(max_tokens=100, temperature=1.0)

Compare the above generation with the greedy generation (temperature 0) below:

In [6]:
llm + guidance.gen(max_tokens=100, temperature=0)

This illustrates the diversity-coherence trade-off. Whereas the first generation likely led to some incoherence, you may find the second somewhat boring and uninteresting.

# Forcing QA behavior in base LLMs.
LLM developers often use post-training (instruction fine-tuning and reward modeling) in order to elicit chatbot-like behavior from LLMs.
On the other hand, base LLMs often struggle with things like questions answering.
Nevertheless, we can use `guidance` to force base language models to adhere to a QA template.

In [7]:
query = "Who won the last Kentucky derby and by how much?"
lm = model + f'''\
Q: {query}
A: {guidance.gen(name="answer", stop=["Q:", "A:"], temperature=0.8, max_tokens=100)}'''

You can use the `name` keyword argument to capture the generation

In [8]:
lm["answer"]

'The 40-1 shot called Justify took the 1 1/8-mile track race with a length victory in the 144th running of the Kentucky Derby on Saturday at Churchill Downs.\n'

# Enforcing valid JSON outputs
When integrating LLMs into larger systems it is often desirable to obtain *structured* outputs in the form of JSON objects. For instance, if generating characters for a game, one might desire a JSON containing the character information.

In [9]:
import json
import jsonschema

In [10]:
character_schema = """{
    "type": "object",
    "properties": {
        "description" : { "type" : "string" },
        "name" : { "type" : "string" },
        "age" : { "type" : "integer" },
        "armour" : { "type" : "string", "enum" : ["leather", "chainmail", "plate"] },
        "weapon" : { "type" : "string", "enum" : ["sword", "axe", "mace", "spear", "bow", "crossbow"] },
        "class" : { "type" : "string" },
        "mantra" : { "type" : "string" },
        "strength" : { "type" : "integer" },
        "quest_items" : { "type" : "array", "items" : { "type" : "string" } }
    }
}
"""
character_schema_obj = json.loads(character_schema)

Without constraints, language models can struggle to adhere to JSON 
schemas, even with prompting. Depending on the model you use, you may see extraneous additional generations, non-compliance with the schema, or a number of other pathologies.

In [11]:
llm = model + (
    "Character descriptions follow this JSON schema:\n\n"
    f"{character_schema}\n\n"
    "Here is a JSON for a character with the description"
    ' "A quick and nimble fighter":'
) + guidance.gen(max_tokens=200, name="json output")
try: 
    json.loads(llm["json output"])
except json.JSONDecodeError:
    print("Failed: Invalid JSON")

Failed: Invalid JSON


The `guidance` library supports template-based enforcement of JSON schemas for this purpose.

In [None]:
model + "Create a character with the description \"A quick and nimble fighter\"\n" + guidance.json(schema=character_schema_obj, name="next character", temperature=1.0)

The resulting output now fits the schema without fail!