# Constraining Large Language Models with Guidance
In this notebook, we're going to learn how to use Guidance, a programming paradigm that offers superior control and efficiency compared to conventional prompting and chaining

In [2]:
!pip install guidance


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m23.3.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [1]:
import guidance
from guidance import models, gen, select, user, assistant, system, substring

In [2]:
model_path = "models/mistral-7b-instruct-v0.1.Q4_K_M.gguf"
llm = models.LlamaCpp(model_path, n_gpu_layers=1)

## Constrained Generation 🔗📏🚫
Let's start by learning how to constrain the output of an emotion detector.

In [3]:
def read_file(file_path):
  with open(file_path, "r") as file:
    return file.read()

In [4]:
text = read_file("data/linkedin.txt")
print(text)

Woohoo, 🎁's starting to come in early this year--just got the acceptance note for my session "Data Contracts In Practice With Debezium and Apache Flink" for #KafkaSummit London '24 🇬🇧. See you there in March!


In [5]:
question = "What is the emotion of the following text and how strong is the emotion on a scale from 1-100?"
lm = llm + f"{question}: {text}"
lm += gen(name='answer')

In [11]:
lm = llm + f"{question}: {text}\n"
lm += "Answer: " + select(['happy', 'sad', 'angry', 'surprised', 'disgusted', 'excited', 'fearful', 'neutral'], name='emotion') 

In [10]:
lm = llm + f"{question}: {text}\n"
lm += "Answer: " + select(['happy', 'sad', 'angry', 'surprised', 'disgusted', 'excited', 'fearful', 'neutral'], name='emotion') 
lm += ", Scale: " + gen(regex='\d+', name='strength')

In [75]:
lm['emotion'], lm['strength']

('happy', '90')

## Reusable Components ♻️🧩🔁
We can put all that code into a functioin to make it easier to detect the emotion of new pieces of text.

In [8]:
@guidance
def emotion_detector(lm, text):  
    lm = llm + f"{question}: {text}\n"
    lm += "Answer: " + select(['happy', 'sad', 'angry', 'surprised', 'disgusted', 'excited', 'fearful', 'neutral'], name='emotion') 
    lm += ", Scale: " + gen(regex='\d+', name='strength')
    return lm

In [80]:
jack_text = read_file("data/jack.txt")
print(jack_text)

A last-16 US Open finish and a first ATP Tour final have made 2023 Jack Draper's most successful year of his young career so far. But for the 21-year-old Briton, success has been tinged by sadness with his grandmother Brenda - a former tennis player and coach - unable to recognise his achievements. Draper's grandmother has Alzheimer's disease, a condition that causes dementia and the gradual decline of cognitive functioning in the brain.


In [84]:
llm + emotion_detector(jack_text)

In [9]:
llm + emotion_detector(read_file("data/novak.txt"))

## Returning JSON 🔄📄🔡 
Guidance can also generate output in a JSON format, which is useful for connecting with other tools.

In [42]:
@guidance
def emotion_detector_json(lm, text):
    question = "What is the emotion of the following text and how strong is the emotion on a scale from 1-100?"
    emotions = [
        'happy', 'sad', 'angry', 'surprised', 
        'disgusted', 'excited', 'fearful', 'neutral'
    ]
    lm = llm + f"{question}: {text}\n"
    only_numbers_pattern = r'\d+'
    lm += f"""{{
      "text": "{text}",
      "emotion": "{select(emotions, name='answer')}",
      "scale": {gen(regex=only_numbers_pattern, name='strength')}
    }}"""
    return lm

In [43]:
llm + emotion_detector_json("I'm so happy to be here")

In [159]:
import guidance

# define a re-usable "guidance function" that we can use below
@guidance
def quoted_list(lm, name, n):
    for i in range(n):
        if i > 0:
            lm += ", "
        lm += '"' + gen(name, list_append=True, stop='"') + '"'
    return lm

response = llm + f"""What are the most common commands used in the Linux operating system?

Here are the 5 most common commands in JSON format:
{{
    "commands": [{quoted_list('commands', 5)}],
    "my_favorite_command": "{gen('favorite_command', stop='"')}"
}}"""

In [174]:
import json
json.loads(
  response.__str__().split("Here are the 5 most common commands in JSON format:")[-1].strip()
)

[1m{[0m[32m'commands'[0m: [1m[[0m[32m'cd'[0m, [32m'ls'[0m, [32m'pwd'[0m, [32m'mkdir'[0m, [32m'rm'[0m[1m][0m, [32m'my_favorite_command'[0m: [32m'ls -la'[0m[1m}[0m

In [176]:
text = """
To remove advertisements from a transcript with timestamps, you can follow these general steps
"""

sentiment = llm + 'Question: What is the sentiment of the following text and what is your confidence in that sentiment?'
sentiment += f"{text}"
sentiment += "Answer: " + gen(name="answer")

In [191]:
text = """
I'm happy that you got the job, but sad that you're leaving.
"""

sentiment = llm + 'Question: What is the sentiment of the following text and what is your confidence in that sentiment?'
sentiment += f"{text}"
sentiment += "Answer: " + select(['positive', 'negative', 'neutral'], name='answer') + ", Confidence:" + gen('whole', regex="(0(\.\d+)?|1(\.0+)?)")

In [39]:
sentiment['answer']

[32m'neutral'[0m

In [77]:
texts = [
  """Woohoo, 🎁's starting to come in early this year--just got the acceptance note for my session "Data Contracts In Practice With Debezium and Apache Flink" for #KafkaSummit London '24 🇬🇧. See you there in March!""",
  "To remove advertisements from a transcript with timestamps, you can follow these general steps"
]

for text in texts:  
  sentiment = llm + 'Question: What is the sentiment of the following text and why?\n'
  sentiment += f"{text}\n"
  sentiment += "Answer: " + select(['positive', 'negative', 'neutral'], name='answer')
  # print(text, lm['answer'])

In [76]:
lm['answer']

In [72]:
llm + 'Question: What is the sentiment of the following text and why?\n' + f"{text}\nAnswer:" + gen()

In [79]:
with user():
    lm = llm + "What is the capital of England?"

with assistant():
    lm += gen("capital")

with user():
    lm += "What is the most well known landmark?"

with assistant():
    lm += gen("fact")

with user():
    lm += "What is a sporting event held there?"

with assistant():
    lm += gen("event")

In [65]:
lm['capital'], lm['fact'], lm['event']


[1m([0m
    [32m'The capital of England is London.'[0m,
    [32m'One of the most well-known landmarks in London, England is the Buckingham Palace. It is the official residence and administrative headquarters of the monarch of the United Kingdom.'[0m,
    [32m"The London Marathon is an annual sporting event held in London, England. It is one of the world's most famous marathons and attracts thousands of runners from around the world. The marathon starts and finishes in Green Park, near Buckingham Palace, and takes runners through the city's iconic landmarks, including the Tower of London and Big Ben."[0m
[1m)[0m