In [79]:
import json
import numpy as np
from random import uniform, choices

# 1 Toy model of LLM generation

For pedagogical reasons, we are going to make the following simplifications through this tutorial:

1. an LLM has an `LLM_VOCABULARY` made up of `V = len(LLM_VOCABULARY)` distinct words
2. at each time step, `t`, the LLM outputs a list of float values of size `V` which we will always call `logits`
3. we use the LLM as a "next word predictor", which means we select at each time step `t` the word in `LLM_VOCABULARY` which has the largest associated logit "score" in the `logits` list at step `t`

### Exercise 1.1 - Determine the word predicted by the LLM

- Our first basic LLM has a `LLM_VOCABULARY` of only `V = 4` different words.
- You have a `logits` list of `V = 4` float values

**Question: which of the 4 words is being predicted by the LLM in this case?**

In [80]:
LLM_VOCABULARY = ["Hello", "Bretagne", "Orange", "Python"]
#    index:          0         1          2         3

logits = [0.7, 1.4, 572.3, 0.9]
# index:   0    1     2     3

**Answer:**

<details>
<summary>Exercise 1.1 - Click here for answer</summary>
Exercise 1.1

The word being predicted is "Orange", because the highest logit value in `logits` (which is `572.3`) is occuring at the index `idx = 2` which corresponds to the index of the word "Orange".

In other words, `LLM_VOCABULARY[ index of highest logit value] == Orange`.
</details>


---

## Getting the next word prediction with code

We can code a function in Python to get the index of the highest value in `logits` as follows:

```python
def get_index_of_highest_value(logits_list):
    idx_highest, value_highest = max(enumerate(logits_list), key=lambda x: x[1])

    return idx_highest
```

There is also a helpful implementation in most array libraries, such as NumPy, which is universally called `argmax` and which we will use from now on:

```python
idx_highest = np.argmax(logits_list)
```




### Exercise 1.2 - Using argmax in code

- Use both approaches outlined above to obtain with code the same answer you found earlier for the sample list of words and logits

In [81]:
def get_index_of_highest_value(logits_list):
    idx_highest, value_highest = max(enumerate(logits_list), key=lambda x: x[1])

    return idx_highest

# use the get_index_of_highest_value() function on the previous 'logits' to print the same result found manually earlier
idx_highest_with_function = get_index_of_highest_value(logits) 

# also use the in-built NumPy argmax function
idx_highest_with_argmax = np.argmax(logits)

# check that both answers agree with the result you found earlier:
print("Following values should be equal to each other:", idx_highest_with_function, idx_highest_with_argmax)

# check that using this index on the LLM_VOCABULARY gives the word result you found earlier:
next_word_prediction = LLM_VOCABULARY[ idx_highest_with_argmax ]
print("LLM next word prediction is :", next_word_prediction)

Following values should be equal to each other: 2 2
LLM next word prediction is : Orange


# 2 First steps with structured generation

Structured generation involves **modifying or postprocessing the `logits` list to reflect a requirement or constraint**

For the rest of this section, we are going to work with **GPT-000** which has an `LLM_VOCABULARY` of `V = 37` words.


In [82]:
GPT_000_LLM_VOCABULARY = [
    "aeroplane", "arsenic",
    "book",
    "car", "crepe",
    "digital",
    "elephant",
    "football",
    "giraffe",
    "helicopter", "hamburger",
    "it", "internet",
    "japan",
    "kanban",
    "laptop",
    "microscope", "mercury", "mobile",
    "nitrogen", "no",
    "open",
    "pterodactyl", "pizza",
    "queen",
    "road",
    "server", "salad",
    "telephone",
    "university", "uranium",
    "voice",
    "water",
    "xylem", "xray",
    "yes",
    "zebra",
]

In [83]:
print(len(GPT_000_LLM_VOCABULARY))

37


## First use case for structured generation and GPT-000

Pretend that we prompt **GPT-000** to generate a suggestion for what to eat for lunch. Let's see how GPT-000 behaves:

In [84]:
pretend_prompt = "Hello GPT-000, please give me a suggestion for my lunch today!"

# Pretend that you sent this prompt to GPT-000...
#
#    * time passes... *
#
#         * time passes... *
#
#                       * time passes... *
#
# At last, GPT-000 has calculated the follow logits in response to this prompt:

gpt_000_logits = [3.9, 7.4, 8.6, 5.1, 7.3, 3.7, 5.5, 6.2, 13890.3, 5.8, 6.7, 6.3, 2.6, 4.5, 8.5, 1.8, 0.7, 3.6, 4.2, 6.3, 8.7, 3.9, 1.5, 4.4, 7.6, 3.3, 8.6, 1.7, 7.5, 5.1, 4.0, 4.4, 2.8, 8.6, 0.1, 7.7, 8.6]

### Exercise 2.1 - So, what does GPT-000 want us to eat for lunch?

- Using the `argmax` function from previous section, find the index of the largest logit value
- Then, use this index to access the corresponding word in `LLM_VOCABULARY`

In [85]:
gpt_000_index_highest = "FIX_ME" # ???????????? use argmax here

gpt_000_next_word_prediction = "FIX_ME" # ???????????? use your index_highest result here

print("-- GPT-000 output --")
print("Sure! Here is what I suggest you eat for lunch:", gpt_000_next_word_prediction)

-- GPT-000 output --
Sure! Here is what I suggest you eat for lunch: FIX_ME


**Answer:**

<details>
<summary>Exercise 2.1 - Click here for answer</summary>
Exercise 2.1

```python
gpt_000_index_highest = np.argmax(gpt_000_logits)

gpt_000_next_word_prediction = GPT_000_LLM_VOCABULARY[gpt_000_index_highest]
```

**GPT-000 wants you to eat a giraffe !**
</details>


---

## Oh no! What a disaster!

It seems that GPT-000 was not trained very well, and is giving us bad predictions.

- One way that you could try to improve the results is with extensive prompt engineering: "You are an expert chef. Please give me a lunch suggestion. Please ensure that it is not a toxic chemical. Check your answer against endangered species list." etc.

**But structured generation offers a far more certain approach:**

1. if we **know that the answer should have a specific property**
2. and if we have access to the **list of all possible words in the LLM's vocabulary**
3. then we can **mask/filter all the possible words, and only keep the ones that satisfy the specific property that we require**

Let's walk through these steps

## Implementing single-word structured generation

In this specific case, we certainly want our answer to be **only an edible lunch**, so let's define this with a function:

```python
def word_represents_edible_lunch(word):
    if word in {
        "crepe",
        "hamburger",
        "pizza",
        "salad"
    }:return True
    else:
        return False
```

Now here comes the main idea:

1. **if we now apply this function to the `GPT_000_LLM_VOCABULARY` list, we will get a "MASK" of `True/False` values that is `True` only for the words that we want**
2. then, when we look at the LLM `logits` we can chose to **only select from the indices corresponding to the `True` positions in the MASK**

One way to implement the step 2 above is to set the logit values in `logits` that correspond to `False` positions to a **large negative value, which ensures they will never be selected by the `argmax` function later**

Let's do this in code now:



In [86]:
def word_represents_edible_lunch(word):
    if word in {
        "crepe",
        "hamburger",
        "pizza",
        "salad"
    }:return True
    else:
        return False
    
# -------------------------------------

# 1 - apply the selector function over all of the words in our vocabulary    
is_edible_lunch_mask = [word_represents_edible_lunch(word) for word in GPT_000_LLM_VOCABULARY]

# This is a list of True / False, of the same size V = len(GPT_000_LLM_VOCABULARY) as the LLM's vocabulary
# There should only be True in 4 places - the indices corresponding to crepe/hamburger/pizza/salad
#print(is_edible_lunch_mask)

# -------------------------------------

# 2 - use this mask to:
#     a) KEEP the logit value of a word, if that word is in a True is_edible_lunch position
#     b) MODIFY the logit value to -1000, if that word is in a False is_edible_lunch position
modified_gpt_000_logits = []

for idx, is_edible_lunch in enumerate(is_edible_lunch_mask):
    if is_edible_lunch:
        # case a)
        original_logit_value = gpt_000_logits[idx]
        # KEEP the logit value in this case
        modified_gpt_000_logits.append(original_logit_value)
    else:
        # case b)
        modified_logit_value = -1000
        # MODIFY the logit value to a large negative value, let's say: -1000
        modified_gpt_000_logits.append(modified_logit_value)

**Now, if we take the `argmax` prediction from `modified_gpt_000_logits`, we should be 100% sure of getting a tasty lunch suggestion:**

In [87]:
modified_index_highest = np.argmax(modified_gpt_000_logits)

modifed_next_word_prediction = GPT_000_LLM_VOCABULARY[modified_index_highest]

print("-- GPT-000 output --")
print("== NOW WITH ADDED STRUCTURED GENERATION ==")
print("Sure! Here is what I suggest you eat for lunch:", modifed_next_word_prediction)

-- GPT-000 output --
== NOW WITH ADDED STRUCTURED GENERATION ==
Sure! Here is what I suggest you eat for lunch: crepe


# 3 Structured generation for more than one word

In section 2, for pedagogical reasons, we focused on an LLM that generates only one "next word" and then stops. We were able to structure/constrain the generation of this word - in our case to make sure that only an edible lunch was suggested to the LLM user.

**The exact same principle can be used to structure the generation of longer LLM outputs, as we see now**

## Second example use case for structured generation

Suppose that, after our tasty lunch, we want GPT-000 to generate **a sentence where the first letter of each word matches the letter of our name**

Since our name has `N` letters, we are going to need to run GPT-000 for `N` generation steps.

**Here we build a pretend "generation" function that mimics the generation step of a real LLM. In our case, it is a completely random function - all that matters is that it will, when called, generate a list of `logits` of size `V = 37` containing float values.**

In [88]:
def get_logits_from_gpt_000(vocab):
    vocab_size = len(vocab)
    
    # Pretend that there is is a sophisticated LLM here that somehow generates the final output: logits
    # ---
    #
    #  Insert sophisticated LLM architecture 
    #
    # ---
    
    logits = [round(uniform(0, 9), 1) for _ in range(vocab_size)]

    return logits

Now, you can type your name and see if GPT-000 **without structured generation** is able to output a sentence that matches the letters in your name:

In [89]:
# Change this to any name you want (no spaces, just a -> z letters)
USER_NAME = "marcel"

USER_NAME = USER_NAME.lower()

# -----------------------------

pretend_prompt_for_user_name = "Generate a sentence where each word starts with the letters of my name!"

# Pretend you sent this prompt to GPT-000...
for current_letter in USER_NAME:
    
    # GPT-000 here generates some logits:
    current_step_logits = get_logits_from_gpt_000(GPT_000_LLM_VOCABULARY)
    
    # we take the argmax of these logits to get the corresponding next word prediction:
    next_word_prediction = GPT_000_LLM_VOCABULARY[np.argmax(current_step_logits)]

    # did GPT-000 follow the prompt well ????
    print(f"Current letter in name is: {current_letter} and GPT-000 generated the word: {next_word_prediction}") 

Current letter in name is: m and GPT-000 generated the word: microscope
Current letter in name is: a and GPT-000 generated the word: zebra
Current letter in name is: r and GPT-000 generated the word: giraffe
Current letter in name is: c and GPT-000 generated the word: internet
Current letter in name is: e and GPT-000 generated the word: it
Current letter in name is: l and GPT-000 generated the word: xray


Unless you are really, really, lucky you probably did not get the output that you wanted - you got a list of words that **did not** match the initial letters of your name. Let's see how structured generation can help us now.

## Implementing structured generation for the sentence with initial letters

To solve our task, we reuse the same ideas as for the "single-word" lunch-idea generation task earlier.

Recall that we used a function to identify all valid options : "edible lunch words" in the LLM vocabulary.

Here the difference is that **the list of all valid options CHANGES AT EACH STEP of the generation process**

In other words, the "MASK" function that we apply to the LLM vocabulary **changes at each step, since the current letter that we want to generate varies as we move through the `USER_NAME`**

---

Let's implement this in code.

First, we introduce another NumPy function `np.where()` - we can use it to easily implement:

- the `True/False` masking behavior that we built earlier
- setting the logit values corresponding to the `False` mask values to `-1000` 

In [90]:
# Using np.where()
modified_gpt_000_logits_with_npwhere = np.where(is_edible_lunch_mask, # <--- "If this list contains True, then..."
                                                gpt_000_logits,       # <--- "...use the value in this list, else ..."
                                                -1000)                # <--- "...set to this value instead."

# check that the np.where() approach gives the same answer as our first approach
# (we need to convert the NumPy array to a Python list):
list(modified_gpt_000_logits_with_npwhere) == modified_gpt_000_logits

True

Now let's build our structured generation for our "sentence with initial letters" task.

1. for each letter in our `USER_NAME`, use the "MASK" approach and create a list of words in the vocabulary that start with this letter
2. generate the "baseline" GPT-000 `logits`
3. use `np.where()` function with this specific MASK from step 1 to apply the structured generation to the `logits`
4. get the next word prediction from step 3

In [91]:
for current_letter in USER_NAME:
    # 1. build the mask for this current letter:
    # 1. find all words the LLM vocabulary that start with current letter
    current_mask = [word.startswith(current_letter) for word in GPT_000_LLM_VOCABULARY]

    # 2. generate the baseline GPT-000 logits
    # (this step is identical/unchanged from earlier)
    current_step_logits = get_logits_from_gpt_000(GPT_000_LLM_VOCABULARY)

    # 3. use np.where() with the current mask to select the "allowed" logits
    modifed_logits = np.where(current_mask, current_step_logits, -1000)

    # 4. a) this is the UNSTRUCTURED prediction
    next_word_prediction = GPT_000_LLM_VOCABULARY[np.argmax(current_step_logits)]
    # 4. b) this is the STRUCTURED prediction using the modified logits
    structured_next_word_prediction = GPT_000_LLM_VOCABULARY[np.argmax(modifed_logits)]

    # check outputs:
    print(f"Current letter: {current_letter} | Structured prediction: {structured_next_word_prediction} | Unstructured prediction: {next_word_prediction}")


Current letter: m | Structured prediction: microscope | Unstructured prediction: open
Current letter: a | Structured prediction: arsenic | Unstructured prediction: book
Current letter: r | Structured prediction: road | Unstructured prediction: laptop
Current letter: c | Structured prediction: car | Unstructured prediction: yes
Current letter: e | Structured prediction: elephant | Unstructured prediction: mercury
Current letter: l | Structured prediction: laptop | Unstructured prediction: it


**In the above output cell, you should find that each structured prediction word does indeed begin with the correct letter.**

# 4 Structured generation with complex examples

Up to now we have chosen simple pedagogical examples to illustrate the principles of structured generation, and how modifying the `logits` from our LLM can work.

But there are 2 observations we can make about the limitations so far:

- the examples are fun but not necessarily "complex" enough to match real world uses of LLMs
- the implementation of MASKs at each step, and the function we need to define for our condition, is not very optimised

The goal of this section is to approach the idea of a **grammar or programmatic approach** for implementing the ideas we have discussed so far, and to see how this maps on to several "real world" example use cases.

## Structural generation following a specified grammar

**TODO: short explanation of grammar / CFG notions**

We will show here how to solve one of the most requested features for real LLMs (and common internet search in 2023 for GPT-3): **"How can I make GPT output a JSON object?"**

We first need to slightly extend our LLM's vocabulary so that it includes more "words" - the ones that appear in JSON file format:



In [92]:
GPT_000_LLM_VOCABULARY.extend([
    "{", "}",
    "\"",
    ":[", "]",
    ",",
])

## Grammar via state transitions

A "valid JSON" is determined by a grammar - here we use a somewhat simplified JSON that only contains strings or lists of strings:

- Each JSON must start with an open bracket symbol, `{`.
- Then it may either contain a new key string, `"first_key_in_my_json"`, or it may close the bracket `}`.
- If we create a new key at the previous step, then we must create the value associated with that key: `"first_key_in_my_json" : ["val1", "val2", ...]`
- After creating the list of values, we can create another key, or close the bracket `}`.

Below I have encoded all the allowed transitions between states in the dictionary `STATE_TRANSITIONS`.

The rules here are a bit artificial, just to make the next section easier - in practice, your LLM will have a separate word for `[ : ] { } "` etc

**Importantly, we must have a special `START` and `END` state, so that the generation of the JSON does actually terminate.**

In [93]:
STATE_TRANSITIONS = {
    "START_OF_GENERATION": ["open_bracket"],
    "open_bracket": ["open_quotation_key", "close_bracket"],
    "open_quotation_key": ["new_key"], 
    "new_key": ["close_quotation_key"],
    "close_quotation_key": ["colon_open_square_bracket"],
    "colon_open_square_bracket": ["open_quotation_value", "close_square_bracket"],
    "open_quotation_value" : ["value"],
    "value": ["close_quotation_value"],
    "close_quotation_value": ["comma_within_value_list", "close_square_bracket"],
    "comma_within_value_list": ["open_quotation_value"],
    "close_square_bracket": ["comma_between_entries", "close_bracket"],
    "comma_between_entries": ["open_quotation_key"],
    "close_bracket": ["END_OF_GENERATION"],    
}

The weights below are just to mimic the behavior of a real LLM - you don't need to worry about them (it's just to ensure that we have a high probability of generating JSONs with lots of words in them rather than just `{}`.)

In [94]:
STATE_TRANSITIONS_WEIGHTS = {
    "START_OF_GENERATION": [1.0],
    "open_bracket": [0.8, 0.2],
    "open_quotation_key": [1.0], 
    "new_key": [1.0],
    "close_quotation_key": [1.0],
    "colon_open_square_bracket": [0.8, 0.2],
    "open_quotation_value" : [1.0],
    "value": [1.0],
    "close_quotation_value": [0.8, 0.2],
    "comma_within_value_list": [1.0],
    "close_square_bracket": [0.8, 0.2],
    "comma_between_entries": [1.0],
    "close_bracket": [1.0],    
}

## The grammar of states defines the masks that we will use at each generation step

In the examples in Section 2 and 3, we only needed to use **one** function (the "does word correspond to edible lunch" function, and the "does word start with `current_letter`" function respectively) to define a mask for the vocabulary of allowed words.

Here we will need many different functions: **depending on which state we are in, we apply a corresponding function to the LLM vocabulary**

In [95]:
MASKS_FOR_STATES = {
    "open_bracket": [word == "{" for word in GPT_000_LLM_VOCABULARY],
    "open_quotation_key": [word == "\"" for word in GPT_000_LLM_VOCABULARY],
    "new_key": [word.isalpha() for word in GPT_000_LLM_VOCABULARY],
    "close_quotation_key": [word == "\"" for word in GPT_000_LLM_VOCABULARY],
    "colon_open_square_bracket": [word == ":[" for word in GPT_000_LLM_VOCABULARY],
    "open_quotation_value": [word == "\"" for word in GPT_000_LLM_VOCABULARY],
    "value": [word.isalpha() for word in GPT_000_LLM_VOCABULARY],
    "close_quotation_value": [word == "\"" for word in GPT_000_LLM_VOCABULARY],
    "comma_within_value_list": [word == "," for word in GPT_000_LLM_VOCABULARY],
    "close_square_bracket": [word == "]" for word in GPT_000_LLM_VOCABULARY],
    "comma_between_entries": [word == "," for word in GPT_000_LLM_VOCABULARY],
    "close_bracket": [word == "}" for word in GPT_000_LLM_VOCABULARY],
}

## Structured generation with the JSON grammar

The "LLM" is unchanged: GPT-000 still generates, at each time step, a `logits` list which we pretend is the result of calling `get_logits_from_gpt_000()`

Now we implement structured generation with the tools defined above:

- we start the generation process and we track the `current_state` that we are in
- from the `current_state` we get the list of all possible "successor" states (this corresponds to our understanding of what "correct JSON" structure)
- we get the corresponding mask to use at this generation step

**We continue this generation process until we reach the special `END_OF_GENERATION` state, which occurs when the LLM has outputted a `}` symbol, since this defines the end of a valid JSON in our grammar**



In [96]:
current_state = "START_OF_GENERATION"

generated_string = ""
structured_generated_string = ""

while True:
    # 1 - get the list of all possible successor states from the current_state:
    successor_states = STATE_TRANSITIONS[current_state]
    transistion_weights = STATE_TRANSITIONS_WEIGHTS[current_state]
    
    # 2 - here we model the LLM's structured generation:
    # 2/a) We want the LLM to only chose a grammatically allowed next word, and...
    # 2/b) ... we need to track the "type" of word, so that we know which states are allowed in subsequent steps.
    # => we implement this as follows:
    # 2/1) - randomly select one of the successor_states
    # (use weights to make it more likely to generate large JSON files, just for pedagogical reasons)
    current_state = choices(population=successor_states, weights=transistion_weights, k=1)[0]
    
    # 2/2) - check if we have reached the END_OF_GENERATION state; if so we exit
    if current_state == "END_OF_GENERATION":
        break

    # == Steps below are exactly the same as from Section 3 and the "alphabet sentence" example ==
    # 2/3) - do all the steps from section 3 as before:
    
    # -- apply the mask corresponding to this state
    current_mask = MASKS_FOR_STATES[current_state]

    # -- generate the baseline GPT-000 logits
    current_step_logits = get_logits_from_gpt_000(GPT_000_LLM_VOCABULARY)

    # -- use np.where() with the current mask to select the "allowed" logits
    modifed_logits = np.where(current_mask, current_step_logits, -1000)

    #-- get the structured prediction using the modified logits, and the UNstructured for comparison
    next_word_prediction = GPT_000_LLM_VOCABULARY[np.argmax(current_step_logits)]
    generated_string += next_word_prediction + " " # add space for readability
    structured_next_word_prediction = GPT_000_LLM_VOCABULARY[np.argmax(modifed_logits)]
    structured_generated_string += structured_next_word_prediction

Let's compare the results - first, look at the output that would have been generated **without the structured grammar:**

In [97]:
generated_string

'server road } car internet internet yes open , football aeroplane arsenic ] salad salad ] road arsenic { water pizza zebra :[ kanban mobile mercury zebra crepe arsenic it } } yes ] digital " mercury it kanban server university queen server telephone microscope microscope water { open car nitrogen , car open nitrogen mercury microscope car football internet nitrogen arsenic helicopter xray elephant microscope { football uranium mobile " internet football " laptop arsenic giraffe salad open xray salad voice car zebra book open nitrogen ] open pterodactyl server water elephant pterodactyl microscope zebra mobile book { elephant arsenic giraffe salad crepe hamburger it pizza mercury pizza { crepe pterodactyl pterodactyl crepe laptop server xylem mobile no nitrogen internet football internet university crepe { open nitrogen telephone aeroplane xray salad digital zebra university aeroplane ] zebra , open telephone book zebra zebra arsenic giraffe internet japan :[ internet aeroplane japan h

Now let's look at the **JSON structured generation** - does it indeed represent a valid JSON object ?

In [98]:
structured_generated_string

'{"crepe":["yes","aeroplane","salad","queen","hamburger","zebra"],"elephant":["helicopter","server","telephone","internet","telephone","mercury","internet","xray","football","internet","arsenic"],"salad":["book","open"],"pterodactyl":["book","giraffe","it","zebra","crepe","mobile"],"internet":[],"telephone":["digital","book","telephone","arsenic","japan","helicopter","it","elephant","laptop","pizza","internet","no","laptop","internet","road","road","yes","hamburger","open","elephant"],"giraffe":["pizza","internet","digital","internet","mercury"],"laptop":[],"queen":["uranium"],"japan":["giraffe","arsenic","microscope","uranium","digital","xylem"],"giraffe":["salad"]}'

## Testing the JSON quality

It looks good! Let's test to see if the strings can indeed be parsed directly as JSON.

First, the unstructured output:

In [99]:
# test if JSON loads both strings
test_generated_string = json.loads(generated_string)

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

As expected, the LLM didn't generate a valid JSON without out structured generation intervention.

Now, let's check our result **with** the structured generation tools:

In [100]:
test_structured_generated_string = json.loads(structured_generated_string)

test_structured_generated_string

{'crepe': ['yes', 'aeroplane', 'salad', 'queen', 'hamburger', 'zebra'],
 'elephant': ['helicopter',
  'server',
  'telephone',
  'internet',
  'telephone',
  'mercury',
  'internet',
  'xray',
  'football',
  'internet',
  'arsenic'],
 'salad': ['book', 'open'],
 'pterodactyl': ['book', 'giraffe', 'it', 'zebra', 'crepe', 'mobile'],
 'internet': [],
 'telephone': ['digital',
  'book',
  'telephone',
  'arsenic',
  'japan',
  'helicopter',
  'it',
  'elephant',
  'laptop',
  'pizza',
  'internet',
  'no',
  'laptop',
  'internet',
  'road',
  'road',
  'yes',
  'hamburger',
  'open',
  'elephant'],
 'giraffe': ['salad'],
 'laptop': [],
 'queen': ['uranium'],
 'japan': ['giraffe', 'arsenic', 'microscope', 'uranium', 'digital', 'xylem']}

## Conclusion

We did it - we wrote our own grammar and now we can be sure that, **with any LLM where we can access the `logits` programmatically**, we can ensure **100% of the time that we will get a valid JSON object in return!**

As a summary, now that we have worked through several examples, we can present a short conceptual overview:

1. define a grammar that captures the structure of your required LLM output
2. use this grammar to define a collection of masks that interact with the LLM's vocabulary
3. let the LLM generate `logits` and modify them at each step with the grammar and the masks you defined in steps 1 and 2

---

# Discussion questions

To test your understanding, using the techniques seen here and the 3-step summary process above, explain how would you use structured generation to:

- Ensure that your LLM app never uses any profanity?
- Generate a sentence in which words never appear more than once?
- Make your LLM generate valid SQL queries?

More sophisticated:

- Make your LLM generate a story where paragraphs progressively change sentiment from happy to sad for example?
- Make an LLM that generates exact verbatim quotes from a given source text?

---

# Improved implementations and going further

- libraries: Outlines, Marvin, LMQL
- grammar libraries: LlamaCPP
- `LogitsProcessor` in HuggingFace

### Theory

- bibliography
- shifting/adjusting probability distribution

---

# Extra material

**TODO: timing, but could do a real example - load e.g. GPT-2 but need people to install libraries**

**TODO: example with a library ?**
