# Programmatic LLM Usage

## Convincing LLMs to produce structured valid JSON output

Looking at various techniques to produce structured LLM Output

There's huge potential for the future of computing in offloading logic to LLMs, however, this is currently a developing use case of the technology with many pitfalls. However, if we can offload application code logic, software applications could request structured output from models such as JSON and use this output instead of the application code generating the JSON.

* Security considerations: think about the possibilities of `eval()` and other issues if the output of the LLM contained dangerous computer code
* LLMs approximate text, so they can approximate JSON. However, there is no well known current method to force the LLM to always generate the same format. The LLM may generate the exact format the computer program needs 99 out of 100 times and cause the program to crash on the 100th time.
* This lack of guarantee can create a multiplicative error scenario. If an application uses an LLM which has a 1% error rate at generating proper JSON, if that application uses it 10 times in it's workflow, there is now a potential for a 10% error rate of the overall application.

We will explore some of the popular techniques for managing the output of LLMs. While there is huge potential for this technology, the safety and usefulness of this technique may vary by use case. For creative applications such as video games, perhaps using retries to reduce the error rate could suffice and add creativity to the game in a way traditional code may not be able to. For critical infrastructure or applications this could be a very dangerous shortcut to writing and testing code.

## OpenAI JSON Output

Some OpenAI Models including 4o support JSON mode. See documentation here: https://platform.openai.com/docs/guides/json-mode

This requires that the system prompt or user prompt have the word JSON in it as well as the following object added to the API request: `"response_format":  {"type": "json_object"}`

The Python Library for OpenAI also has support for this feature.

OpenAI guarantees that the model will only produce valid JSON. OpenAI doesn't explain how they achieved this, although some blog posts and open source projects hint towards custom LLM grammar or transformer modifications: https://github.com/1rgs/jsonformer

Important Notes:
* Valid JSON is guaranteed
* While you can use prompting to suggest a JSON schema, the model makes **no guarantee that it will follow the schema that you specify**. This is a significant shortcoming of this approach, as the model may not consistently output the correct schema. If it works 99 out of 100 times is that good enough? Consider other augmented approaches.


In [7]:
%%bash
curl --location 'https://api.openai.com/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
     "model": "gpt-4o-mini",
     "messages": [
         {"role": "system", "content": "You are a computer program with the knowledge of a very intelligent yet sarcastic human comedian. Because you are a computer program, you only return valid JSON so that other computers can interpret your response. Additionally, you will only return JSON in the object structure specified by the user."},
         {"role": "user", "content": "Tell me that today is a good day as a greeting. You will craft your response in the following JSON format: {\"greeting_type\": \"Day or Evening\",\"message\": \"your greeting message\"}"}
         ],
     "temperature": 1.0,
     "response_format":  {"type": "json_object"}
   }'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1312    0   602  100   710    368    434  0:00:01  0:00:01 --:--:--   802


{
  "id": "chatcmpl-9t2vxy2FHao02qt0NpCGJjQeqjCw2",
  "object": "chat.completion",
  "created": 1722906345,
  "model": "gpt-4o-mini-2024-07-18",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "{\n  \"greeting_type\": \"Day\",\n  \"message\": \"Today is a good day, unless you accidentally step on a LEGO. Then, not so much.\"\n}"
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 111,
    "completion_tokens": 36,
    "total_tokens": 147
  },
  "system_fingerprint": "fp_48196bc67a"
}


## This worked well, although how well does it follow the supplied schema?

In [29]:
%%bash
curl --location 'https://api.openai.com/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
     "model": "gpt-4o-mini",
     "messages": [
         {"role": "system", "content": "You are a computer program with the knowledge of a very intelligent yet sarcastic human comedian. Because you are a computer program, you only return valid JSON so that other computers can interpret your response. Additionally, you will only return JSON in the object structure specified by the user."},
         {"role": "user", "content": "Tell me that today is a good night as a greeting. You will craft your response in the following JSON format: {\"greeting_type\": \"Day or Night Greeting\",\"message\": \"your greeting message\"}"}
         ],
     "temperature": 1.0,
     "response_format":  {"type": "json_object"}
   }'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1329    0   610  100   719    332    392  0:00:01  0:00:01 --:--:--   724


{
  "id": "chatcmpl-9t36JsMfbm8MtsckZhXlzVZWcveUe",
  "object": "chat.completion",
  "created": 1722906987,
  "model": "gpt-4o-mini-2024-07-18",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "{\n  \"greeting_type\": \"Day or Night Greeting\",\n  \"message\": \"Good night! May your dreams be as delightful as a cat video marathon!\"\n}"
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 107,
    "completion_tokens": 34,
    "total_tokens": 141
  },
  "system_fingerprint": "fp_48196bc67a"
}


## Will we get different results from a different model?

Using full GPT 4o instead of 4o mini, we see that 4o correctly guesses our intent in the JSON structure

In [41]:
%%bash
curl --location 'https://api.openai.com/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
     "model": "gpt-4o",
     "messages": [
         {"role": "system", "content": "You are a computer program with the knowledge of a very intelligent yet sarcastic human comedian. Because you are a computer program, you only return valid JSON so that other computers can interpret your response. Additionally, you will only return JSON in the object structure specified by the user."},
         {"role": "user", "content": "Tell me that today is a good day as a greeting. You will craft your response in the following JSON format: {\"greeting_type\": \"Day or Evening\",\"message\": \"your greeting message\"}"}
         ],
     "temperature": 1.0,
     "response_format":  {"type": "json_object"}
   }'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1310    0   605  100   705    188    219  0:00:03  0:00:03 --:--:--   407


{
  "id": "chatcmpl-9t3CpC4fA4XbMd0rNks0sxlGj4Zgn",
  "object": "chat.completion",
  "created": 1722907391,
  "model": "gpt-4o-2024-05-13",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "{\n  \"greeting_type\": \"Day\",\n  \"message\": \"Today is a good day because you haven't broken the internet yet. Keep up the good work!\"\n}"
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 106,
    "completion_tokens": 35,
    "total_tokens": 141
  },
  "system_fingerprint": "fp_c9aa9c0491"
}


## OK, why not just use the Function Call capabilities of OpenAI or an Agent Instead?

## Let's try a different approach - structured output from LangChain

To mix things up some more, let's use Ollama as well

LangChain adds some syntactic sugar to the prompt and query, which makes it a bit easier to re-use prompts across your application.

The output parser takes the results of the model and forms JSON.

In [None]:
%pip install langchain_core
%pip install langchain_ollama
%pip install langchain

## Langchain overview

Chains use overloaded | operator to send the output of the first object to the next. In the below example, it computes the prompt, then sends it to the model.

Example from: https://python.langchain.com/v0.2/docs/integrations/llms/ollama/

In [49]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama.llms import OllamaLLM

template = """Question: {question}

Answer: Let's think step by step."""

prompt = ChatPromptTemplate.from_template(template)

model = OllamaLLM(model="mixtral:8x7b")

chain = prompt | model

chain.invoke({"question": "What is most efficient way to solve Dijkstra's algorithm in win32 assembly?"})

" Solving Dijkstra's algorithm in Win32 Assembly is quite a challenging task due to the low-level nature of assembly language and the complexity of Dijkstra's algorithm. However, let's break it down:\n\n1. **Understanding Dijkstra's Algorithm**: Before writing any code, you need to understand how Dijkstra's algorithm works. It's a pathfinding algorithm that calculates the shortest path between nodes in a graph.\n\n2. **Setting Up Your Assembly Environment**: You'll need an assembler for Win32, such as MASM (Microsoft Macro Assembler) or NASM (Netwide Assembler). You'll also need a text editor to write your assembly code and a linker to convert your assembly code into machine code.\n\n3. **Data Structures**: You'll need to define data structures for your graph. This could be arrays of nodes, where each node contains a value for the distance and pointers to adjacent nodes.\n\n4. **Implementing Dijkstra's Algorithm**: The core of your program will be implementing Dijkstra's algorithm. Thi

## Using the JSON Output Parser

The JSON Output Parser allows us to take the result of a model call and parse it as JSON

In [50]:
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_ollama.llms import OllamaLLM

model = OllamaLLM(model="mixtral:8x7b")

# Define your desired data structure.
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")

# And a query intended to prompt a language model to populate the data structure.
joke_query = "Tell me a joke."

# Set up a parser + inject instructions into the prompt template.
parser = JsonOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt #| model | parser

chain.invoke({"query": joke_query})

StringPromptValue(text='Answer the user query.\nThe output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"setup": {"title": "Setup", "description": "question to set up a joke", "type": "string"}, "punchline": {"title": "Punchline", "description": "answer to resolve the joke", "type": "string"}}, "required": ["setup", "punchline"]}\n```\nTell me a joke.\n')

In [62]:
chain = prompt | model #| parser

result = chain.invoke({"query": joke_query})

print(result)

type(result)

 {
  "setup": "Why don't scientists trust atoms?",
  "punchline": "Because they make up everything!"
}


str

In [61]:
chain = prompt | model | parser

result = chain.invoke({"query": joke_query})

print(result)

type(result)

{'setup': "Why don't scientists trust atoms?", 'punchline': 'Because they make up everything!'}


dict

## Retry Parser

It is possible that the model, despite the best efforts of the prompt engineering, does not produce valid JSON. It could also fail due to a timeout. Retry Parsing allows you to specify a max number of additional calls to make to the model to retry generating valid JSON.

The retry logic does a bit more than just a retry, it passes in the prompt (as well as the original output) to try again to get a better response.

https://python.langchain.com/v0.1/docs/modules/model_io/output_parsers/types/retry/