<a href="https://colab.research.google.com/github/Mayo-Radiology-Informatics-Lab/MIDeL/blob/dev/chapters/17B.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Obtaining Consistent LLM Outputs: From Chaos to Clarity
## How Instructor and Pydantic Models Can Help

In this notebook, we will describe how to use the Instructor and Pydantic libraries to help increase the consistency of Large Language Model (LLM) output, which should increase the usefulness and accuracy.

The main goal of this notebook is to provide you with a step-by-step guide on how to improve data consistency when working with LLMs. We will specifically focus on using the Pydantic and Instructor packages to validate JSON responses from an LLM.

### Introduction

`instructor` is a lightweight Python library that provides a convenient wrapper around the client of the OpenAI compatible servers, adding validation of JSON responses from an LLM. Instructor uses the Pydantic library, which allows users to specify models for JSON schemas and data validation, ensuring that LLM responses adhere to the defined schema.


#### Key Features
- **Easy integration** Seamlessly integrates with several LLMs beyond OpenAI. See:
    - Working with different providers: https://jxnl.github.io/instructor/hub/
    - Examples: https://jxnl.github.io/instructor/examples/
    - If needed, you can also use [ChatGPT-instructor](https://chatgpt.com/g/g-EvZweRWrE-instructor-gpt/) to get a code snippet.
- **Data validation**: Ensure the JSON response from a LLM meets the specified schema. See:
    - https://docs.pydantic.dev/latest/
- **Retry Management**: Retries with error guidance if the LLM returns invalid responses. You can set the maxium number of retries.
- **Streaming Support**: Work with Lists and Partial responses effortlessly



#### Concept
<img src="https://raw.githubusercontent.com/lennartpollvogt/ollama-instructor/main/Concept.png" alt="Concept Image" width="60%">



By using the Instructor package, you can have full control over agent flows without relying on complex agent frameworks. It serves as a starting point for building your own agents and ensures that the responses from LLMs are consistent and conform to the defined schema.

In the next sections, we will walk through the steps involved in enhancing data consistency using Pydantic models and the Instructor package. We will cover topics such as port forwarding, installation, creating the client, defining the response model, prompting, and more.

Let's dive in and explore the power of Pydantic models and the Instructor package in achieving data consistency in language model applications!

## Step 1: Create a LLM server with ollama

To run this notebook, we need to have a OpenAI Compatible server. You can connect you own OpenAI account, huggingface CLI or use a local server. In the next cell, we will create an LLM server running on colab so that you dont' need to use any of the prior options.
> Note: If you are running this code on the Google Colab, be sure to check if you have a GPU (Runtime menu->`Change runtime type`->`gpu T4`).

In [1]:
# Download and install Ollama which will serve the LLM
!curl -fsSL https://ollama.com/install.sh | sh

>>> Downloading ollama...
############################################################################################# 100.0%
>>> Installing ollama to /usr/local/bin...
>>> Creating ollama user...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


In [2]:
# Importing nesseracy libraries
import subprocess
import time

In [3]:
# Start ollama in the background and use llama3.1 model

# Start the process in the background
server = subprocess.Popen(['ollama', 'serve'])
time.sleep(60) # To make sure ollama is ready in subsequent cell if you are running all not cell at a time

# To kill the server
# server.kill()

# To see all the models available: https://ollama.com/library
MODEL = 'llama3.1'
llama3 = subprocess.Popen(['ollama', 'run', MODEL])
time.sleep(90) # Make sure ollama is ready in subsequent cell if you are running all not cell at a time

# To kill the llama3
# llama3.kill()

In [25]:
# show which model(s) ollama is serving
!ollama list

NAME           	ID          	SIZE  	MODIFIED      
llama3.1:latest	91ab477bec9d	4.7 GB	7 minutes ago	


## Step 2: Installation and Creating the Client

#### Installation
To install 'instructor', run the following command in your terminal:

In [5]:
! pip install instructor pydantic rich PyYAML tqdm

Collecting instructor
  Downloading instructor-1.3.7-py3-none-any.whl.metadata (14 kB)
Collecting jiter<0.5.0,>=0.4.1 (from instructor)
  Downloading jiter-0.4.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.6 kB)
Collecting openai<2.0.0,>=1.1.0 (from instructor)
  Downloading openai-1.40.2-py3-none-any.whl.metadata (22 kB)
Collecting tenacity<9.0.0,>=8.4.1 (from instructor)
  Downloading tenacity-8.5.0-py3-none-any.whl.metadata (1.2 kB)
Collecting httpx<1,>=0.23.0 (from openai<2.0.0,>=1.1.0->instructor)
  Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai<2.0.0,>=1.1.0->instructor)
  Downloading httpcore-1.0.5-py3-none-any.whl.metadata (20 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai<2.0.0,>=1.1.0->instructor)
  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Downloading instructor-1.3.7-py3-none-any.whl (56 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [26]:
# Importing the libraries
import yaml
import json
import csv
import os
import time
import logging
from pydantic import BaseModel, Field, create_model, StringConstraints
from pydantic.config import ConfigDict
from typing import List, Literal, Optional, Any, Dict
from concurrent.futures import ThreadPoolExecutor, as_completed
from tqdm import tqdm

import instructor
from openai import OpenAI

import rich

#### Adding YAML Configuration
In the next code block, we're introducing a YAML file to manage our configuration settings. YAML (YAML Ain't Markup Language) is a human-readable data serialization format that's particularly useful for configuration files.

In the upcoming code, we'll load this YAML file and extract various configuration elements such as general settings, messages, variables, and examples. This approach will make our project more modular and easier to manage as it grows in complexity.

In [27]:
# Download the YAML file
yaml_file_id = '1uOSeDoqDPVYlkjiSV69JvVi87d0zEXjz'
!gdown https://drive.google.com/uc?id=$yaml_file_id -O ExtractionConfig.yaml

# if you want to see what is in the YAML file, run this:
!cat ExtractionConfig.yaml

Downloading...
From: https://drive.google.com/uc?id=1uOSeDoqDPVYlkjiSV69JvVi87d0zEXjz
To: /content/ExtractionConfig.yaml
  0% 0.00/6.07k [00:00<?, ?B/s]100% 6.07k/6.07k [00:00<00:00, 26.9MB/s]
metadata:
  title: "Ablation Report Tag Extraction Schema"
  version: "0.1.1"
  author: "Ali Ganjizadeh"
  created_date: "2024-07-29"
  last_update: "2024-07-31"
  organization: "Mayo Clinic AI Lab"

config: 
  host: "http://localhost:11434"
  api_key: "ollama"
  model: "llama3.1"
  seed: 42
  temperature: 0.0
  max_retries: 5
  concurrency: 10

messages:
  system_prompt: "You are an expert in extracting data from radiology reports with 20 years of experience."
  user_prompt: |
    Extract data elements from the MR guided ablation report in the <report> tag:
    
    Guidelines:
    - Focus on findings at the time of scan, not previous ones.
    - If information is not mentioned, use 'Not Mentioned'.
    - Ignore irrelevant information.
    - Use only the provided output format.
    - Expand ab

>Note: You are able to change the content and variables of the YAML file.

In [28]:
# loading the yaml file which includes the configuration
yaml = yaml.safe_load(open("ExtractionConfig.yaml"))

CONFIG, MESSAGES, VARIABLES, EXAMPLES = yaml['config'], yaml['messages'], yaml['variables'], yaml['examples']

In [29]:
# Setting main variables based on the yaml file
HOST = CONFIG['host']
API_KEY = CONFIG['api_key']
MODEL = MODEL # ERROR--yaml has llama3-70b-instruct

SEED = CONFIG['seed'] # random seed
TEMP = CONFIG['temperature'] # temperature
MAX_RETRIES = CONFIG['max_retries'] # Number of retries that the instructor tries to validate the response

And finally, it is time to create the client. `client` is responsible to contact the LLM server we have and return the response. In this notebook we are using OpenAI compatible server/clients. In case you are using `ollama`, it won't change the process.

In [30]:
# Create the client
client = instructor.from_openai(
    OpenAI(
        base_url=f"{HOST}/v1",
        api_key=API_KEY,  # required, but unused
    ),
    # mode: for more information: https://jxnl.github.io/instructor/concepts/patching/
    mode=instructor.Mode.TOOLS, # TOOLS will enable the model to answer in free form
)

## Step 3: Define the Response Model
In this notebook, we are focusing on a simple question answering (QA) task. For additional use cases, refer to [The cookbooks](https://jxnl.github.io/instructor/examples/), or try [Instructor GPT](https://chatgpt.com/g/g-EvZweRWrE-instructor-gpt). These resources provide various examples demonstrating how to use Instructor in different scenarios.

As a simple test example, let's prompt the LLM to extract specific pieces of information from text we supply. We will then compare its answer with what we know to be the answer.
In the next cell, we write a test `system prompt` (which sets the personality or backstory of our LLM instance) and a test `user prompt` (which is the main task or request we are making, including guidelines of how to create or format the answer). We will test these prompts on a sample report that we have in the yaml file.


In [31]:
# Test prompts
test1_system_prompt = "You are an expert in extracting data from radiology reports with 20 years of experience."
test1_user_prompt = """
Extract data elements from the MR guided ablation report in the <report> tag:

    Guidelines:
    - Focus on findings at the time of scan, not previous ones.
    - If information is not mentioned, use 'Not Mentioned'.
    - Ignore irrelevant information.
    - Use only the provided output format.
    - Expand abbreviations: sv (seminal vesicle), uvj (urethro vesicular junction), vuj (vesico urethral junction), VM (vascular malformation), US (ultrasound), LN (lymph node), CT (computed tomography), MRI (magnetic resonance imaging).

    only return your answer in this json from and always include the <json></json> tag with your answer. Include these tags in your response:
    organ: Extract the organ where the ablation was performed. Indicate 'Not Mentioned' if not specified. Use the provided dictionary to expand abbreviations.
    location: the exact anatomical location of the tissue
    tissueType: Specify the tissue type ablated: 'Muscle', 'Nerve', 'Fat', 'Ligament', 'Tendon', 'Cartilage', 'Bone', or 'Not Mentioned'. You can choose multiple tissues.
    complications: Specify whether complications occurred: 'Yes', 'No', or 'Not Mentioned'
    """

# Loading a sample report from the yaml file
sample_report = yaml['sample_report']

In [32]:
# Let's ask the model:

# Creating the conversation for the model to pass report and instructions
messages = [
        {"role": "system", "content": test1_system_prompt},
        {"role": "user", "content": f"{test1_user_prompt} \n <report> {sample_report} </report>"}
    ]


# Asking the model to extract the requested information
resp = client.chat.completions.create(
        model=MODEL,
        response_model=str, # Accepting free from answers
        messages=messages,
        temperature=TEMP,
        seed=SEED,
        max_retries=MAX_RETRIES,
        )


rich.print('Output:', resp)

We clearly asked the LLM to give us an answer in json format, but it didn't! And every time that you run you query there is no guarantee that you get the same response structure. Therefore, we need another tools to force LLM and make sure always get a similar response structure.

#### Pydantic Models
Pydantic models are classes that inherit from pydantic.BaseModel. They offer several key benefits:

- **Data Validation**: Models automatically validate input data, ensuring that it conforms to the defined field types and constraints.
- **Type Hinting**: Models leverage Python's type annotations, providing clear type information for fields.
- **Serialization**: Models can easily convert to and from JSON, making them ideal for API development.
- **Schema Generation**: Pydantic can automatically generate JSON schemas from models, useful for documentation and API specifications.


To create a Pydantic model, simply define a class that inherits from `BaseModel`. In the next code block, fields can be customized using the `Field` function. We are also using `typing` package. With the combination of these two packages, we can force the LLM to only response in the desired format:
- `str`: Free from response. There is no limitation for the model. Although we can use max_length to limit the field.
- `Literal`: Imagine that is similar to multiple choice question. LLM can only choose one of them.
- `List`: LLM would return multiple objects in a list. We are using `List` in tandem with `Literal` to force LLM return in a specific terminology, like checking the checkboxes.

    **N.B.:** the term 'model' is used heavily in AI. When we refer to Pydantic 'model' we do not mean an AI or LLM model. Instead, it means a model of how the data should be represented.

In [33]:
# Define a "Test" response model to understand the pydantic models

class TestModel(BaseModel):
    # Each attribute has a description that will be used by the model to generate the response
    organ: str = Field(...,
        description="Extract the organ where the ablation was performed. Indicate 'Not Mentioned' if not specified. Use the provided dictionary to expand abbreviations."
    )
    location: str = Field(...,
        description="Extract the specific anatomical location within the organ where the ablation was performed. Indicate 'Not Mentioned' if not specified. Use the provided dictionary to expand abbreviations."
    )
    tissueType: List[Literal['Muscle', 'Nerve', 'Fat', 'Ligament', 'Tendon', 'Cartilage', 'Bone', 'Not Mentioned']] = Field(...,
        description="Specify the tissue type ablated: 'Muscle', 'Nerve', 'Fat', 'Ligament', 'Tendon', 'Cartilage', 'Bone', or 'Not Mentioned'. You can choose multiple tissues"
    )
    complications: Literal['Yes', 'No', 'Not Mentioned'] = Field(...,
        description="Specify whether complications occurred: 'Yes', 'No', or 'Not Mentioned'."
    )

    # We can include an example in the pydantic model. Therefore our LLM would have behave like a FewShot classification task.
    model_config = ConfigDict(
        json_schema_extra={
        'examples':
            [
                {
                    "organ": "Liver",
                    "location": "Dome",
                    "tissueType": "Bone",
                    "complications": "No",
                }
            ]
        }
    )

Let's print the response model to look in to it:

In [34]:
# Print the model fields and examples
rich.print('Test Response Model Fields:', TestModel.model_fields)
rich.print('Test Response Model Example:', TestModel.model_config)

Now that we have a test response model, let's ask the LLM again, but also give it this model for its reponse, so we also remove the response structure from the `user_prompt`. Note the 'response_model' parameter in the client.chat.completions.create function call. It was 'str' before, meaning it coul dbe any legal string value. By using TestModel as the response_model, the LLM will respond in a way that conforms to the TestModel. This time it should work!

In [35]:
# Test prompts
test2_system_prompt = "You are an expert in extracting data from radiology reports with 20 years of experience."
test2_user_prompt = """
Extract data elements from the MR guided ablation report in the <report> tag:

    Guidelines:
    - Focus on findings at the time of scan, not previous ones.
    - If information is not mentioned, use 'Not Mentioned'.
    - Ignore irrelevant information.
    - Use only the provided output format.
    - Expand abbreviations: sv (seminal vesicle), uvj (urethro vesicular junction), vuj (vesico urethral junction), VM (vascular malformation), US (ultrasound), LN (lymph node), CT (computed tomography), MRI (magnetic resonance imaging).
    """

# Loading a sample report from the yaml file
sample_report = yaml['sample_report']

In [36]:
# Change the client mode to only accept json objects
client = instructor.from_openai(
    OpenAI(
        base_url=f"{HOST}/v1",
        api_key=API_KEY,  # required, but unused
    ),
    # mode: for more information: https://jxnl.github.io/instructor/concepts/patching/
    mode=instructor.Mode.JSON,
)

In [37]:
# Let's ask the model

# Creating the conversation for the model to pass report and instructions
messages = [
        {"role": "system", "content": test2_system_prompt},
        {"role": "user", "content": f"{test2_user_prompt} \n <report> {sample_report} </report>"}
    ]


# Asking the model to extract the requested information
resp = client.chat.completions.create(
        model=MODEL,
        response_model=TestModel,
        messages=messages,
        temperature=TEMP,
        seed=SEED,
        max_retries=MAX_RETRIES,
        ) #type: ignore


rich.print('Output:', resp.model_dump_json(indent=4))

Do you see the differences? By providing a structure to the 'response_model' parameter, we are able to extract the requested information from the text and present it in a structured format. Also, we reduce the input token size by removing the instructions for structured response.

#### Creating a Pydantic model based on the YAML file

Depending on the task, we can hardcode the response model or we can define a function that can create the response model based on a yaml file. Advantages of using a yaml file are that it is more human readable and is easier to share with colleagues.

In [38]:
# Creating a helper function to generate the response model from the yaml file
def create_pydantic_model_from_yaml(variables: list[dict], examples: dict):
    """
    Create a Pydantic model from the variables and examples in the yaml file.
    """
    field_definitions = {}
    examples = json.loads(examples)

    for var in variables:
        name = var['name']
        var_type = var['type']
        options = var.get('options')
        description = var['hint']

        if options:
            if var_type == "list":
                field_type = List[Literal[tuple(options)]]
            else:
                field_type = Literal[tuple(options)]
        else:
            if var_type == "str":
                field_type = str
            else:
                # Handle other types as needed
                field_type = Any

        # Create the field definition
        field_definitions[name] = (field_type, Field(description=description))

    # Create a config with json_schema_extra
    model_config = ConfigDict(json_schema_extra={"examples": examples})

    # Create the model using create_model
    ResponseClass = create_model(
        'ResponseClass',
        **field_definitions,
        __config__=model_config
    )
    return ResponseClass

Now, we can create the "Extraction" response model based on the yaml file and print it.

In [39]:
# Creating the response model
Extraction = create_pydantic_model_from_yaml(VARIABLES, EXAMPLES)

# Printing the model fields and examples
rich.print('Extraction Response Model Fields:', Extraction.model_fields)
rich.print('Extraction Response Model Example:', Extraction.model_config)

## Step 4: Prompting
We are almost ready to prompt the model and get the response in desired format. Let's load the `system prompt`  and `user prompt` from the YAML file.

In [41]:
# Loading variables from the yaml file
# ERROR:
SYSTEM_PROMPT = MESSAGES['system_prompt']
USER_PROMPT = MESSAGES['user_prompt']

To check if we stored the correct variables:

In [42]:
rich.print('System prompt:', SYSTEM_PROMPT)
rich.print('User prompt:', USER_PROMPT)

It is time to test prompt the LLM by using a the same real sample report and see what will happen:

In [43]:
sample_report = yaml['sample_report']

In [44]:
# Creating the conversation for the model to pass report and instructions
messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": f"{USER_PROMPT} \n <report> {sample_report} </report>"}
    ]

In [45]:
# Asking the model to extract the requested information
resp = client.chat.completions.create(
        model=MODEL,
        response_model=Extraction,
        messages=messages,
        temperature=TEMP,
        seed=SEED,
        max_retries=MAX_RETRIES,
        )


rich.print(resp.model_dump_json(indent=4))


## Step 5: Batch Processing

We have successfully tested our model and established a response structure. Often, we need to extract information from multiple reports and save it in a CSV file for further analysis. In this step, we will develop an engine that processes these reports and stores the results in an `output.csv` file.

In [None]:
# Creating a helper function to process a single report
def process_report(client, response_model, report_data, model, temperature, seed, max_retries, system_prompt, user_prompt):
    """
    Process a single report and return the combined data.
    """
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": f"{user_prompt} \n <report> {report_data['report']} </report>"}
    ]
    try:
        resp = client.chat.completions.create(
            model=model,
            response_model=response_model,
            messages=messages,
            temperature=temperature,
            seed=seed,
            max_retries=max_retries,
        )
        resp_dict = json.loads(resp.model_dump_json())
        return {**resp_dict, **report_data}
    except Exception as e:
        logging.error(f"An error occurred while processing report: {e}")
        return None

In [None]:
# Creating the main engine function to process multiple reports
def engine(
        input: List[Dict],
        output: str,
        log_file: str,
        response_model: BaseModel = Extraction,
        model: str = MODEL,
        temperature: float = TEMP,
        seed: int = SEED,
        max_retries: int = MAX_RETRIES,
        host: str = HOST,
        api_key: str = API_KEY,
        system_prompt: str = SYSTEM_PROMPT,
        user_prompt: str = USER_PROMPT,
        concurrency: int = config['concurrency']
) -> str:
    """
    Extracts information from the provided reports and stores the results in a CSV file.
    Processes up to 32 reports simultaneously and shows progress.
    Logs all variables, total time, and average time per report.
    """
    # Configure logging
    logging.basicConfig(filename=log_file, level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
    logging.info("Starting the engine function")
    logging.info(f"Parameters: model={model}, temperature={temperature}, seed={seed}, max_retries={max_retries}, host={host}, api_key={api_key}, \n system_prompt={system_prompt}, \n user_prompt={user_prompt}")

    start_time = time.time()

    # Initialize the OpenAI client
    client = instructor.from_openai(
        OpenAI(
            base_url=f"{host}/v1",
            api_key=api_key,
        ),
        mode=instructor.Mode.JSON,
    )

    # Check if the output exists
    if os.path.exists(output):
        logging.warning(f"Output file {output} already exists. The file content will be REPLACED.")

    # Open the CSV file for writing
    with open(output, 'w', newline='', encoding='utf-8') as csvfile:
        csvwriter = None

        # Create a thread pool with a higher number of workers
        max_workers = min(concurrency, len(input))  # Use up to 32 workers or the number of reports, whichever is smaller
        with ThreadPoolExecutor(max_workers=max_workers) as executor:
            # Initialize a progress bar
            with tqdm(total=len(input), desc="Processing Reports", unit="report") as pbar:
                # Submit tasks for each report
                future_to_report = {executor.submit(process_report, client, response_model, report_data, model, temperature, seed, max_retries, system_prompt, user_prompt): report_data for report_data in input}

                for future in as_completed(future_to_report):
                    report_data = future_to_report[future]
                    try:
                        combined_data = future.result()
                        if combined_data:
                            # Initialize the CSV writer and write the header if it's the first row
                            if csvwriter is None:
                                csvwriter = csv.DictWriter(csvfile, fieldnames=combined_data.keys())
                                csvwriter.writeheader()

                            # Write the combined data to the CSV file
                            csvwriter.writerow(combined_data)

                        # Update the progress bar
                        pbar.update(1)
                    except Exception as e:
                        logging.error(f"An error occurred while processing report: {e}")

    end_time = time.time()
    total_time = end_time - start_time
    average_time_per_report = total_time / len(input) if input else 0

    # Log the total and average time
    logging.info(f"Total processing time: {total_time:.2f} seconds")
    logging.info(f"Average time per report: {average_time_per_report:.2f} seconds")

    return f"Data has been successfully written to {output}"

Since we have the engine, we can move to load our data:

In [None]:
import pandas as pd
# Load the Excel file with the reports
df = pd.read_excel("reports.xlsx")

# Extract 'Radiology Report' column and accession reports
reports = [{"report": row['Radiology Report'], "Accession Number": row['Accession Number']} for index, row in df.iterrows()]

# Process the reports using the engine function and save the output to a CSV file
output_file = "output.csv"
log_file = "log.log"

In [None]:
if __name__ == "__main__":
    engine(
        input=sample_report,
        output=output_file,
        log_file=log_file,
    )

Authors:
Ali Ganjizadeh, Bradley J. Erickson

This notebook is a part of [MIDel.org](http://midel.org/). `MIDeL` is a website to help healthcare professionals and medical imaging scientists learn to apply deep learning methods to medical images.