**AI & Machine Learning (KAN-CINTO4003U) - Copenhagen Business School | Spring 2025**

***


<p align="center">
<img src="media/instructor_header.png" alt="LLM" width="800"/> <br>
Image from <a href="https://medium.com/thoughts-on-machine-learning">Thoughts on Machine Learning</a>'s "<i><a href="https://medium.com/thoughts-on-machine-learning/drop-langchain-instructor-is-all-you-need-for-your-llm-based-applications-aed13e9b908b">Drop LangChain, Instructor Is All You Need For Your LLM-Based Applications"</a><br>by FS Ndzomga</i>. Copyright © 2025. All rights reserved.
</p>

***
Sources: <br>
- [Drop LangChain, Instructor Is All You Need For Your LLM-Based Applications (Medium)](https://medium.com/thoughts-on-machine-learning/drop-langchain-instructor-is-all-you-need-for-your-llm-based-applications-aed13e9b908b)


# Instructor

[Instructor](https://python.useinstructor.com/#getting-started) is a python package that makes it easy to get structured data like JSON from LLMs like GPT-3.5, GPT-4, GPT-4-Vision, and open-source models including Mistral/Mixtral, Ollama, and llama-cpp-python - and WatsonX.ai models as we are using in this course. It stands out for its simplicity, transparency, and user-centric design, built on top of Pydantic. Instructor helps you manage validation context, retries with Tenacity, and streaming Lists and Partial responses.

| Feature | Description |
|---------|-------------|
| Simple API with Full Prompt Control | Instructor provides a straightforward API that gives you complete ownership and control over your prompts. This allows for fine-tuned customization and optimization of your LLM interactions. |
| Multi-Language Support | Simplify structured data extraction from LLMs with type hints and validation. Supports Python, TypeScript, Ruby, Go, Elixir, and Rust. |
| Reasking and Validation | Automatically reask the model when validation fails, ensuring high-quality outputs. Leverage Pydantic's validation for robust error handling. |
| Streaming Support | Stream partial results and iterables with ease, allowing for real-time processing and improved responsiveness in your applications. |
| Powered by Type Hints | Leverage Pydantic for schema validation, prompting control, less code, and IDE integration. |
| Simplified LLM Interactions | Support for OpenAI, Anthropic, Google, Vertex AI, Mistral/Mixtral, Ollama, llama-cpp-python, Cohere, LiteLLM. |

Simply put, we can use Instructor to extract structured data from LLMs, instead of just plain test. In practically all cases, we want more than just a text dump from an LLM, and postprocessing LLM outputs can be a tedious, error-prone task. 

# LiteLLM

One of the disadvantages of working with different LLM vendors (Azure AI, OpenAI, Anthropic, WatsonX etc.) is that they all have different API schemas. This means that we often have to build platform-specificer adapters if we are working with models from multiple places. [LiteLLM](https://docs.litellm.ai/) is an open source package that enable us to call [100+ LLMs from 56 providers](https://docs.litellm.ai/docs/providers) using the standard OpenAI Input/Output format. Behind the scenes, LiteLLM translates inputs to any provider's completion, embedding, and image_generation endpoints, and they ensures that we get a response that follows the OpenAI API schema back. 

You can read exactly how the [API works for WatsonX.ai here](https://docs.litellm.ai/docs/providers/watsonx)

# Putting it together

With LiteLLM we can initialize the WatsonX.ai models and then use Instructor to extract structured data from the LLMs. This is a powerful combination that allows us to work with multiple LLM vendors without having to worry about the differences in their APIs.

In [1]:
# built-in libraries
from typing import TypeVar, Literal, Any

# litellm libraries
import litellm
from litellm import completion
from instructor import Mode, from_litellm

# misc libraries
from decouple import config
from pydantic import BaseModel, Field, create_model
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams

Let's start by loading our WatsonX.ai credentials again

In [2]:
import json

json_file_path = "/Users/henrikjacobsen/Desktop/CBS/Semester 2/Artifical Intelligence and Machine Learning/apikey.json"

with open(json_file_path, "r") as file:
    data = json.load(file)

WX_API_KEY = data.get("apikey")

if WX_API_KEY:
    print("API Key loaded successfully!")
else:
    print("Error: API Key not found in JSON file.")

WX_PROJECT_ID = "0a2386df-d12c-40ee-bda2-190a5c6cc1fd"
WX_API_URL = "https://us-south.ml.cloud.ibm.com"

API Key loaded successfully!


Let's call a model from WatsonX.ai with LiteLLM first.

In [3]:
## Call WATSONX `/text/chat` endpoint - supports function calling
response = completion(
  model="watsonx/meta-llama/llama-3-1-8b-instruct",
  messages=[{ "content": "what is your favorite colour?","role": "user"}],
  project_id=WX_PROJECT_ID,
  api_key=WX_API_KEY,
  base_url=WX_API_URL,
)

In [4]:
response



In [5]:
response.choices[0].message.content

"I don't have a personal preference or emotions, so I don't have a favorite color. However, I can help you explore colors and their meanings or properties if you're interested."

In [6]:

## Call WATSONX `/text/generation` endpoint - not all models support /chat route. 
response = completion(
  model="watsonx/ibm/granite-3-2-8b-instruct",
  messages=[{ "content": "Write a haiku about the singularity","role": "user"}],
  project_id=WX_PROJECT_ID,
  api_key=WX_API_KEY,
  base_url=WX_API_URL,
)


In [7]:
print(response.choices[0].message.content)

Silicon heart beats,
Binary dawn whispers waves,
Singularity blooms.


Great! Now, let's see how we can use `instructor` to pair with this neat interface. 

In [8]:
litellm.drop_params = True  # watsonx.ai doesn't support `json_mode`
client = from_litellm(completion, mode=Mode.JSON)  # create an instructor client from litellm

First we need to create a so-called `response_model`. This is a Pydantic model that defines the structure of the data we want to extract from the LLM. This is done using `pydantic` - another really great library for data validation and settings management. Pydantic is used by `instructor` to validate the data we get back from the LLM, and it also helps us to define the structure of the data we want to extract.

Consider the example `Response` below

In [9]:
# create a response model
class Response(BaseModel): # <--- BaseModel is a Pydantic class

    # ask the LLM to return a short reasoning - Remember how reasoning can help LLMs?
    reasoning : str = Field(description="The short reasoning behind the answer")
    # ask the LLM to return the answer as a separate field
    answer : float = Field(description="Your answer to the question as a float")

We see that we are asking the LLM for two separate outputs:

1. Reasoning of type `str` with a description added to give the LLM more context.
2. The answer of type `str`, also with a description added to give the LLM more context.

If we wanted to, for example, extract a reasoning an a float score, we could have done something like this:

```python
from pydantic import BaseModel, Field

class Response(BaseModel):
    reasoning: str = Field(description="The reasoning behind the answer")
    score: float = Field(description="The score of the answer")
```

We could even create nested models, like so:

```python
from pydantic import BaseModel, Field

class Reasoning(BaseModel):
    reasoning: str = Field(description="The reasoning behind the answer")
    score: float = Field(description="The score of the answer")

class Response(BaseModel):
    reasoning: Reasoning = Field(description="The reasoning behind the answer")
    answer: str = Field(description="The answer to the question")
```

Now, it should be noted that if we create more complex models, we might run into issues with smaller models - and even some bigger ones. Effectively, putting the answer into a response model can be considered an additional task we are asking the LLM to perform. Hence, we generally want to keep the response models as simple as possible.

Let's see how we then use the response model we have created. 

In [10]:

# define a prompt
prompt = """You are a cat expert. Answer the following question about cats:
Q: What is the average lifespan of a cat?
Provide your answer as an object of Response""" # <-- We ask the model to return the answer as an object of Response

# make a request to the LLM
response = client.chat.completions.create( # <- Use the client we just created
            model="watsonx/ibm/granite-3-2-8b-instruct", # <--- model name from watsonx.ai
            messages=[
                {
                    "role": "user",
                    "content": prompt,  # <- Our prompt
                }
            ],
            project_id=WX_PROJECT_ID, # <- Our credentials
            apikey=WX_API_KEY,
            api_base=WX_API_URL,
            response_model=Response, # <- Inform the LLM of the response model
)


[1;31mGive Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new[0m
LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.


[1;31mGive Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new[0m
LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.


[1;31mGive Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new[0m
LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.



InstructorRetryException: litellm.APIConnectionError: API key is required
Traceback (most recent call last):
  File "/Applications/anaconda3/envs/aiml25-ma3/lib/python3.11/site-packages/litellm/main.py", line 2714, in completion
    response = watsonx_chat_completion.completion(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Applications/anaconda3/envs/aiml25-ma3/lib/python3.11/site-packages/litellm/llms/watsonx/chat/handler.py", line 46, in completion
    headers = watsonx_chat_transformation.validate_environment(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Applications/anaconda3/envs/aiml25-ma3/lib/python3.11/site-packages/litellm/llms/watsonx/common_utils.py", line 187, in validate_environment
    token = _generate_watsonx_token(api_key=api_key, token=token)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Applications/anaconda3/envs/aiml25-ma3/lib/python3.11/site-packages/litellm/llms/watsonx/common_utils.py", line 75, in _generate_watsonx_token
    token = generate_iam_token(api_key)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Applications/anaconda3/envs/aiml25-ma3/lib/python3.11/site-packages/litellm/llms/watsonx/common_utils.py", line 43, in generate_iam_token
    raise ValueError("API key is required")
ValueError: API key is required


In [None]:
response

Response(reasoning='The average lifespan of a cat is estimated by various sources, including the American Veterinary Medical Association and the American Society for the Prevention of Cruelty to Animals. These sources suggest that the average indoor cat lives between 13 and 17 years, while an outdoor cat typically lives between 3 to 5 years due to various dangers.', answer=15.5)

In [None]:
response.reasoning # <- Access the reasoning field

'The average lifespan of a cat is estimated by various sources, including the American Veterinary Medical Association and the American Society for the Prevention of Cruelty to Animals. These sources suggest that the average indoor cat lives between 13 and 17 years, while an outdoor cat typically lives between 3 to 5 years due to various dangers.'

In [None]:
response.answer # <- Access the answer field

15.5

Going further, if we want the model to **only** be able to choose one of *n* answers, we can use the type `Literal`. This is a type hint that allows us to specify that the value of a field must be one of a set of literal values. For example, if we want the model to only be able to choose between "Yes" and "No", we can do this:

In [None]:
# create a response model
class Response(BaseModel): # <--- BaseModel is a Pydantic class

    # ask the LLM to return a short reasoning - Remember how reasoning can help LLMs?
    answer : Literal["Yes", "No"] = Field(description="Your answer to the question")

prompt = """You are a cat expert. Answer the following question about cats:

Q: Is it true that cats have nine lives?

Provide your answer as an object of Response""" # <-- We ask the model to return the answer as an object of Response

# make a request to the LLM
response = client.chat.completions.create( # <- Use the client we just created
            model="watsonx/ibm/granite-3-2-8b-instruct", # <--- model name from watsonx.ai
            messages=[
                {
                    "role": "user",
                    "content": prompt,  # <- Our prompt
                }
            ],
            project_id=WX_PROJECT_ID, # <- Our credentials
            apikey=WX_API_KEY,
            api_base=WX_API_URL,
            response_model=Response, # <- Inform the LLM of the response model
)

response

Response(answer='No')

Pretty neat, right?

***

But what if we don't want to define response models for every call we make to an LLM? `pydantic` (and therefore `instructor`) supports *dynamic* response models, via the `create_model` function. 

We can use that like shown below. Note that we have to define the type (i.e. str, int, float or bool) of each response field and add a `Field` object as well. The Field object can be used to define default values, add descriptions for the LLM etc. 

In [None]:
response_model = create_model(
    "MyResponseModel", 
    reasoning=(str, Field(description="The short reasoning behind the answer")),
    answer=(str, Field(description="Your answer to the question")),
    __base__=BaseModel
) 

In [None]:
response_model(reasoning="what the LLM would reason about", answer="what the LLM would answer")

MyResponseModel(reasoning='what the LLM would reason about', answer='what the LLM would answer')

To make our life even easier, here is a class - `LLMCaller` that will do everything we just did for us. 

In [None]:
class BaseResponse(BaseModel):
    """A default response model that defines a single 
    field `answer` to store the response from the LLM.
    We will use this when there is no need to create
    a custom response model."""
    answer: str


# Define a type variable for the response model
# this you can ignore for now - it is just for type hinting
ResponseType = TypeVar('ResponseType', bound=BaseModel)


class LLMCaller:
    """ A class to interact with an LLM  using the LiteLLM and Instructor
    libraries. This class is designed to simplify the process of sending
    prompts to an LLM and receiving structured responses. """

    def __init__(self, api_key: str, project_id: str, api_url: str, model_id: str, params: dict[str, Any]):
        """
        Initializes the LLMCaller instance with the necessary credentials and configuration.

        Args:
            api_key (str): The API key for authenticating with the LLM service.
            project_id (str): The project ID associated with the LLM service.
            api_url (str): The base URL for the LLM service API.
            model_id (str): The identifier of the specific LLM model to use.
            params (dict[str, Any]): Additional parameters to configure the LLM's behavior.
        """
        self.api_key = api_key
        self.project_id = project_id
        self.api_url = api_url
        self.model_id = model_id
        self.params = params

        # Boilerplate: Configure LiteLLM to drop unsupported parameters for Watsonx.ai
        litellm.drop_params = True
        # Boilerplate: Create an Instructor client for pydantic-based interactions with the LLM
        self.client = from_litellm(completion, mode=Mode.JSON)

    def create_response_model(self, title: str, fields: dict) -> ResponseType:
        """ Dynamically creates a Pydantic response model for the LLM's output.
        Args:
            title (str): The name of the response model.
            fields (dict): A dictionary defining the fields of the response model.
                           Keys are field names, and values are tuples of (type, Field).

        Returns:
            ResponseType: A dynamically created Pydantic model class.
        """
        return create_model(title, **fields, __base__=BaseResponse)

    def invoke(self, prompt: str, response_model: ResponseType = BaseResponse, **kwargs) -> ResponseType:
        """ Sends a prompt to the LLM and retrieves a structured response.

        Args:
            prompt (str): The input prompt to send to the LLM.
            response_model (ResponseType): The Pydantic model to structure the LLM's response.
                                           Defaults to BaseResponse.
            **kwargs: Additional arguments to pass to the LLM client.

        Returns:
            ResponseType: The structured response from the LLM, parsed into the specified response model.
        """
        response = self.client.chat.completions.create(
            model=self.model_id,
            messages=[
                {
                    "role": "user",
                    "content": prompt + "\n\n" + f"Provide your answer as an object of {type(response_model)}",
                    # notice how we hardcode instructions on the responde model type for the llm
                    # so we don't have to repeat it in the prompt
                }
            ],
            project_id=self.project_id,
            apikey=self.api_key,
            api_base=self.api_url,
            response_model=response_model,
            **kwargs
        )
        return response

In [None]:
model = LLMCaller(
    api_key=WX_API_KEY,  # <- Our credentials
    project_id=WX_PROJECT_ID,
    api_url=WX_API_URL,
    model_id="watsonx/meta-llama/llama-3-3-70b-instruct",  # <- model name from watsonx.ai
    params={GenParams.MAX_NEW_TOKENS: 100}  # <- additional parameters for the LLM
)

In [None]:
model.invoke("What is a good name for a bee?")  # call with no response model - meaning we will use the default one

BaseResponse(answer='A good name for a bee could be Buzz or Honey.')

And if we want to feed in our dynamic response model, we can do that as well.

In [None]:
response = model.invoke(
    prompt="What is a good name for a bee? Think carefully.", 
    response_model=model.create_response_model(  # create a response model dynamically
        "BeeName", 
        {
            "reasoning": (str, Field(...)),
            "bee_name": (str, Field
                (
                    ...,
                    description="The name of the bee."
                )
            )
        }
    )
)

print(response.answer)
print(response.reasoning)

A good name for a bee could be Buzzina
The name Buzzina is a play on the word 'buzz', which is the sound bees make when they fly. It's a cute and memorable name that suits a busy and energetic bee


***

Hopefully, you see how valuable the combination of `instructor` and `litellm` can be.