# Running LLMs Locally with Docker Model Runner and Python



One of the great features of Docker Model Runner is its compatibility with the OpenAI API SDKs. This makes it easy to adapt existing code that uses the OpenAI API to work with DMR and interact with locally running LLMs. In this tutorial, we'll focus on the Python SDK, though the same approach applies to other OpenAI SDKs like JavaScript, Java, Go, .NET, and more.

DMR runs as a standalone server so that you can connect to it from both containerized environments and regular local Python environments.

<figure>
 <img src="assets/dmr_diagram.png" width="100%" align="center"/></a>
</figure>


This notebook focuses on running LLMs locally with Python using:
- OpenAI API SDK
- LangChain (OpenAI client)

## The OpenAI API SDK

Let's start by loading the `openai` library

In [36]:
import openai

Next, we’ll define the client using the `OpenAI` function. To connect to the DMR server, we need to set the `base_url` parameter. The value of this URL depends on whether you're running the code inside a container or from your local environment.

If you're running the code inside a container, use the following:

In [37]:
base_url= "http://model-runner.docker.internal/engines/v1"

If you're running it locally (outside of a container), use:

In [38]:
# base_url = "http://localhost:12434/engines/v1"

> Note that the localhost should point to the TCP port, in this case 12434.

Let's initialize the OpenAI client:

In [39]:
client = openai.OpenAI(
  base_url = base_url,
  api_key = "docker"
)

Next, let's set the `chat.completions` method:

In [40]:
completion = client.chat.completions.create(
    model="ai/llama3.2:latest", 
    messages=[
        {"role": "system", "content": "You are a helpful AI assistant."},
        {"role": "user", "content": "Tell me a Docker joke"}
        ],
)

The prompt is send to the local LLM and return the response:

In [47]:
print(completion.choices[0].message.content)

Here's one:

Why did Docker go to therapy?

Because it was struggling to container-ize its emotions!

(Sorry, I know it's a bit of a "docker"-ious pun, but I hope it made you smile!)


## Using DMR with LangChain

Likewise, we can use the LangChain's OpenAI SDK wrapper to call local LLMs with DRM. Let's start by import the required LangChain methods:

In [42]:
from langchain_openai import ChatOpenAI

from langchain_core.messages import (
  SystemMessage,
  HumanMessage
)

Next, we will set the `ChatOpenAI` method with the DMR parameters:

In [43]:
llm = ChatOpenAI(    
    base_url="http://model-runner.docker.internal/engines/v1",
    api_key="docker", 
    temperature=0, 
    model = "ai/llama3.2:latest")

We use the LangChain's built-in prompts functionality:

In [44]:
prompt = [
  SystemMessage(
  """"
  You are an AI assistant that helps users with their questions. Your responses should be helpful and informative.
  """
  ),
  HumanMessage("What is the capital of United States of America?")
]

And send the prompt:

In [45]:
result = llm.invoke(prompt)

Let's print the results:

In [46]:
print(result.content)

The capital of the United States of America is Washington, D.C. (short for District of Columbia).
