# Running Models Locally with Docker Model Runner

Docker Model Runner (DMR) is a Docker Desktop feature that enables the run of Large Language Models natively with Docker Desktop. This feature follows the common Docker workflow:

- Pull models from registries (e.g., Docker Hub)
- Run models locally with GPU acceleration
- Integrate models into the development workflows

Of course, the LLMs' performance is derived from the model size and resources available locally.

>This feature is currently under Beta, and required Docker Engine (Linux) or Docker Desktop 4.40 and above for MacOS, and Docker Desktop 4.41 for Windows Docker Engine. For hardware requirements, please check the Docker Model Runner documentation

DMR runs as a standalone server so that you can connect to it from both containerized environments and regular local Python environments:

<figure>
 <img src="assets/docker model runner.png" width="85%" align="center"/></a>
<figcaption> Docker Model Runner Server </figcaption>
</figure>

DMR is compatible with the OpenAI API SDKs. This makes it easy to adapt existing code that uses the OpenAI API to work with DMR and interact with locally running LLMs. In this tutorial, we'll focus on the Python SDK, though the same approach applies to other OpenAI SDKs like JavaScript, Java, Go, .NET, and more.

As we demo before with the Google Gemini and Anthropic Cloud APIs, we will need  first to set the client by defining the base URL and API key.

By default, the DRM uses `docker` as the API key, and the base URL is set based off the calling methods:
- Local virtual environment
- Containerized environment

In the following example, we will download a Llama 3.2 model from Docker Hub, and use the OpenAI API SDK to interact with the model locally.

## Download LLM from Docker Hub

We will download the model from Docker Hub from the CLI using the `docker model pull` command. 

First, make sure that you are logged in to Docker Hub:

```shell
docker login
```

Next, we will confirm that DMR is active by using the `docker model status` command:

```shell
docker model status
```

The following output indicated that DMR is active and ready to use:
``` shell
Docker Model Runner is running

Status:
llama.cpp: running llama.cpp latest-metal (sha256:41df5190b7121f6509a278b8af657732f42b3715155893b5993ed4b28c53b92d) version: 82bf586
```

Next, we will pull the selected model from Docker Hub:

```shell
docker model pull ai/llama3.2:3B-Q4_0
```

You should expect the following output:

``` shell
Downloaded: 0.00 MB
Model pulled successfully
```

You can use the `docker model list` to see the model details:

``` shell
docker model list
```

This returns the following:
``` shell
MODEL NAME                                   PARAMETERS  QUANTIZATION    ARCHITECTURE  MODEL ID      CREATED        SIZE
ai/llama3.2:latest                           3.21 B      IQ2_XXS/Q4_K_M  llama         436bb282b419  5 months ago   1.87 GiB
```

> Note that DMR follows the Open Container Initiative (OCI) standards for model registry and supports the [GGUF](https://huggingface.co/docs/hub/en/gguf) file format for packaging models as OCI Artifacts. Therefore, you can download models that follow this format from Hugging Face.

Once the model was downloaded, we can start using it with Python.

## Using the OpenAI API Python SDK

Let's pivot to Python, and follow the same workflow as we did with the OpenAI, Google Gemini, and Anthropic Claude APIs using the OpenAI API Python SDK. 

Let's start by loading the `openai` library:

In [1]:
from openai import OpenAI

Next, we will set the client object. The base URL setting depends on whether you run your code in a containerized or local environment.

If you are running your code locally with virtual environment (i.e., not inside a container), you should enable the TCP and set a port by running the following from the CLI:

```shell
docker desktop enable model-runner --tcp=12434
```
And than use the following URL:
```
http://localhost:12434/engines/v1
```

Where the `12434` represents the port number that you have exposed.

Otherwise, if are using a containerized environment, you should use the following URL:

```
http://model-runner.docker.internal/engines/v1
```

In [2]:
# Case using a containerized environment:
base_url = "http://model-runner.docker.internal/engines/v1"

# Case running outside a container uncomment the code
# base_url = "http://localhost:12434/engines/v1"

In [3]:
client = OpenAI(base_url=base_url, api_key="docker")

We will use the same prompt as before:

In [4]:
content_prompt = """
I am working with a dataset that contains information about the Red30 company's product sales online.
I want to create a SQL query that calculates the total sales by state.
"""

Let's set the temperature and max tokens:

In [5]:
temperature = 0
max_tokens = 5000


Next, we will set the chat completions method:

In [6]:
response_llama = client.chat.completions.create(
    model="ai/llama3.2:latest",
    messages=[{"role": "user", "content": content_prompt}],
    temperature=temperature,
    max_tokens=max_tokens,
)


Let's parse the output:

In [7]:
print(response_llama.choices[0].message.content)


Here's an example SQL query that calculates the total sales by state for the Red30 company's product sales online.

```sql
-- Create a table for the data
CREATE TABLE sales_data (
    id INT PRIMARY KEY,
    state VARCHAR(255),
    product VARCHAR(255),
    sales DECIMAL(10, 2)
);

-- Insert sample data
INSERT INTO sales_data (id, state, product, sales)
VALUES
(1, 'California', 'Product A', 1000.00),
(2, 'California', 'Product B', 500.00),
(3, 'New York', 'Product A', 2000.00),
(4, 'New York', 'Product B', 1500.00),
(5, 'Florida', 'Product A', 3000.00),
(6, 'Florida', 'Product B', 2500.00);

-- SQL query to calculate total sales by state
SELECT 
    state,
    SUM(sales) AS total_sales
FROM 
    sales_data
GROUP BY 
    state;
```

This query will return a result set with the state and the total sales for each state.

Example output:

| state    | total_sales |
|----------|-------------|
| California| 2500.00     |
| New York  | 3500.00     |
| Florida   | 5500.00     |

This query use