Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,22 +1,19 @@
---
title: Learn how to use Docker Model Runner in AI applications
title: Run AI models with Docker Model Runner

draft: true
cascade:
draft: true

minutes_to_complete: 45

who_is_this_for: This is for software developers and AI enthusiasts who want to run AI models using Docker Model Runner.
who_is_this_for: This is for software developers and AI enthusiasts who want to run pre-trained AI models locally using Docker Model Runner.

learning_objectives:
- Run AI models locally using Docker Model Runner.
- Easily build containerized applications with LLMs.
- Build containerized applications that integrate Large Language Models (LLMs).

prerequisites:
- A computer with at least 16GB of RAM (recommended) and Docker Desktop installed (version 4.40 or later).
- Basic understanding of Docker.
- Familiarity with Large Language Model (LLM) concepts.
- Docker Desktop (version 4.40 or later) installed on a system with at least 16GB of RAM (recommended).
- Basic understanding of Docker CLI and concepts.
- Familiarity with LLM concepts.

author: Jason Andrews

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,13 @@ weight: 3
layout: "learningpathall"
---

Docker Compose makes it easy to run multi-container applications. Docker Compose can also include AI models in your project.
Docker Compose makes it easy to run multi-container applications, and it can also include those that include local AI inference services.

In this section, you'll learn how to use Docker Compose to deploy a web-based AI chat application that uses Docker Model Runner as the backend for AI inference.
In this section, you'll use Docker Compose to deploy a simple web-based AI chat application. The frontend is a Flask web app, and the backend uses Docker Model Runner to serve AI responses.

## Clone the example project

The example project, named [docker-model-runner-chat](https://github.com/jasonrandrews/docker-model-runner-chat) is available on GitHub. It provides a simple web interface to interact with local AI models such as Llama 3.2 or Gemma 3.

First, clone the example repository:
Clone the [docker-model-runner-chat](https://github.com/jasonrandrews/docker-model-runner-chat) repository from GitHub. This project provides a simple web interface to interact with local AI models such as Llama 3.2 or Gemma 3.

```console
git clone https://github.com/jasonrandrews/docker-model-runner-chat.git
Expand All @@ -21,7 +19,7 @@ cd docker-model-runner-chat

## Review the Docker Compose file

The `compose.yaml` file defines how the application is deployed using Docker Compose.
The `compose.yaml` file defines defines how Docker Compose sets up and connects the services.

It sets up two services:

Expand Down Expand Up @@ -60,21 +58,21 @@ From the project directory, start the app with:
docker compose up --build
```

Docker Compose will build the web app image and start both services.
Docker Compose builds the web app image and starts both services.

## Access the chat interface

Open your browser and copy and paste the local URL below:
Once running, open your browser and copy-and-paste the local URL below:

```console
http://localhost:5000
```

You can now chat with the AI model using the web interface. Enter your prompt and view the response in real time.
You’ll see a simple chat UI. Enter a prompt and get real-time responses from the AI model.

![Compose #center](compose-app.png)
![Compose #center](compose-app.png "Docker Model Chat")

## Configuration
## Configure the model

You can change the AI model or endpoint by editing the `vars.env` file before starting the containers. The file contains environment variables used by the web application:

Expand All @@ -88,15 +86,20 @@ BASE_URL=http://model-runner.docker.internal/engines/v1/
MODEL=ai/gemma3
```

To use a different model, change the `MODEL` value. For example:
To use a different model or API endpoint, change the `MODEL` value. For example:

```console
MODEL=ai/llama3.2
```

Make sure to change the model in the `compose.yaml` file also.
Be sure to also update the model name in the `compose.yaml` under the `ai-runner` service.

## Optional: customize generation parameters

You can edit `app.py` to adjust parameters such as:

You can also change the `temperature` and `max_tokens` values in `app.py` to further customize the application.
* `temperature`: controls randomness (higher is more creative)
* `max_tokens`: controls the length of responses

## Stop the application

Expand All @@ -112,12 +115,13 @@ docker compose down

Use the steps below if you have any issues running the application:

- Ensure Docker and Docker Compose are installed and running
- Make sure port 5000 is not in use by another application
- Check logs with:
* Ensure Docker and Docker Compose are installed and running
* Make sure port 5000 is not in use by another application
* Check logs with:

```console
docker compose logs
```

## What you've learned
In this section, you learned how to use Docker Compose to run a containerized AI chat application with a web interface and local model inference from Docker Model Runner.
51 changes: 32 additions & 19 deletions content/learning-paths/laptops-and-desktops/docker-models/models.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,13 @@ weight: 2
layout: "learningpathall"
---

Docker Model Runner is an official Docker extension that allows you to run Large Language Models (LLMs) on your local computer. It provides a convenient way to deploy and use AI models across different environments, including Arm-based systems, without complex setup or cloud dependencies.
## Simplified Local LLM Inference

Docker Model Runner is an official Docker extension that allows you to run Large Language Models (LLMs) directly on your local computer. It provides a convenient way to deploy and use AI models across different environments, including Arm-based systems, without complex framework setup or cloud dependencies.

Docker uses [llama.cpp](https://github.com/ggml-org/llama.cpp), an open source C/C++ project developed by Georgi Gerganov that enables efficient LLM inference on a variety of hardware, but you do not need to download, build, or install any LLM frameworks.

Docker Model Runner provides a easy to use CLI that is familiar to Docker users.
Docker Model Runner provides a easy-to-use CLI interface that is familiar to Docker users.

## Before you begin

Expand All @@ -18,21 +20,21 @@ Verify Docker is running with:
docker version
```

You should see output showing your Docker version.
You should see your Docker version shown in the output.

Confirm the Docker Desktop version is 4.40 or above, for example:
Confirm that Docker Desktop is version 4.40 or above, for example:

```output
Server: Docker Desktop 4.41.2 (191736)
```

Make sure the Docker Model Runner is enabled.
Make sure the Docker Model Runner is enabled:

```console
docker model --help
```

You should see the usage message:
You should see this output:

```output
Usage: docker model COMMAND
Expand All @@ -52,27 +54,28 @@ Commands:
version Show the Docker Model Runner version
```

If Docker Model Runner is not enabled, enable it using the [Docker Model Runner documentation](https://docs.docker.com/model-runner/).
If Docker Model Runner is not enabled, enable it by following the [Docker Model Runner documentation](https://docs.docker.com/model-runner/).

You should also see the Models icon in your Docker Desktop sidebar.
You should also see the **Models** tab and icon appear in your Docker Desktop sidebar.

![Models #center](models-tab.png)
![Models #center](models-tab.png "Docker Models UI")

## Running your first AI model with Docker Model Runner
## Run your first AI model with Docker Model Runner

Docker Model Runner is an extension for Docker Desktop that simplifies running AI models locally.

Docker Model Runner automatically selects compatible model versions and optimizes performance for the Arm architecture.

You can try Docker Model Runner by using an LLM from Docker Hub.
You can try Model Runner by downloading and running a model from Docker Hub.

The example below uses the [SmolLM2 model](https://hub.docker.com/r/ai/smollm2), a compact language model with 360 million parameters, designed to run efficiently on-device while performing a wide range of language tasks. You can explore additional [models in Docker Hub](https://hub.docker.com/u/ai).
The example below uses the [SmolLM2 model](https://hub.docker.com/r/ai/smollm2), a compact LLM with ~360 million parameters, designed for efficient on-device inference while performing a wide range of language tasks. You can explore further models in [Docker Hub](https://hub.docker.com/u/ai).

Download the model using:
1. Download the model

```console
docker model pull ai/smollm2
```
2. Run the model interactively

For a simple chat interface, run the model:

Expand All @@ -96,10 +99,9 @@ int main() {
return 0;
}
```
To exit the chat, use the `/bye` command.

You can ask more questions and continue to chat.

To exit the chat use the `/bye` command.
3. View downloaded models

You can print the list of models on your computer using:

Expand All @@ -119,7 +121,9 @@ ai/llama3.2 3.21 B IQ2_XXS/Q4_K_M llama 436bb282b419 2 months ag

## Use the OpenAI endpoint to call the model

From your host computer you can access the model using the OpenAI endpoint and a TCP port.
Docker Model Runner exposes a REST endpoint compatible with OpenAI's API spec.

From your host computer, you can access the model using the OpenAI endpoint and a TCP port.

First, enable the TCP port to connect with the model:

Expand Down Expand Up @@ -155,7 +159,7 @@ Run the shell script:
bash ./curl-test.sh | jq
```

If you don't have `jq` installed, you eliminate piping the output.
If you don't have `jq` installed, you can eliminate piping the output.

The output, including the performance information, is shown below:

Expand Down Expand Up @@ -193,5 +197,14 @@ The output, including the performance information, is shown below:
}
}
```
You now have a fully functioning OpenAI-compatible inference endpoint running locally.

## What you've learned

In this section, you learned:

* How to verify and use Docker Model Runner on Docker Desktop
* How to run a model interactively from the CLI
* How to connect to a model using a local OpenAI-compatible API

In this section you learned how to run AI models using Docker Model Runner. Continue to see how to use Docker Compose to build an application with a built-in AI model.
In the next section, you'll use Docker Compose to deploy a web-based AI chat interface powered by Docker Model Runner.