-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Add RAG Ollama application guide to the documentation #20395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
a1c934a
447aa41
fa2cb77
3138e15
e4a0d77
8ab130a
9127fdb
a6e596a
43c013c
6f5a99d
5378dee
92bf895
f3c7cde
bd22b41
ad4757d
03926c0
36ea5d4
78e7b45
fe41f12
fc94aed
227551f
b0aca89
43b9dd2
1d99c9c
f6e87d2
40c12cf
86147cf
181c0c0
90d3357
b2ea91f
9b28afb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| --- | ||
| description: Containerize RAG application using Ollama and Docker | ||
| keywords: python, generative ai, genai, llm, ollama, rag, qdrant | ||
| title: Build a RAG application using Ollama and Docker | ||
| toc_min: 1 | ||
| toc_max: 2 | ||
| --- | ||
|
|
||
| The Retrieval Augmented Generation (RAG) guide teaches you how to containerize an existing RAG application using Docker. The example application is a RAG that acts like a sommelier, giving you the best pairings between wines and food. In this guide, you’ll learn how to: | ||
|
|
||
| * Containerize and run a RAG application | ||
| * Set up a local environment to run the complete RAG stack locally for development | ||
|
|
||
| Start by containerizing an existing RAG application. | ||
|
|
||
| {{< button text="Containerize a RAG app" url="containerize.md" >}} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,107 @@ | ||
| --- | ||
| title: Containerize a RAG application | ||
| keywords: python, generative ai, genai, llm, ollama, containerize, intitialize, qdrant | ||
| description: Learn how to containerize a RAG application. | ||
| --- | ||
|
|
||
|
|
||
| ## Overview | ||
|
|
||
| This section walks you through containerizing a RAG application using Docker. | ||
|
|
||
| > **Note** | ||
| > | ||
| > You can see more samples of containerized GenAI applications in the [GenAI Stack](https://github.com/docker/genai-stack) demo applications. | ||
|
|
||
| ## Get the sample application | ||
|
|
||
| The sample application used in this guide is an example of RAG application, made by three main components, which are the building blocks for every RAG application. A Large Language Model hosted somewhere, in this case it is hosted in a container and served via [Ollama](https://ollama.ai/). A vector database, [Qdrant](https://qdrant.tech/), to store the embeddings of local data, and a web application, using [Streamlit](https://streamlit.io/) to offer the best user experience to the user. | ||
|
|
||
| Clone the sample application. Open a terminal, change directory to a directory that you want to work in, and run the following command to clone the repository: | ||
|
|
||
| ```console | ||
| $ git clone https://github.com/mfranzon/winy.git | ||
| ``` | ||
|
|
||
| You should now have the following files in your `winy` directory. | ||
|
|
||
| ```text | ||
| ├── winy/ | ||
| │ ├── .gitignore | ||
| │ ├── app/ | ||
| │ │ ├── main.py | ||
| │ │ ├── Dockerfile | ||
| | | └── requirements.txt | ||
| │ ├── tools/ | ||
| │ │ ├── create_db.py | ||
| │ │ ├── create_embeddings.py | ||
| │ │ ├── requirements.txt | ||
| │ │ ├── test.py | ||
| | | └── download_model.sh | ||
| │ ├── docker-compose.yaml | ||
| │ ├── wine_database.db | ||
| │ ├── LICENSE | ||
| │ └── README.md | ||
| ``` | ||
|
|
||
| ## Containerizing your application: Essentials | ||
|
|
||
| Containerizing an application involves packaging it along with its dependencies into a container, which ensures consistency across different environments. Here’s what you need to containerize an app like Winy : | ||
|
|
||
| 1. Dockerfile: A Dockerfile that contains instructions on how to build a Docker image for your application. It specifies the base image, dependencies, configuration files, and the command to run your application. | ||
|
|
||
| 2. Docker Compose File: Docker Compose is a tool for defining and running multi-container Docker applications. A Compose file allows you to configure your application's services, networks, and volumes in a single file. | ||
|
|
||
| ## Run the application | ||
|
|
||
| Inside the `winy` directory, run the following command in a | ||
| terminal. | ||
|
|
||
| ```console | ||
| $ docker compose up --build | ||
| ``` | ||
|
|
||
| Docker builds and runs your application. Depending on your network connection, it may take several minutes to download all the dependencies. You'll see a message like the following in the terminal when the application is running. | ||
|
|
||
| ```console | ||
| server-1 | You can now view your Streamlit app in your browser. | ||
| server-1 | | ||
| server-1 | URL: http://0.0.0.0:8501 | ||
| server-1 | | ||
| ``` | ||
|
|
||
| Open a browser and view the application at [http://localhost:8501](http://localhost:8501). You should see a simple Streamlit application. | ||
|
|
||
| The application requires a Qdrant database service and an LLM service to work properly. If you have access to services that you ran outside of Docker, specify the connection information in the `docker-compose.yaml`. | ||
|
|
||
| ```yaml | ||
| winy: | ||
| build: | ||
| context: ./app | ||
| dockerfile: Dockerfile | ||
| environment: | ||
| - QDRANT_CLIENT=http://qdrant:6333 # Specifies the url for the qdrant database | ||
| - OLLAMA=http://ollama:11434 # Specifies the url for the ollama service | ||
| container_name: winy | ||
| ports: | ||
| - "8501:8501" | ||
| depends_on: | ||
| - qdrant | ||
| - ollama | ||
| ``` | ||
|
|
||
| If you don't have the services running, continue with this guide to learn how you can run some or all of these services with Docker. | ||
| Remember that the `ollama` service is empty; it doesn't have any model. For this reason you need to pull a model before starting to use the RAG application. All the instructions are in the following page. | ||
|
|
||
| In the terminal, press `ctrl`+`c` to stop the application. | ||
|
|
||
| ## Summary | ||
|
|
||
| In this section, you learned how you can containerize and run your RAG | ||
| application using Docker. | ||
|
|
||
| ## Next steps | ||
|
|
||
| In the next section, you'll learn how to properly configure the application with your preferred LLM model, completely locally, using Docker. | ||
|
|
||
| {{< button text="Develop your application" url="develop.md" >}} | ||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,159 @@ | ||||||
| --- | ||||||
| title: Use containers for RAG development | ||||||
| keywords: python, local, development, generative ai, genai, llm, rag, ollama | ||||||
| description: Learn how to develop your generative RAG application locally. | ||||||
| --- | ||||||
|
|
||||||
| ## Prerequisites | ||||||
|
|
||||||
| Complete [Containerize a RAG application](containerize.md). | ||||||
|
|
||||||
| ## Overview | ||||||
|
|
||||||
| In this section, you'll learn how to set up a development environment to access all the services that your generative RAG application needs. This includes: | ||||||
|
|
||||||
| - Adding a local database | ||||||
| - Adding a local or remote LLM service | ||||||
|
|
||||||
| > **Note** | ||||||
| > | ||||||
| > You can see more samples of containerized GenAI applications in the [GenAI Stack](https://github.com/docker/genai-stack) demo applications. | ||||||
|
|
||||||
| ## Add a local database | ||||||
|
|
||||||
| You can use containers to set up local services, like a database. In this section, you'll explore the database service in the `docker-compose.yaml` file. | ||||||
|
|
||||||
| To run the database service: | ||||||
|
|
||||||
| 1. In the cloned repository's directory, open the `docker-compose.yaml` file in an IDE or text editor. | ||||||
|
|
||||||
| 2. In the `docker-compose.yaml` file, you'll see the following: | ||||||
|
|
||||||
|
|
||||||
| ```yaml | ||||||
| services: | ||||||
| qdrant: | ||||||
| image: qdrant/qdrant | ||||||
| container_name: qdrant | ||||||
| ports: | ||||||
| - "6333:6333" | ||||||
| volumes: | ||||||
| - qdrant_data:/qdrant/storage | ||||||
| ``` | ||||||
|
|
||||||
| > **Note** | ||||||
| > | ||||||
| > To learn more about Qdrant, see the [Qdrant Official Docker Image](https://hub.docker.com/r/qdrant/qdrant). | ||||||
|
|
||||||
| 3. Run the application. Inside the `winy` directory, | ||||||
| run the following command in a terminal. | ||||||
|
|
||||||
| ```console | ||||||
| $ docker compose up --build | ||||||
| ``` | ||||||
|
|
||||||
| 4. Access the application. Open a browser and view the application at [http://localhost:8501](http://localhost:8501). You should see a simple Streamlit application. | ||||||
|
|
||||||
craig-osterhout marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| 5. Stop the application. In the terminal, press `ctrl`+`c` to stop the application. | ||||||
|
|
||||||
| ## Add a local or remote LLM service | ||||||
|
|
||||||
| The sample application supports both [Ollama](https://ollama.ai/). This guide provides instructions for the following scenarios: | ||||||
| - Run Ollama in a container | ||||||
| - Run Ollama outside of a container | ||||||
|
|
||||||
| While all platforms can use any of the previous scenarios, the performance and | ||||||
| GPU support may vary. You can use the following guidelines to help you choose the appropriate option: | ||||||
| - Run Ollama in a container if you're on Linux, and using a native installation of the Docker Engine, or Windows 10/11, and using Docker Desktop, you | ||||||
| have a CUDA-supported GPU, and your system has at least 8 GB of RAM. | ||||||
| - Run Ollama outside of a container if running Docker Desktop on a Linux Machine. | ||||||
|
|
||||||
| Choose one of the following options for your LLM service. | ||||||
|
|
||||||
| {{< tabs >}} | ||||||
| {{< tab name="Run Ollama in a container" >}} | ||||||
|
|
||||||
| When running Ollama in a container, you should have a CUDA-supported GPU. While you can run Ollama in a container without a supported GPU, the performance may not be acceptable. Only Linux and Windows 11 support GPU access to containers. | ||||||
|
|
||||||
| To run Ollama in a container and provide GPU access: | ||||||
| 1. Install the prerequisites. | ||||||
| - For Docker Engine on Linux, install the [NVIDIA Container Toolkilt](https://github.com/NVIDIA/nvidia-container-toolkit). | ||||||
| - For Docker Desktop on Windows 10/11, install the latest [NVIDIA driver](https://www.nvidia.com/Download/index.aspx) and make sure you are using the [WSL2 backend](../../../desktop/wsl/index.md/#turn-on-docker-desktop-wsl-2) | ||||||
| 2. The `docker-compose.yaml` file already contains the necessary instructions. In your own apps, you'll need to add the Ollama service in your `docker-compose.yaml`. The following is | ||||||
| the updated `docker-compose.yaml`: | ||||||
|
|
||||||
| ```yaml | ||||||
| ollama: | ||||||
| image: ollama/ollama | ||||||
| container_name: ollama | ||||||
| ports: | ||||||
| - "8000:8000" | ||||||
| deploy: | ||||||
| resources: | ||||||
| reservations: | ||||||
| devices: | ||||||
| - driver: nvidia | ||||||
| count: 1 | ||||||
| capabilities: [gpu] | ||||||
| ``` | ||||||
|
|
||||||
| > **Note** | ||||||
| > | ||||||
| > For more details about the Compose instructions, see [Turn on GPU access with Docker Compose](../../../compose/gpu-support.md). | ||||||
|
|
||||||
| 3. Once the Ollama container is up and running it is possible to use the `download_model.sh` inside the `tools` folder with this command: | ||||||
| ```console | ||||||
| . ./download_model.sh <model-name> | ||||||
craig-osterhout marked this conversation as resolved.
Show resolved
Hide resolved
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| ``` | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| Pulling an Ollama model can take several minutes. | ||||||
|
|
||||||
| {{< /tab >}} | ||||||
| {{< tab name="Run Ollama outside of a container" >}} | ||||||
|
|
||||||
| To run Ollama outside of a container: | ||||||
| 1. [Install](https://github.com/jmorganca/ollama) and run Ollama on your host | ||||||
| machine. | ||||||
| 2. Pull the model to Ollama using the following command. | ||||||
| ```console | ||||||
| $ ollama pull llama2 | ||||||
| ``` | ||||||
| 3. Remove the `ollama` service from the `docker-compose.yaml` and update properly the connection variables in `winy` service: | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| ```diff | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| - OLLAMA=http://ollama:11434 | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| + OLLAMA=<your-url> | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| ``` | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
craig-osterhout marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| {{< /tab >}} | ||||||
|
|
||||||
| {{< /tabs >}} | ||||||
|
|
||||||
| ## Run your RAG application | ||||||
|
|
||||||
| At this point, you have the following services in your Compose file: | ||||||
| - Server service for your main RAG application | ||||||
| - Database service to store vectors in a Qdrant database | ||||||
| - (optional) Ollama service to run the LLM | ||||||
| service | ||||||
|
|
||||||
|
|
||||||
| Once the application is running, open a browser and access the application at [http://localhost:8501](http://localhost:8501). | ||||||
|
|
||||||
| Depending on your system and the LLM service that you chose, it may take several | ||||||
| minutes to answer. | ||||||
|
|
||||||
| ## Summary | ||||||
|
|
||||||
| In this section, you learned how to set up a development environment to provide | ||||||
| access all the services that your GenAI application needs. | ||||||
|
|
||||||
| Related information: | ||||||
| - [Dockerfile reference](../../../reference/dockerfile.md) | ||||||
| - [Compose file reference](../../../compose/compose-file/_index.md) | ||||||
| - [Ollama Docker image](https://hub.docker.com/r/ollama/ollama) | ||||||
| - [GenAI Stack demo applications](https://github.com/docker/genai-stack) | ||||||
|
|
||||||
| ## Next steps | ||||||
|
|
||||||
| See samples of more GenAI applications in the [GenAI Stack demo applications](https://github.com/docker/genai-stack). | ||||||
|
|
||||||
Uh oh!
There was an error while loading. Please reload this page.