Ollama Serverless Container

Tested with Ollama Official Image v0.1.27.

What is this?
Usage
- Build Docker Image with pre-downloaded Model Artifact
- Download another model

What is this?

Ollama is a super easy tool to run LLM/GenAI model locally, with elegant installaion options, huge community ecosystem and official Docker Images built.

Regular installation will put a executable binary file into system, but we can also run Ollama as Docker Container. It's a two-step process:

Execute docker run -itd --name ollama ollama/ollama to launch Ollama backend, the default Entrypoint is /bin/ollama and Command is serve, which means it executes ollama serve to be ready to accept incoming requests
Execute docker exec -it ollama ollama run <Model Name:Version> to enter the interactive interface and start to inject prompts

Sounds good, but not enough. If we gonna host with some cloud services and turn Ollama into a serverless, it should

Launch the backend at the first moment
Accept a prompt argument then output the answer
Shutdown

So we can make inference like following pseudo code: docker run -it --rm --name ollama/ollama ollama run <Model Name:Version> "<Prompt>".

If you need the same function, this repository gives you a better way to use Ollama as Docker Container.

Usage

To build the image with "gemma:2b-instruct-q4_0" model for example:

./build.sh

# Which actually does this:
docker build -t gemma .

This will take some time since one of the building step is download the model artifact.

Inference Example:

./run.sh

# Which actually does this:
docker run -it --rm --name gemma gemma '/serve.sh "<Prompt>"'

The magic is in the serve.sh script that

Launches the Ollama backend in the background: nohup ollama serve
Sleep 1 second to make sure the backend is awaken: sleep 1
Make inference with "gemma:2b-instruct-q4_0" model and take the argument as Prompt: ollama run gemma:2b-instruct-q4_0 "<Prompt>"

Build Docker Image with pre-downloaded Model Artifact

Sometimes it's annoying to download the model artifact everytime when building Docker Image. Another approach is pre-downloaded it, and copy into Docker Image during the building process.

Create a directory to store the artifacts: mkdir ollama
Launch Ollama Container and mount the directory: docker run -it --rm -v $(pwd)/ollama/:/root/.ollama/ --name ollama ollama/ollama
Pull the Model artifact: docker exec -it ollama ollama pull <Model Name:Version>
Edit the Dockerfile: Remove RUN /bin/bash -c "/bin/ollama serve & sleep 1 && ollama pull <Model", replace it with COPY ["ollama/", "/root/.ollama/"]

Example Dockerfile:

FROM ollama/ollama:0.1.27

COPY ["ollama/", "/root/.ollama/"]

COPY ["serve.sh", "/serve.sh"]
RUN ["chmod", "+x", "/serve.sh"]

ENTRYPOINT ["/bin/bash", "-c"]

Download another model

Edit Line # of Dockerfile, change the name and version of the model. Take gemma:7b for example:

RUN /bin/bash -c "/bin/ollama serve & sleep 1 && ollama pull gemma:7b"

Reference: https://github.com/langchain4j/langchain4j/blob/200522f558509a67e940ae4c82284b85caaebef8/docker/ollama/llama2/Dockerfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ollama Serverless Container

What is this?

Usage

Build Docker Image with pre-downloaded Model Artifact

Download another model

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
build.sh		build.sh
run.sh		run.sh
serve.sh		serve.sh

VioletVivirand/ollama-serverless-container

Folders and files

Latest commit

History

Repository files navigation

Ollama Serverless Container

What is this?

Usage

Build Docker Image with pre-downloaded Model Artifact

Download another model

About

Topics

Resources

Stars

Watchers

Forks

Languages