Skip to content

Proxy requests from ollama format to openai format, useful when clients only support ollama. this will act as a "fake ollama server" and forward your requests in openai compatible format.

Notifications You must be signed in to change notification settings

regismesquita/ollama_proxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

Ollama-OpenAI Proxy

This is a Go-based proxy server that enables applications designed to work with the Ollama API to interact seamlessly with an OpenAI-compatible endpoint. It translates and forwards requests and responses between the two APIs while applying custom transformations to the model names and data formats.

Note: This is a pet project I use to forward requests to LiteLLM for use with Kerlig and Enchanted, which doesn’t support custom OpenAI endpoints. As this is a personal project, there might be issues and rough edges. Contributions and feedback are welcome!

Features

Endpoint Proxying:

  • /v1/models & /v1/completions: These endpoints are forwarded directly to the downstream OpenAI-compatible server.
  • /api/tags: Queries the downstream /v1/models endpoint, transforms the response into the Ollama-style model list, and appends :proxy to model names if they don’t already contain a colon.
  • /api/chat: Rewrites the request to the downstream /v1/chat/completions endpoint. It intercepts and transforms streaming NDJSON responses from the OpenAI format into the expected Ollama format, including stripping any trailing :proxy from model names.
  • /api/pull and other unknown endpoints: Forwarded to a local Ollama instance running on 127.0.0.1:11505.

Debug Logging: When running in debug mode, the proxy logs:

  • Every incoming request.
  • The outgoing /api/chat payload.
  • Raw downstream streaming chunks and their transformed equivalents.

Model Name Handling:

  • For /api/tags, if a model ID does not contain a colon, the proxy appends :proxy to the name.
  • For other endpoints, any :proxy suffix in model names is stripped before forwarding.

Prerequisites

  • Go 1.18+ installed.
  • An OpenAI-compatible server endpoint (e.g., running on http://127.0.0.1:4000).
  • (Optional) A local Ollama instance running on 127.0.0.1:11505 for endpoints not handled by the downstream server.

Installation

Clone this repository:

git clone https://github.com/regismesquita/ollama_proxy.git
cd ollama_proxy

Build the project:

go build -o proxy-server ollama_proxy.go

Usage

Run the proxy server with the desired flags:

./proxy-server --listen=":11434" --target="http://127.0.0.1:4000/v1" --api-key="YOUR_API_KEY" --debug

Command-Line Flags

  • --listen: The address and port the proxy server listens on (default :11434).
  • --target: The base URL of the OpenAI-compatible downstream server (e.g., http://127.0.0.1:4000).
  • --api-key: (Optional) The API key for the downstream server.
  • --debug: Enable detailed debug logging for every request and response.

How It Works

  1. Request Routing: The proxy intercepts requests and routes them based on the endpoint:

    • Requests to /v1/models and /v1/completions are forwarded directly.
    • Requests to /api/tags are handled locally by querying /v1/models on the downstream, transforming the JSON response, and appending :proxy where needed.
    • Requests to /api/chat are rewritten to /v1/chat/completions, with the payload and response processed to strip or add the :proxy suffix as appropriate.
    • All other endpoints are forwarded to the local Ollama instance.
  2. Response Transformation: Streaming responses from the downstream /v1/chat/completions endpoint (in NDJSON format) are read line by line. Each chunk is parsed, transformed into the Ollama format, and streamed back to the client.

  3. Logging: With debug mode enabled, detailed logs of incoming requests, outgoing payloads, and both raw and transformed response chunks are printed.

Contributing

Contributions are welcome! As this is a pet project, there may be rough edges and issues. Please feel free to open issues or submit pull requests for improvements and bug fixes.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

Proxy requests from ollama format to openai format, useful when clients only support ollama. this will act as a "fake ollama server" and forward your requests in openai compatible format.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages