Ollama API-Compatible Server

This project provides an Ollama API-compatible server that uses the llama-cpp-python library to run local LLM inference. It allows you to use your own GGUF models with an API that's compatible with Ollama's endpoints, making it easy to integrate with existing tools and applications designed to work with Ollama.

Features

Ollama API Compatibility: Implements endpoints that match Ollama's API structure
Local Inference: Uses llama-cpp-python to run inference on local GGUF model files
Model Management: Implements a model caching system to avoid reloading models
Configurable Parameters: Supports various inference parameters (temperature, max tokens, etc.)
Web UI: Includes a simple web interface for chatting with the model and generating text

Requirements

Python 3.9+ with SSL support
A GGUF model file (e.g., Llama 3.2)

Setup Instructions

Mac

Install Python with SSL support:

Using Homebrew (recommended):
```
brew install python
```
Or download from the official Python website.

Create a virtual environment:

python3 -m venv venv
source venv/bin/activate

Install required packages:

pip install fastapi uvicorn llama-cpp-python pydantic

Update the model path:

Edit the ollama-api-compatible.py file and update the MODEL_PATH variable to point to your GGUF model file:
```
MODEL_PATH = "/path/to/your/model.gguf"
```
Run the server:
```
python ollama-api-compatible.py
```
The server will run on http://127.0.0.1:11435 by default.

Windows

Install Python with SSL support:

Download and install Python from the official Python website.

During installation, make sure to check the box that says "Add Python to PATH".

Create a virtual environment:

python -m venv venv
venv\Scripts\activate

Install required packages:

pip install fastapi uvicorn llama-cpp-python pydantic

Update the model path:

Edit the ollama-api-compatible.py file and update the MODEL_PATH variable to point to your GGUF model file:
```
MODEL_PATH = "C:\\path\\to\\your\\model.gguf"
```
Run the server:
```
python ollama-api-compatible.py
```
The server will run on http://127.0.0.1:11435 by default.

API Endpoints

The server implements the following Ollama-compatible endpoints:

GET /api/tags: List available models
POST /api/generate: Generate text completions
POST /api/chat: Handle chat-based interactions
GET /api/version: Get the server version

Web UI

The project includes a simple web interface for interacting with your models:

Access: Simply navigate to http://localhost:11435/ in your browser after starting the server
Features:
- Chat Tab: Have conversational interactions with the model
- Generate Tab: Create text completions with adjustable parameters
- Model Info: View information about the loaded model

The web UI automatically connects to the API endpoints and provides a user-friendly way to interact with your models without needing to use command-line tools or write code.

Usage Examples

Checking Available Models

curl -s http://localhost:11435/api/tags

Generating Text

curl -s -X POST http://localhost:11435/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "prompt": "Hello, how are you today?",
    "stream": false
  }'

Chat Interaction

curl -s -X POST http://localhost:11435/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [
      {"role": "user", "content": "What are the three laws of robotics?"}
    ],
    "stream": false
  }'

Using in Your Code

Python Example

import requests
import json

# For text generation
def generate_text(prompt, model="llama3.2"):
    response = requests.post(
        "http://localhost:11435/api/generate",
        json={
            "model": model,
            "prompt": prompt,
            "stream": False
        }
    )
    return response.json()["response"]

# For chat interaction
def chat(messages, model="llama3.2"):
    response = requests.post(
        "http://localhost:11435/api/chat",
        json={
            "model": model,
            "messages": messages,
            "stream": False
        }
    )
    return response.json()["message"]["content"]

# Example usage
if __name__ == "__main__":
    # Text generation
    result = generate_text("Explain quantum computing in simple terms.")
    print(f"Generated text: {result}")
    
    # Chat interaction
    messages = [
        {"role": "user", "content": "What is the capital of France?"}
    ]
    result = chat(messages)
    print(f"Chat response: {result}")

JavaScript Example

// Using fetch API
async function generateText(prompt, model = "llama3.2") {
  const response = await fetch("http://localhost:11435/api/generate", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: model,
      prompt: prompt,
      stream: false,
    }),
  });
  
  const data = await response.json();
  return data.response;
}

async function chat(messages, model = "llama3.2") {
  const response = await fetch("http://localhost:11435/api/chat", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: model,
      messages: messages,
      stream: false,
    }),
  });
  
  const data = await response.json();
  return data.message.content;
}

// Example usage
async function example() {
  // Text generation
  const generatedText = await generateText("Write a short poem about coding.");
  console.log("Generated text:", generatedText);
  
  // Chat interaction
  const chatResponse = await chat([
    { role: "user", content: "Explain how to make a sandwich." }
  ]);
  console.log("Chat response:", chatResponse);
}

example();

Advanced Configuration

You can modify the following parameters in the code to customize the server:

Port: Change the port number in the uvicorn.run() call
Model Parameters: Adjust the n_ctx, n_gpu_layers, and other parameters in the get_or_load_model() function
Response Format: Customize the response format in the API endpoint handlers

Troubleshooting

SSL Issues with pip

If you encounter SSL errors when installing packages with pip, try:

pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org <package-name>

Model Loading Errors

Ensure the model path is correct and the file exists
Check that you have sufficient RAM to load the model
For large models, consider enabling GPU acceleration by setting n_gpu_layers to a higher value

License

This project is open source and available under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
static		static
.gitignore		.gitignore
MODEL_INSTRUCTIONS.md		MODEL_INSTRUCTIONS.md
README.md		README.md
ollama-api-compatible.py		ollama-api-compatible.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ollama API-Compatible Server

Features

Requirements

Setup Instructions

Mac

Windows

API Endpoints

Web UI

Usage Examples

Checking Available Models

Generating Text

Chat Interaction

Using in Your Code

Python Example

JavaScript Example

Advanced Configuration

Troubleshooting

SSL Issues with pip

Model Loading Errors

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ollama API-Compatible Server

Features

Requirements

Setup Instructions

Mac

Windows

API Endpoints

Web UI

Usage Examples

Checking Available Models

Generating Text

Chat Interaction

Using in Your Code

Python Example

JavaScript Example

Advanced Configuration

Troubleshooting

SSL Issues with pip

Model Loading Errors

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages