Dynamic Hardware Selection for Cost Effective ML Inference

This proxy dynamically selects hardware resources (CPU/GPU/accelerators) for incoming workloads to maximize cost efficiency.

This proxy uses llama-server as the backend inference engine, and orchestrates the servers with docker containers.

Prerequisites

Docker installed and running
UV package manager

Installation

Install UV if you haven't already:

# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh

Install project dependencies:

# Install dependencies and create virtual environment
uv sync

Model Setup

Create a models directory in the project root:

mkdir -p models

Place your llama.cpp compatible model files (.gguf format) in the models directory:

models/
├── model1.gguf
├── model2.gguf
└── ...

Running the Proxy Server

Start the Server

Run the main proxy server:

uv run main.py

An OpenAI-compatible proxy server will start on http://localhost:8000 by default.

Running Tests

The test suite validates proxy functionality, model discovery, and container management.

Prerequisites for Testing

Ensure the proxy server is running on http://localhost:8000
Have at least one model available (tests expect 01-DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M by default)

Run All Tests

uv run pytest

Run Specific Test Classes

uv run pytest test_proxy.py::TestProxy -v

uv run pytest test_proxy.py::TestModelDiscovery -v

uv run pytest test_proxy.py::TestContainerManagement -v

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
.kiro		.kiro
.vscode		.vscode
benchmarks		benchmarks
docs		docs
models		models
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
andre-jesus-dissertation.pdf		andre-jesus-dissertation.pdf
hardware_configs.json		hardware_configs.json
main_cost_aware.py		main_cost_aware.py
pyproject.toml		pyproject.toml
throughput_benchmark.py		throughput_benchmark.py
throughput_benchmark_results.json		throughput_benchmark_results.json
throughput_benchmark_results_1.json		throughput_benchmark_results_1.json
throughput_benchmark_results_2.json		throughput_benchmark_results_2.json
throughput_benchmark_results_3.json		throughput_benchmark_results_3.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dynamic Hardware Selection for Cost Effective ML Inference

Prerequisites

Installation

Model Setup

Running the Proxy Server

Start the Server

Running Tests

Prerequisites for Testing

Run All Tests

Run Specific Test Classes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Dynamic Hardware Selection for Cost Effective ML Inference

Prerequisites

Installation

Model Setup

Running the Proxy Server

Start the Server

Running Tests

Prerequisites for Testing

Run All Tests

Run Specific Test Classes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages