Auxen hosts private AI model endpoints on dedicated GPUs. Pay by the minute, no subscriptions, OpenAI-compatible API.
Each Auxen instance is a per-customer GPU running one open-source model (Llama, Qwen, Mistral, Gemma, etc.) on a stable HTTPS endpoint. This SDK is a thin wrapper over the official openai Python client with Auxen-specific defaults.
pip install auxenProvision an instance at auxen.ai/dashboard and copy the endpoint URL + API key.
from auxen import Auxen
client = Auxen(
base_url="https://api.auxen.ai/v1/inst_xxx",
api_key="auxk_...",
)
response = client.chat.completions.create(
model="llama-3.1-8b",
messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)Or, with environment variables (AUXEN_BASE_URL + AUXEN_API_KEY):
from auxen import Auxen
client = Auxen()stream = client.chat.completions.create(
model="llama-3.1-8b",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")import asyncio
from auxen import AsyncAuxen
async def main():
client = AsyncAuxen()
response = await client.chat.completions.create(
model="llama-3.1-8b",
messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)
asyncio.run(main())These all work via the standard OpenAI client surface — no Auxen-specific code needed. The Auxen inference layer is fully OpenAI-compatible.
Auxen bills per minute of GPU runtime, not per token:
| Tier | Models | Rate (1× capacity) |
|---|---|---|
| Small (≤7B) | gemma2-2b, mistral-7b… | $0.10/hr |
| Medium (8–14B) | llama3.1-8b, qwen2.5-14b… | $0.20/hr |
| Large (24–32B) | mistral-small-24b, qwen2.5-32b… | $0.65/hr |
| XL (70B+) | llama3.1-70b, qwen2.5-72b… | $1.50/hr |
See auxen.ai/docs for the full breakdown.
- Auxen homepage: https://auxen.ai
- Documentation: https://auxen.ai/docs
- Dashboard: https://auxen.ai/dashboard
- Source code: https://github.com/auxen-ai/auxen-python
Apache-2.0