inference-labs

Official Python client for Inference Labs — a vendor-neutral router for the major cloud LLMs (OpenAI / Azure / Anthropic / Google / AWS Bedrock / RunwayML). One endpoint, one billing surface, automatic failover, semantic caching, and policy-driven model selection (cost-first, quality-first, latency-first, balanced, judge).

pip install inference-labs

Or install the current release directly from GitHub (no PyPI account needed by us — works today):

pip install https://github.com/bosslesss/inference-labs-python/releases/download/v0.1.0/inference_labs-0.1.0-py3-none-any.whl

Optional LangChain integration:

pip install "inference-labs[langchain]"

Quickstart

from inference_labs import InferenceLabs

client = InferenceLabs(api_key="il_live_...")  # or INFERENCE_LABS_API_KEY env var

out = client.generate(
    prompt="Summarize this ticket: the laser printer is offline...",
    strategy="cost-first",
    max_cost_usd=0.01,
)
print(out.text)
print(f"routed via {out.provider}/{out.model} -- ${out.cost_usd:.5f}")

Streaming:

for chunk in client.stream(prompt="Write a haiku about caching."):
    print(chunk, end="", flush=True)

Async (same surface, awaitable):

import asyncio
from inference_labs import AsyncInferenceLabs

async def main():
    async with AsyncInferenceLabs() as client:
        out = await client.generate(prompt="Hello.")
        print(out.text)

asyncio.run(main())

LangChain

from inference_labs.langchain import ChatInferenceLabs
from langchain_core.messages import HumanMessage, SystemMessage

llm = ChatInferenceLabs(
    api_key="il_live_...",
    strategy="balanced",
    max_cost_usd=0.01,
)

resp = llm.invoke([
    SystemMessage(content="You are a terse SRE."),
    HumanMessage(content="What does workers=1 threads=8 mean for SQLite?"),
])
print(resp.content)
print(resp.additional_kwargs)   # -> model, provider, cost_usd, latency_ms, cached, trace_id

Routing options

All parameters below are optional and can be set per call.

Parameter	Type	What it does
`strategy`	`"balanced"` / `"cost-first"` / `"quality-first"` / `"latency-first"` / `"judge"`	Picks the policy the router uses to choose between models in your allowlist.
`max_cost_usd`	`float`	Hard cap on per-request cost in USD.
`max_latency_ms`	`int`	Latency budget in milliseconds.
`allow_models`	`list[str]`	Restrict the call to a subset of model IDs.
`deny_models`	`list[str]`	Exclude specific model IDs from selection.
`workspace_id`	`str`	Override the API key's default workspace.
`collect_trace`	`bool`	Persist a redacted trace for evals (default `True`).
`redact_pii`	`bool`	Run the PII / secrets redactor before storage (default `True`).

The router returns a small typed object:

@dataclass
class GenerationResult:
    text: str
    model: str
    provider: str
    cost_usd: float
    latency_ms: int
    cached: bool
    trace_id: str | None
    raw: dict   # whole response payload if you need fields we don't surface

Errors

from inference_labs import (
    InferenceLabsError, AuthenticationError, RateLimitError,
    InsufficientCreditsError, APIError,
)

All exceptions inherit from InferenceLabsError so you can catch one.

Configuration

client = InferenceLabs(
    api_key="il_live_...",                  # or INFERENCE_LABS_API_KEY
    base_url="https://app.inference-labs.com",  # override for staging / self-hosted
    timeout=60.0,
)

For multi-tenant frameworks pass your own httpx.Client / httpx.AsyncClient via the client= kwarg so the SDK reuses your connection pool.

License

Apache-2.0. See LICENSE.

Links

Marketing: https://inference-labs.com
App / dashboard: https://app.inference-labs.com
Issues: https://github.com/bosslesss/InferenceLabs/issues
Blog: https://blog.inference-labs.com

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
examples		examples
inference_labs		inference_labs
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

inference-labs

Quickstart

LangChain

Routing options

Errors

Configuration

License

Links

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

inference-labs

Quickstart

LangChain

Routing options

Errors

Configuration

License

Links

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages