DeepInfra is a serverless inference platform for open-source models. Hosts 100+ LLMs (Llama, Qwen, DeepSeek, Mixtral) plus image (Flux, Stable Diffusion), video, audio (Whisper, TTS, Voxtral), embeddings/reranking, and vision/OCR models. Includes fine-tuning, dedicated GPU rentals, and private deployments. OpenAI- and Anthropic-compatible endpoints.
URL: Visit APIs.json URL
Run: Capabilities Using Naftiko
- x-type: company
- AI, LLM, Inference, Serverless, Open Source, OpenAI Compatible, Anthropic Compatible, Image Generation, Audio, Embeddings
- DeepInfra Platform API — Chat completions (OpenAI + Anthropic compatible), embeddings, reranking, audio (Whisper / TTS / Voxtral), image (Flux/SD), video, vision/OCR, fine-tuning, dedicated-model deployments, account, billing, webhooks. Base URL
https://api.deepinfra.com/v1/openai. Docs · Pricing · Rate Limits
- DeepSeek-V3: $0.32/M input · $0.89/M output
- Voxtral Mini audio: $0.001/minute
- Flux schnell image: $0.0005 × (w/1024) × (h/1024) × iterations
- Dedicated GPU rentals: A100 from $0.89/hour, B300 up to $4.20/hour
- Plans — PAYG per-token / per-minute / per-image, dedicated-GPU hourly. 5 usage tiers ($20-$10K).
- RateLimits — 200 concurrent requests default; rate/GPU limit increases on request.
- FinOps — FOCUS-aligned, Usage Record API + automatic invoicing thresholds.
- Created: 2026-05-08
- Modified: 2026-05-08
- A documented OpenAPI URL exists (
https://docs.deepinfra.com/api-reference/openapi.json) but currently returns a placeholder "Plant Store" sample spec rather than the real DeepInfra schema. Spec not copied locally.
FN: Kin Lane
Email: kin@apievangelist.com