Baseten is a production inference platform for deploying and serving custom and pre-trained ML models. Offers a Model APIs catalog with OpenAI- and Anthropic-compatible endpoints (DeepSeek, Qwen, GLM, Nemotron), dedicated deployments via Truss, autoscaling GPU compute, async/queue inference, training, chains (multi-model workflows), and management APIs.
URL: Visit APIs.json URL
Run: Capabilities Using Naftiko
- x-type: company
- AI, ML, Inference, Deployment, MLOps, OpenAI Compatible, Anthropic Compatible, Truss
- Baseten LLM Inference API — OpenAI-compatible chat completions for Model APIs catalog. Base URL
https://inference.baseten.co/v1. Docs · OpenAPI - Baseten Anthropic-Compatible Messages API — Anthropic Messages-compatible inference. OpenAPI
- Baseten Management & Async API — Deployment management, async inference, chains, training. Base URL
https://api.baseten.co.
- Basic $0/mo PAYG: dedicated deployments, model APIs, training, fast cold starts, SOC 2 + HIPAA, email/in-app chat support.
- Pro (volume discounts): everything in Basic + priority GPU, dedicated compute, higher rate limits, hands-on support.
- Enterprise (custom): self-hosted options, custom SLAs, data residency, advanced RBAC.
- Model APIs (per-token): DeepSeek V4 $1.74/M input · $3.48/M output. NVIDIA Nemotron 3 Super $0.30/M input · $0.75/M output.
- Compute (per-minute): T4 $0.01052, up to B200 $0.16633. CPU from $0.00058. No charge for idle time.
- Plans
- RateLimits — Async control rate-limited to 20 req/s; tier-dependent inference limits.
- FinOps
- Created: 2026-05-08
- Modified: 2026-05-08
FN: Kin Lane
Email: kin@apievangelist.com