BentoML

Welcome to BentoML 👋

Website | Docs | Blog | Twitter | Community

What's cooking? 👩‍🍳

🍱 BentoML: The Unified Serving Framework for AI Systems

BentoML is a Python library for building online serving systems optimized for AI apps and model inference. It supports serving any model format/runtime and custom Python code, offering the key primitives for serving optimizations, task queues, batching, multi-model chains, distributed orchestration, and multi-GPU serving.

🦾 OpenLLM: Self-hosting Large Language Models Made Easy

Run any open-source LLMs (Llama 3.1, Qwen2, Phi3 and more) or custom models as OpenAI-compatible APIs with a single command. It features a built-in chat UI, state-of-the-art inference performance, and a simplified workflow for production-grade cloud deployment.

☁️ BentoCloud: Fast and scalable infrastructure for building and scaling with BentoML on the cloud

BentoCloud is the complete platform for enterprise AI teams to build and scale Compound AI systems. It brings cutting-edge AI infrastructure into your cloud environment, enabling AI teams to run inference with unparalleled efficiency, rapidly iterate on system design, and effortlessly scale in production with full observability.

Get in touch 💬

👉 Join our Slack community!

👀 Follow us on X @bentomlai and LinkedIn

📖 Read our blog

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BentoML

Welcome to BentoML 👋

What's cooking? 👩‍🍳

🍱 BentoML: The Unified Serving Framework for AI Systems

🦾 OpenLLM: Self-hosting Large Language Models Made Easy

☁️ BentoCloud: Fast and scalable infrastructure for building and scaling with BentoML on the cloud

Get in touch 💬

Pinned Loading

Repositories

People

Sponsoring

Top languages

Most used topics