👋 Welcome to WebLLM

GitHub | WebLLM Chat | NPM | Discord

WebLLM is a high-performance in-browser language model inference engine that brings large language models (LLMs) to web browsers with hardware acceleration. With WebGPU support, it allows developers to build AI-powered applications directly within the browser environment, removing the need for server-side processing and ensuring privacy.

It provides a specialized runtime for the web backend of MLCEngine, leverages WebGPU for local acceleration, offers OpenAI-compatible API, and provides built-in support for web workers to separate heavy computation from the UI flow.

Key Features

🌐 In-Browser Inference: Run LLMs directly in the browser
🚀 WebGPU Acceleration: Leverage hardware acceleration for optimal performance
🔄 OpenAI API Compatibility: Seamless integration with standard AI workflows
📦 Multiple Model Support: Works with Llama, Phi, Gemma, Mistral, and more

Start exploring WebLLM by chatting with WebLLM Chat, and start building webapps with high-performance local LLM inference with the following guides and tutorials.

.. toctree::
   :maxdepth: 2
   :caption: User Guide

   user/get_started.rst
   user/basic_usage.rst
   user/advanced_usage.rst
   user/api_reference.rst

.. toctree::
   :maxdepth: 2
   :caption: Developer Guide

   developer/building_from_source.rst
   developer/add_models.rst

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.rst

index.rst

👋 Welcome to WebLLM

Key Features

Files

index.rst

Latest commit

History

index.rst

File metadata and controls

👋 Welcome to WebLLM

Key Features