Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 24 additions & 49 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,105 +35,79 @@ OpenLLM supports a wide range of state-of-the-art open-source LLMs. You can also
<th>Start a Server</th>
</tr>
<tr>
<td>deepseek-r1</td>
<td>671B</td>
<td>80Gx16</td>
<td><code>openllm serve deepseek-r1:671b-fc3d</code></td>
</tr>
<tr>
<td>deepseek-r1-distill</td>
<td>14B</td>
<td>80G</td>
<td><code>openllm serve deepseek-r1-distill:qwen2.5-14b-98a9</code></td>
</tr>
<tr>
<td>deepseek-v3</td>
<td>671B</td>
<td>80Gx16</td>
<td><code>openllm serve deepseek-v3:671b-instruct-d7ec</code></td>
<td>deepseek</td>
<td>8B</td>
<td>24GB</td>
<td><code>openllm serve deepseek:r1-distill-llama3.1-8b-626a</code></td>
</tr>
<tr>
<td>gemma2</td>
<td>2B</td>
<td>12G</td>
<td><code>openllm serve gemma2:2b-instruct-747d</code></td>
<td><code>openllm serve gemma2:2b-instruct-868c</code></td>
</tr>
<tr>
<td>hermes-3</td>
<td>8B</td>
<td>80G</td>
<td><code>openllm serve hermes-3:deep-llama3-8b-1242</code></td>
</tr>
<tr>
<td>llama3.1</td>
<td>8B</td>
<td>24G</td>
<td><code>openllm serve llama3.1:8b-instruct-3c0c</code></td>
<td><code>openllm serve llama3.1:8b-instruct-a995</code></td>
</tr>
<tr>
<td>llama3.2</td>
<td>1B</td>
<td>24G</td>
<td><code>openllm serve llama3.2:1b-instruct-f041</code></td>
<td><code>openllm serve llama3.2:1b-instruct-6fa1</code></td>
</tr>
<tr>
<td>llama3.3</td>
<td>70B</td>
<td>80Gx2</td>
<td><code>openllm serve llama3.3:70b-instruct-b850</code></td>
<td><code>openllm serve llama3.3:70b-instruct-f791</code></td>
</tr>
<tr>
<td>mistral</td>
<td>8B</td>
<td>24G</td>
<td><code>openllm serve mistral:8b-instruct-50e8</code></td>
<td><code>openllm serve mistral:8b-instruct-f4ed</code></td>
</tr>
<tr>
<td>mistral-large</td>
<td>123B</td>
<td>80Gx4</td>
<td><code>openllm serve mistral-large:123b-instruct-1022</code></td>
</tr>
<tr>
<td>mistralai</td>
<td>24B</td>
<td>80G</td>
<td><code>openllm serve mistralai:24b-small-instruct-2501-0e69</code></td>
</tr>
<tr>
<td>mixtral</td>
<td>7B</td>
<td>80Gx2</td>
<td><code>openllm serve mixtral:8x7b-instruct-v0.1-b752</code></td>
<td><code>openllm serve mistral-large:123b-instruct-2407-e1ef</code></td>
</tr>
<tr>
<td>phi4</td>
<td>14B</td>
<td>80G</td>
<td><code>openllm serve phi4:14b-c12d</code></td>
<td><code>openllm serve phi4:14b-a515</code></td>
</tr>
<tr>
<td>pixtral</td>
<td>12B</td>
<td>80G</td>
<td><code>openllm serve pixtral:12b-240910-c344</code></td>
<td><code>openllm serve pixtral:12b-2409-a2e0</code></td>
</tr>
<tr>
<td>qwen2.5</td>
<td>7B</td>
<td>24G</td>
<td><code>openllm serve qwen2.5:7b-instruct-3260</code></td>
<td><code>openllm serve qwen2.5:7b-instruct-dbe1</code></td>
</tr>
<tr>
<td>qwen2.5-coder</td>
<td>7B</td>
<td>24G</td>
<td><code>openllm serve qwen2.5-coder:7b-instruct-e75d</code></td>
</tr>
<tr>
<td>qwen2.5vl</td>
<td>3B</td>
<td>24G</td>
<td><code>openllm serve qwen2.5vl:3b-instruct-4686</code></td>
<td><code>openllm serve qwen2.5-coder:3b-instruct-63b0</code></td>
</tr>
</table>

...

For the full model list, see the [OpenLLM models repository](https://github.com/bentoml/openllm-models).

## Start an LLM server
Expand All @@ -151,7 +125,7 @@ To start an LLM server locally, use the `openllm serve` command and specify the
> ```

```bash
openllm serve openllm serve llama3.2:1b-instruct-f041
openllm serve openllm serve llama3.2:1b-instruct-6fa1
```

The server will be accessible at [http://localhost:3000](http://localhost:3000/), providing OpenAI-compatible APIs for interaction. You can call the endpoints with different frameworks and tools that support OpenAI-compatible APIs. Typically, you may need to specify the following:
Expand Down Expand Up @@ -235,7 +209,7 @@ openllm repo update
To review a model’s information, run:

```bash
openllm model get openllm serve llama3.2:1b-instruct-f041
openllm model get openllm serve llama3.2:1b-instruct-6fa1
```

### Add a model to the default model repository
Expand Down Expand Up @@ -263,7 +237,7 @@ OpenLLM supports LLM cloud deployment via BentoML, the unified model serving fra
[Sign up for BentoCloud](https://www.bentoml.com/) for free and [log in](https://docs.bentoml.com/en/latest/bentocloud/how-tos/manage-access-token.html). Then, run `openllm deploy` to deploy a model to BentoCloud:

```bash
openllm deploy openllm serve llama3.2:1b-instruct-f041
openllm deploy openllm serve llama3.2:1b-instruct-6fa1
```

> [!NOTE]
Expand Down Expand Up @@ -296,3 +270,4 @@ This project uses the following open-source projects:
- [astral-sh/uv](https://github.com/astral-sh/uv) for blazing fast model requirements installing

We are grateful to the developers and contributors of these projects for their hard work and dedication.

1 change: 0 additions & 1 deletion README.md.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@ OpenLLM supports a wide range of state-of-the-art open-source LLMs. You can also
{%- endfor %}
</table>

...

For the full model list, see the [OpenLLM models repository](https://github.com/bentoml/openllm-models).

Expand Down
16 changes: 2 additions & 14 deletions gen_readme.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,6 @@
# "uv",
# ]
# ///
import subprocess, sys, pathlib, json, jinja2

import subprocess, sys, pathlib, json

from jinja2 import Environment, FileSystemLoader

wd = pathlib.Path('.').parent
model_dict = subprocess.run(
[sys.executable, '-m', 'uv', 'run', '--with-editable', '.', 'openllm', 'model', 'list', '--output', 'readme'],
capture_output=True,
text=True,
check=True,
)
E = Environment(loader=FileSystemLoader('.'))
with (wd / 'README.md').open('w') as f:
f.write(E.get_template('README.md.tpl').render(model_dict=json.loads(model_dict.stdout.strip())))
with (pathlib.Path('.').parent / 'README.md').open('w') as f: f.write(jinja2.Environment(loader=jinja2.FileSystemLoader('.')).get_template('README.md.tpl').render(model_dict=json.loads(subprocess.run([sys.executable, '-m', 'uv', 'run', '--with-editable', '.', 'openllm', 'model', 'list', '--output', 'readme'], text=True, check=True, capture_output=True).stdout.strip())))
8 changes: 4 additions & 4 deletions uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.