Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add rocm builds and documentation #1012

Merged
merged 41 commits into from
Dec 13, 2023
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
5b0816c
Added rocm builds and documentation
cromefire Dec 10, 2023
1be9c1e
Pulled build improvements from #902
cromefire Dec 10, 2023
8a21760
Fixed build container for rocm build
cromefire Dec 10, 2023
1e3fe32
Install git in rocm container
cromefire Dec 10, 2023
1a1d4da
Fixed github step
cromefire Dec 10, 2023
b701a63
Try to fix if statement
cromefire Dec 10, 2023
82bbf8d
Added more generic dependency installation
cromefire Dec 10, 2023
081a9f3
upgraded rustup action
cromefire Dec 10, 2023
5308d34
Update sccache
cromefire Dec 10, 2023
f50b5af
Try pytorch manylinux image
cromefire Dec 10, 2023
980f3ed
Switched location for toolchain parameter
cromefire Dec 10, 2023
42ad479
Downgraded to deprecated action again
cromefire Dec 10, 2023
a5e69ca
Readded set default step
cromefire Dec 10, 2023
d4d38d7
Merge branch 'main' into rocm-release
cromefire Dec 12, 2023
5be320a
Install minimal rocm on the fly
cromefire Dec 12, 2023
e89ca16
fixed typo in binary name
cromefire Dec 12, 2023
c96853a
Downgraded checkout action
cromefire Dec 12, 2023
97fef46
Use curl to download
cromefire Dec 12, 2023
022548c
Add -y flag to yum
cromefire Dec 12, 2023
f4e99e7
Also install rocblas
cromefire Dec 12, 2023
7ae9d1d
Update release.yml
wsxiaoys Dec 13, 2023
17cfd18
Update release.yml
wsxiaoys Dec 13, 2023
895a2a2
Update prepare_build_environment.sh
wsxiaoys Dec 13, 2023
5c1ea2f
Update prepare_build_environment.sh
wsxiaoys Dec 13, 2023
18df1ba
Update build.rs
wsxiaoys Dec 13, 2023
d17bee0
Update build.rs
wsxiaoys Dec 13, 2023
3113962
Update README.md
wsxiaoys Dec 13, 2023
81f138a
Update website/docs/faq.mdx
wsxiaoys Dec 13, 2023
ab80cda
Update index.md
wsxiaoys Dec 13, 2023
23f2054
Update and rename docker-cuda.yml to docker.yml
wsxiaoys Dec 13, 2023
5202dfe
Delete .github/workflows/docker-rocm.yml
wsxiaoys Dec 13, 2023
f3d793f
Delete rocm.Dockerfile
wsxiaoys Dec 13, 2023
15767a8
Rename cuda.Dockerfile to Dockerfile
wsxiaoys Dec 13, 2023
1ae3822
Update docker.yml
wsxiaoys Dec 13, 2023
e44f48d
Update website/docs/installation/docker.mdx
wsxiaoys Dec 13, 2023
6fc195e
Update website/docs/installation/docker-compose.mdx
wsxiaoys Dec 13, 2023
2c55ee4
Update docker-compose.mdx
wsxiaoys Dec 13, 2023
a1f9589
Update docker-compose.mdx
wsxiaoys Dec 13, 2023
1ed492b
Update docker.mdx
wsxiaoys Dec 13, 2023
f7fe002
Update docker.mdx
wsxiaoys Dec 13, 2023
39a640b
Update website/docs/faq.mdx
wsxiaoys Dec 13, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,8 @@
.idea
ci
clients
.github
python
**/target
**/node_modules
website
11 changes: 8 additions & 3 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,7 @@ jobs:
container: ${{ matrix.container }}
strategy:
matrix:
binary: [aarch64-apple-darwin, x86_64-manylinux2014, x86_64-manylinux2014-cuda117,
x86_64-windows-msvc-cuda117, x86_64-windows-msvc-cuda122]
binary: [aarch64-apple-darwin, x86_64-manylinux2014, x86_64-manylinux2014-cuda117, x86_64-windows-msvc-cuda117, x86_64-windows-msvc-cuda122, x86_64-manylinux2014-rocm57]
include:
- os: macos-latest
target: aarch64-apple-darwin
Expand All @@ -53,6 +52,11 @@ jobs:
ext: .exe
build_args: --features cuda
windows_cuda: '12.2.0'
- os: ubuntu-latest
target: x86_64-unknown-linux-gnu
binary: x86_64-manylinux2014-rocm57
container: ghcr.io/cromefire/hipblas-manylinux/2014/5.7:latest
build_args: --features rocm

env:
SCCACHE_GHA_ENABLED: true
Expand All @@ -72,7 +76,8 @@ jobs:
target: ${{ matrix.target }}
components: clippy

- run: rustup default ${{ env.RUST_TOOLCHAIN }}
- name: Set default rust version
run: rustup default ${{ env.RUST_TOOLCHAIN }}

- name: Sccache cache
uses: mozilla-actions/sccache-action@v0.0.3
Expand Down
3 changes: 2 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,13 @@ RUN curl https://sh.rustup.rs -sSf | bash -s -- --default-toolchain ${RUST_TOOLC
ENV PATH="/root/.cargo/bin:${PATH}"

WORKDIR /root/workspace
COPY . .

RUN mkdir -p /opt/tabby/bin
RUN mkdir -p /opt/tabby/lib
RUN mkdir -p target

COPY . .

RUN --mount=type=cache,target=/usr/local/cargo/registry \
--mount=type=cache,target=/root/workspace/target \
cargo build --features cuda --release --package tabby && \
Expand Down
6 changes: 3 additions & 3 deletions website/docs/extensions/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,9 +112,9 @@ for the current code context.
If your completion requests are timing out, Tabby may display a warning message.
This could be due to network issues or poor server performance, especially when
running a large model on a CPU. To improve performance, consider running the model
on a GPU with CUDA support or on Apple M1/M2 with Metal support. When running
the server, make sure to specify the device in the arguments using `--device cuda`
or `--device metal`. You can also try using a smaller model from the available [models](https://tabby.tabbyml.com/docs/models/).
on a GPU with CUDA or ROCm support or on Apple M1/M2 with Metal support. When running
the server, make sure to specify the device in the arguments using `--device cuda`, `--device rocm` or
`--device metal`. You can also try using a smaller model from the available [models](https://tabby.tabbyml.com/docs/models/).

By default, the timeout for automatically triggered completion requests is set to 4 seconds.
You can adjust this timeout value in the `~/.tabby-client/agent/config.toml` configuration file.
Expand Down
19 changes: 15 additions & 4 deletions website/docs/faq.mdx
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
import CodeBlock from '@theme/CodeBlock';

# ⁉️ Frequently Asked Questions

<details>
<summary>How much VRAM a LLM model consumes?</summary>
<div>By default, Tabby operates in int8 mode with CUDA, requiring approximately 8GB of VRAM for CodeLlama-7B.</div>
<div>
<p>By default, Tabby operates in int8 mode with CUDA, requiring approximately 8GB of VRAM for CodeLlama-7B.</p>
<p>For ROCm the actual limits are currently largely untested, but the same CodeLlama-7B seems to use 8GB of VRAM as well on a AMD Radeon™ RX 7900 XTX according to the ROCm monitoring tools.</p>
</div>
</details>

<details>
Expand All @@ -24,7 +25,17 @@ import CodeBlock from '@theme/CodeBlock';
<details>
<summary>How to utilize multiple NVIDIA GPUs?</summary>
<div>
<p>Tabby only supports the use of a single GPU. To utilize multiple GPUs, you can initiate multiple Tabby instances and set CUDA_VISIBLE_DEVICES accordingly.</p>
<p>Tabby only supports the use of a single GPU. To utilize multiple GPUs, you can initiate multiple Tabby instances and set CUDA_VISIBLE_DEVICES or HIP_VISIBLE_DEVICES accordingly.</p>
</div>
wsxiaoys marked this conversation as resolved.
Show resolved Hide resolved
</details>

<details>
<summary>My AMD ROCm device isn't supported by ROCm</summary>
<div>
<p>
You can use the HSA_OVERRIDE_GFX_VERSION variable if there is a similar GPU that is supported by ROCm you can set it to that.
For example for RDNA2 you can set it to 10.3.0 and to 11.0.0 for RDNA3.
</p>
</div>
</details>

Expand Down
2 changes: 1 addition & 1 deletion website/docs/installation/apple.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@ brew install tabbyml/tabby/tabby
tabby serve --device metal --model TabbyML/StarCoder-1B
```

The compute power of M1/M2 is limited and is likely to be sufficient only for individual usage. If you require a shared instance for a team, we recommend considering Docker hosting with CUDA. You can find more information about Docker [here](./docker).
The compute power of M1/M2 is limited and is likely to be sufficient only for individual usage. If you require a shared instance for a team, we recommend considering Docker hosting with CUDA or ROCm. You can find more information about Docker [here](./docker).
27 changes: 24 additions & 3 deletions website/docs/installation/docker-compose.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ sidebar_position: 1
# Docker Compose
This guide explains how to launch Tabby using docker-compose.



wsxiaoys marked this conversation as resolved.
Show resolved Hide resolved
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

Expand All @@ -16,8 +18,8 @@ version: '3.5'

services:
tabby:
restart: always
image: tabbyml/tabby
restart: unless-stopped
image: tabbyml/tabby-cuda
command: serve --model TabbyML/StarCoder-1B --device cuda
volumes:
- "$HOME/.tabby:/data"
Expand All @@ -33,14 +35,33 @@ services:
```

</TabItem>
<TabItem value="rocm" label="ROCm">

```yaml title="docker-compose.yml"
version: '3.5'
services:
tabby:
restart: unless-stopped
image: tabbyml/tabby-rocm
command: serve --model TabbyML/StarCoder-1B --device rocm
volumes:
- "$HOME/.tabby:/data"
ports:
- 8080:8080
devices:
- /dev/dri
- /dev/kfd
```

</TabItem>
<TabItem value="cpu" label="CPU">

```yaml title="docker-compose.yml"
version: '3.5'

services:
tabby:
restart: always
restart: unless-stopped
image: tabbyml/tabby
command: serve --model TabbyML/StarCoder-1B
volumes:
Expand Down
9 changes: 8 additions & 1 deletion website/docs/installation/docker.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,14 @@ import TabItem from '@theme/TabItem';
<TabItem value="cuda" label="CUDA (requires NVIDIA Container Toolkit)" default>

```bash title="run.sh"
docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model TabbyML/StarCoder-1B --device cuda
docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby-cuda serve --model TabbyML/StarCoder-1B --device cuda
```

</TabItem>
<TabItem value="rocm" label="ROCm" default>

```bash title="run.sh"
docker run -it --device /dev/dri --device /dev/kfd -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby-rocm serve --model TabbyML/StarCoder-1B --device rocm
```

</TabItem>
Expand Down
Loading