diff --git a/contributing/DOCS.md b/contributing/DOCS.md
index a885c5f51e..4fcc04d6d1 100644
--- a/contributing/DOCS.md
+++ b/contributing/DOCS.md
@@ -52,6 +52,19 @@ uv run mkdocs build -s
The documentation uses a custom build system with MkDocs hooks to generate various files dynamically.
+### Disable flags
+
+Use these in `.envrc` to disable expensive docs regeneration, especially during `mkdocs serve` auto-reload. Set any of them to disable the corresponding artifact.
+
+```shell
+export DSTACK_DOCS_DISABLE_EXAMPLES=1
+export DSTACK_DOCS_DISABLE_LLM_TXT=1
+export DSTACK_DOCS_DISABLE_CLI_REFERENCE=1
+export DSTACK_DOCS_DISABLE_YAML_SCHEMAS=1
+export DSTACK_DOCS_DISABLE_OPENAPI_REFERENCE=1
+export DSTACK_DOCS_DISABLE_REST_PLUGIN_SPEC_REFERENCE=1
+```
+
### Build hooks
The build process is customized via hooks in `scripts/docs/hooks.py`:
diff --git a/docs/assets/stylesheets/extra.css b/docs/assets/stylesheets/extra.css
index fcde5e2e73..cb2d68e55d 100644
--- a/docs/assets/stylesheets/extra.css
+++ b/docs/assets/stylesheets/extra.css
@@ -1615,7 +1615,8 @@ html .md-footer-meta.md-typeset a:is(:focus,:hover) {
.md-typeset.md-banner__inner a {
color: var(--md-default-bg-color);
/* border-bottom: 1.5px dotted; */
- font-weight: 500;
+ /* font-weight: 500; */
+ font-size: 0.75rem;
}
.md-typeset.md-banner__inner .md-banner__button svg {
diff --git a/docs/docs/concepts/services.md b/docs/docs/concepts/services.md
index 1eb63dd01e..685b793bc9 100644
--- a/docs/docs/concepts/services.md
+++ b/docs/docs/concepts/services.md
@@ -1093,6 +1093,5 @@ The rolling deployment stops when all replicas are updated or when a new deploym
1. Read about [dev environments](dev-environments.md) and [tasks](tasks.md)
2. Learn how to manage [fleets](fleets.md)
3. See how to set up [gateways](gateways.md)
- 4. Check the [TGI](../../examples/inference/tgi/index.md),
- [vLLM](../../examples/inference/vllm/index.md), and
+ 4. Check the [vLLM](../../examples/inference/vllm/index.md) and
[NIM](../../examples/inference/nim/index.md) examples
diff --git a/docs/docs/index.md b/docs/docs/index.md
index 8afc24fdb5..4edaaee798 100644
--- a/docs/docs/index.md
+++ b/docs/docs/index.md
@@ -16,11 +16,11 @@ It streamlines development, training, and inference, and is compatible with any
-#### 1. Set up the server
+### Set up the server
> Before using `dstack`, ensure you've [installed](installation.md) the server, or signed up for [dstack Sky](https://sky.dstack.ai).
-#### 2. Define configurations
+### Define configurations
`dstack` supports the following configurations:
@@ -32,7 +32,7 @@ It streamlines development, training, and inference, and is compatible with any
Configuration can be defined as YAML files within your repo.
-#### 3. Apply configurations
+### Apply configurations
Apply the configuration either via the `dstack apply` CLI command (or through a programmatic API.)
diff --git a/docs/docs/installation.md b/docs/docs/installation.md
index d555f0873d..0ff4f624a8 100644
--- a/docs/docs/installation.md
+++ b/docs/docs/installation.md
@@ -177,6 +177,8 @@ Once the server is up, you can access it via the `dstack` CLI.
### Configure the project
+When the server is started, by default, it creates the `main` project and the `admin` user.
+
To point the CLI to the `dstack` server, configure it
with the server address, user token, and project name:
@@ -195,6 +197,12 @@ Configuration is updated at ~/.dstack/config.yml
This configuration is stored in `~/.dstack/config.yml`.
+Later, you can create additional projects and users.
+
+### Use CLI or API
+
+Once the project is configured, you can use the `dstack` CLI or API.
+
## Install agent skills
Install [`dstack` skills](https://skills.sh/dstackai/dstack/dstack) to help AI agents use the CLI and edit configuration files.
@@ -207,6 +215,8 @@ $ npx skills add dstackai/dstack
+### Use agents
+
AI agents like Claude, Codex, and Cursor can now create and manage fleets and submit workloads on your behalf.
@@ -233,10 +243,9 @@ $
-!!! info "Feedback"
- We're actively improving Skills and would love your feedback in [GitHub issues](https://github.com/dstackai/dstack/issues).
+We're actively improving Skills and would love your feedback in [GitHub issues](https://github.com/dstackai/dstack/issues).
!!! info "What's next?"
1. See [Backends](concepts/backends.md)
2. Follow [Quickstart](quickstart.md)
- 3. Check the [server deployment](guides/server-deployment.md) guide
+ 3. Check the [Server deployment](guides/server-deployment.md) guide
diff --git a/docs/docs/reference/dstack.yml/service.md b/docs/docs/reference/dstack.yml/service.md
index 59411a540d..8aba6f827e 100644
--- a/docs/docs/reference/dstack.yml/service.md
+++ b/docs/docs/reference/dstack.yml/service.md
@@ -20,51 +20,6 @@ The `service` configuration type allows running [services](../../concepts/servic
type:
required: true
-=== "TGI"
-
- > TGI provides an OpenAI-compatible API starting with version 1.4.0,
- so models served by TGI can be defined with `format: openai` too.
-
- #SCHEMA# dstack.api.TGIChatModel
- overrides:
- show_root_heading: false
- type:
- required: true
-
- ??? info "Chat template"
-
- By default, `dstack` loads the [chat template](https://huggingface.co/docs/transformers/main/en/chat_templating)
- from the model's repository. If it is not present there, manual configuration is required.
-
- ```yaml
- type: service
-
- image: ghcr.io/huggingface/text-generation-inference:latest
- env:
- - MODEL_ID=TheBloke/Llama-2-13B-chat-GPTQ
- commands:
- - text-generation-launcher --port 8000 --trust-remote-code --quantize gptq
- port: 8000
-
- resources:
- gpu: 80GB
-
- # Enable the OpenAI-compatible endpoint
- model:
- type: chat
- name: TheBloke/Llama-2-13B-chat-GPTQ
- format: tgi
- chat_template: "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 and system_message != false %}{% set content = '<>\\n' + system_message + '\\n<>\\n\\n' + message['content'] %}{% else %}{% set content = message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + content.strip() + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ ' ' + content.strip() + ' ' }}{% endif %}{% endfor %}"
- eos_token: ""
- ```
-
- Please note that model mapping is an experimental feature with the following limitations:
-
- 1. Doesn't work if your `chat_template` uses `bos_token`. As a workaround, replace `bos_token` inside `chat_template` with the token content itself.
- 2. Doesn't work if `eos_token` is defined in the model repository as a dictionary. As a workaround, set `eos_token` manually, as shown in the example above (see Chat template).
-
- If you encounter any ofther issues, please make sure to file a
- [GitHub issue](https://github.com/dstackai/dstack/issues/new/choose).
### `scaling`
diff --git a/docs/examples.md b/docs/examples.md
index e57a41cf52..cbecf2435e 100644
--- a/docs/examples.md
+++ b/docs/examples.md
@@ -3,16 +3,16 @@ title: Examples
description: Collection of examples for training, inference, and clusters
#template: examples.html
hide:
- - navigation
-# - toc
- - footer
+# - navigation
+ - toc
+# - footer
---
-
+ -->
## Single-node training
@@ -165,15 +165,6 @@ hide:
Deploy Llama 3.1 with vLLM
-
-
-
- ```yaml
- type: service
- name: amd-service-tgi
-
- # Using the official TGI's ROCm Docker image
- image: ghcr.io/huggingface/text-generation-inference:sha-a379d55-rocm
-
- env:
- - HF_TOKEN
- - MODEL_ID=meta-llama/Meta-Llama-3.1-70B-Instruct
- - TRUST_REMOTE_CODE=true
- - ROCM_USE_FLASH_ATTN_V2_TRITON=true
- commands:
- - text-generation-launcher --port 8000
- port: 8000
- # Register the model
- model: meta-llama/Meta-Llama-3.1-70B-Instruct
-
- # Uncomment to leverage spot instances
- #spot_policy: auto
-
- resources:
- gpu: MI300X
- disk: 150GB
- ```
-
-
-
+vLLM supports AMD GPUs. Here's an example of a [service](https://dstack.ai/docs/services) that deploys
+Llama 3.1 70B in FP16 using [vLLM](https://docs.vllm.ai/en/latest/getting_started/amd-installation.html).
=== "vLLM"
@@ -97,6 +64,7 @@ Llama 3.1 70B in FP16 using [TGI](https://huggingface.co/docs/text-generation-in
gpu: MI300X
disk: 200GB
```
+
Note, maximum size of vLLM’s `KV cache` is 126192, consequently we must set `MAX_MODEL_LEN` to 126192. Adding `/opt/conda/envs/py_3.10/bin` to PATH ensures we use the Python 3.10 environment necessary for the pre-built binaries compiled specifically for this version.
@@ -244,15 +212,13 @@ $ dstack apply -f examples/inference/vllm/amd/.dstack.yml
## Source code
The source-code of this example can be found in
-[`examples/inference/tgi/amd`](https://github.com/dstackai/dstack/blob/master/examples/inference/tgi/amd),
[`examples/inference/vllm/amd`](https://github.com/dstackai/dstack/blob/master/examples/inference/vllm/amd),
[`examples/single-node-training/axolotl/amd`](https://github.com/dstackai/dstack/blob/master/examples/single-node-training/axolotl/amd) and
[`examples/single-node-training/trl/amd`](https://github.com/dstackai/dstack/blob/master/examples/single-node-training/trl/amd)
## What's next?
-1. Browse [TGI](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/deploy-your-model.html#serving-using-hugging-face-tgi),
- [vLLM](https://docs.vllm.ai/en/latest/getting_started/amd-installation.html#build-from-source-rocm),
+1. Browse [vLLM](https://docs.vllm.ai/en/latest/getting_started/amd-installation.html#build-from-source-rocm),
[Axolotl](https://github.com/ROCm/rocm-blogs/tree/release/blogs/artificial-intelligence/axolotl),
[TRL](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/fine-tuning-and-inference.html) and
[ROCm Bitsandbytes](https://github.com/ROCm/bitsandbytes)
diff --git a/examples/accelerators/intel/README.md b/examples/accelerators/intel/README.md
deleted file mode 100644
index 0e2a629f2f..0000000000
--- a/examples/accelerators/intel/README.md
+++ /dev/null
@@ -1,193 +0,0 @@
----
-title: Intel Gaudi
-description: Deploying and fine-tuning models on Intel Gaudi accelerators using TGI, vLLM, and Optimum
----
-
-# Intel Gaudi
-
-`dstack` supports running dev environments, tasks, and services on Intel Gaudi GPUs via
-[SSH fleets](https://dstack.ai/docs/concepts/fleets#ssh-fleets).
-
-## Deployment
-
-Serving frameworks like vLLM and TGI have Intel Gaudi support. Here's an example of
-a service that deploys
-[`DeepSeek-R1-Distill-Llama-70B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B)
-using [TGI on Gaudi](https://github.com/huggingface/tgi-gaudi)
-and [vLLM](https://github.com/HabanaAI/vllm-fork).
-
-=== "TGI"
-
-
-
-## Fine-tuning
-
-Below is an example of LoRA fine-tuning of [`DeepSeek-R1-Distill-Qwen-7B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)
-using [Optimum for Intel Gaudi](https://github.com/huggingface/optimum-habana)
-and [DeepSpeed](https://docs.habana.ai/en/latest/PyTorch/DeepSpeed/DeepSpeed_User_Guide/DeepSpeed_User_Guide.html#deepspeed-user-guide) with
-the [`lvwerra/stack-exchange-paired`](https://huggingface.co/datasets/lvwerra/stack-exchange-paired) dataset.
-
-
-
-To finetune `DeepSeek-R1-Distill-Llama-70B` with eight Gaudi 2,
-you can partially offload parameters to CPU memory using the Deepspeed configuration file.
-For more details, refer to [parameter offloading](https://deepspeed.readthedocs.io/en/latest/zero3.html#deepspeedzerooffloadparamconfig).
-
-## Applying a configuration
-
-Once the configuration is ready, run `dstack apply -f `.
-
-
-
-```shell
-$ dstack apply -f examples/inference/vllm/.dstack.yml
-
- # BACKEND REGION RESOURCES SPOT PRICE
- 1 ssh remote 152xCPU,1007GB,8xGaudi2:96GB yes $0 idle
-
-Submit a new run? [y/n]: y
-
-Provisioning...
----> 100%
-```
-
-
-
-## Source code
-
-The source-code of this example can be found in
-[`examples/llms/deepseek/tgi/intel`](https://github.com/dstackai/dstack/blob/master/examples/llms/deepseek/tgi/intel),
-[`examples/llms/deepseek/vllm/intel`](https://github.com/dstackai/dstack/blob/master/examples/llms/deepseek/vllm/intel) and
-[`examples/llms/deepseek/trl/intel`](https://github.com/dstackai/dstack/blob/master/examples/llms/deepseek/trl/intel).
-
-!!! info "What's next?"
- 1. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/tasks), and [services](https://dstack.ai/docs/services).
- 2. See also [Intel Gaudi Documentation](https://docs.habana.ai/en/latest/index.html), [vLLM Inference with Gaudi](https://docs.habana.ai/en/latest/PyTorch/Inference_on_PyTorch/vLLM_Inference.html)
- and [Optimum for Gaudi examples](https://github.com/huggingface/optimum-habana/blob/main/examples/trl/README.md).
diff --git a/examples/inference/tgi/.dstack.yml b/examples/inference/tgi/.dstack.yml
deleted file mode 100644
index 67fe1179d4..0000000000
--- a/examples/inference/tgi/.dstack.yml
+++ /dev/null
@@ -1,32 +0,0 @@
-type: service
-name: llama4-scout
-
-image: ghcr.io/huggingface/text-generation-inference:latest
-
-env:
- - HF_TOKEN
- - MODEL_ID=meta-llama/Llama-4-Scout-17B-16E-Instruct
- - MAX_INPUT_LENGTH=8192
- - MAX_TOTAL_TOKENS=16384
- # max_batch_prefill_tokens must be >= max_input_tokens
- - MAX_BATCH_PREFILL_TOKENS=8192
-commands:
- # Activate the virtual environment at /usr/src/.venv/
- # as required by TGI's latest image.
- - . /usr/src/.venv/bin/activate
- - NUM_SHARD=$DSTACK_GPUS_NUM text-generation-launcher
-
-port: 80
-# Register the model
-model: meta-llama/Llama-4-Scout-17B-16E-Instruct
-
-# Uncomment to leverage spot instances
-#spot_policy: auto
-
-# Uncomment to cache downloaded models
-#volumes:
-# - /data:/data
-
-resources:
- gpu: H200:2
- disk: 500GB..
diff --git a/examples/inference/tgi/README.md b/examples/inference/tgi/README.md
deleted file mode 100644
index 08a1de74db..0000000000
--- a/examples/inference/tgi/README.md
+++ /dev/null
@@ -1,124 +0,0 @@
----
-title: HuggingFace TGI
-description: Deploying Llama 4 Scout using HuggingFace Text Generation Inference
----
-
-# HuggingFace TGI
-
-This example shows how to deploy Llama 4 Scout with `dstack` using [HuggingFace TGI](https://huggingface.co/docs/text-generation-inference/en/index).
-
-??? info "Prerequisites"
- Once `dstack` is [installed](https://dstack.ai/docs/installation), clone the repo with examples.
-
-
-
-## Deployment
-
-Here's an example of a service that deploys [`Llama-4-Scout-17B-16E-Instruct`](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct) using TGI.
-
-
-
-```yaml
-type: service
-name: llama4-scout
-
-image: ghcr.io/huggingface/text-generation-inference:latest
-
-env:
- - HF_TOKEN
- - MODEL_ID=meta-llama/Llama-4-Scout-17B-16E-Instruct
- - MAX_INPUT_LENGTH=8192
- - MAX_TOTAL_TOKENS=16384
- # max_batch_prefill_tokens must be >= max_input_tokens
- - MAX_BATCH_PREFILL_TOKENS=8192
-commands:
- # Activate the virtual environment at /usr/src/.venv/
- # as required by TGI's latest image.
- - . /usr/src/.venv/bin/activate
- - NUM_SHARD=$DSTACK_GPUS_NUM text-generation-launcher
-
-port: 80
-# Register the model
-model: meta-llama/Llama-4-Scout-17B-16E-Instruct
-
-# Uncomment to leverage spot instances
-#spot_policy: auto
-
-# Uncomment to cache downloaded models
-#volumes:
-# - /data:/data
-
-resources:
- gpu: H200:2
- disk: 500GB..
-```
-
-
-### Running a configuration
-
-To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/reference/cli/dstack/apply.md) command.
-
-
-
-```shell
-$ HF_TOKEN=...
-$ dstack apply -f examples/inference/tgi/.dstack.yml
-
- # BACKEND REGION RESOURCES SPOT PRICE
- 1 vastai is-iceland 48xCPU, 128GB, 2xH200 (140GB) no $7.87
- 2 runpod EU-SE-1 40xCPU, 128GB, 2xH200 (140GB) no $7.98
-
-Submit the run llama4-scout? [y/n]: y
-
-Provisioning...
----> 100%
-```
-
-
-If no gateway is created, the service endpoint will be available at `/proxy/services///`.
-
-
-
-When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://llama4-scout./`.
-
-## Source code
-
-The source-code of this example can be found in
-[`examples/inference/tgi`](https://github.com/dstackai/dstack/blob/master/examples/inference/tgi).
-
-## What's next?
-
-1. Check [services](https://dstack.ai/docs/services)
-2. Browse the [Llama](https://dstack.ai/examples/llms/llama/), [vLLM](https://dstack.ai/examples/inference/vllm/), [SgLang](https://dstack.ai/examples/inference/sglang/) and [NIM](https://dstack.ai/examples/inference/nim/) examples
-3. See also [AMD](https://dstack.ai/examples/accelerators/amd/) and
- [TPU](https://dstack.ai/examples/accelerators/tpu/)
diff --git a/examples/inference/tgi/amd/.dstack.yml b/examples/inference/tgi/amd/.dstack.yml
deleted file mode 100644
index 46c2239688..0000000000
--- a/examples/inference/tgi/amd/.dstack.yml
+++ /dev/null
@@ -1,21 +0,0 @@
-type: service
-name: amd-service-tgi
-
-image: ghcr.io/huggingface/text-generation-inference:sha-a379d55-rocm
-env:
- - HF_TOKEN
- - ROCM_USE_FLASH_ATTN_V2_TRITON=true
- - TRUST_REMOTE_CODE=true
- - MODEL_ID=meta-llama/Meta-Llama-3.1-70B-Instruct
-commands:
- - text-generation-launcher --port 8000
-port: 8000
-# Register the model
-model: meta-llama/Meta-Llama-3.1-70B-Instruct
-
-# Uncomment to leverage spot instances
-#spot_policy: auto
-
-resources:
- gpu: MI300X
- disk: 150GB
diff --git a/examples/inference/tgi/tpu/.dstack.yml b/examples/inference/tgi/tpu/.dstack.yml
deleted file mode 100644
index 42ba5ab7fd..0000000000
--- a/examples/inference/tgi/tpu/.dstack.yml
+++ /dev/null
@@ -1,27 +0,0 @@
-type: service
-# The name is optional, if not specified, generated randomly
-name: llama31-service-optimum-tpu
-
-# Using a Docker image with a fix instead of the official one
-# More details at https://github.com/huggingface/optimum-tpu/pull/92
-image: dstackai/optimum-tpu:llama31
-# Required environment variables
-env:
- - HF_TOKEN
- - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct
- - MAX_TOTAL_TOKENS=4096
- - MAX_BATCH_PREFILL_TOKENS=4095
-commands:
- - text-generation-launcher --port 8000
-port: 8000
-model:
- format: tgi
- type: chat
- name: meta-llama/Meta-Llama-3.1-8B-Instruct
-
-# Uncomment to leverage spot instances
-#spot_policy: auto
-
-resources:
- # Required resources
- gpu: v5litepod-4
diff --git a/examples/inference/vllm/README.md b/examples/inference/vllm/README.md
index 7af4e97989..ce77e31782 100644
--- a/examples/inference/vllm/README.md
+++ b/examples/inference/vllm/README.md
@@ -116,7 +116,7 @@ The source-code of this example can be found in
## What's next?
1. Check [services](https://dstack.ai/docs/services)
-2. Browse the [Llama 3.1](https://dstack.ai/examples/llms/llama31/), [TGI](https://dstack.ai/examples/inference/tgi/)
- and [NIM](https://dstack.ai/examples/inference/nim/) examples
+2. Browse the [Llama 3.1](https://dstack.ai/examples/llms/llama31/) and
+ [NIM](https://dstack.ai/examples/inference/nim/) examples
3. See also [AMD](https://dstack.ai/examples/accelerators/amd/) and
[TPU](https://dstack.ai/examples/accelerators/tpu/)
diff --git a/examples/llms/deepseek/README.md b/examples/llms/deepseek/README.md
index 41d73e9e99..ae467891fc 100644
--- a/examples/llms/deepseek/README.md
+++ b/examples/llms/deepseek/README.md
@@ -78,95 +78,6 @@ Here's an example of a service that deploys `Deepseek-R1-Distill-Llama-70B` usin
Note, when using `Deepseek-R1-Distill-Llama-70B` with `vLLM` with a 192GB GPU, we must limit the context size to 126432 tokens to fit the memory.
-### Intel Gaudi
-
-Here's an example of a service that deploys `Deepseek-R1-Distill-Llama-70B`
-using [TGI on Gaudi](https://github.com/huggingface/tgi-gaudi)
-and [vLLM](https://github.com/HabanaAI/vllm-fork) (Gaudi fork) with Intel Gaudi 2.
-
-> Both [TGI on Gaudi](https://github.com/huggingface/tgi-gaudi)
-> and [vLLM](https://github.com/HabanaAI/vllm-fork) do not support `Deepseek-V2-Lite`.
-> See [this](https://github.com/huggingface/tgi-gaudi/issues/271)
-> and [this](https://github.com/HabanaAI/vllm-fork/issues/809#issuecomment-2652454824) issues.
-
-=== "TGI"
-
-
-
### NVIDIA
Here's an example of a service that deploys `Deepseek-R1-Distill-Llama-8B`
@@ -241,7 +152,7 @@ Approximate memory requirements for loading the model (excluding context and CUD
| `DeepSeek-R1-Distill-Qwen` | **7B** | 16GB | 8GB | 4GB |
For example, the FP8 version of Deepseek-R1 671B fits on a single node of MI300X with eight 192GB GPUs, a single node of
-H200 with eight 141GB GPUs, or a single node of Intel Gaudi2 with eight 96GB GPUs.
+H200 with eight 141GB GPUs.
### Applying the configuration
@@ -400,65 +311,6 @@ Here are the examples of LoRA fine-tuning of `Deepseek-V2-Lite` and GRPO fine-tu
Note, the `GRPO` fine-tuning of `DeepSeek-R1-Distill-Qwen-1.5B` consumes up to 135GB of VRAM.
-### Intel Gaudi
-
-Here is an example of LoRA fine-tuning of `DeepSeek-R1-Distill-Qwen-7B` on Intel Gaudi 2 GPUs using
-HuggingFace's [Optimum for Intel Gaudi](https://github.com/huggingface/optimum-habana)
-and [DeepSpeed](https://github.com/deepspeedai/DeepSpeed). Both also support `LoRA`
-fine-tuning of `Deepseek-V2-Lite` with same configuration as below.
-
-=== "LoRA"
-
-