From bca1019a4fde3951cc37037f7b00f7f409675009 Mon Sep 17 00:00:00 2001
From: sibianl <sibianl@rings.net.cn>
Date: Fri, 7 Nov 2025 10:40:05 +0800
Subject: [PATCH 1/6] feat(model): add kimi k2 thinking model

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 49e4ee2e..700f7173 100644
--- a/README.md
+++ b/README.md
@@ -338,7 +338,7 @@ We warmly welcome contributions of all kinds! For guidelines on how to get invol
 |DeepSeek      | Deepseek     | [DeepSeek-V3.1](https://huggingface.co/collections/deepseek-ai/deepseek-v31) <br>[DeepSeek-R1](https://huggingface.co/collections/deepseek-ai/deepseek-r1) <br>[DeepSeek-V3](https://huggingface.co/collections/deepseek-ai/deepseek-v3) <br>[DeepSeek-V2](https://huggingface.co/collections/deepseek-ai/deepseek-v2) | [DeepSeek V3.1: The New Frontier in Artificial Intelligence](https://deepseek.ai/blog/deepseek-v31) | "DeepSeek" is an advanced large language model series from Deepseek AI, offering multiple generations such as DeepSeek-V3.1, DeepSeek-R1, DeepSeek-V2, and DeepSeek-V3. These models are designed for powerful natural language understanding and generation, with various sizes and capabilities for research and production use. |
 |MiniMax-M2    | MiniMax AI  | [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2) | [MiniMax M2 & Agent: Ingenious in Simplicity](https://www.minimax.io/news/minimax-m2) | MiniMax-M2 is a compact, fast, and cost-effective MoE model (230B parameters, 10B active) built for advanced coding and agentic workflows. It offers state-of-the-art intelligence and coding abilities, delivering efficient, reliable tool use and strong multi-step reasoning for developers and agents, with high throughput and low latency for easy deployment. |
 |GLM-4.6       | Z AI | [GLM-4.6](https://huggingface.co/zai-org/GLM-4.6) | [GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities](https://z.ai/blog/glm-4.6) | GLM-4.6 improves upon GLM-4.5 with a longer 200K token context window, stronger coding and reasoning performance, enhanced tool-use and agent integration, and refined writing quality. Outperforms previous versions and is highly competitive with leading open-source models across coding, reasoning, and agent benchmarks. |
-|Kimi-K2       | Moonshot AI  | [Kimi-K2](https://huggingface.co/collections/moonshotai/kimi-k2-6871243b990f2af5ba60617d) | [Kimi K2: Open Agentic Intelligence](https://moonshotai.github.io/Kimi-K2/) | "Kimi-K2" is Moonshot AI's Kimi-K2 model family, including Kimi-K2-Instruct and Kimi-K2-Instruct-0905. The models are designed for agentic intelligence and available in different versions and parameter sizes. |
+|Kimi-K2       | Moonshot AI  | [Kimi-K2](https://huggingface.co/collections/moonshotai/kimi-k2-6871243b990f2af5ba60617d) | [Kimi K2: Open Agentic Intelligence](https://moonshotai.github.io/Kimi-K2/) | "Kimi-K2" is Moonshot AI's Kimi-K2 model family, including Kimi-K2-Base, Kimi-K2-Instruct and Kimi-K2-Thinking. Kimi K2 Thinking is a state-of-the-art open-source agentic model designed for deep, step-by-step reasoning and dynamic tool use. It features native INT4 quantization and a 256k context window for fast, memory-efficient inference. Uniquely stable in long-horizon tasks, Kimi K2 enables reliable autonomous workflows with consistent performance across hundreds of tool calls.
 |Qwen          | Qwen         | [Qwen3-Next](https://huggingface.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d) <br>[Qwen3](https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f) <br>[Qwen2.5](https://huggingface.co/collections/Qwen/qwen25-66e81a666513e518adb90d9e)| [Qwen3-Next: Towards Ultimate Training & Inference Efficiency](https://qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb7d27cd&from=research.latest-advancements-list) | The Qwen series is a family of large language models developed by Alibaba's Qwen team. It includes multiple generations such as Qwen2.5, Qwen3, and Qwen3-Next, which improve upon model architecture, efficiency, and capabilities. The models are available in various sizes and instruction-tuned versions, with support for cutting-edge features like long context and quantization. Suitable for a wide range of language tasks and open-source use cases. |
 |gpt-oss       | OpenAI       | [gpt-oss](https://huggingface.co/collections/openai/gpt-oss-68911959590a1634ba11c7a4) | [Introducing gpt-oss](https://openai.com/index/introducing-gpt-oss/) | "gpt-oss" refers to OpenAI's open-source GPT models, including gpt-oss-20b and gpt-oss-120b. The number (e.g., 20b, 120b) indicates the parameter count (20 billion, 120 billion).  |
 |Meta Llama 3  | Meta         | [Meta Llama 3](https://huggingface.co/collections/meta-llama/meta-llama-3-66214712577ca38149ebb2b6) <br>[Llama 3.1](https://huggingface.co/collections/meta-llama/llama-31-669fc079a0c406a149a5738f) <br>[Llama 3.2](https://huggingface.co/collections/meta-llama/llama-32-66f448ffc8c32f949b04c8cf) <br>[Llama 3.3](https://huggingface.co/collections/meta-llama/llama-33-67531d5c405ec5d08a852000) | [Introducing Meta Llama 3: The most capable openly available LLM to date](https://ai.meta.com/blog/meta-llama-3/) | "Meta Llama 3" is Meta's third-generation Llama model, available in sizes such as 8B and 70B parameters. Includes instruction-tuned and quantized (e.g., FP8) variants. |

From eb746f99c284dc95e2e09c29f3f7f44d91c3d956 Mon Sep 17 00:00:00 2001
From: sibianl <sibianl@rings.net.cn>
Date: Mon, 10 Nov 2025 11:23:33 +0800
Subject: [PATCH 2/6] add gpt-oss-safeguard

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 700f7173..cdc011d8 100644
--- a/README.md
+++ b/README.md
@@ -340,5 +340,5 @@ We warmly welcome contributions of all kinds! For guidelines on how to get invol
 |GLM-4.6       | Z AI | [GLM-4.6](https://huggingface.co/zai-org/GLM-4.6) | [GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities](https://z.ai/blog/glm-4.6) | GLM-4.6 improves upon GLM-4.5 with a longer 200K token context window, stronger coding and reasoning performance, enhanced tool-use and agent integration, and refined writing quality. Outperforms previous versions and is highly competitive with leading open-source models across coding, reasoning, and agent benchmarks. |
 |Kimi-K2       | Moonshot AI  | [Kimi-K2](https://huggingface.co/collections/moonshotai/kimi-k2-6871243b990f2af5ba60617d) | [Kimi K2: Open Agentic Intelligence](https://moonshotai.github.io/Kimi-K2/) | "Kimi-K2" is Moonshot AI's Kimi-K2 model family, including Kimi-K2-Base, Kimi-K2-Instruct and Kimi-K2-Thinking. Kimi K2 Thinking is a state-of-the-art open-source agentic model designed for deep, step-by-step reasoning and dynamic tool use. It features native INT4 quantization and a 256k context window for fast, memory-efficient inference. Uniquely stable in long-horizon tasks, Kimi K2 enables reliable autonomous workflows with consistent performance across hundreds of tool calls.
 |Qwen          | Qwen         | [Qwen3-Next](https://huggingface.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d) <br>[Qwen3](https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f) <br>[Qwen2.5](https://huggingface.co/collections/Qwen/qwen25-66e81a666513e518adb90d9e)| [Qwen3-Next: Towards Ultimate Training & Inference Efficiency](https://qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb7d27cd&from=research.latest-advancements-list) | The Qwen series is a family of large language models developed by Alibaba's Qwen team. It includes multiple generations such as Qwen2.5, Qwen3, and Qwen3-Next, which improve upon model architecture, efficiency, and capabilities. The models are available in various sizes and instruction-tuned versions, with support for cutting-edge features like long context and quantization. Suitable for a wide range of language tasks and open-source use cases. |
-|gpt-oss       | OpenAI       | [gpt-oss](https://huggingface.co/collections/openai/gpt-oss-68911959590a1634ba11c7a4) | [Introducing gpt-oss](https://openai.com/index/introducing-gpt-oss/) | "gpt-oss" refers to OpenAI's open-source GPT models, including gpt-oss-20b and gpt-oss-120b. The number (e.g., 20b, 120b) indicates the parameter count (20 billion, 120 billion).  |
+|gpt-oss       | OpenAI       | [gpt-oss](https://huggingface.co/collections/openai/gpt-oss-68911959590a1634ba11c7a4) <br>[gpt-oss-safeguard](https://huggingface.co/collections/openai/gpt-oss-safeguard) | [Introducing gpt-oss-safeguard](https://openai.com/index/introducing-gpt-oss-safeguard/) | gpt-oss are OpenAI’s open-weight GPT models (20B & 120B). The gpt-oss-safeguard variants are reasoning-based safety classification models: developers provide their own policy at inference, and the model uses chain-of-thought to classify content and explain its reasoning. This allows flexible, policy-driven moderation in complex or evolving domains, with open weights under Apache 2.0. |
 |Meta Llama 3  | Meta         | [Meta Llama 3](https://huggingface.co/collections/meta-llama/meta-llama-3-66214712577ca38149ebb2b6) <br>[Llama 3.1](https://huggingface.co/collections/meta-llama/llama-31-669fc079a0c406a149a5738f) <br>[Llama 3.2](https://huggingface.co/collections/meta-llama/llama-32-66f448ffc8c32f949b04c8cf) <br>[Llama 3.3](https://huggingface.co/collections/meta-llama/llama-33-67531d5c405ec5d08a852000) | [Introducing Meta Llama 3: The most capable openly available LLM to date](https://ai.meta.com/blog/meta-llama-3/) | "Meta Llama 3" is Meta's third-generation Llama model, available in sizes such as 8B and 70B parameters. Includes instruction-tuned and quantized (e.g., FP8) variants. |

From b8380b85c88b5d3d508034dd86acb4685e5c6ebb Mon Sep 17 00:00:00 2001
From: sibianl <sibianl@rings.net.cn>
Date: Mon, 10 Nov 2025 18:04:21 +0800
Subject: [PATCH 3/6] tmp

---
 README.md                      | 285 +--------------------------------
 docs/user_guide/install.md     |  98 ++++++++++++
 docs/user_guide/quick_start.md | 185 +++++++++++++++++++++
 3 files changed, 286 insertions(+), 282 deletions(-)
 create mode 100644 docs/user_guide/install.md
 create mode 100644 docs/user_guide/quick_start.md

diff --git a/README.md b/README.md
index cdc011d8..80e6361c 100644
--- a/README.md
+++ b/README.md
@@ -43,289 +43,10 @@ The backend architecture:
 * GPU backend powered by [SGLang](https://github.com/sgl-project/sglang)
 * MAC backend powered by [MLX LM](https://github.com/ml-explore/mlx-lm)
 
-## Installation
+## User Guide
 
-### Prerequisites
-- Python>=3.11.0,<3.14.0
-- Ubuntu-24.04 for Blackwell GPUs
-
-Below are installation methods for different operating systems.
-
-|  Operating System  |  Windows App  |  From Source | Docker |
-|:-------------|:----------------------------:|:----------------------------:|:----------------------------:|
-|Windows       | ✅️ | Not recommended | Not recommended |
-|Linux | ❌️ | ✅️ | ✅️ |
-|macOS | ❌️ | ✅️ | ❌️ |
-
-### From Source
-- For Linux/WSL (GPU):
-```sh
-git clone https://github.com/GradientHQ/parallax.git
-cd parallax
-pip install -e '.[gpu]'
-```
-
-- For macOS (Apple silicon):
-
-We recommend macOS users to create an isolated Python virtual environment before installation.
-
-```sh
-git clone https://github.com/GradientHQ/parallax.git
-cd parallax
-
-# Enter Python virtual environment
-python3 -m venv ./venv
-source ./venv/bin/activate
-
-pip install -e '.[mac]'
-```
-
-Next time to re-activate this virtual environment, run ```source ./venv/bin/activate```.
-
-- Extra step for development:
-```sh
-pip install -e '.[dev]'
-```
-
-**Note for macOS users regarding network permissions**
-
-On macOS, you need to allow your terminal or IDE (such as Terminal, iTerm2, VS Code, Cursor, etc.) access to the local network in order for Parallax to work correctly. If the application prompts you for network access the first time you run Parallax, click "Allow." If you have already denied access, follow these steps to enable it:
-
-1. Open System Settings from the Apple menu.
-2. Click on Privacy & Security in the sidebar.
-3. Click on Local Network.
-4. For each app listed, turn the ability to access your local network on or off using the toggle switch.
-
-This will ensure Parallax has the proper network permissions for local communication.
-
-
-### Windows Application
-[Click here](https://github.com/GradientHQ/parallax_win_cli/releases/latest/download/Parallax_Win_Setup.exe) to get latest Windows installer.
-
-After installing .exe, right click Windows start button and click ```Windows Terminal(Admin)``` to start a Powershell console as administrator.
-
-❗ Make sure you open your terminal with administrator privileges.
-#### Ways to run Windows Terminal as administrator
-
-- Start menu: Right‑click Start and choose “Windows Terminal (Admin)”, or search “Windows Terminal”, right‑click the result, and select “Run as administrator”.
-- Run dialog: Press Win+R → type `wt` → press Ctrl+Shift+Enter.
-- Task Manager: Press Ctrl+Shift+Esc → File → Run new task → enter `wt` → check “Create this task with administrator privileges”.
-- File Explorer: Open the target folder → hold Ctrl+Shift → right‑click in the folder → select “Open in Terminal”.
-
-Start Windows dependencies installation by simply typing this command in console:
-```sh
-parallax install
-```
-
-Installation process may take around 30 minutes.
-
-To see a description of all Parallax Windows configurations you can do:
-```sh
-parallax --help
-```
-
-### Docker
-For Linux+GPU devices, Parallax provides a docker environment for quick setup. Choose the docker image according to the device's GPU architechture.
-
-|  GPU Architecture  |  GPU Series  | Image Pull Command |
-|:-------------|:----------------------------|:----------------------------|
-|Blackwell       | RTX50 series/B100/B200... |```docker pull gradientservice/parallax:latest-blackwell```|
-|Ampere/Hopper | RTX30 series/RTX40 series/A100/H100... |```docker pull gradientservice/parallax:latest-hopper```|
-
-Run a docker container as below. Please note that generally the argument ```--gpus all``` is necessary for the docker to run on GPUs.
-```sh
-# For Blackwell
-docker run -it --gpus all --network host gradientservice/parallax:latest-blackwell bash
-# For Ampere/Hopper
-docker run -it --gpus all --network host gradientservice/parallax:latest-hopper bash
-```
-The container starts under parallax workspace and you should be able to run parallax directly.
-
-## Getting Started
-
-We will walk through you the easiest way to quickly set up your own AI cluster
-
-### With Frontend
-
-#### Step 1: Launch scheduler
-
-First launch our scheduler on the main node, we recommend you to use your most convenient computer for this.
-- For Linux/macOS:
-```sh
-parallax run
-```
-
-- For Windows, start Powershell console as administrator and run:
-```sh
-parallax run
-```
-
-To allow the API to be accessible from other machines, add the argument `--host 0.0.0.0` when launching scheduler.
-```sh
-parallax run --host 0.0.0.0
-```
-
-When running `parallax run` for the first time or after an update, the code version info might be sent to help improve the project. To disable this, use the `-u` flag:
-```sh
-parallax run -u
-```
-
-#### Step 2: Set cluster and model config
-
-Open http://localhost:3001 and you should see the setup interface.
-
-![Model select](docs/images/node_config.png)
-
-Select your desired node and model config and click continue.
-
-> **Note:**
-When running in remote mode, Parallax will use a public relay server to help establish connections between the scheduler and nodes. The public relay server will receive the IP information of both the scheduler and the nodes in order to facilitate this connection.
-
-#### Step 3: Connect your nodes
-
-Copy the generated join command line to your node and run. For remote connection, you can find your scheduler-address in the scheduler logs.
-
-```sh
-# local area network env
-parallax join
-# public network env
-parallax join -s {scheduler-address}
-# example
-parallax join -s 12D3KooWLX7MWuzi1Txa5LyZS4eTQ2tPaJijheH8faHggB9SxnBu
-```
-
-![Node join](docs/images/node_join.png)
-
-You should see your nodes start to show up with their status. Wait until all nodes are successfully connected, and you will automatically be directed to the chat interface.
-
-When running `parallax join` for the first time or after an update, some basic info (like version and gpu name) might be sent to help improve the project. To disable this, use the `-u` flag:
-```sh
-parallax join -u
-```
-
-#### Step 4: Chat
-
-Done! You have your own AI cluster now.
-
-![Chat](docs/images/chat_interface.png)
-
-#### Accessing the chat interface from another non-scheduler computer
-
-You can access the chat interface from any non-scheduler computer, not just those running a node server. Simply start the chat server with:
-
-```sh
-# local area network env
-parallax chat
-# public network env
-parallax chat -s {scheduler-address}
-# example
-parallax chat -s 12D3KooWLX7MWuzi1Txa5LyZS4eTQ2tPaJijheH8faHggB9SxnBu
-```
-
-After launching, visit [http://localhost:3002](http://localhost:3002) in your browser to use the chat interface.
-
-To allow the API to be accessible from other machines, add the argument `--host 0.0.0.0` when launching chat interface.
-```sh
-parallax chat --host 0.0.0.0
-```
-
-### Without frontend
-#### Step 1: Launch scheduler
-First launch our scheduler on the main node.
-```sh
-parallax run -m {model-name} -n {number-of-worker-nodes}
-```
-For example:
-```sh
-parallax run -m Qwen/Qwen3-0.6B -n 2
-```
-Please notice and record the scheduler ip4 address generated in the terminal.
-
-#### Step 2: Connect your nodes
-For each distributed nodes including the main node, open a terminal and join the server with the scheduler address.
-```sh
-# local area network env
-parallax join
-# public network env
-parallax join -s {scheduler-address}
-```
-For example:
-```sh
-# first node
-parallax join -s 12D3KooWLX7MWuzi1Txa5LyZS4eTQ2tPaJijheH8faHggB9SxnBu
-# second node
-parallax join -s 12D3KooWLX7MWuzi1Txa5LyZS4eTQ2tPaJijheH8faHggB9SxnBu
-```
-
-#### Step 3: Call chat api with Scheduler
-```sh
-curl --location 'http://localhost:3001/v1/chat/completions' --header 'Content-Type: application/json' --data '{
-    "max_tokens": 1024,
-    "messages": [
-      {
-        "role": "user",
-        "content": "hello"
-      }
-    ],
-    "stream": true
-}'
-```
-
-### Skipping Scheduler
-Developers can start Parallax backend engine without a scheduler. Pipeline parallel start/end layers should be set manually.
-An example of serving Qwen3-0.6B with 2-nodes:
-- First node:
-```sh
-python3 ./parallax/src/parallax/launch.py \
---model-path Qwen/Qwen3-0.6B \
---port 3000 \
---max-batch-size 8 \
---start-layer 0 \
---end-layer 14
-```
-- Second node:
-```sh
-python3 ./parallax/src/parallax/launch.py \
---model-path Qwen/Qwen3-0.6B \
---port 3000 \
---max-batch-size 8 \
---start-layer 14 \
---end-layer 28
-```
-
-Call chat API on one of the nodes:
-```sh
-curl --location 'http://localhost:3000/v1/chat/completions' --header 'Content-Type: application/json' --data '{
-    "max_tokens": 1024,
-    "messages": [
-      {
-        "role": "user",
-        "content": "hello"
-      }
-    ],
-    "stream": true
-}'
-
-```
-### Uninstalling Parallax
-
-For macOS or Linux, if you've installed Parallax via pip and want to uninstall it, you can use the following command:
-
-```sh
-pip uninstall parallax
-```
-
-For Docker installations, remove Parallax images and containers using standard Docker commands:
-
-```sh
-docker ps -a               # List running containers
-docker stop <container_id> # Stop running containers
-docker rm <container_id>   # Remove stopped containers
-docker images              # List Docker images
-docker rmi <image_id>      # Remove Parallax images
-```
-
-For Windows, simply go to Control Panel → Programs → Uninstall a program, find "Gradient" in the list, and uninstall it.
+- [Installation](./docs/user_guide/install.md)
+- [Getting Started](./docs/user_guide/quick_start.md)
 
 ## Contributing
 
diff --git a/docs/user_guide/install.md b/docs/user_guide/install.md
new file mode 100644
index 00000000..52882785
--- /dev/null
+++ b/docs/user_guide/install.md
@@ -0,0 +1,98 @@
+
+## Installation
+
+### Prerequisites
+- Python>=3.11.0,<3.14.0
+- Ubuntu-24.04 for Blackwell GPUs
+
+Below are installation methods for different operating systems.
+
+|  Operating System  |  Windows App  |  From Source | Docker |
+|:-------------|:----------------------------:|:----------------------------:|:----------------------------:|
+|Windows       | ✅️ | Not recommended | Not recommended |
+|Linux | ❌️ | ✅️ | ✅️ |
+|macOS | ❌️ | ✅️ | ❌️ |
+
+### From Source
+- For Linux/WSL (GPU):
+```sh
+git clone https://github.com/GradientHQ/parallax.git
+cd parallax
+pip install -e '.[gpu]'
+```
+
+- For macOS (Apple silicon):
+
+We recommend macOS users to create an isolated Python virtual environment before installation.
+
+```sh
+git clone https://github.com/GradientHQ/parallax.git
+cd parallax
+
+# Enter Python virtual environment
+python3 -m venv ./venv
+source ./venv/bin/activate
+
+pip install -e '.[mac]'
+```
+
+Next time to re-activate this virtual environment, run ```source ./venv/bin/activate```.
+
+- Extra step for development:
+```sh
+pip install -e '.[dev]'
+```
+
+**Note for macOS users regarding network permissions**
+
+On macOS, you need to allow your terminal or IDE (such as Terminal, iTerm2, VS Code, Cursor, etc.) access to the local network in order for Parallax to work correctly. If the application prompts you for network access the first time you run Parallax, click "Allow." If you have already denied access, follow these steps to enable it:
+
+1. Open System Settings from the Apple menu.
+2. Click on Privacy & Security in the sidebar.
+3. Click on Local Network.
+4. For each app listed, turn the ability to access your local network on or off using the toggle switch.
+
+This will ensure Parallax has the proper network permissions for local communication.
+
+
+### Windows Application
+[Click here](https://github.com/GradientHQ/parallax_win_cli/releases/latest/download/Parallax_Win_Setup.exe) to get latest Windows installer.
+
+After installing .exe, right click Windows start button and click ```Windows Terminal(Admin)``` to start a Powershell console as administrator.
+
+❗ Make sure you open your terminal with administrator privileges.
+#### Ways to run Windows Terminal as administrator
+
+- Start menu: Right‑click Start and choose “Windows Terminal (Admin)”, or search “Windows Terminal”, right‑click the result, and select “Run as administrator”.
+- Run dialog: Press Win+R → type `wt` → press Ctrl+Shift+Enter.
+- Task Manager: Press Ctrl+Shift+Esc → File → Run new task → enter `wt` → check “Create this task with administrator privileges”.
+- File Explorer: Open the target folder → hold Ctrl+Shift → right‑click in the folder → select “Open in Terminal”.
+
+Start Windows dependencies installation by simply typing this command in console:
+```sh
+parallax install
+```
+
+Installation process may take around 30 minutes.
+
+To see a description of all Parallax Windows configurations you can do:
+```sh
+parallax --help
+```
+
+### Docker
+For Linux+GPU devices, Parallax provides a docker environment for quick setup. Choose the docker image according to the device's GPU architechture.
+
+|  GPU Architecture  |  GPU Series  | Image Pull Command |
+|:-------------|:----------------------------|:----------------------------|
+|Blackwell       | RTX50 series/B100/B200... |```docker pull gradientservice/parallax:latest-blackwell```|
+|Ampere/Hopper | RTX30 series/RTX40 series/A100/H100... |```docker pull gradientservice/parallax:latest-hopper```|
+
+Run a docker container as below. Please note that generally the argument ```--gpus all``` is necessary for the docker to run on GPUs.
+```sh
+# For Blackwell
+docker run -it --gpus all --network host gradientservice/parallax:latest-blackwell bash
+# For Ampere/Hopper
+docker run -it --gpus all --network host gradientservice/parallax:latest-hopper bash
+```
+The container starts under parallax workspace and you should be able to run parallax directly.
diff --git a/docs/user_guide/quick_start.md b/docs/user_guide/quick_start.md
new file mode 100644
index 00000000..b02ce154
--- /dev/null
+++ b/docs/user_guide/quick_start.md
@@ -0,0 +1,185 @@
+## Getting Started
+
+We will walk through you the easiest way to quickly set up your own AI cluster
+
+### With Frontend
+
+#### Step 1: Launch scheduler
+
+First launch our scheduler on the main node, we recommend you to use your most convenient computer for this.
+- For Linux/macOS:
+```sh
+parallax run
+```
+
+- For Windows, start Powershell console as administrator and run:
+```sh
+parallax run
+```
+
+To allow the API to be accessible from other machines, add the argument `--host 0.0.0.0` when launching scheduler.
+```sh
+parallax run --host 0.0.0.0
+```
+
+When running `parallax run` for the first time or after an update, some basic info (like version and gpu name) might be sent to help improve the project. To disable this, use the `-u` flag:
+```sh
+parallax run -u
+```
+
+#### Step 2: Set cluster and model config
+
+Open http://localhost:3001 and you should see the setup interface.
+
+![Model select](docs/images/node_config.png)
+
+Select your desired node and model config and click continue.
+
+> **Note:**
+When running in remote mode, Parallax will use a public relay server to help establish connections between the scheduler and nodes. The public relay server will receive the IP information of both the scheduler and the nodes in order to facilitate this connection.
+
+#### Step 3: Connect your nodes
+
+Copy the generated join command line to your node and run. For remote connection, you can find your scheduler-address in the scheduler logs.
+
+```sh
+# local area network env
+parallax join
+# public network env
+parallax join -s {scheduler-address}
+# example
+parallax join -s 12D3KooWLX7MWuzi1Txa5LyZS4eTQ2tPaJijheH8faHggB9SxnBu
+```
+
+![Node join](docs/images/node_join.png)
+
+You should see your nodes start to show up with their status. Wait until all nodes are successfully connected, and you will automatically be directed to the chat interface.
+
+When running `parallax join` for the first time or after an update, some basic info (like version and gpu name) might be sent to help improve the project. To disable this, use the `-u` flag:
+```sh
+parallax join -u
+```
+
+#### Step 4: Chat
+
+Done! You have your own AI cluster now.
+
+![Chat](docs/images/chat_interface.png)
+
+#### Accessing the chat interface from another non-scheduler computer
+
+You can access the chat interface from any non-scheduler computer, not just those running a node server. Simply start the chat server with:
+
+```sh
+# local area network env
+parallax chat
+# public network env
+parallax chat -s {scheduler-address}
+# example
+parallax chat -s 12D3KooWLX7MWuzi1Txa5LyZS4eTQ2tPaJijheH8faHggB9SxnBu
+```
+
+After launching, visit [http://localhost:3002](http://localhost:3002) in your browser to use the chat interface.
+
+To allow the API to be accessible from other machines, add the argument `--host 0.0.0.0` when launching chat interface.
+```sh
+parallax chat --host 0.0.0.0
+```
+
+### Without frontend
+#### Step 1: Launch scheduler
+First launch our scheduler on the main node.
+```sh
+parallax run -m {model-name} -n {number-of-worker-nodes}
+```
+For example:
+```sh
+parallax run -m Qwen/Qwen3-0.6B -n 2
+```
+Please notice and record the scheduler ip4 address generated in the terminal.
+
+#### Step 2: Connect your nodes
+For each distributed nodes including the main node, open a terminal and join the server with the scheduler address.
+```sh
+# local area network env
+parallax join
+# public network env
+parallax join -s {scheduler-address}
+```
+For example:
+```sh
+# first node
+parallax join -s 12D3KooWLX7MWuzi1Txa5LyZS4eTQ2tPaJijheH8faHggB9SxnBu
+# second node
+parallax join -s 12D3KooWLX7MWuzi1Txa5LyZS4eTQ2tPaJijheH8faHggB9SxnBu
+```
+
+#### Step 3: Call chat api with Scheduler
+```sh
+curl --location 'http://localhost:3001/v1/chat/completions' --header 'Content-Type: application/json' --data '{
+    "max_tokens": 1024,
+    "messages": [
+      {
+        "role": "user",
+        "content": "hello"
+      }
+    ],
+    "stream": true
+}'
+```
+
+### Skipping Scheduler
+Developers can start Parallax backend engine without a scheduler. Pipeline parallel start/end layers should be set manually.
+An example of serving Qwen3-0.6B with 2-nodes:
+- First node:
+```sh
+python3 ./parallax/src/parallax/launch.py \
+--model-path Qwen/Qwen3-0.6B \
+--port 3000 \
+--max-batch-size 8 \
+--start-layer 0 \
+--end-layer 14
+```
+- Second node:
+```sh
+python3 ./parallax/src/parallax/launch.py \
+--model-path Qwen/Qwen3-0.6B \
+--port 3000 \
+--max-batch-size 8 \
+--start-layer 14 \
+--end-layer 28
+```
+
+Call chat API on one of the nodes:
+```sh
+curl --location 'http://localhost:3000/v1/chat/completions' --header 'Content-Type: application/json' --data '{
+    "max_tokens": 1024,
+    "messages": [
+      {
+        "role": "user",
+        "content": "hello"
+      }
+    ],
+    "stream": true
+}'
+
+```
+### Uninstalling Parallax
+
+For macOS or Linux, if you've installed Parallax via pip and want to uninstall it, you can use the following command:
+
+```sh
+pip uninstall parallax
+```
+
+For Docker installations, remove Parallax images and containers using standard Docker commands:
+
+```sh
+docker ps -a               # List running containers
+docker stop <container_id> # Stop running containers
+docker rm <container_id>   # Remove stopped containers
+docker images              # List Docker images
+docker rmi <image_id>      # Remove Parallax images
+```
+
+For Windows, simply go to Control Panel → Programs → Uninstall a program, find "Gradient" in the list, and uninstall it.

From 222544a2ba1cff7b2d6e68d66245a6a6dedf0832 Mon Sep 17 00:00:00 2001
From: sibianl <sibianl@rings.net.cn>
Date: Mon, 10 Nov 2025 19:25:50 +0800
Subject: [PATCH 4/6] fix

---
 docs/user_guide/quick_start.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/user_guide/quick_start.md b/docs/user_guide/quick_start.md
index b02ce154..b4dcca54 100644
--- a/docs/user_guide/quick_start.md
+++ b/docs/user_guide/quick_start.md
@@ -31,7 +31,7 @@ parallax run -u
 
 Open http://localhost:3001 and you should see the setup interface.
 
-![Model select](docs/images/node_config.png)
+![Model select](../images/node_config.png)
 
 Select your desired node and model config and click continue.
 
@@ -51,7 +51,7 @@ parallax join -s {scheduler-address}
 parallax join -s 12D3KooWLX7MWuzi1Txa5LyZS4eTQ2tPaJijheH8faHggB9SxnBu
 ```
 
-![Node join](docs/images/node_join.png)
+![Node join](../images/node_join.png)
 
 You should see your nodes start to show up with their status. Wait until all nodes are successfully connected, and you will automatically be directed to the chat interface.
 
@@ -64,7 +64,7 @@ parallax join -u
 
 Done! You have your own AI cluster now.
 
-![Chat](docs/images/chat_interface.png)
+![Chat](../images/chat_interface.png)
 
 #### Accessing the chat interface from another non-scheduler computer
 

From 85ead2806cbcc2dc821fb68bb03121bac21559c8 Mon Sep 17 00:00:00 2001
From: sibianl <sibianl@rings.net.cn>
Date: Mon, 10 Nov 2025 20:15:59 +0800
Subject: [PATCH 5/6] fix

---
 docs/user_guide/install.md     | 48 +++++++++++++++++++++++++---------
 docs/user_guide/quick_start.md | 27 ++++---------------
 2 files changed, 41 insertions(+), 34 deletions(-)

diff --git a/docs/user_guide/install.md b/docs/user_guide/install.md
index 52882785..a688ae39 100644
--- a/docs/user_guide/install.md
+++ b/docs/user_guide/install.md
@@ -14,14 +14,14 @@ Below are installation methods for different operating systems.
 |macOS | ❌️ | ✅️ | ❌️ |
 
 ### From Source
-- For Linux/WSL (GPU):
+#### For Linux/WSL (GPU):
 ```sh
 git clone https://github.com/GradientHQ/parallax.git
 cd parallax
 pip install -e '.[gpu]'
 ```
 
-- For macOS (Apple silicon):
+#### For macOS (Apple silicon):
 
 We recommend macOS users to create an isolated Python virtual environment before installation.
 
@@ -38,12 +38,8 @@ pip install -e '.[mac]'
 
 Next time to re-activate this virtual environment, run ```source ./venv/bin/activate```.
 
-- Extra step for development:
-```sh
-pip install -e '.[dev]'
-```
-
-**Note for macOS users regarding network permissions**
+<details>
+<summary> Note for macOS users regarding network permissions</summary>
 
 On macOS, you need to allow your terminal or IDE (such as Terminal, iTerm2, VS Code, Cursor, etc.) access to the local network in order for Parallax to work correctly. If the application prompts you for network access the first time you run Parallax, click "Allow." If you have already denied access, follow these steps to enable it:
 
@@ -53,7 +49,12 @@ On macOS, you need to allow your terminal or IDE (such as Terminal, iTerm2, VS C
 4. For each app listed, turn the ability to access your local network on or off using the toggle switch.
 
 This will ensure Parallax has the proper network permissions for local communication.
+</details>
 
+#### Extra step for development:
+```sh
+pip install -e '.[dev]'
+```
 
 ### Windows Application
 [Click here](https://github.com/GradientHQ/parallax_win_cli/releases/latest/download/Parallax_Win_Setup.exe) to get latest Windows installer.
@@ -61,12 +62,15 @@ This will ensure Parallax has the proper network permissions for local communica
 After installing .exe, right click Windows start button and click ```Windows Terminal(Admin)``` to start a Powershell console as administrator.
 
 ❗ Make sure you open your terminal with administrator privileges.
-#### Ways to run Windows Terminal as administrator
+<details>
+<summary>Ways to run Windows Terminal as administrator</summary>
 
-- Start menu: Right‑click Start and choose “Windows Terminal (Admin)”, or search “Windows Terminal”, right‑click the result, and select “Run as administrator”.
+- Start menu: Right‑click Start and choose "Windows Terminal (Admin)", or search "Windows Terminal", right‑click the result, and select "Run as administrator".
 - Run dialog: Press Win+R → type `wt` → press Ctrl+Shift+Enter.
-- Task Manager: Press Ctrl+Shift+Esc → File → Run new task → enter `wt` → check “Create this task with administrator privileges”.
-- File Explorer: Open the target folder → hold Ctrl+Shift → right‑click in the folder → select “Open in Terminal”.
+- Task Manager: Press Ctrl+Shift+Esc → File → Run new task → enter `wt` → check "Create this task with administrator privileges".
+- File Explorer: Open the target folder → hold Ctrl+Shift → right‑click in the folder → select "Open in Terminal".
+</details>
+<br>
 
 Start Windows dependencies installation by simply typing this command in console:
 ```sh
@@ -96,3 +100,23 @@ docker run -it --gpus all --network host gradientservice/parallax:latest-blackwe
 docker run -it --gpus all --network host gradientservice/parallax:latest-hopper bash
 ```
 The container starts under parallax workspace and you should be able to run parallax directly.
+
+### Uninstalling Parallax
+
+For macOS or Linux, if you've installed Parallax via pip and want to uninstall it, you can use the following command:
+
+```sh
+pip uninstall parallax
+```
+
+For Docker installations, remove Parallax images and containers using standard Docker commands:
+
+```sh
+docker ps -a               # List running containers
+docker stop <container_id> # Stop running containers
+docker rm <container_id>   # Remove stopped containers
+docker images              # List Docker images
+docker rmi <image_id>      # Remove Parallax images
+```
+
+For Windows, simply go to Control Panel → Programs → Uninstall a program, find "Gradient" in the list, and uninstall it.
diff --git a/docs/user_guide/quick_start.md b/docs/user_guide/quick_start.md
index b4dcca54..96b4e0a5 100644
--- a/docs/user_guide/quick_start.md
+++ b/docs/user_guide/quick_start.md
@@ -1,6 +1,8 @@
 ## Getting Started
 
-We will walk through you the easiest way to quickly set up your own AI cluster
+We will walk through you the easiest way to quickly set up your own AI cluster.
+
+If you have not installed Parallax yet, please refer to the [installation guide](./install.md) and follow the instructions.
 
 ### With Frontend
 
@@ -22,7 +24,7 @@ To allow the API to be accessible from other machines, add the argument `--host
 parallax run --host 0.0.0.0
 ```
 
-When running `parallax run` for the first time or after an update, some basic info (like version and gpu name) might be sent to help improve the project. To disable this, use the `-u` flag:
+When running `parallax run` for the first time or after an update, the code version info might be sent to help improve the project. To disable this, use the `-u` flag:
 ```sh
 parallax run -u
 ```
@@ -55,7 +57,7 @@ parallax join -s 12D3KooWLX7MWuzi1Txa5LyZS4eTQ2tPaJijheH8faHggB9SxnBu
 
 You should see your nodes start to show up with their status. Wait until all nodes are successfully connected, and you will automatically be directed to the chat interface.
 
-When running `parallax join` for the first time or after an update, some basic info (like version and gpu name) might be sent to help improve the project. To disable this, use the `-u` flag:
+When running `parallax join` for the first time or after an update, the code version info might be sent to help improve the project. To disable this, use the `-u` flag:
 ```sh
 parallax join -u
 ```
@@ -164,22 +166,3 @@ curl --location 'http://localhost:3000/v1/chat/completions' --header 'Content-Ty
 }'
 
 ```
-### Uninstalling Parallax
-
-For macOS or Linux, if you've installed Parallax via pip and want to uninstall it, you can use the following command:
-
-```sh
-pip uninstall parallax
-```
-
-For Docker installations, remove Parallax images and containers using standard Docker commands:
-
-```sh
-docker ps -a               # List running containers
-docker stop <container_id> # Stop running containers
-docker rm <container_id>   # Remove stopped containers
-docker images              # List Docker images
-docker rmi <image_id>      # Remove Parallax images
-```
-
-For Windows, simply go to Control Panel → Programs → Uninstall a program, find "Gradient" in the list, and uninstall it.

From fff71599b383051457e8c751d8e8c5c8999d5066 Mon Sep 17 00:00:00 2001
From: sibianl <sibianl@rings.net.cn>
Date: Mon, 10 Nov 2025 22:30:43 +0800
Subject: [PATCH 6/6] add disable think note

---
 docs/user_guide/quick_start.md | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/docs/user_guide/quick_start.md b/docs/user_guide/quick_start.md
index 96b4e0a5..526ba27c 100644
--- a/docs/user_guide/quick_start.md
+++ b/docs/user_guide/quick_start.md
@@ -130,6 +130,9 @@ curl --location 'http://localhost:3001/v1/chat/completions' --header 'Content-Ty
 }'
 ```
 
+> **Note:**
+For models such as Qwen3 and gpt-oss, the "reasoning" (or "thinking") feature is enabled by default. To disable it, add `"chat_template_kwargs": {"enable_thinking": false}` to your request payload.
+
 ### Skipping Scheduler
 Developers can start Parallax backend engine without a scheduler. Pipeline parallel start/end layers should be set manually.
 An example of serving Qwen3-0.6B with 2-nodes: