InternLM · pppppM · Dec 11, 2023 · Dec 11, 2023 · Dec 11, 2023 · Dec 11, 2023
diff --git a/README.md b/README.md
@@ -5,6 +5,7 @@
 [![license](https://img.shields.io/github/license/InternLM/xtuner.svg)](https://github.com/InternLM/xtuner/blob/main/LICENSE)
 [![PyPI](https://badge.fury.io/py/xtuner.svg)](https://pypi.org/project/xtuner/)
 [![Generic badge](https://img.shields.io/badge/🤗%20Huggingface-xtuner-yellow.svg)](https://huggingface.co/xtuner)
+[![Generic badge](https://img.shields.io/badge/🤖%20ModelScope-xtuner-yellow.svg)](https://www.modelscope.cn/organization/xtuner)
 
 English | [简体中文](README_zh-CN.md)
 
@@ -14,6 +15,7 @@ English | [简体中文](README_zh-CN.md)
 
 ## 🎉 News
 
+- **\[2023/12\]** 🔥 Support [Mixtral 8x7b](https://huggingface.co/DiscoResearch/mixtral-7b-8expert) model! To get started, please check out the [docs](xtuner/configs/mixtral/README.md)!
 - **\[2023/11\]** Support [ChatGLM3-6B](https://huggingface.co/THUDM/chatglm3-6b) model!
 - **\[2023/10\]** Support [MSAgent-Bench](https://modelscope.cn/datasets/damo/MSAgent-Bench) dataset, and the fine-tuned LLMs can be applied by [Lagent](https://github.com/InternLM/lagent)!
 - **\[2023/10\]** Optimize the data processing to accommodate `system` context. More information can be found on [Docs](docs/en/user_guides/dataset_format.md)!
@@ -83,6 +85,7 @@ XTuner is a toolkit for efficiently fine-tuning LLM, developed by the [MMRazor](
   <li><a href="https://huggingface.co/Qwen/Qwen-7B">Qwen</a></li>
   <li><a href="https://huggingface.co/baichuan-inc/Baichuan-7B">Baichuan</a></li>
   <li><a href="https://huggingface.co/baichuan-inc/Baichuan2-7B-Base">Baichuan2</a></li>
+  <li><a href="https://huggingface.co/DiscoResearch/mixtral-7b-8expert">Mixtral 8x7b</a></li>
   <li>...</li>
 </ul>
 </td>

diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -5,6 +5,7 @@
 [![license](https://img.shields.io/github/license/InternLM/xtuner.svg)](https://github.com/InternLM/xtuner/blob/main/LICENSE)
 [![PyPI](https://badge.fury.io/py/xtuner.svg)](https://pypi.org/project/xtuner/)
 [![Generic badge](https://img.shields.io/badge/🤗%20Huggingface-xtuner-yellow.svg)](https://huggingface.co/xtuner)
+[![Generic badge](https://img.shields.io/badge/🤖%20ModelScope-xtuner-yellow.svg)](https://www.modelscope.cn/organization/xtuner)
 
 [English](README.md) | 简体中文
 
@@ -14,6 +15,7 @@
 
 ## 🎉 更新
 
+- **\[2023/12\]** 🔥 支持 [Mixtral 8x7b](https://huggingface.co/DiscoResearch/mixtral-7b-8expert) 模型！快速开始请查阅此[文档](xtuner/configs/mixtral/README.md)！
 - **\[2023/11\]** 支持 [ChatGLM3-6B](https://huggingface.co/THUDM/chatglm3-6b) 模型！
 - **\[2023/10\]** 支持 [MSAgent-Bench](https://modelscope.cn/datasets/damo/MSAgent-Bench) 数据集，并且微调所得大语言模型可应用至 [Lagent](https://github.com/InternLM/lagent) 框架！
 - **\[2023/10\]** 优化数据处理逻辑以兼容 `system` 字段，相关细节请查阅[文档](docs/zh_cn/user_guides/dataset_format.md)！
@@ -83,6 +85,7 @@ XTuner 是一个轻量级微调大语言模型的工具库，由 [MMRazor](https
   <li><a href="https://huggingface.co/Qwen/Qwen-7B">Qwen</a></li>
   <li><a href="https://huggingface.co/baichuan-inc/Baichuan-7B">Baichuan</a></li>
   <li><a href="https://huggingface.co/baichuan-inc/Baichuan2-7B-Base">Baichuan2</a></li>
+  <li><a href="https://huggingface.co/DiscoResearch/mixtral-7b-8expert">Mixtral 8x7b</a></li>
   <li>...</li>
 </ul>
 </td>

diff --git a/xtuner/configs/mixtral/README.md b/xtuner/configs/mixtral/README.md
@@ -0,0 +1,50 @@
+# Mixtral 8x7b
+
+## Install
+```bash
+# Mixtral requires the latest version of transformers.
+pip install git+https://github.com/huggingface/transformers.git
+
+# Mixtral requires flash-attn
+pip install flash-attn
+
+# install xtuner and deepspeed
+pip install -U 'xtuner['deepspeed']'
+```
+## Chat Template
+
+Due to the lack of official dialogue templates from Mixtral, we use InternLM's dialogue templates for its SFT fine-tuning.
+
+
+## QLoRA Finetune
+QLoRA only need a single A100-80G
+
+```bash
+xtuner train mixtral_8x7b_qlora_oasst1_internlm_template_e3 --deepspeed deepspeed_zero2
+```
+
+
+## Full Parameter Finetune
+
+Full parameter finetune needs 32 A100-80G
+
+### slurm
+```bash
+srun ${SRUN_ARGS} xtuner train mixtral_8x7b_full_oasst1_internlm_template_e3 --deepspeed deepspeed_zero3 --launcher slurm
+```
+
+### torchrun
+
+```bash
+# excuete on node 0
+NPROC_PER_NODE=8 NNODES=4 PORT=29600 ADDR=$NODE_0_ADDR NODE_RANK=0 xtuner train mixtral_8x7b_full_oasst1_internlm_template_e3 --deepspeed deepspeed_zero3
+
+# excuete on node 1
+NPROC_PER_NODE=8 NNODES=4 PORT=29600 ADDR=$NODE_0_ADDR NODE_RANK=1 xtuner train mixtral_8x7b_full_oasst1_internlm_template_e3 --deepspeed deepspeed_zero3
+
+# excuete on node 2
+NPROC_PER_NODE=8 NNODES=4 PORT=29600 ADDR=$NODE_0_ADDR NODE_RANK=2 xtuner train mixtral_8x7b_full_oasst1_internlm_template_e3 --deepspeed deepspeed_zero3
+
+# excuete on node 3
+NPROC_PER_NODE=8 NNODES=4 PORT=29600 ADDR=$NODE_0_ADDR NODE_RANK=3 xtuner train mixtral_8x7b_full_oasst1_internlm_template_e3 --deepspeed deepspeed_zero3
+```