From 3f5fd0107eeaaa558166d5e42e9350589d40a3ca Mon Sep 17 00:00:00 2001 From: Jiang-Jia-Jun Date: Wed, 5 Nov 2025 16:45:45 +0800 Subject: [PATCH 1/6] [Doc] Update docs for v2.3.0rc0 --- README.md | 93 +------------------ README_CN.md | 7 +- README_EN.md | 89 ++++++++++++++++++ docs/get_started/installation/nvidia_gpu.md | 10 +- .../zh/get_started/installation/nvidia_gpu.md | 10 +- 5 files changed, 104 insertions(+), 105 deletions(-) mode change 100644 => 120000 README.md create mode 100644 README_EN.md diff --git a/README.md b/README.md deleted file mode 100644 index dae9dca6b3c..00000000000 --- a/README.md +++ /dev/null @@ -1,92 +0,0 @@ -English | [简体中文](README_CN.md) -

- -

-

- - - - - - - -

- -

- PaddlePaddle%2FFastDeploy | Trendshift
- Installation - | - Quick Start - | - Supported Models - -

- --------------------------------------------------------------------------------- -# FastDeploy : Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle - -## News -**[2025-09] 🔥 FastDeploy v2.2 is newly released!** It now offers compatibility with models in the HuggingFace ecosystem, has further optimized performance, and newly adds support for [baidu/ERNIE-21B-A3B-Thinking](https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Thinking)! - -**[2025-08] 🔥 Released FastDeploy v2.1:** A brand-new KV Cache scheduling strategy has been introduced, and expanded support for PD separation and CUDA Graph across more models. Enhanced hardware support has been added for platforms like Kunlun and Hygon, along with comprehensive optimizations to improve the performance of both the service and inference engine. - -**[2025-07] The FastDeploy 2.0 Inference Deployment Challenge is now live!** Complete the inference deployment task for the ERNIE 4.5 series open-source models to win official FastDeploy 2.0 merch and generous prizes! 🎁 You're welcome to try it out and share your feedback! 📌[Sign up here](https://www.wjx.top/vm/meSsp3L.aspx#) 📌[Event details](https://github.com/PaddlePaddle/FastDeploy/discussions/2728) - -**[2025-06] 🔥 Released FastDeploy v2.0:** Supports inference and deployment for ERNIE 4.5. Furthermore, we open-source an industrial-grade PD disaggregation with context caching, dynamic role switching for effective resource utilization to further enhance inference performance for MoE models. - -## About - -**FastDeploy** is an inference and deployment toolkit for large language models and visual language models based on PaddlePaddle. It delivers **production-ready, out-of-the-box deployment solutions** with core acceleration technologies: - -- 🚀 **Load-Balanced PD Disaggregation**: Industrial-grade solution featuring context caching and dynamic instance role switching. Optimizes resource utilization while balancing SLO compliance and throughput. -- 🔄 **Unified KV Cache Transmission**: Lightweight high-performance transport library with intelligent NVLink/RDMA selection. -- 🤝 **OpenAI API Server and vLLM Compatible**: One-command deployment with [vLLM](https://github.com/vllm-project/vllm/) interface compatibility. -- 🧮 **Comprehensive Quantization Format Support**: W8A16, W8A8, W4A16, W4A8, W2A16, FP8, and more. -- ⏩ **Advanced Acceleration Techniques**: Speculative decoding, Multi-Token Prediction (MTP) and Chunked Prefill. -- 🖥️ **Multi-Hardware Support**: NVIDIA GPU, Kunlunxin XPU, Hygon DCU, Ascend NPU, Iluvatar GPU, Enflame GCU, MetaX GPU, Intel Gaudi etc. - -## Requirements - -- OS: Linux -- Python: 3.10 ~ 3.12 - -## Installation - -FastDeploy supports inference deployment on **NVIDIA GPUs**, **Kunlunxin XPUs**, **Iluvatar GPUs**, **Enflame GCUs**, **Hygon DCUs** and other hardware. For detailed installation instructions: - -- [NVIDIA GPU](./docs/get_started/installation/nvidia_gpu.md) -- [Kunlunxin XPU](./docs/get_started/installation/kunlunxin_xpu.md) -- [Iluvatar GPU](./docs/get_started/installation/iluvatar_gpu.md) -- [Enflame GCU](./docs/get_started/installation/Enflame_gcu.md) -- [Hygon DCU](./docs/get_started/installation/hygon_dcu.md) -- [MetaX GPU](./docs/get_started/installation/metax_gpu.md) -- [Intel Gaudi](./docs/get_started/installation/intel_gaudi.md) - -**Note:** We are actively working on expanding hardware support. Additional hardware platforms including Ascend NPU are currently under development and testing. Stay tuned for updates! - -## Get Started - -Learn how to use FastDeploy through our documentation: -- [10-Minutes Quick Deployment](./docs/get_started/quick_start.md) -- [ERNIE-4.5 Large Language Model Deployment](./docs/get_started/ernie-4.5.md) -- [ERNIE-4.5-VL Multimodal Model Deployment](./docs/get_started/ernie-4.5-vl.md) -- [Offline Inference Development](./docs/offline_inference.md) -- [Online Service Deployment](./docs/online_serving/README.md) -- [Best Practices](./docs/best_practices/README.md) - -## Supported Models - -Learn how to download models, enable using the torch format, and more: -- [Full Supported Models List](./docs/supported_models.md) - -## Advanced Usage - -- [Quantization](./docs/quantization/README.md) -- [PD Disaggregation Deployment](./docs/features/disaggregated.md) -- [Speculative Decoding](./docs/features/speculative_decoding.md) -- [Prefix Caching](./docs/features/prefix_caching.md) -- [Chunked Prefill](./docs/features/chunked_prefill.md) - -## Acknowledgement - -FastDeploy is licensed under the [Apache-2.0 open-source license](./LICENSE). During development, portions of [vLLM](https://github.com/vllm-project/vllm) code were referenced and incorporated to maintain interface compatibility, for which we express our gratitude. diff --git a/README.md b/README.md new file mode 120000 index 00000000000..bacd3186b4b --- /dev/null +++ b/README.md @@ -0,0 +1 @@ +README_CN.md \ No newline at end of file diff --git a/README_CN.md b/README_CN.md index 0f6460fa57f..805178bbf94 100644 --- a/README_CN.md +++ b/README_CN.md @@ -26,11 +26,12 @@ # FastDeploy :基于飞桨的大语言模型与视觉语言模型推理部署工具包 ## 最新活动 -**[2025-09] 🔥 FastDeploy v2.2 全新发布**: HuggingFace生态模型兼容,性能进一步优化,更新增对[baidu/ERNIE-21B-A3B-Thinking](https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Thinking)支持! -**[2025-08] FastDeploy v2.1 发布**:全新的KV Cache调度策略,更多模型支持PD分离和CUDA Graph,昆仑、海光等更多硬件支持增强,全方面优化服务和推理引擎的性能。 +**[2025-11] 🔥FastDeploy v2.3-rc0 PaddleOCR-VL 0.9B推理性能专项优化发布,**相比vLLM吞吐提升35%**(详见[PaddleOCR Benchmark](benchmarks/paddleocr-vl))! + +**[2025-09] FastDeploy v2.2 全新发布**: HuggingFace生态模型兼容,性能进一步优化,更新增对[baidu/ERNIE-21B-A3B-Thinking](https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Thinking)支持! -**[2025-07] 《FastDeploy2.0推理部署实测》专题活动已上线!** 完成文心4.5系列开源模型的推理部署等任务,即可获得骨瓷马克杯等FastDeploy2.0官方周边及丰富奖金!🎁 欢迎大家体验反馈~ 📌[报名地址](https://www.wjx.top/vm/meSsp3L.aspx#) 📌[活动详情](https://github.com/PaddlePaddle/FastDeploy/discussions/2728) +**[2025-08] FastDeploy v2.1 发布**:全新的KV Cache调度策略,更多模型支持PD分离和CUDA Graph,昆仑、海光等更多硬件支持增强,全方面优化服务和推理引擎的性能。 ## 关于 diff --git a/README_EN.md b/README_EN.md new file mode 100644 index 00000000000..ddc00321556 --- /dev/null +++ b/README_EN.md @@ -0,0 +1,89 @@ +English | [简体中文](README_CN.md) +

+ +

+

+ + + + + + + +

+ +

+ PaddlePaddle%2FFastDeploy | Trendshift
+ Installation + | + Quick Start + | + Supported Models + +

+ +-------------------------------------------------------------------------------- +# FastDeploy : Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle + +## News + +**[2025-11] 🔥 FastDeploy v2.3-rc0: The specialized optimization release for PaddleOCR-VL 0.9B inference performance has been launched, achieving a 35% increase in throughput compared to vLLM (for details, see [PaddleOCR Benchmark](benchmarks/paddleocr-vl))! + +**[2025-09] FastDeploy v2.2 is newly released!** It now offers compatibility with models in the HuggingFace ecosystem, has further optimized performance, and newly adds support for [baidu/ERNIE-21B-A3B-Thinking](https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Thinking)! + +## About + +**FastDeploy** is an inference and deployment toolkit for large language models and visual language models based on PaddlePaddle. It delivers **production-ready, out-of-the-box deployment solutions** with core acceleration technologies: + +- 🚀 **Load-Balanced PD Disaggregation**: Industrial-grade solution featuring context caching and dynamic instance role switching. Optimizes resource utilization while balancing SLO compliance and throughput. +- 🔄 **Unified KV Cache Transmission**: Lightweight high-performance transport library with intelligent NVLink/RDMA selection. +- 🤝 **OpenAI API Server and vLLM Compatible**: One-command deployment with [vLLM](https://github.com/vllm-project/vllm/) interface compatibility. +- 🧮 **Comprehensive Quantization Format Support**: W8A16, W8A8, W4A16, W4A8, W2A16, FP8, and more. +- ⏩ **Advanced Acceleration Techniques**: Speculative decoding, Multi-Token Prediction (MTP) and Chunked Prefill. +- 🖥️ **Multi-Hardware Support**: NVIDIA GPU, Kunlunxin XPU, Hygon DCU, Ascend NPU, Iluvatar GPU, Enflame GCU, MetaX GPU, Intel Gaudi etc. + +## Requirements + +- OS: Linux +- Python: 3.10 ~ 3.12 + +## Installation + +FastDeploy supports inference deployment on **NVIDIA GPUs**, **Kunlunxin XPUs**, **Iluvatar GPUs**, **Enflame GCUs**, **Hygon DCUs** and other hardware. For detailed installation instructions: + +- [NVIDIA GPU](./docs/get_started/installation/nvidia_gpu.md) +- [Kunlunxin XPU](./docs/get_started/installation/kunlunxin_xpu.md) +- [Iluvatar GPU](./docs/get_started/installation/iluvatar_gpu.md) +- [Enflame GCU](./docs/get_started/installation/Enflame_gcu.md) +- [Hygon DCU](./docs/get_started/installation/hygon_dcu.md) +- [MetaX GPU](./docs/get_started/installation/metax_gpu.md) +- [Intel Gaudi](./docs/get_started/installation/intel_gaudi.md) + +**Note:** We are actively working on expanding hardware support. Additional hardware platforms including Ascend NPU are currently under development and testing. Stay tuned for updates! + +## Get Started + +Learn how to use FastDeploy through our documentation: +- [10-Minutes Quick Deployment](./docs/get_started/quick_start.md) +- [ERNIE-4.5 Large Language Model Deployment](./docs/get_started/ernie-4.5.md) +- [ERNIE-4.5-VL Multimodal Model Deployment](./docs/get_started/ernie-4.5-vl.md) +- [Offline Inference Development](./docs/offline_inference.md) +- [Online Service Deployment](./docs/online_serving/README.md) +- [Best Practices](./docs/best_practices/README.md) + +## Supported Models + +Learn how to download models, enable using the torch format, and more: +- [Full Supported Models List](./docs/supported_models.md) + +## Advanced Usage + +- [Quantization](./docs/quantization/README.md) +- [PD Disaggregation Deployment](./docs/features/disaggregated.md) +- [Speculative Decoding](./docs/features/speculative_decoding.md) +- [Prefix Caching](./docs/features/prefix_caching.md) +- [Chunked Prefill](./docs/features/chunked_prefill.md) + +## Acknowledgement + +FastDeploy is licensed under the [Apache-2.0 open-source license](./LICENSE). During development, portions of [vLLM](https://github.com/vllm-project/vllm) code were referenced and incorporated to maintain interface compatibility, for which we express our gratitude. diff --git a/docs/get_started/installation/nvidia_gpu.md b/docs/get_started/installation/nvidia_gpu.md index 0a8e4f9eff7..bfafbb90143 100644 --- a/docs/get_started/installation/nvidia_gpu.md +++ b/docs/get_started/installation/nvidia_gpu.md @@ -15,7 +15,7 @@ The following installation methods are available when your environment meets the **Notice**: The pre-built image only supports SM80/90 GPU(e.g. H800/A800),if you are deploying on SM86/89GPU(L40/4090/L20), please reinstall ```fastdeploy-gpu``` after you create the container. ```shell -docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.2.1 +docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.3.0rc0 ``` ## 2. Pre-built Pip Installation @@ -23,7 +23,7 @@ docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12 First install paddlepaddle-gpu. For detailed instructions, refer to [PaddlePaddle Installation](https://www.paddlepaddle.org.cn/en/install/quick?docurl=/documentation/docs/en/develop/install/pip/linux-pip_en.html) ```shell # Install stable release -python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/ +python -m pip install paddlepaddle-gpu==3.2.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/ # Install latest Nightly build python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu126/ @@ -34,7 +34,7 @@ Then install fastdeploy. **Do not install from PyPI**. Use the following methods For SM80/90 architecture GPUs(e.g A30/A100/H100/): ``` # Install stable release -python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-80_90/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple +python -m pip install fastdeploy-gpu==2.3.0rc0 -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-80_90/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple # Install latest Nightly build python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/fastdeploy-gpu-80_90/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple @@ -43,7 +43,7 @@ python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages For SM86/89 architecture GPUs(e.g A10/4090/L20/L40): ``` # Install stable release -python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-86_89/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple +python -m pip install fastdeploy-gpu==2.3.0rc0 -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-86_89/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple # Install latest Nightly build python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/fastdeploy-gpu-86_89/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple @@ -64,7 +64,7 @@ docker build -f dockerfiles/Dockerfile.gpu -t fastdeploy:gpu . First install paddlepaddle-gpu. For detailed instructions, refer to [PaddlePaddle Installation](https://www.paddlepaddle.org.cn/en/install/quick?docurl=/documentation/docs/en/develop/install/pip/linux-pip_en.html) ```shell -python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/ +python -m pip install paddlepaddle-gpu==3.2.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/ ``` Then clone the source code and build: diff --git a/docs/zh/get_started/installation/nvidia_gpu.md b/docs/zh/get_started/installation/nvidia_gpu.md index 30af18e4be8..57f988825b9 100644 --- a/docs/zh/get_started/installation/nvidia_gpu.md +++ b/docs/zh/get_started/installation/nvidia_gpu.md @@ -17,7 +17,7 @@ **注意**: 如下镜像仅支持SM 80/90架构GPU(A800/H800等),如果你是在L20/L40/4090等SM 86/69架构的GPU上部署,请在创建容器后,卸载```fastdeploy-gpu```再重新安装如下文档指定支持86/89架构的`fastdeploy-gpu`包。 ``` shell -docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.2.1 +docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.3.0rc0 ``` ## 2. 预编译Pip安装 @@ -26,7 +26,7 @@ docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12 ``` shell # Install stable release -python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/ +python -m pip install paddlepaddle-gpu==3.2.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/ # Install latest Nightly build python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu126/ @@ -38,7 +38,7 @@ python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/ ``` # 安装稳定版本fastdeploy -python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-80_90/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple +python -m pip install fastdeploy-gpu==2.3.0rc0 -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-80_90/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple # 安装Nightly Build的最新版本fastdeploy python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/fastdeploy-gpu-80_90/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple @@ -48,7 +48,7 @@ python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages ``` # 安装稳定版本fastdeploy -python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-86_89/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple +python -m pip install fastdeploy-gpu==2.3.0rc0 -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-86_89/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple # 安装Nightly Build的最新版本fastdeploy python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/fastdeploy-gpu-86_89/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple @@ -70,7 +70,7 @@ docker build -f dockerfiles/Dockerfile.gpu -t fastdeploy:gpu . 首先安装 paddlepaddle-gpu,详细安装方式参考 [PaddlePaddle安装](https://www.paddlepaddle.org.cn/) ``` shell -python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/ +python -m pip install paddlepaddle-gpu==3.2.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/ ``` 接着克隆源代码,编译安装 From 55342ff93a1d4da50e01c16902c70ea1c93ad741 Mon Sep 17 00:00:00 2001 From: Jiang-Jia-Jun Date: Wed, 5 Nov 2025 16:52:29 +0800 Subject: [PATCH 2/6] [Doc] Update docs for v2.3.0rc0 --- docs/get_started/installation/nvidia_gpu.md | 6 +++--- docs/zh/get_started/installation/nvidia_gpu.md | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/get_started/installation/nvidia_gpu.md b/docs/get_started/installation/nvidia_gpu.md index bfafbb90143..e41ca6944de 100644 --- a/docs/get_started/installation/nvidia_gpu.md +++ b/docs/get_started/installation/nvidia_gpu.md @@ -15,7 +15,7 @@ The following installation methods are available when your environment meets the **Notice**: The pre-built image only supports SM80/90 GPU(e.g. H800/A800),if you are deploying on SM86/89GPU(L40/4090/L20), please reinstall ```fastdeploy-gpu``` after you create the container. ```shell -docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.3.0rc0 +docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.3.0-rc0 ``` ## 2. Pre-built Pip Installation @@ -34,7 +34,7 @@ Then install fastdeploy. **Do not install from PyPI**. Use the following methods For SM80/90 architecture GPUs(e.g A30/A100/H100/): ``` # Install stable release -python -m pip install fastdeploy-gpu==2.3.0rc0 -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-80_90/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple +python -m pip install fastdeploy-gpu==2.3.0-rc0 -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-80_90/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple # Install latest Nightly build python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/fastdeploy-gpu-80_90/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple @@ -43,7 +43,7 @@ python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages For SM86/89 architecture GPUs(e.g A10/4090/L20/L40): ``` # Install stable release -python -m pip install fastdeploy-gpu==2.3.0rc0 -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-86_89/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple +python -m pip install fastdeploy-gpu==2.3.0-rc0 -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-86_89/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple # Install latest Nightly build python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/fastdeploy-gpu-86_89/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple diff --git a/docs/zh/get_started/installation/nvidia_gpu.md b/docs/zh/get_started/installation/nvidia_gpu.md index 57f988825b9..9b00088f6cf 100644 --- a/docs/zh/get_started/installation/nvidia_gpu.md +++ b/docs/zh/get_started/installation/nvidia_gpu.md @@ -17,7 +17,7 @@ **注意**: 如下镜像仅支持SM 80/90架构GPU(A800/H800等),如果你是在L20/L40/4090等SM 86/69架构的GPU上部署,请在创建容器后,卸载```fastdeploy-gpu```再重新安装如下文档指定支持86/89架构的`fastdeploy-gpu`包。 ``` shell -docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.3.0rc0 +docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.3.0-rc0 ``` ## 2. 预编译Pip安装 @@ -38,7 +38,7 @@ python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/ ``` # 安装稳定版本fastdeploy -python -m pip install fastdeploy-gpu==2.3.0rc0 -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-80_90/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple +python -m pip install fastdeploy-gpu==2.3.0-rc0 -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-80_90/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple # 安装Nightly Build的最新版本fastdeploy python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/fastdeploy-gpu-80_90/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple @@ -48,7 +48,7 @@ python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages ``` # 安装稳定版本fastdeploy -python -m pip install fastdeploy-gpu==2.3.0rc0 -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-86_89/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple +python -m pip install fastdeploy-gpu==2.3.0-rc0 -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-86_89/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple # 安装Nightly Build的最新版本fastdeploy python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/fastdeploy-gpu-86_89/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple From 11b6141b31691c6900bc89074390f06356102a5e Mon Sep 17 00:00:00 2001 From: Jiang-Jia-Jun Date: Wed, 5 Nov 2025 18:55:45 +0800 Subject: [PATCH 3/6] [Doc] Update docs for v2.3.0rc0 --- docs/get_started/installation/kunlunxin_xpu.md | 6 +++--- docs/zh/get_started/installation/kunlunxin_xpu.md | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/get_started/installation/kunlunxin_xpu.md b/docs/get_started/installation/kunlunxin_xpu.md index b631da0fe53..a69499c5d1d 100644 --- a/docs/get_started/installation/kunlunxin_xpu.md +++ b/docs/get_started/installation/kunlunxin_xpu.md @@ -27,9 +27,9 @@ Verified platform: ```bash mkdir Work cd Work -docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-xpu:2.3.0 +docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-xpu:2.3.0-rc0 docker run --name fastdeploy-xpu --net=host -itd --privileged -v $PWD:/Work -w /Work \ - ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-xpu:2.3.0 \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-xpu:2.3.0-rc0 \ /bin/bash docker exec -it fastdeploy-xpu /bin/bash ``` @@ -51,7 +51,7 @@ python -m pip install --pre paddlepaddle-xpu -i https://www.paddlepaddle.org.cn/ ### Install FastDeploy (**Do NOT install via PyPI source**) ```bash -python -m pip install fastdeploy-xpu==2.3.0 -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-xpu-p800/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple +python -m pip install fastdeploy-xpu==2.3.0-rc0 -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-xpu-p800/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple ``` Alternatively, you can install the latest version of FastDeploy (Not recommended) diff --git a/docs/zh/get_started/installation/kunlunxin_xpu.md b/docs/zh/get_started/installation/kunlunxin_xpu.md index 590dd9ee1c1..16dd3e16cdf 100644 --- a/docs/zh/get_started/installation/kunlunxin_xpu.md +++ b/docs/zh/get_started/installation/kunlunxin_xpu.md @@ -27,9 +27,9 @@ ```bash mkdir Work cd Work -docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-xpu:2.3.0 +docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-xpu:2.3.0-rc0 docker run --name fastdeploy-xpu --net=host -itd --privileged -v $PWD:/Work -w /Work \ - ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-xpu:2.3.0 \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-xpu:2.3.0-rc0 \ /bin/bash docker exec -it fastdeploy-xpu /bin/bash ``` @@ -51,7 +51,7 @@ python -m pip install --pre paddlepaddle-xpu -i https://www.paddlepaddle.org.cn/ ### 安装 FastDeploy(**注意不要通过 pypi 源安装**) ```bash -python -m pip install fastdeploy-xpu==2.3.0 -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-xpu-p800/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple +python -m pip install fastdeploy-xpu==2.3.0-rc0 -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-xpu-p800/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple ``` 或者你也可以安装最新版 FastDeploy(不推荐) From d27d0363fe977904ec89a665f6b77cd1c2c6acdc Mon Sep 17 00:00:00 2001 From: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Date: Wed, 5 Nov 2025 19:29:02 +0800 Subject: [PATCH 4/6] Update README_CN.md --- README_CN.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README_CN.md b/README_CN.md index 805178bbf94..627ef945f56 100644 --- a/README_CN.md +++ b/README_CN.md @@ -27,7 +27,7 @@ ## 最新活动 -**[2025-11] 🔥FastDeploy v2.3-rc0 PaddleOCR-VL 0.9B推理性能专项优化发布,**相比vLLM吞吐提升35%**(详见[PaddleOCR Benchmark](benchmarks/paddleocr-vl))! +**[2025-11]** 🔥FastDeploy v2.3-rc0 PaddleOCR-VL 0.9B推理性能专项优化发布,**相比vLLM吞吐提升35%**(详见[PaddleOCR Benchmark](benchmarks/paddleocr-vl))! **[2025-09] FastDeploy v2.2 全新发布**: HuggingFace生态模型兼容,性能进一步优化,更新增对[baidu/ERNIE-21B-A3B-Thinking](https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Thinking)支持! From 209336d45d148d08dcc373208d14404a12bb9f1e Mon Sep 17 00:00:00 2001 From: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Date: Wed, 5 Nov 2025 19:43:15 +0800 Subject: [PATCH 5/6] Add deployment guide link for FastDeploy v2.3-rc0 Updated release note for FastDeploy v2.3-rc0 to include deployment guide link. --- README_CN.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README_CN.md b/README_CN.md index 627ef945f56..8ba840d7d92 100644 --- a/README_CN.md +++ b/README_CN.md @@ -27,7 +27,7 @@ ## 最新活动 -**[2025-11]** 🔥FastDeploy v2.3-rc0 PaddleOCR-VL 0.9B推理性能专项优化发布,**相比vLLM吞吐提升35%**(详见[PaddleOCR Benchmark](benchmarks/paddleocr-vl))! +**[2025-11]** 🔥FastDeploy v2.3-rc0 PaddleOCR-VL 0.9B推理性能专项优化发布,**相比vLLM吞吐提升35%**![部署指南](docs/best_practices/PaddleOCR-VL-0.9B.md) **[2025-09] FastDeploy v2.2 全新发布**: HuggingFace生态模型兼容,性能进一步优化,更新增对[baidu/ERNIE-21B-A3B-Thinking](https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Thinking)支持! From c34ab35cbd33164d9e7c5260195dc37929d397e8 Mon Sep 17 00:00:00 2001 From: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Date: Wed, 5 Nov 2025 19:44:22 +0800 Subject: [PATCH 6/6] Add Deployment Guide link for FastDeploy v2.3-rc0 Updated the news section to include a link to the Deployment Guide for FastDeploy v2.3-rc0. --- README_EN.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README_EN.md b/README_EN.md index ddc00321556..b7a8bcecaca 100644 --- a/README_EN.md +++ b/README_EN.md @@ -27,7 +27,7 @@ English | [简体中文](README_CN.md) ## News -**[2025-11] 🔥 FastDeploy v2.3-rc0: The specialized optimization release for PaddleOCR-VL 0.9B inference performance has been launched, achieving a 35% increase in throughput compared to vLLM (for details, see [PaddleOCR Benchmark](benchmarks/paddleocr-vl))! +**[2025-11] 🔥 FastDeploy v2.3-rc0: The specialized optimization release for PaddleOCR-VL 0.9B inference performance has been launched, achieving a 35% increase in throughput compared to vLLM! [Deployment Guide](docs/best_practices/PaddleOCR-VL-0.9B.md) **[2025-09] FastDeploy v2.2 is newly released!** It now offers compatibility with models in the HuggingFace ecosystem, has further optimized performance, and newly adds support for [baidu/ERNIE-21B-A3B-Thinking](https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Thinking)!