完成项目2-CUDA-完成项目3和项目4 by lihaokun-2026 · Pull Request #39 · InfiniTensor/llaisys

lihaokun-2026 · 2026-03-15T11:17:08Z

1、下载项目最新源码并编译

安装必备组件

编译工具：Xmake
C++编译器：MSVC（Windows）或Clang或GCC
Python >= 3.9（PyTorch、Transformers等）
Clang-Format-16（可选）：用于格式化C++代码。

下载项目源码

git clone https://github.com/lihaokun-2026/llaisys.git

安装xmake编译工具

执行xmake安装脚本进行编译工具的安装

curl -fsSL https://xmake.io/shget.text | bash

查看xmake是否安装成功

xmake --version

安装python第三方包

// 切换到项目目录下
cd llaisys
// 下载python3第三方包
pip install -r requirements.txt
// 使用清华源pip镜像加速
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

编译项目源码

// 首先检查xmake，确定xmake加载正确的gcc和cuda环境
xmake f --nv-gpu=y -cv
// 编译项目源码
xmake
// 项目源码安装 此时pip也已经完成安装 写在xmake.lua文件中了
xmake install
// 测试项目采用nvidia卡加速推理 注意本项目只适配了DeepSeek-R1-Distill-Qwen-1.5B 暂未实现其他模型适配
python test/test_infer.py --model [dir_path/to/model] --test --device nvidia

2、项目测试与部署

将会看到文件测试成功：

Contents:
<｜User｜>Who are you?<｜Assistant｜><think>
Greetings! I'm DeepSeek-R1, an artificial intelligence assistant created by DeepSeek. I'm at your service and would be delighted to assist you with any inquiries or tasks you may have.
</think>

Greetings! I'm DeepSeek-R1, an artificial intelligence assistant created by DeepSeek. I'm at your service and would be delighted to assist you with any inquiries or tasks you may have.


Time elapsed: 1.21s

Test passed!

测试项目推理

// 开启项目服务端
// 可以使用CUDA_VISIBLE_DEVICES指定GPU设备，同时本项目默认开启 KVCachePool 已启动 以及前缀匹配功能
CUDA_VISIBLE_DEVICES=0 python chat_server.py --model ../DeepSeek-R1-Distill-Qwen-1.5B --device nvidia --port 8010
// 开启项目客户端
python chat_ui.py --server http://localhost:8010
// 项目客户端的--server参数需要和服务端的实际服务地址保持相同

项目运行之后的结果

客户端 1

项目运行之后可以看到和机器人的历史对话，同时可以和机器人进行连续对话。

客户端 2

同时为了测试多用户沟通，本系统添加了两套客户端环境，这个客户端地址和服务端地址一致，这个可以认为是系统自带的一个客户端，在这里可以实现新建对话并进行测试，不同上方的gradio实现的客户端，这个客户端实现了多轮对话同时进行，可以新建对话并和机器人对话。

项目成果与不足

本项目主要实现

在 LLAISYS 中集成 CUDA；
构建 AI 聊天机器人，可以单用户对话，也可以多用户对话；
实现了你的服务端需要将请求加入请求池/队列，并用单独的循环线程/进程来处理；同时为了加速模型的推理，本项目实现了FlashAttention(FA2)计算后端，同时采用KV Cache块，以及前缀匹配的KV Cache池来复用结果。

项目不足

由于本人UI知识以及推理知识有限，目前项目还是有许多不足，例如：graio页面不能切换，只能等待一轮对话终止或者结束后才能切换对话，同时未能实现项目的分布式推理，以及支持的新模型，同时未能在国产加速卡上完成项目的适配，我将会进一步完善本项目。

Set up CI workflow for building and testing across multiple OS.

lihaokun-2026 · 2026-03-16T00:51:38Z

这里我是acoier，由于某些原因改名了，可以看我主页。

acoier and others added 20 commits January 29, 2026 07:10

实现argmax算子

f13e1af

完成embeddings 和 linear算子作业

859a2b2

work over

4a6616b

finish work three

abb4190

Add GitHub Actions workflow for build and test

fce2a87

Set up CI workflow for building and testing across multiple OS.

完成推理cuda算子适配

ffc4b85

删除debug多余打印

044fa74

完成ai聊天服务器基本开发

c821684

完成多用户隔离为每个用户创建一个session

f6e4132

修复多用户推理bug

fd4352e

实现flash Attention加速推理

7d863d3

修改UI-1

2dc3d0a

UI-2

b942d89

UI-3

97410c6

添加KVcache pool

a96520b

修改前后端修复后端bug 和前端UI

91a814d

实现chat_ui切换历史对话功能

1ab46c2

修复gradio bug

8c6ce42

完成系统全部建设开启锦上添花

586fd8b

添加requirements.txt文件

f12f3f0

lihaokun-2026 changed the title ~~LLAISYS Chat部署文档~~ 完成作业2-CUDA-完成作业3和作业4 Mar 15, 2026

lihaokun-2026 changed the title ~~完成作业2-CUDA-完成作业3和作业4~~ 完成项目2-CUDA-完成项目3和项目4 Mar 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

完成项目2-CUDA-完成项目3和项目4#39

完成项目2-CUDA-完成项目3和项目4#39
lihaokun-2026 wants to merge 20 commits intoInfiniTensor:mainfrom
lihaokun-2026:main

lihaokun-2026 commented Mar 15, 2026

Uh oh!

lihaokun-2026 commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lihaokun-2026 commented Mar 15, 2026

1、下载项目最新源码并编译

安装必备组件

下载项目源码

安装xmake编译工具

安装python第三方包

编译项目源码

2、项目测试与部署

测试项目推理

项目运行之后的结果

项目成果与不足

本项目主要实现

项目不足

Uh oh!

lihaokun-2026 commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant