Skip to content

Commit 267b9f2

Browse files
committed
The default readme is written in English, with adjustments made to the content of the architecture
1 parent 5f6db40 commit 267b9f2

File tree

4 files changed

+180
-180
lines changed

4 files changed

+180
-180
lines changed

README.md

Lines changed: 61 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -8,72 +8,72 @@ Codefuse-ModelCache
88
<div align="center">
99
<h4 align="center">
1010
<p>
11-
<b>中文</b> |
12-
<a href="https://github.com/codefuse-ai/CodeFuse-ModelCache/blob/main/README_EN.md">English</a>
11+
<a href="https://github.com/codefuse-ai/CodeFuse-ModelCache/blob/main/README_CN.md">中文</a> |
12+
<b>English</b>
1313
<p>
1414
</h4>
1515
</div>
1616

1717
## Contents
18-
- [新闻](#新闻)
19-
- [项目简介](#项目简介)
20-
- [快速部署](#快速部署)
21-
- [服务访问](#服务访问)
22-
- [文章](#文章)
23-
- [架构大图](#架构大图)
24-
- [核心功能](#核心功能)
25-
## 新闻
26-
[2023.10.31] codefuse-ModelCache...
27-
## 项目简介
28-
Codefuse-ModelCache 是一个开源的大模型语义缓存系统,通过缓存已生成的模型结果,降低类似请求的响应时间,提升用户体验。该项目从服务优化角度出发,引入缓存机制,在资源有限和对实时性要求较高的场景下,帮助企业和研究机构降低推理部署成本、提升模型性能和效率、提供规模化大模型服务。我们希望通过开源,分享交流大模型语义Cache的相关技术。
29-
## 快速部署
30-
### 环境依赖
18+
- [news](#news)
19+
- [Introduction](#Introduction)
20+
- [Quick-Deployment](#Quick-Deployment)
21+
- [Service-Access](#Service-Access)
22+
- [Articles](#Articles)
23+
- [Modules](#Modules)
24+
- [Core-Features](#Core-Features)
25+
- [Acknowledgements](#Acknowledgements)
26+
## news
27+
[2023.08.26] codefuse-ModelCache...
28+
### Introduction
29+
Codefuse-ModelCache is a semantic cache for large language models (LLMs). By caching pre-generated model results, it reduces response time for similar requests and improves user experience. <br />This project aims to optimize services by introducing a caching mechanism. It helps businesses and research institutions reduce the cost of inference deployment, improve model performance and efficiency, and provide scalable services for large models. Through open-source, we aim to share and exchange technologies related to large model semantic cache.
30+
## Quick Deployment
31+
### Dependencies
3132

32-
- python版本: 3.8及以上
33-
- 依赖包安装:
33+
- Python version: 3.8 and above
34+
- Package Installation
3435
```shell
3536
pip install requirements.txt
3637
```
38+
### Environment Configuration
39+
Before starting the service, the following environment configurations should be performed:
3740

38-
### 环境配置
39-
在启动服务前,应该进行如下环境配置:
40-
41-
1. 安装关系数据库 mysql, 导入sql创建数据表,sql文件: reference_doc/create_table.sql
42-
2. 安装向量数据库milvus
43-
3. 在配置文件中添加数据库访问信息,配置文件为:
44-
1. modelcache/config/milvus_config.ini
41+
1. Install the relational database MySQL and import the SQL file to create the data tables. The SQL file can be found at: reference_doc/create_table.sql
42+
2. Install the vector database Milvus.
43+
3. Add the database access information to the configuration files:
44+
1. modelcache/config/milvus_config.ini
4545
2. modelcache/config/mysql_config.ini
46-
4. 离线模型bin文件下载, 参考地址:[https://huggingface.co/shibing624/text2vec-base-chinese/tree/main](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main),并将下载的bin文件,放到 model/text2vec-base-chinese 文件夹中
47-
5. 通过flask4modelcache.py脚本启动后端服务。
48-
## 服务访问
49-
当前服务以restful API方式提供3个核心功能:数据写入,cache查询和cache数据清空。请求demo 如下:
50-
### cache写入
46+
4. Download the embedding model bin file from the following address: [https://huggingface.co/shibing624/text2vec-base-chinese/tree/main](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main). Place the downloaded bin file in the model/text2vec-base-chinese folder.
47+
5. Start the backend service using the flask4modelcache.py script.
48+
## Service-Access
49+
The current service provides three core functionalities through RESTful API.: Cache-Writing, Cache-Querying, and Cache-Clearing. Demos:
50+
### Cache-Writing
5151
```python
5252
import json
5353
import requests
5454
url = 'http://127.0.0.1:5000/modelcache'
5555
type = 'insert'
5656
scope = {"model": "CODEGPT-1008"}
57-
chat_info = [{"query": [{"role": "system", "content": "你是一个AI代码助手, 你必须提供中立的、无害的答案帮助用户解决代码相关的问题"}, {"role": "user", "content": "你是谁?"}],
58-
"answer": "你好,我是智能助手,请问有什么能帮您!"}]
57+
chat_info = [{"query": [{"role": "system", "content": "You are an AI code assistant and you must provide neutral and harmless answers to help users solve code-related problems."}, {"role": "user", "content": "你是谁?"}],
58+
"answer": "Hello, I am an intelligent assistant. How can I assist you?"}]
5959
data = {'type': type, 'scope': scope, 'chat_info': chat_info}
6060
headers = {"Content-Type": "application/json"}
6161
res = requests.post(url, headers=headers, json=json.dumps(data))
6262
```
63-
### cache查询
63+
### Cache-Querying
6464
```python
6565
import json
6666
import requests
6767
url = 'http://127.0.0.1:5000/modelcache'
6868
type = 'query'
6969
scope = {"model": "CODEGPT-1008"}
70-
query = [{"role": "system", "content": "你是一个AI代码助手, 你必须提供中立的、无害的答案帮助用户解决代码相关的问题"}, {"role": "user", "content": "你是谁?"}]
70+
query = [{"role": "system", "content": "You are an AI code assistant and you must provide neutral and harmless answers to help users solve code-related problems."}, {"role": "user", "content": "Who are you?"}]
7171
data = {'type': type, 'scope': scope, 'query': query}
7272

7373
headers = {"Content-Type": "application/json"}
7474
res = requests.post(url, headers=headers, json=json.dumps(data))
7575
```
76-
### cache清空
76+
### Cache-Clearing
7777
```python
7878
import json
7979
import requests
@@ -86,34 +86,33 @@ data = {'type': type, 'scope': scope, 'remove_type': remove_type}
8686
headers = {"Content-Type": "application/json"}
8787
res = requests.post(url, headers=headers, json=json.dumps(data))
8888
```
89-
## 文章
90-
敬请期待
91-
## 架构大图
92-
![modelcache modules](docs/modelcache_modules.png)
93-
## 核心功能
94-
在ModelCache中,沿用了GPTCache的主要思想,包含了一系列核心模块:adapter、embedding、similarity和data_manager。adapter模块主要功能是处理各种任务的业务逻辑,并且能够将embedding、similarity、data_manager等模块串联起来;embedding模块主要负责将文本转换为语义向量表示,它将用户的查询转换为向量形式,并用于后续的召回或存储操作;rank模块用于对召回的向量进行相似度排序和评估;data_manager模块主要用于管理数据库。同时,为了更好的在工业界落地,我们做了架构和功能上的升级,如下:
95-
96-
- [x] 架构调整(轻量化集成):以类redis的缓存模式嵌入到大模型产品中,提供语义缓存能力,不会干扰LLM调用和安全审核等功能,适配所有大模型服务。
97-
- [x] 多种模型加载方案:
98-
- 支持加载本地embedding模型,解决huggingface网络连通问题
99-
- 支持加载多种预训练模型embeding层
100-
- [x] 数据隔离能力
101-
- 环境隔离:可依据环境,拉取不同的数据库配置,实现环境隔离(开发、预发、生产)
102-
- 多租户数据隔离:根据模型动态创建collection,进行数据隔离,用于大模型产品中多个模型/服务数据隔离问题
103-
- [x] 支持系统指令:采用拼接的方式,解决propmt范式中sys指令问题。
104-
- [x] 长短文本区分:长文本会给相似评估带来更多挑战,增加了长短文本的区分,可单独配置判断阈值。
105-
- [x] milvus性能优化:milvus consistency_level调整为"Session"级别,可以得到更好的性能。
106-
- [x] 数据管理能力:
107-
- 一键清空缓存的能力,用于模型升级后的数据管理。
108-
- 召回hitquery,用于后续的数据分析和模型迭代参考。
109-
- 异步日志回写能力,用于数据分析和统计
110-
- 增加model字段和数据统计字段,用于功能拓展。
89+
## Articles
90+
Coming soon...
91+
## modules
92+
![modelcache modules](docs/modelcache_modules_en.png)
93+
## Core-Features
94+
In ModelCache, we adopted the main idea of GPTCache, includes core modules: adapter, embedding, similarity, and data_manager. The adapter module is responsible for handling the business logic of various tasks and can connect the embedding, similarity, and data_manager modules. The embedding module is mainly responsible for converting text into semantic vector representations, it transforms user queries into vector form.The rank module is used for sorting and evaluating the similarity of the recalled vectors. The data_manager module is primarily used for managing the database. In order to better facilitate industrial applications, we have made architectural and functional upgrades as follows:
11195

112-
未来会持续建设的功能:
96+
- [x] We have modified it similar to Redis and embedded it into the LLMs product, providing semantic caching capabilities. This ensures that it does not interfere with LLM calls, security audits, and other functionalities, achieving compatibility with all large-scale model services.
97+
- [x] Multiple Model Loading Schemes:
98+
- Support loading local embedding models to address Hugging Face network connectivity issues.
99+
- Support loading various pretrained model embedding layers.
100+
- [x] Data Isolation Capability
101+
- Environment Isolation: Can pull different database configurations based on the environment to achieve environment isolation (dev, prepub, prod).
102+
- Multi-tenant Data Isolation: Dynamically create collections based on the model for data isolation, addressing data isolation issues in multi-model/services scenarios in LLMs products.
103+
- [x] Support for System Commands: Adopting a concatenation approach to address the issue of system commands in the prompt format.
104+
- [x] Differentiation of Long and Short Texts: Long texts pose more challenges for similarity evaluation. To address this, we have added differentiation between long and short texts, allowing for separate configuration of threshold values for determining similarity.
105+
- [x] Milvus Performance Optimization: The consistency_level of Milvus has been adjusted to "Session" level, which can result in better performance.
106+
- [x] Data Management Capability:
107+
- Ability to clear the cache, used for data management after model upgrades.
108+
- Hitquery recall for subsequent data analysis and model iteration reference.
109+
- Asynchronous log write-back capability for data analysis and statistics.
110+
- Added model field and data statistics field for feature expansion.
113111

114-
- [ ] 基于超参数的数据隔离
115-
- [ ] system promt分区存储能力,以提高相似度匹配的准确度和效率
116-
- [ ] 更通用的embedding模型和相似度评估算法
117-
## 致谢
118-
本项目参考了以下开源项目,在此对相关项目和研究开发人员表示感谢。<br />[GPTCache](https://github.com/zilliztech/GPTCache)
112+
Future Features Under Development:
119113

114+
- [ ] Data isolation based on hyperparameters.
115+
- [ ] System prompt partitioning storage capability to enhance accuracy and efficiency of similarity matching.
116+
- [ ] More versatile embedding models and similarity evaluation algorithms.
117+
## Acknowledgements
118+
This project has referenced the following open-source projects. We would like to express our gratitude to the projects and their developers for their contributions and research.<br />[GPTCache](https://github.com/zilliztech/GPTCache)

README_CN.md

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
<div align="center">
2+
<h1>
3+
Codefuse-ModelCache
4+
</h1>
5+
</div>
6+
7+
<p align="center">
8+
<div align="center">
9+
<h4 align="center">
10+
<p>
11+
<b>中文</b> |
12+
<a href="https://github.com/codefuse-ai/CodeFuse-ModelCache/blob/main/README.md">English</a>
13+
<p>
14+
</h4>
15+
</div>
16+
17+
## Contents
18+
- [新闻](#新闻)
19+
- [项目简介](#项目简介)
20+
- [快速部署](#快速部署)
21+
- [服务访问](#服务访问)
22+
- [文章](#文章)
23+
- [架构大图](#架构大图)
24+
- [核心功能](#核心功能)
25+
## 新闻
26+
[2023.10.31] codefuse-ModelCache...
27+
## 项目简介
28+
Codefuse-ModelCache 是一个开源的大模型语义缓存系统,通过缓存已生成的模型结果,降低类似请求的响应时间,提升用户体验。该项目从服务优化角度出发,引入缓存机制,在资源有限和对实时性要求较高的场景下,帮助企业和研究机构降低推理部署成本、提升模型性能和效率、提供规模化大模型服务。我们希望通过开源,分享交流大模型语义Cache的相关技术。
29+
## 快速部署
30+
### 环境依赖
31+
32+
- python版本: 3.8及以上
33+
- 依赖包安装:
34+
```shell
35+
pip install requirements.txt
36+
```
37+
38+
### 环境配置
39+
在启动服务前,应该进行如下环境配置:
40+
41+
1. 安装关系数据库 mysql, 导入sql创建数据表,sql文件: reference_doc/create_table.sql
42+
2. 安装向量数据库milvus
43+
3. 在配置文件中添加数据库访问信息,配置文件为:
44+
1. modelcache/config/milvus_config.ini
45+
2. modelcache/config/mysql_config.ini
46+
4. 离线模型bin文件下载, 参考地址:[https://huggingface.co/shibing624/text2vec-base-chinese/tree/main](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main),并将下载的bin文件,放到 model/text2vec-base-chinese 文件夹中
47+
5. 通过flask4modelcache.py脚本启动后端服务。
48+
## 服务访问
49+
当前服务以restful API方式提供3个核心功能:数据写入,cache查询和cache数据清空。请求demo 如下:
50+
### cache写入
51+
```python
52+
import json
53+
import requests
54+
url = 'http://127.0.0.1:5000/modelcache'
55+
type = 'insert'
56+
scope = {"model": "CODEGPT-1008"}
57+
chat_info = [{"query": [{"role": "system", "content": "你是一个AI代码助手, 你必须提供中立的、无害的答案帮助用户解决代码相关的问题"}, {"role": "user", "content": "你是谁?"}],
58+
"answer": "你好,我是智能助手,请问有什么能帮您!"}]
59+
data = {'type': type, 'scope': scope, 'chat_info': chat_info}
60+
headers = {"Content-Type": "application/json"}
61+
res = requests.post(url, headers=headers, json=json.dumps(data))
62+
```
63+
### cache查询
64+
```python
65+
import json
66+
import requests
67+
url = 'http://127.0.0.1:5000/modelcache'
68+
type = 'query'
69+
scope = {"model": "CODEGPT-1008"}
70+
query = [{"role": "system", "content": "你是一个AI代码助手, 你必须提供中立的、无害的答案帮助用户解决代码相关的问题"}, {"role": "user", "content": "你是谁?"}]
71+
data = {'type': type, 'scope': scope, 'query': query}
72+
73+
headers = {"Content-Type": "application/json"}
74+
res = requests.post(url, headers=headers, json=json.dumps(data))
75+
```
76+
### cache清空
77+
```python
78+
import json
79+
import requests
80+
url = 'http://127.0.0.1:5000/modelcache'
81+
type = 'remove'
82+
scope = {"model": "CODEGPT-1008"}
83+
remove_type = 'truncate_by_model'
84+
data = {'type': type, 'scope': scope, 'remove_type': remove_type}
85+
86+
headers = {"Content-Type": "application/json"}
87+
res = requests.post(url, headers=headers, json=json.dumps(data))
88+
```
89+
## 文章
90+
敬请期待
91+
## 架构大图
92+
![modelcache modules](docs/modelcache_modules.png)
93+
## 核心功能
94+
在ModelCache中,沿用了GPTCache的主要思想,包含了一系列核心模块:adapter、embedding、similarity和data_manager。adapter模块主要功能是处理各种任务的业务逻辑,并且能够将embedding、similarity、data_manager等模块串联起来;embedding模块主要负责将文本转换为语义向量表示,它将用户的查询转换为向量形式,并用于后续的召回或存储操作;rank模块用于对召回的向量进行相似度排序和评估;data_manager模块主要用于管理数据库。同时,为了更好的在工业界落地,我们做了架构和功能上的升级,如下:
95+
96+
- [x] 架构调整(轻量化集成):以类redis的缓存模式嵌入到大模型产品中,提供语义缓存能力,不会干扰LLM调用和安全审核等功能,适配所有大模型服务。
97+
- [x] 多种模型加载方案:
98+
- 支持加载本地embedding模型,解决huggingface网络连通问题
99+
- 支持加载多种预训练模型embeding层
100+
- [x] 数据隔离能力
101+
- 环境隔离:可依据环境,拉取不同的数据库配置,实现环境隔离(开发、预发、生产)
102+
- 多租户数据隔离:根据模型动态创建collection,进行数据隔离,用于大模型产品中多个模型/服务数据隔离问题
103+
- [x] 支持系统指令:采用拼接的方式,解决propmt范式中sys指令问题。
104+
- [x] 长短文本区分:长文本会给相似评估带来更多挑战,增加了长短文本的区分,可单独配置判断阈值。
105+
- [x] milvus性能优化:milvus consistency_level调整为"Session"级别,可以得到更好的性能。
106+
- [x] 数据管理能力:
107+
- 一键清空缓存的能力,用于模型升级后的数据管理。
108+
- 召回hitquery,用于后续的数据分析和模型迭代参考。
109+
- 异步日志回写能力,用于数据分析和统计
110+
- 增加model字段和数据统计字段,用于功能拓展。
111+
112+
未来会持续建设的功能:
113+
114+
- [ ] 基于超参数的数据隔离
115+
- [ ] system promt分区存储能力,以提高相似度匹配的准确度和效率
116+
- [ ] 更通用的embedding模型和相似度评估算法
117+
## 致谢
118+
本项目参考了以下开源项目,在此对相关项目和研究开发人员表示感谢。<br />[GPTCache](https://github.com/zilliztech/GPTCache)
119+

0 commit comments

Comments
 (0)