Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .script/integration_test_compose.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:9.1.3
image: docker.elastic.co/elasticsearch/elasticsearch:9.1.5
environment:
# 单节点模式配置
- discovery.type=single-node
Expand All @@ -17,7 +17,7 @@ services:
# 检查插件是否已安装
if ! elasticsearch-plugin list | grep -q analysis-ik; then
echo '🔧 安装IK插件...'
elasticsearch-plugin install --batch https://release.infinilabs.com/analysis-ik/stable/elasticsearch-analysis-ik-9.1.3.zip
elasticsearch-plugin install --batch https://release.infinilabs.com/analysis-ik/stable/elasticsearch-analysis-ik-9.1.5.zip
echo '✅ IK插件安装完成'
fi
# 启动ES
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# 使用官方 Python 镜像
FROM python:3.12-slim
FROM python:3.13-slim

# 设置环境变量
ENV PYTHONUNBUFFERED=1 \
Expand Down
74 changes: 71 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

### 环境要求

- **Python**: 3.12
- **Python**: 3.x
- **uv**: 现代Python包管理器(强烈推荐)
- **Elasticsearch**: 9.x(用于文档存储和搜索)

Expand Down Expand Up @@ -50,7 +50,7 @@ make setup
```

`make setup` 会自动:
- ✅ 创建 Python 3.12 虚拟环境
- ✅ 创建 Python 虚拟环境
- ✅ 安装所有依赖(包括开发工具)
- ✅ 配置 Git 预提交钩子

Expand Down Expand Up @@ -293,4 +293,72 @@ Docker 打包的时候,忽略掉了很多文件,具体可以参考项目下

因此在使用 docker 来部署的时候,必须挂载:
- .env 文件
- config.yaml 文件
- config.yaml 文件

## 升级项目依赖

### 升级Python版本

1. 手动更新 pyproject.toml 配置文件中所有与 Python 版本相关的字段:

- 更新项目元数据 [project]:

```diff
- requires-python = ">=3.12"
+ requires-python = ">=3.13"
```

- 更新 Ruff 配置 [tool.ruff]:

```diff
- target-version = "py312"
+ target-version = "py313"
```

- 更新 MyPy 配置 [tool.mypy]:

```diff
- python_version = "3.12"
+ python_version = "3.13"
```

2. 使用 uv 重建虚拟环境


```bash
# 1. (推荐) 删除旧的虚拟环境,确保一个完全纯净的开始
rm -rf .venv

# 2. 使用 uv 创建一个新的虚拟环境。如果找不到,它会下载一个!
uv venv --python 3.13

# 3. 激活新创建的环境
source .venv/bin/activate

# 4. 同步所有依赖到新环境
# uv sync 会读取 uv.lock 文件,在新环境中精确安装所有包
uv sync
```

### 升级其他依赖版本

> 使用 uv add 命令可以一站式完成版本更新、pyproject.toml 文件修改、锁文件更新和环境安装。

1. 检查所有过时的依赖名称:

```bash
uv pip list --outdated
```

2. 逐个升级`pyproject.toml`中定义的**直接**依赖,不要更新**间接依赖**:

```bash
uv add pydantic@latest
uv add langchain@latest
```

3. 升级后务必运行测试:

```bash
pytest
```
11 changes: 8 additions & 3 deletions app/utils/embedders/sentence_transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
# See the License for the specific language governing permissions and
# limitations under the License.

from sentence_transformers import SentenceTransformer


class SentenceTransformerEmbedder:
def __init__(self, model_name: str, similarity: str) -> None:
Expand All @@ -20,8 +22,6 @@ def __init__(self, model_name: str, similarity: str) -> None:
:param model_name: 模型名称
:param similarity: 相似性算法名称 cosine,dot_product
"""
from sentence_transformers import SentenceTransformer

self.model = SentenceTransformer(model_name)
self._similarity = similarity

Expand All @@ -30,7 +30,12 @@ def embed_documents(self, texts: list[str]) -> list[list[float]]:

@property
def dimensions(self) -> int:
return int(self.model.get_sentence_embedding_dimension())
d = self.model.get_sentence_embedding_dimension()
if d is None:
raise RuntimeError(
"SentenceTransformerEmbedder: dimension cannot be None"
)
return d

@property
def similarity_metric(self) -> str:
Expand Down
2 changes: 1 addition & 1 deletion config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ elasticsearch:
request_timeout: 60

embedder:
model_name: "shibing624/text2vec-base-chinese" # "BAAI/bge-base-zh-v1.5"
model_name: "BAAI/bge-base-zh-v1.5 " # "shibing624/text2vec-base-chinese" # "BAAI/bge-base-zh-v1.5"
dimensions: 768
similarity_metric: "cosine"
index_type: "int8_hnsw" # 可选: "int8_hnsw", "hnsw", "flat"
Expand Down
24 changes: 11 additions & 13 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,25 +1,27 @@
[project]
name = "kbase"
version = "0.1.0"
requires-python = ">=3.12,<3.13"
requires-python = ">=3.13"
dependencies = [
"cos-python-sdk-v5>=1.9.38",
"elasticsearch>=9.1.0",
"fastapi>=0.116.1",
"fastapi>=0.118.2",
"gunicorn>=23.0.0",
"langchain>=0.3.27",
"langchain-community>=0.3.29",
"langchain-community>=0.3.31",
"mistune>=3.1.4",
"numpy>=1.24,<2.0",
"numpy>=2.3.3",
"pdfplumber>=0.11.7",
"pydantic>=2.12.0",
"pydantic-settings>=2.10.1",
"pypdf>=6.0.0",
"python-multipart>=0.0.20",
"pyyaml>=6.0.2",
"ruff>=0.14.0",
"scipy>=1.16.1",
"sentence-transformers>=2.3.0,<2.7.0",
"torch>=2.1.0",
"transformers>=4.35.0,<4.50.0",
"sentence-transformers>=5.1.1",
"torch>=2.8.0",
"transformers>=4.57.0",
"unstructured[docx,pdf,pptx,xlsx]>=0.18.14",
"uvicorn>=0.35.0",
]
Expand All @@ -41,7 +43,7 @@ dev = [

[tool.ruff]
line-length = 80
target-version = "py312"
target-version = "py313"
lint.extend-select = [
"B", # flake8-bugbear
"I", # isort
Expand Down Expand Up @@ -71,7 +73,7 @@ suppress-none-returning = false # 检查返回None的函数
strict = true # 严格的类型导入检查

[tool.mypy]
python_version = "3.12"
python_version = "3.13"
packages = ["app", "tests"]
strict = true
warn_unreachable = true # 警告无法到达的代码
Expand Down Expand Up @@ -103,7 +105,3 @@ python_classes = "Test*"
python_functions = "test_*"

[tool.uv]
required-environments = [
"sys_platform == 'darwin' and platform_machine == 'arm64'",
"sys_platform == 'darwin' and platform_machine == 'x86_64'"
]
4 changes: 2 additions & 2 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,14 +84,14 @@ def _builder(file_names: str | list[str]) -> Path | list[Path]: # noqa: F811


@pytest.fixture(scope="module")
def client() -> Generator[TestClient, Any, None]:
def client() -> Generator[TestClient, Any]:
"""V2 API 测试客户端"""
with TestClient(app) as test_client:
yield test_client


@pytest.fixture(scope="session")
def es_client() -> Generator[Elasticsearch, Any, None]:
def es_client() -> Generator[Elasticsearch, Any]:
"""
Elasticsearch客户端 - 用于测试ES相关操作
"""
Expand Down
2 changes: 1 addition & 1 deletion tests/web/document/save_endpoint_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ class TestSaveEndpoint:
@pytest.fixture(scope="class", autouse=True)
def setup_test_index(
self, es_client: Elasticsearch
) -> Generator[None, Any, None]:
) -> Generator[None, Any]:
"""设置测试索引"""
# 如果索引存在则删除(清理之前的测试)
if es_client.indices.exists(index=self.TEST_INDEX):
Expand Down
2 changes: 1 addition & 1 deletion tests/web/document/structured_search_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ class TestStructuredSearch:
@pytest.fixture(scope="class", autouse=True)
def setup_environment(
self, client: TestClient, es_client: Elasticsearch
) -> Generator[None, Any, None]:
) -> Generator[None, Any]:
"""准备测试环境(索引+数据)"""

# 1. 清理已存在的索引
Expand Down
2 changes: 1 addition & 1 deletion tests/web/document/vector_hybrid_search_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ def setup_environment(
self,
client: TestClient,
es_client: Elasticsearch,
) -> Generator[None, Any, None]:
) -> Generator[None, Any]:
"""准备测试环境(索引+数据)"""

# 1. 清理已存在的索引
Expand Down
Loading