Skip to content

[Bug] Alibaba Cloud Bailian Text Embedding Model vectorization dimension confusing, resulting in retrieval error #4074

@liuchangfitcloud

Description

@liuchangfitcloud

Contact Information

No response

MaxKB Version

v2.1.1 | v1.10.10-lts

Problem Description

接入阿里百炼平台的 text-embedding-v3 及 text-embedding-v4 模型后向量化文件导致返回向量维度不固定,检索报错

Image

查看官网python sdk

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),  # 如果您没有配置环境变量,请在此处用您的API Key进行替换
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"  # 百炼服务的base_url
)

completion = client.embeddings.create(
    model="text-embedding-v4",
    input='衣服的质量杠杠的,很漂亮,不枉我等了这么久啊,喜欢,以后还来这里买',
    dimensions=1024, # 指定向量维度(仅 text-embedding-v3及 text-embedding-v4支持该参数)
    encoding_format="float"
)

print(completion.model_dump_json())

可以指定向量返回维度

Steps to Reproduce

  • 接入百炼 text-embedding-v3 / text-embedding-v4 向量模型
  • 知识库上传文档执行向量化
  • 检索报错

The expected correct result

No response

Related log output

Additional Information

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions