模型的部署参考： [learn-llm-deploy-easily](https://gitee.com/coderwillyan/learn-llm-deploy-easily) 

这里主要介绍如何调用已部署的模型

# embedding

## 使用embedding API

以zhipu为例

In [11]:
from langchain_community.embeddings import ZhipuAIEmbeddings

my_emb = ZhipuAIEmbeddings(
    model="embedding-2",
    api_key=os.environ["ZHIPUAI_API_KEY"]
)

## 使用本地部署的embedding模型

### Sentence Transformers部署

In [1]:
# Requires transformers>=4.51.0
# Requires sentence-transformers>=2.7.0

from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer(model = "Qwen/Qwen3-Embedding-0.6B")

# We recommend enabling flash_attention_2 for better acceleration and memory saving,
# together with setting `padding_side` to "left":
# model = SentenceTransformer(
#     "Qwen/Qwen3-Embedding-0.6B",
#     model_kwargs={"attn_implementation": "flash_attention_2", "device_map": "auto"},
#     tokenizer_kwargs={"padding_side": "left"},
# )

# The queries and documents to embed
queries = [
    "What is the capital of China?",
    "Explain gravity",
]
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]

# Encode the queries and documents. Note that queries benefit from using a prompt
# Here we use the prompt called "query" stored under `model.prompts`, but you can
# also pass your own prompt via the `prompt` argument
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

# Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)
# tensor([[0.7646, 0.1414],
#         [0.1355, 0.6000]])

ModuleNotFoundError: No module named 'sentence_transformers'

### 使用HuggingFaceEmbeddings部署

In [12]:
from langchain_community.embeddings.huggingface import HuggingFaceEmbeddings

my_emb = HuggingFaceEmbeddings(model_name='/opt/workspace/models/BAAI/bge-small-zh-v1.5')

query = "如何调用HuggingFaceEmbeddings？"

query_vector = my_emb.embed_query(query) 

print("查询向量:", query_vector[:5], "...")

  my_emb = HuggingFaceEmbeddings(model_name='/opt/workspace/models/BAAI/bge-small-zh-v1.5')
  from .autonotebook import tqdm as notebook_tqdm


FileNotFoundError: Path /opt/workspace/models/BAAI/bge-small-zh-v1.5 not found

### 使用ollama部署

In [9]:
# from langchain_community.embeddings import OllamaEmbeddings
from langchain_ollama.embeddings import OllamaEmbeddings
my_emb = OllamaEmbeddings(base_url='http://localhost:11434',model="bge-m3:latest")

In [1]:
from langchain_ollama.embeddings import OllamaEmbeddings

# 初始化嵌入模型
my_emb = OllamaEmbeddings(
    base_url='http://localhost:11434',
    model="bge-m3:latest"
)

# 示例文本
query = "如何调用OllamaEmbeddings？"
documents = [
    "OllamaEmbeddings 用于生成文本嵌入向量。",
    "调用方法包括 embed_query 和 embed_documents。"
]

# 生成嵌入向量
query_vector = my_emb.embed_query(query)
doc_vectors = my_emb.embed_documents(documents)

# 输出结果
print("查询向量:", query_vector[:5], "...")
print("文档向量列表:", [vec[:5] for vec in doc_vectors], "...")

查询向量: [-0.017827146, -0.009976904, -0.03116692, 0.029771997, -0.024388244] ...
文档向量列表: [[-0.029825423, -0.0006194668, -0.031351015, 0.02808682, -0.018649695], [0.002058215, 0.013199092, -0.020036552, 0.024129553, -0.003552613]] ...


### 使用xinference部署

In [None]:
from langchain_community.embeddings import XinferenceEmbeddings

# 替换为你的Xinference服务器URL和模型UID
xinference = XinferenceEmbeddings(
    server_url="http://localhost:9997",  # 注意：原代码中的"loaclhost"拼写错误，应为"localhost"
    model_uid="your_model_uid"  # 替换为实际的模型UID
)

# 输入文本
texts = ["你好，世界。", "LangChain 是一个强大的工具。"]

# 生成嵌入向量
vectors = xinference.embed_documents(texts)

# 打印结果
for idx, vector in enumerate(vectors):
    print(f"文本 {idx + 1}: {texts[idx]}")
    print(f"嵌入向量: {vector[:5]}... (维度: {len(vector)})")