模型的部署参考： [learn-llm-deploy-easily](https://gitee.com/coderwillyan/learn-llm-deploy-easily) 

这里主要介绍如何调用已部署的模型

# rerank

## 使用reranker API

In [6]:
from langchain.retrievers.document_compressors import CohereRerank
import cohere
from langchain_core.documents import Document

cohere_client = cohere.Client(api_key="Tahx1eySFbKvu9sTyTXrRLf59la3ZUG9vy02stRZ")

compressor = CohereRerank(
    client=cohere_client,
    top_n=3,
    model="rerank-multilingual-v3.0"  # 支持多语言的新版本
)

# 测试样例数据
documents = [
    Document(page_content="巴黎是法国的首都，也是著名的艺术文化中心。", metadata={"source": "wiki"}),
    Document(page_content="北京是中国的政治和文化中心，拥有紫禁城等历史建筑。", metadata={"source": "gov"}),
    Document(page_content="Capital punishment refers to the death penalty in legal systems.", metadata={"source": "law"}),
    Document(page_content="东京是日本最大的城市，也是全球重要的经济中心。", metadata={"source": "news"}),
    Document(page_content="Washington D.C. is the capital of the United States.", metadata={"source": "edu"}),
    Document(page_content="首尔是韩国的首都，以现代化与传统文化的融合闻名。", metadata={"source": "travel"}),
    Document(page_content="Capitalization in finance refers to the total value of a company's shares.", metadata={"source": "finance"})
]

query = "各国首都城市的介绍有哪些？"

# 执行重排序
compressed_docs = compressor.compress_documents(documents=documents, query=query)

# 打印排序结果
print("===== 重排序后的Top 5文档 =====")
for i, doc in enumerate(compressed_docs):
    print(f"Rank {i+1} (Score: {doc.metadata['relevance_score']:.4f}):")
    print(f"内容：{doc.page_content}")
    print(f"元数据：{doc.metadata}\n{'-'*50}")

===== 重排序后的Top 5文档 =====
Rank 1 (Score: 0.3492):
内容：首尔是韩国的首都，以现代化与传统文化的融合闻名。
元数据：{'source': 'travel', 'relevance_score': 0.34919977}
--------------------------------------------------
Rank 2 (Score: 0.3157):
内容：Washington D.C. is the capital of the United States.
元数据：{'source': 'edu', 'relevance_score': 0.31573597}
--------------------------------------------------
Rank 3 (Score: 0.2335):
内容：巴黎是法国的首都，也是著名的艺术文化中心。
元数据：{'source': 'wiki', 'relevance_score': 0.23353152}
--------------------------------------------------
Rank 4 (Score: 0.1166):
内容：东京是日本最大的城市，也是全球重要的经济中心。
元数据：{'source': 'news', 'relevance_score': 0.11656274}
--------------------------------------------------
Rank 5 (Score: 0.0390):
内容：北京是中国的政治和文化中心，拥有紫禁城等历史建筑。
元数据：{'source': 'gov', 'relevance_score': 0.03904829}
--------------------------------------------------


In [7]:
from langchain_community.document_compressors import JinaRerank  # 使用Jina的rerank组件

# Jina Rerank配置
JINA_API_KEY = "jina_63bb115e2d5f42d581f42643294792b5CE4nrEINMDcT4vJZJaSLcr5tkbIB"  # 替换为你的Jina API密钥

compressor = JinaRerank(
    jina_api_key=JINA_API_KEY,
    top_n=3,
    model="jina-reranker-v2-base-multilingual"  # Jina的多语言rerank模型[5](@ref)
)

# 测试样例数据
documents = [
    Document(page_content="巴黎是法国的首都，也是著名的艺术文化中心。", metadata={"source": "wiki"}),
    Document(page_content="北京是中国的政治和文化中心，拥有紫禁城等历史建筑。", metadata={"source": "gov"}),
    Document(page_content="Capital punishment refers to the death penalty in legal systems.", metadata={"source": "law"}),
    Document(page_content="东京是日本最大的城市，也是全球重要的经济中心。", metadata={"source": "news"}),
    Document(page_content="Washington D.C. is the capital of the United States.", metadata={"source": "edu"}),
    Document(page_content="首尔是韩国的首都，以现代化与传统文化的融合闻名。", metadata={"source": "travel"}),
    Document(page_content="Capitalization in finance refers to the total value of a company's shares.", metadata={"source": "finance"})
]

query = "各国首都城市的介绍有哪些？"

# 执行重排序
compressed_docs = compressor.compress_documents(documents=documents, query=query)

# 打印排序结果
print("===== 重排序后的Top 3文档 =====")
for i, doc in enumerate(compressed_docs):
    print(f"Rank {i+1} (Score: {doc.metadata['relevance_score']:.4f}):")
    print(f"内容：{doc.page_content}")
    print(f"元数据：{doc.metadata}\n{'-'*50}")

===== 重排序后的Top 5文档 =====
Rank 1 (Score: 0.2323):
内容：Washington D.C. is the capital of the United States.
元数据：{'source': 'edu', 'relevance_score': 0.23231014609336853}
--------------------------------------------------
Rank 2 (Score: 0.1541):
内容：巴黎是法国的首都，也是著名的艺术文化中心。
元数据：{'source': 'wiki', 'relevance_score': 0.154057577252388}
--------------------------------------------------
Rank 3 (Score: 0.1432):
内容：首尔是韩国的首都，以现代化与传统文化的融合闻名。
元数据：{'source': 'travel', 'relevance_score': 0.14318770170211792}
--------------------------------------------------


## 使用开源的Reranker模型

！modelscope download --model BAAI/bge-reranker-base --cache_dir /opt/workspace/models

模型路径：  /opt/workspace/models/BAAI/bge-reranker-base

### Sentence Transformers部署

### HuggingFaceCrossEncoder部署

In [None]:
# 先将模型下载到本地
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import CrossEncoderReranker
from langchain_community.cross_encoders import HuggingFaceCrossEncoder

model = HuggingFaceCrossEncoder(model_name="/opt/workspace/models/BAAI/bge-reranker-base")
compressor = CrossEncoderReranker(model=model, top_n=3)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever
)

compressed_docs = compression_retriever.invoke("What is the plan for the economy?")
pretty_print_docs(compressed_docs)

In [None]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import CrossEncoderReranker
from langchain_community.cross_encoders import HuggingFaceCrossEncoder

model = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-base")
compressor = CrossEncoderReranker(model=model, top_n=3)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever
)

compressed_docs = compression_retriever.invoke("What is the plan for the economy?")
pretty_print_docs(compressed_docs)

### ollama部署

### xinference部署