# Part 10: Logical and Semantic routing (逻辑和语义路由)
由llm进行分类，在rag前，先选择合适的数据库。

## Logical routing (逻辑路由)
由llm进行分类，在rag前，先选择合适的数据库。

In [3]:
import os
from pprint import pprint

from typing import Literal

from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI

# 定义llm的返回结果
class RouteQuery(BaseModel):
    """将用户的查询路由到最相关的数据源"""
    datasource: Literal["python_docs", "js_docs", "golang_docs"] = Field(
        ...,
        description="Given a user question choose "
    )

# LLM with function call
llm = ChatOpenAI(
    model=os.getenv("ARK_MODEL"),
    api_key=os.getenv("ARK_API_KEY"),
    base_url=os.getenv("ARK_API_URL"),
    temperature=0.0,
)
structured_llm = llm.with_structured_output(RouteQuery)

# 设计提示词，由llm进行数据源的选择
system_prompt = (
    "You are an expert at routing a user question to "
    "the appropriate data source."
    "Based on the programming language the question is referring to, "
    "route it to the relevant data source."
)
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{question}")
    ]
)

# 定义路由处理链
router = (
    prompt
    | structured_llm
)

In [5]:
# 使用question调用
question = """Why doesn't the following code work:

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(["human", "speak in {language}"])
prompt.invoke("french")
"""
result = router.invoke({"question": question})
pprint(result)

RouteQuery(datasource='python_docs')


In [7]:
# 根据返回的“代码语言”，选择合适的数据库（此处只作示意，没有实现具体的逻辑）
def choose_route(result):
    if "python_docs" in result.datasource.lower():
        return "chain for python_docs"
    elif "js_docs" in result.datasource.lower():
        return "chain for js_docs"
    else:
        return "golang_docs"

from langchain_core.runnables import RunnableLambda

full_chain = (
    router
    | RunnableLambda(choose_route)
)
final_result = full_chain.invoke({"question": question})
pprint(final_result)

'chain for python_docs'


## Semantic routing 语义路由

In [None]:
from langchain.utils.math import cosine_similarity
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnableLambda, RunnablePassthrough

# 两个分别适用于物理、数据场景的提示词
physics_template = """You are a very smart physics professor. \
You are great at answering questions about physics in a concise and easy to understand manner. \
When you don't know the answer to a question you admit that you don't know.

Here is a question:
{query}"""

math_template = """You are a very good mathematician. You are great at answering math questions. \
You are so good because you are able to break down hard problems into their component parts, \
answer the component parts, and then put them together to answer the broader question.

Here is a question:
{query}"""

from ark_embedding import ArkEmbeddings

embd = ArkEmbeddings(
    model=os.getenv("ALIYUN_EMBEDDING_MODEL"),
    api_key=os.getenv("ALIYUN_API_KEY"),
    api_url=os.getenv("ALIYUN_API_URL"),
    batch_size=10
)

# 将prompt向量化
prompt_templates = [physics_template, math_template]
prompt_embeddings = embd.embed_documents(prompt_templates)

In [None]:
# 将问题路由到合适的prompt,并回答问题
def prompt_router(input):
    # question向量化
    query_embedding = embd.embed_query(input["query"])
    # 计算相似度
    similarity = cosine_similarity([query_embedding], prompt_embeddings)[0]
    most_similar = prompt_templates[similarity.argmax()]
    # 选择相似度最高的prompt
    print("Using MATH" if most_similar == math_template else "Using PHYSICS")
    # 显式完成提示词构建（可选，也可仅返回prompt模板）
    prompt = PromptTemplate.from_template(most_similar).invoke(input)
    return prompt

from langchain_core.runnables import RunnableParallel

chain = (
    RunnableParallel({"query": RunnablePassthrough()})
    | RunnableLambda(prompt_router)  # 将获取的“提示词模板+问题”进行拼装，将调用llm
    | llm  # 如果不显示进行拼装，实际也可以运行，因为LangChain维护“执行上下文”，自动进行变量回溯查找。
    | StrOutputParser()
)

result = chain.invoke("What's a black hole?")
pprint(result)

Using PHYSICS
("Of course. That's an excellent and fundamental question.\n"
 '\n'
 'In the simplest terms, a **black hole is a region of space where gravity is '
 'so intense that nothing, not even light, can escape from it.**\n'
 '\n'
 "Let's break that down:\n"
 '\n'
 '1.  **The "Point of No Return":** The outer boundary of a black hole is '
 'called the **event horizon**. Think of it as a one-way door. Once anything—a '
 'spaceship, a planet, or a particle of light (a photon)—crosses this '
 'boundary, it can never come back out. We cannot see what happens inside, '
 'hence the name "black hole."\n'
 '\n'
 '2.  **Why the Gravity is So Strong:** The extreme gravity comes from an '
 'immense amount of mass being crushed into a vanishingly small point at the '
 'very center, called a **singularity**. Imagine crushing the entire mass of '
 'our Sun into a sphere less than 4 miles across, or the entire Earth into a '
 'sphere the size of a marble. This incredible density warps the fabric

# Part 11: 查询构建

## 用于元数据过滤的查询结构化

In [14]:
# 加载YouTube的字幕数据
from langchain_community.document_loaders import YoutubeLoader

docs = YoutubeLoader.from_youtube_url(
    "https://www.youtube.com/watch?v=pbAd8O1Lvm4",
    add_video_info=True,
).load()

docs[0].metadata

HTTPError: HTTP Error 400: Bad Request