<br>
<a href="https://www.nvidia.cn/training/">
    <div style="width: 55%; background-color: white; margin-top: 50px;">
    <img src="https://dli-lms.s3.amazonaws.com/assets/general/nvidia-logo.png"
         width="400"
         height="186"
         style="margin: 0px -25px -5px; width: 300px"/>
</a>
<h1 style="line-height: 1.4;"><font color="#76b900"><b>使用大语言模型（LLM）构建 AI 智能体</h1>
<h2><b>练习 1:</b> 数据集聊天</h2>
<br>

**欢迎回来！这是本课程的第一项练习，看看您能做些什么！**

这个 Notebook 是“主讲”Notebook 后的一个实践练习。该系列练习基于一个共同的数据集，逐步扩展到更复杂的 LLM 交互中。

在之前的 Notebook 中，我们实现了一个基本的多智能体系统来生成合成的多轮对话。虽然这种方法在生成人工对话方面很有用，但它在最终用户应用中的实用性不足。

### **学习目标：**

**在这个 Notebook 中，我们将：**

- 构建一个与数据集交互的简单用户聊天机器人。
- 解决处理过大数据集的问题，它们超出了模型的上下文窗口。
- 开发一个总结工作流以有效地预处理数据。

这个数据集将在整个课程中使用，所以第一步是使其能够进行简单的交互聊天。

<hr><br>

### **第一部分：** 设置一个聊天机器人

从之前的练习中，您可以指定一个简单的聊天机器人，通过一个简单的循环与用户进行交互。基于此，以下函数建立了一个简单的聊天机器人循环，用户可以与 AI 智能体进行交互。如果没有提供处理函数（链），它将默认使用一个未实现的生成器，输出占位消息。

In [None]:
from time import sleep

def not_implemented_gen(state):
    """A placeholder generator that informs users the chain is not yet implemented."""
    message = "Chain Not Implemented. Enter with no inputs or interrupt execution to exit."
    for letter in message:
        yield letter
        sleep(0.005)

def chat_with_chain(state={}, chain=not_implemented_gen):
    """
    Interactive chat function that processes user input through a specified chain.
    
    Parameters:
        state (dict): Maintains chat history and context.
        chain (callable): Function to generate responses based on the chat history.
    """
    assert isinstance(state, dict)
    state["messages"] = state.get("messages", [])
    while True:
        try:
            human_msg = input("\n[Human]:")
            if not human_msg.strip(): break
            agent_msg = ""
            state["messages"] += [("user", human_msg)]
            print(flush=True)
            print("[Agent]: ", end="", flush=True)
            for token in getattr(chain, "stream", chain)(state):
                agent_msg += token
                print(getattr(token, "content", token), end="", flush=True)
            state["messages"] += [("ai", agent_msg)]
        except KeyboardInterrupt:
            print("KeyboardInterrupt")
            break

# Initialize chat with the placeholder generator
chat_with_chain()

<br>

基于此，我们可以定义一个包括 LLM 的对话工作流、一个提示模板和一个初始状态。将提供提示、llm 和输出解析器，请将它们组合到您的 `chat_with_chain` 函数中：

In [None]:
from langchain_nvidia import ChatNVIDIA
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from functools import partial

# Define an NVIDIA-backed LLM
llm = ChatNVIDIA(model="meta/llama-3.1-8b-instruct", base_url="http://nim-llm:8000/v1")

# Define a structured prompt
prompt = ChatPromptTemplate.from_messages([
    ("system",
     "You are a helpful instructor assistant for NVIDIA Deep Learning Institute (DLI). "
     "Assist users with their course-related queries using the provided context. "
     "Do not reference the 'context' as 'context' explicitly."),
    ("user", "<context>\n{context}</context>"),
    ("ai", "Thank you. I will not restart the conversation and will abide by the context."),
    ("placeholder", "{messages}")
])

# Construct the processing pipeline
chat_chain = prompt | llm | StrOutputParser()

# Initialize chatbot state
state = {
    "messages": [("ai", "Hello! I'm the NVIDIA DLI Chatbot! How can I help you?")],
    "context": "",  # Empty for now; will be updated later
}

# Wrap function to integrate AI response generation
chat = partial(chat_with_chain, chain=chat_chain)

# Start the chatbot with the AI pipeline
chat(state)

<hr><br>

### **第二部分：** 引入一些上下文

在本课程中，我们将开始使用来自 GTC 2025 会议的小数据集，每个都是独立的议题，内容不尽相同。这应该让您想起有机累积的数据池……部分原因是它确实是这样。数据可以在 [`gtc-data-2025.csv`](./gtc-data-2025.csv) 中找到，下面就直接加载成列表吧！

In [None]:
import pandas as pd
import json

# Load dataset
filepath = "gtc-data-2025.csv"
df = pd.read_csv(filepath)

# Convert to JSON for structured processing
raw_entries = json.loads(df.to_json(orient="records"))

# Display the first few records
raw_entries[:4]

<br>

我们可以快速将这些数据处理成更自然的格式，然后尝试将它们连接起来，创建一个有效的“上下文字符串”供模型使用。自动创建上下文在实际工作流中非常常见，以增强 LLM，所以在这里做也没理由不这样做。

In [None]:
def stringify(entry, description_key='description'):
    """Formats workshop details into a human-readable string."""
    return (
        f"{entry.get('name')}\n"
        f"Presenters: {entry.get('instructors')}\n"
        f"Description: {entry.get(description_key)}"
    )

# Convert dataset entries to structured text
raw_blurbs = [
    f"[Session {i+1}]\n{stringify(entry)}" 
    for i, entry in enumerate(raw_entries)
]

# Construct full context string
raw_context = "The following workshops are slated to be presented at NVIDIA's GTC 2025 Conference:\n\n"
raw_context += "\n\n".join(raw_blurbs)

# Display context statistics
print(f"Full Context Length (characters): {len(raw_context)}")
print(raw_context[:2000])  # Preview the first portion

<br>

**[TODO] 使用您之前的抽象，将上下文传递到提示词中，看看是否有效：**

In [None]:
## TODO: Initialize your state based on your opinionated chat chain
state = {}

try:
    ## TODO: Perform the conversation with your long context (it's ok if it fails)
    pass
except Exception as e:
    print(e)

<br>

<details><summary><b>提示</b></summary>

记得有一个 `chat` 函数，它封装了可重用的工作流。所以可以使用 `chat(state)`来调用它。然后，只需要弄清楚的提示词中应该放什么。为了提醒您的记忆：

```python
prompt = ChatPromptTemplate.from_messages([
    ("system",
     "You are a helpful instructor assistant for NVIDIA Deep Learning Institute (DLI). "
     "Assist users with their course-related queries using the provided context. "
     "Do not reference the 'context' as 'context' explicitly."),
    ("user", "<context>\n{context}</context>"),
    ("ai", "Thank you. I will not restart the conversation and will abide by the context."),
    ("placeholder", "{messages}")
])

```

</details>
<details><summary><b>参考答案</b></summary>

考虑到提示词期望接收一个包含 `context`（可解释为字符串）和 `messages`（可解释为消息列表，如 `[("user", "Hello World")]`）的字典，我们可以以没有消息历史的状态开始，将 `raw_context` 作为上下文。

```python
state = {"messages": [], "context": raw_context,}
try:
    chat(state)
except Exception as e:
    print(e)

```

</details>
<br>

这可能没有成功，原因也明显。模型的最大上下文为 `2^13 = 8192` 个 token，而当前上下文可能有点长。让我们用一个 [tokenizer](https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct/blob/main/tokenizer.json) 来验证一下相关模型。

In [None]:
from transformers import PreTrainedTokenizerFast

llama_tokenizer = PreTrainedTokenizerFast(tokenizer_file="tokenizer.json", clean_up_tokenization_spaces=True)

def token_len(text):
    return len(llama_tokenizer.encode(text=text))

print(f"String Length of Context: {len(raw_context)}")
print(f"Token Length of Context: {token_len(raw_context)}")

<br>

**您可能会想“难道大多数模型的上下文不是更长吗？”您说的对，但有一些关键的注意事项：**

- 如果使用 API 服务，您仍然会为这些 token 付费，所以长的静态上下文可能不是最好的主意？
- 即使您的上下文长度是被支持的，输入越长，大多数模型的质量仍然会下降。此外，更多的输入意味着有更多相互矛盾的数据和文本结构的机会。
- 对于任意文档甚至文档池，您可能会发现自己经常拓展到最大上下文。即使模型对这个数据集有效，它对数据库是否仍然适用呢？

在我们的例子中，处理的是各种细节和质量的课程描述样本，总体而言，这形成了一个不一致的条目池。

In [None]:
import matplotlib.pyplot as plt
import numpy as np

sorted_raw_blurbs = sorted(raw_blurbs, key=token_len)

def plot_token_len(entries, color="green", alpha=1, len_fn=token_len):
    plt.bar(x=range(len(entries)), height=[len_fn(v) for v in entries], width=1.0, color=color, alpha=alpha)

plot_token_len(sorted_raw_blurbs, color="grey")
plt.xlabel("Entry Sample")
plt.ylabel("Token Length")
plt.show() 

print("SHORTEST ENTRIES:")
sample_blurbs = sorted_raw_blurbs[:5]
# sample_blurbs = sorted_raw_blurbs[:5] + sorted_raw_blurbs[-3:]
for entry in sample_blurbs:
    print(entry, "\n")

<hr><br>

### **第三部分：** 总结长上下文

也许可以把这些条目转换成更统一的形式？作为一个预处理步骤，可以将所有这些条目处理成一个更一致的形式。这不仅有助于模型推理整体上下文，还能利用条目的统一性来提高提示词的一致性。

In [None]:
%%time
## TODO: Create a symmary system message to instruct the LLM.
## Reuse the chat_chain as-it-was, remembering that it expects "messages" and "context"
summary_msg = (
    "Summarize the presentation description down to only a few important sentences."
    " Start with '(Summary) '"
    ## Feel free to customize
)

def summarize(context_str, summary_msg=summary_msg):
    return "(Summary) No summary"

print(summarize(stringify(raw_entries[1])))

<br>

<details><summary><b>参考答案</b></summary>

```python
return chat_chain.invoke({
    "messages": [("user", summary_msg)],
    "context": context_str
})
```

</details>
<br>

这是一种自然语言，不要求格式很好，但我们可以通过适当的提示工程为简单的文本到文本转换函数做好准备。还可以使用 LangChain 的批处理原语来大大简化并发管理：

In [None]:
%%time
from langchain_core.runnables import RunnableLambda
from tqdm.auto import tqdm
import threading

batch_inputs = [stringify(entry_dict) for entry_dict in raw_entries]

## Simple version of a batched process. No progress bar
# summaries = RunnableLambda(summarize).batch(batch_inputs, config={"max_concurrency": 20})

## Modified version which also has progress bars! Marginally-slower, same backbone
def batch_process(fn, inputs, max_concurrency=20):
    lock = threading.Lock()
    pbar = tqdm(total=len(inputs))
    def process_doc(value):
        try:
            output = fn(value)
        except Exception as e: 
            print(f"Exception in thread: {e}")
        with lock:
            pbar.update(1)
        return output
    try:
        lc_runnable = fn if hasattr(fn, "batch") else RunnableLambda(process_doc)
        return lc_runnable.batch(inputs, config={"max_concurrency": max_concurrency})
    finally:
        pbar.close()

summaries = batch_process(summarize, batch_inputs)

<br>

现在可以看看使用这个合成描述而不是原始描述会发生什么，同时考虑上下文长度是如何缩短的。

In [None]:
# def stringify(entry, description_key='description'):
#     return (
#         f"{entry.get('name')}"
#         f"\nPresentors: {entry.get('instructors')}"
#         f"\nDescription: {entry.get(description_key)}"
#     )

for summary, pres_entry in zip(summaries, raw_entries):
    words = summary.split()
    if "summary" in words[0].lower():
        words = words[1:]
    pres_entry["summary"] = " ".join(words)

print(stringify(raw_entries[0], "summary"))
raw_entries[0]

In [None]:
import matplotlib.pyplot as plt
import numpy as np

contexts_with_summaries = [stringify(entry, "summary") for entry in raw_entries]
contexts_with_descripts = [stringify(entry) for entry in raw_entries]

def plot_token_len(entries, color="green", alpha=1, len_fn=token_len):
    plt.bar(x=range(len(entries)), height=[len_fn(v) for v in entries], width=1.0, color=color, alpha=alpha)    

sorted_summs = [v for _,v in sorted(zip((token_len(x) for x in contexts_with_summaries), contexts_with_summaries))]
sorted_origs = [v for _,v in sorted(zip((token_len(x) for x in contexts_with_descripts), contexts_with_descripts))]
aligned_summs = [v for _,v in sorted(zip((token_len(x) for x in contexts_with_descripts), contexts_with_summaries))]
plot_token_len(sorted_origs, alpha=0.6, color="grey")
plot_token_len(sorted_summs, alpha=0.6, color="lightgreen")
plot_token_len(aligned_summs, alpha=0.6, color="green")
plt.xlabel("Entry Sample")
plt.ylabel("Token Length")
plt.show() 

print("Samples:")
sorted_raw_entries = sorted(raw_entries, key=(lambda v: token_len(str(v.get("description")))))
for entry in sorted_raw_entries[:3] + sorted_raw_entries[-3:]:
    print(
        f"{entry.get('name')}"
        f"\nPresentors: {entry.get('instructors')}"
        f"\nDescription: {entry.get('description')}"
        f"\nSummary: {entry.get('summary')}\n\n"
    )

<br>

看起来不错！让我们实现一下，并将这个更改应用到所有的条目上。

In [None]:
def stringify(entry, description_key='description'):
    return (
        f"{entry.get('name')}"
        f"\nPresentors: {entry.get('instructors')}"
        f"\nDescription: {entry.get(description_key)}"
    )

summary_blurbs = [f"{stringify(entry, 'summary')}" for i, entry in enumerate(raw_entries)]

new_context = "The following workshops are slated to be presented at NVIDIA's GTC 2025 Conference:\n\n"
new_context += "\n\n".join(summary_blurbs)
print("New Context Length:", len(new_context))
print(f"New Context Tokens: {token_len(new_context)}")

print(new_context[:2000])

输入大小的阈值突然就降低了！是时候替换上下文并进行测试了。

In [None]:
#############################################################################
## Defined Earlier. Feel free to play around with this

# Define an NVIDIA-backed LLM
llm = ChatNVIDIA(model="meta/llama-3.1-8b-instruct", base_url="http://nim-llm:8000/v1")

# Define a structured prompt
prompt = ChatPromptTemplate.from_messages([
    ("system",
     "You are a helpful instructor assistant for NVIDIA Deep Learning Institute (DLI). "
     "Assist users with their course-related queries using the provided context. "
     "Do not reference the 'context' as 'context' explicitly."),
    ("user", "<context>\n{context}</context>"),
    ("ai", "Thank you. I will not restart the conversation and will abide by the context."),
    ("placeholder", "{messages}")
])

# Construct the processing pipeline
chat_chain = prompt | llm | StrOutputParser()

# Initialize chatbot state
state = {
    "messages": [("ai", "Hello! I'm the NVIDIA DLI Chatbot! How can I help you?")],
    "context": "",  # Empty for now; will be updated later
}

# Wrap function to integrate AI response generation
chat = partial(chat_with_chain, chain=chat_chain)

## Defined Earlier
#############################################################################

state = {
    "messages": [],
    "context": new_context,
}

try:  ## HINT: Consider putting your call logic in the try-catch
    chat(state)
except Exception as e:
    print(e)

In [None]:
## Consider saving the material as well, since it will be useful for later
## For those who may take this course over multiple sessions, a version is provided.
with open("simple_long_context.txt", "w") as f:
    f.write(new_context)

<hr><br>

### **第四部分：** 反思这个练习

这个练习其实很简单，从某种意义上来说，确实引导我们得到了一个有趣的系统。
- 从表面看，我们所做的就是将过长的上下文转换成不过长的上下文。
- 用其它术语来说，我们将全局环境中的元素“规范化”成一个合理的“标准上下文”，以便为主要 LLM 提供支持。
- 悲观地看，我们通过将上下文*稍微缩短*到比最大输入长度更短，创造了一个短期内解决有限上下文空间的多轮问题的方案。
- 乐观地看，现在有了一个可以复用的上下文，这有助于将整个上下文保持在模型的输入域内，适用于大多数单轮问题（包括可能补充多轮解决方案的问题）。

在接下来的 Notebook 中，我们将继续使用这个练习的结果，希望随着了解的深入，这个简单过程的实用性会逐渐显现出来！