### 什么是链？
> 链是连接LangChain组件、管理租组件数据流动的“包装器”。以确保整个LLM的工作流是一个有效的闭环：从提示词->语言链->检索器->输出解析器。

In [1]:
# LLM链的Demo
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
import os

# 构建LLM
llm = OpenAI(
    temperature=0.9,
    openai_api_key=os.environ['OPENAI_API_KEY']
)

  warn_deprecated(


In [2]:
# 创建提示词模板
prompt = PromptTemplate(
    input_variables = ['product'],
    template = "What is a good name for a company that makes {product}?",
)

In [3]:
# 创建LLM链
from langchain.chains import LLMChain
chain = LLMChain(
    llm=llm,
    prompt=prompt,
)

In [4]:
# 运行LLM链
chain.run("colorful socks")

  warn_deprecated(


'\n\nRainbow Feet Co.'

### 链的正确打开方式
1. 准备输入：首先，需要准备一些输入，输入的数据类型应为一个有效的Python字典，其中键由提示词内的占位槽变量（即input_variables）构成，我们需要根据实际的prompt对象来确定需要哪些输入。
2. 实例化链：从`langchain.chains`导入所需的LLM链，并传递一个有效的`PromptTemplate`和与之对应的llm基类。
3. 运行链：使用以下函数来运行LLM链
- `run()`：以同步状态运行LLM请求到服务器
- `arun()`：以异步状态运行LLM请求到服务器
- `apply()`：利用LLM生成方法来提高速度  


这些方法都提供了一些可选参数：
- inputs：字典类型，用于填充到提示词的变量字典
- return_only_outputs(Optional)：布尔值，用于控制是否只返回输出。当为True时，则只返回由这个链生成的新键，反之为False时则返回输入键和由这个键生成的新键，默认为False。
- callbacks(Optional)：布尔值，用于设置Chain运行时需要调用时的回调函数集合。
- include_run_info(Optional)：表示是否在响应中包含运行信息。默认为False。

In [5]:
## 当有多个输入时的演示
from langchain.chat_models import ChatOpenAI
chat = ChatOpenAI(temperature=0,openai_api_key=os.environ['OPENAI_API_KEY'])

prompt_template = "Tell me a {adjective} joke"

llm_chain = LLMChain(
    llm = chat,
    prompt = PromptTemplate.from_template(prompt_template)
)

# 运行LLM链
llm_chain(
    inputs = {"adjective":"corny"}
)

  warn_deprecated(
  warn_deprecated(


{'adjective': 'corny',
 'text': 'Why did the scarecrow win an award?\n\nBecause he was outstanding in his field!'}

`run()`和`LLMChain.__call__()`并不相同，虽然它们的输入都是一个有效的字典，但是其返回的值永远是一个由输出解析器解析后的字符串。

In [6]:
# 使用verbose参数来debugging 链
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

conversation = ConversationChain(
    llm = chat,
    memory = ConversationBufferMemory(), # 使用基于BaseMemory()的记忆类来强化模型的记忆
    verbose = True
)

### 基础链类型
- LLM链(LLMChain)：由提示词模板和模型包装器组成。用于将模板提示词传递给LLM并得到其回应。
- 路由器链(RouterChain)：用于动态地选择给定输入的下一条链。路由器链由两部分组成：路由器链本身和目标链。
- 顺序链(SequentialChain)：顺序链在调用LLM的下一步使用，它特别适合将一次调用的输出作为另一次调研费输入的场景。  

其中分为：
- `SimpleSequentialChain`，针对每一个链都有单一输入和输出的场景。
- `SequentialChain`：针对每一个链具备多个输入和输出的场景。

转换链(Transformation Chain)：用于数据交换，开发者定义自定义`transform()`函数来执行任何数据转换逻辑。该函数接受一个字典(其键由input_variables指定)作为参数并返回另一个字典(其键由output_variables指定)。

In [10]:
# 下载预先准备的数据
import sqlite3
import requests

# 下载SQL
text = requests.get("https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite.sql").text


# 创建数据库
db = sqlite3.connect("./Chinook.db")

# 执行SQL
db.executescript(text)

<sqlite3.Cursor at 0x7f87b498f7c0>

In [12]:
# 加载SQL工具链
from langchain import OpenAI,SQLDatabase
from langchain_experimental.sql import SQLDatabaseChain

db = SQLDatabase.from_uri("sqlite:///Chinook.db")

llm = OpenAI(temperature=0,verbose=True)

db_chain = SQLDatabaseChain.from_llm(llm,db,verbose=True)

db_chain.run("How many employees are there?")



[1m> Entering new SQLDatabaseChain chain...[0m
How many employees are there?
SQLQuery:[32;1m[1;3mSELECT COUNT(*) FROM Employee[0m
SQLResult: [33;1m[1;3m[(8,)][0m
Answer:[32;1m[1;3m8[0m
[1m> Finished chain.[0m


'8'

### 基础链-LLM链

In [16]:
# 加载原新闻
import requests
import bs4

html = requests.get("https://techcrunch.com/2023/02/21/coinbase-shares-rise-q4-2022/").text

soup = bs4.BeautifulSoup(html)

text = soup.get_text()

outputText = ""

ptagList = soup.find("div", class_="article-content").find_all("p")

for ptag in ptagList:
    outputText += ptag.get_text()

In [22]:
from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

llm = OpenAI(temperature=0.9,openai_api_key=os.environ['OPENAI_API_KEY'])

# 创建一个用于提取事实的提示词模板
fact_extraction_prompt = PromptTemplate(
    input_variables = ["text_input"],
    template = (
        "Extract the key facts out of this text.Don't include opinions."
        "Give each fact a number and keep them short sentences.:\n\n"
        "{text_input}"
    )
)

In [23]:
# 创建LLM链
fact_extraction_chain = LLMChain(llm=llm,prompt=fact_extraction_prompt)

facts = fact_extraction_chain.run(outputText)

print(facts)



1. Coinbase released its Q4 2022 earnings report.
2. Revenue in Q4 was $605 million, down from $2.49 billion in the same period last year.
3. Trading volume and revenues decreased due to a 64% decline in overall crypto market capitalization.
4. Coinbase's stock has risen 86% year-to-date.
5. Trading revenue in Q4 was lower than in Q3.
6. Subscription and services revenue increased by 34% in Q4.
7. The number of monthly active developers in crypto has more than doubled since 2020.
8. Major brands like Starbucks, Nike, and Adidas have entered the crypto space.
9. Trading volume for both consumer and institutional users decreased in Q4.
10. The crypto industry is hoping for greater adoption and trading volume.
11. It is unclear if trading interest will pick back up in 2023 or if Coinbase will rely on other sources of revenue.


### 顺序链

In [24]:
# 定义模板
investor_update_prompt = PromptTemplate(
    input_variables = ["facts"],
    template = "You are a Goldman Sachs analyst.Take the following list of facts and use them to write a short paragrah for investors.Don't leave out key info:\n\n {facts}"
)

In [25]:
# 以常规方式运行链，以撰写摘要
investor_update_chain = LLMChain(llm = llm,prompt = investor_update_prompt)
investor_update = investor_update_chain.run(facts)

print(investor_update)
len(investor_update)



According to Coinbase's Q4 2022 earnings report, the company's revenue in the fourth quarter was $605 million, showing a decline from $2.49 billion in the same period last year. This decrease is largely attributed to the overall 64% decline in crypto market capitalization. Despite this, Coinbase's stock has seen an 86% increase year-to-date, indicating a strong performance compared to other companies in the crypto space. It is worth noting that trading revenue in Q4 was lower than in Q3, however, subscription and services revenue saw a healthy increase of 34%. Additionally, there has been a significant increase in the number of monthly active developers in the crypto industry, with major brands like Starbucks, Nike, and Adidas now involved in the space. While Q4 saw a decrease in trading volume for both consumer and institutional users, the industry is hopeful for greater adoption and trading volume in the future. As investors, it is important to note that it is unclear if trading in

1087

In [26]:
# 使用简单组合链将事实提取和摘要进行结合
from langchain.chains import SimpleSequentialChain,SequentialChain

full_chain = SimpleSequentialChain(
    chains = [fact_extraction_chain,investor_update_chain],
    verbose = True
)

response = full_chain.run(outputText)



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m

1. Coinbase released its Q4 2022 earnings report on Tuesday.
2. The company's shares are down in early after-hours trading.
3. In Q4 2022, Coinbase generated $605 million in total revenue, down from $2.49 billion in the previous year.
4. The company reported a net loss of $557 million on a GAAP basis and an adjusted EBITDA deficit of $124 million in Q4.
5. Wall Street expected revenue of $581.2 million and adjusted EBITDA of -$201.8 million.
6. Coinbase's stock had risen 86% year-to-date.
7. The value of Coinbase, measured on a per-share basis, is down significantly from its 52-week high.
8. Consumer and institutional trading volumes declined in Q4 2022.
9. The overall crypto market capitalization fell by $1.5 trillion in 2022.
10. Coinbase's total trading volumes and revenues fell 50% and 66% year-over-year, respectively.
11. Trading revenue at Coinbase also declined in Q4 compared to the previous quarter.
12. Coin

#### 四大文档合并链
- Stuff链：接受一组文档，将它们全部插入一个提示中，然后将其传递给LLM。这种链适合用于文档较小且大部分调用只传入少量文档的应用程序。
- Refine链：通过便利输入文档并爹地啊更新其答案来构建响应。对于每个文档，它将所有非文档输入、当前文档和最新的中间答案传递给LLM链，以获得新的答案。由于Refine链一次只向LLM传递一个文档，因此它非常适合需要分析模型上下文容纳不下的文档任务。很显然，这种链会比Stuff链调用更多的LLM链。此外，还有一些任务很难通过迭代来完成。例如，当文档经常相互交叉引用或任务需要许多文档的详细之间，Refine链的表现可能较差。
- MapReduce链：首先将LLM链单独应用于每个文档(Map)，并将链输出视为新的文档。然后，它将所有新文档传递给一个单独的文档链，以获得单一的输出(Reduce)。如有需要，这个压缩步骤将递归进行。
- 重排链(MapRerank)：与MapReduce链一样，对每个文档运行一个初始提示的指令微调。这个初始提示不仅试图完成一个特定任务（比如回答一个问题或执行一个动作），也为其答案提供了一个置信度得分。然后，这个得分被用来重新排序所有的文档或条目。最终，得分最高的响应被返回。这种机制有助于在多个可能得答案或解决方案，找到最适合、最准确或最相关的一个。重排链通过添加一个重排序或重打分步骤，进一步提高系统性能和准确性。

### LEDVR工作流的终点：“上链”

In [52]:
# 首先从网络加载文档
import requests
import bs4
from langchain_core.documents.base import Document

html = "https://www.runoob.com/rust/rust-tutorial.html"

html = requests.get(html).text

soup = bs4.BeautifulSoup(html)

text = soup.get_text()

ptagList = soup.find("div", class_="article-body").find_all("p")

cleaned = ""

for ptag in ptagList:
    cleaned += ptag.get_text()

cleaned = cleaned.replace("\n","").replace("\r","")

# 创建文档

doc = Document(
    page_content=cleaned,
    language = "No language found."
)


In [56]:
# 记载OpenAI嵌入
from langchain.embeddings.openai import OpenAIEmbeddings
embedding = OpenAIEmbeddings(openai_api_key=os.environ['OPENAI_API_KEY'])

In [55]:
# 创建字符分割
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500,chunk_overlap=0)

splits = text_splitter.split_documents([doc])

In [59]:
# 初始化FAISS向量数据库
from langchain.vectorstores import FAISS
vectordb = FAISS.from_documents(documents=splits,embedding=embedding)


In [60]:
# 初始化检索器
retriever = vectordb.as_retriever()

In [61]:
# 初始化LLM
from langchain.llms import OpenAI
# 导入对话检索链
from langchain.chains import ConversationalRetrievalChain
llm = OpenAI(openai_api_key = os.environ['OPENAI_API_KEY'])

qa = ConversationalRetrievalChain.from_llm(llm=llm,retriever=retriever)

In [64]:
query = "Rust是由谁编写的？"
result = qa({"question":query,"chat_history":""})

result['answer']

' Rust是由Mozilla开发的。'