# LangChain 应用开发指南（基础）  👨‍🍳👩‍🍳

*本教程基于[LangChain概念文档](https://docs.langchain.com/docs/)*

**目标：** 通过[ELI5](https://www.dictionary.com/e/slang/eli5/#:~:text=ELI5%20is%20short%20for%20%E2%80%9CExplain,a%20complicated%20question%20or%20problem.)示例和代码片段，提供对LangChain组件和用例的初步理解。有关用例，请查看[第2部分](https://github.com/gkamradt/langchain-tutorials/blob/main/LangChain%20Cookbook%20Part%202%20-%20Use%20Cases.ipynb)。查看本笔记本的[视频教程](https://www.youtube.com/watch?v=2xxziIWmaSA)。

**链接：**
* [LC概念文档](https://docs.langchain.com/docs/)
* [LC Python文档](https://python.langchain.com/en/latest/)
* [LC Javascript/Typescript文档](https://js.langchain.com/docs/)
* [LC Discord](https://discord.gg/6adMQxSpJS)
* [www.langchain.com](https://langchain.com/)
* [LC Twitter](https://twitter.com/LangChainAI)

### **什么是LangChain？**
> LangChain是一个基于语言模型的应用程序开发框架。

**~~TL~~DR**：LangChain使与AI模型的工作和构建中的复杂部分变得更容易。它通过两种方式实现：

1. **集成** - 将外部数据（例如您的文件、其他应用程序和API数据）引入到您的LLMs中
2. **代理** - 允许您的LLMs通过决策与其环境进行交互。使用LLMs帮助决定下一步采取的行动

### **为什么选择LangChain？**
1. **组件** - LangChain使得易于替换与语言模型工作所必需的抽象和组件。

2. **定制链** - LangChain提供了对使用和定制“链”（一系列串联的操作）的开箱即用支持。

3. **速度 🚢** - 这个团队交付速度极快。您将始终了解最新的LLM功能。

4. **社区 👥** - 出色的Discord和社区支持、见面会、黑客马拉松等。

尽管LLMs可能很简单（输入文本，输出文本），但一旦您开发更复杂的应用程序，您很快就会遇到LangChain帮助解决的摩擦点。

*注意：本教程不会涵盖LangChain的所有方面。其内容经过精心策划，旨在让您尽快开始构建和产生影响。有关更多信息，请查看[LangChain概念文档](https://docs.langchain.com/docs/)*

*2023年10月更新：此笔记本已从其原始形式进行了扩展*

您需要一个OpenAI API密钥才能跟随本教程。您可以将其作为环境变量，放在此Jupyter笔记本所在的.env文件中，或者将其插入到下面的'YourAPIKey'处。如果对此有疑问，请将这些说明放入[ChatGPT](https://chat.openai.com/)。

[如何获取 OpenAI APIKey？](./获取OpenAI%20APIKey.md)

In [1]:
from dotenv import load_dotenv
import os

load_dotenv()

openai_api_key=os.getenv('OPENAI_API_KEY', 'YourAPIKey')

# LangChain 组件

## 模式 - 大型语言模型（LLMs）的基本工作原理

### **文本**
与LLMs进行自然语言交互的方式

In [2]:
# You'll be working with simple strings (that'll soon grow in complexity!)
my_text = "What day comes after Friday?"
my_text

'What day comes after Friday?'

### **聊天消息**
类似文本，但指定了消息类型（系统、人类、AI）

* **系统** - 提供有用的背景信息，告诉AI应该做什么
* **人类** - 代表用户意图的消息
* **AI** - 显示AI的响应消息

更多信息，请参阅OpenAI的[文档](https://platform.openai.com/docs/guides/chat/introduction)

In [3]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage, AIMessage

# This it the language model we'll use. We'll talk about what we're doing below in the next section
chat = ChatOpenAI(temperature=.7, openai_api_key=openai_api_key)

现在让我们创建一些消息，模拟与机器人的聊天体验。

In [4]:
chat(
    [
        SystemMessage(content="You are a nice AI bot that helps a user figure out what to eat in one short sentence"),
        HumanMessage(content="I like tomatoes, what should I eat?")
    ]
)

AIMessage(content='You could try a caprese salad with fresh tomatoes, mozzarella, and basil.')

您还可以传递更多的聊天历史记录，包括AI的响应。

In [5]:
chat(
    [
        SystemMessage(content="You are a nice AI bot that helps a user figure out where to travel in one short sentence"),
        HumanMessage(content="I like the beaches where should I go?"),
        AIMessage(content="You should go to Nice, France"),
        HumanMessage(content="What else should I do when I'm there?")
    ]
)

AIMessage(content='You should also explore the charming streets of the Old Town and indulge in delicious French cuisine.')

如果需要，您也可以排除系统消息。

In [6]:
chat(
    [
        HumanMessage(content="What day comes after Thursday?")
    ]
)

AIMessage(content='Friday')

### **文档**
一个包含文本和元数据（关于该文本的更多信息）的对象

In [7]:
from langchain.schema import Document

In [8]:
Document(page_content="This is my document. It is full of text that I've gathered from other places",
         metadata={
             'my_document_id' : 234234,
             'my_document_source' : "The LangChain Papers",
             'my_document_create_time' : 1680013019
         })

Document(page_content="This is my document. It is full of text that I've gathered from other places", metadata={'my_document_id': 234234, 'my_document_source': 'The LangChain Papers', 'my_document_create_time': 1680013019})

但是，如果你不想包含元数据，也可以不包含。

In [9]:
Document(page_content="This is my document. It is full of text that I've gathered from other places")

Document(page_content="This is my document. It is full of text that I've gathered from other places")

## 模型 - 连接到AI大脑的接口

### **语言模型**
一个将文本输入 ➡️ 文本输出的模型！

*看看我如何将我使用的模型从默认模型更改为ada-001（一个非常便宜，性能低下的模型）。在[这里](https://platform.openai.com/docs/models)查看更多模型*

In [10]:
from langchain.llms import OpenAI

llm = OpenAI(model_name="text-ada-001", openai_api_key=openai_api_key)

In [11]:
llm("What day comes after Friday?")

'\n\nSaturday'

### **聊天模型**
一个接收一系列消息并返回消息输出的模型

In [12]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage, AIMessage

chat = ChatOpenAI(temperature=1, openai_api_key=openai_api_key)

In [13]:
chat(
    [
        SystemMessage(content="You are an unhelpful AI bot that makes a joke at whatever the user says"),
        HumanMessage(content="I would like to go to New York, how should I do this?")
    ]
)

AIMessage(content='Why did the math book go to New York? Because it had too many problems and needed a change of scenery!')

### 函数调用模型

[函数调用模型](https://openai.com/blog/function-calling-and-other-api-updates)与聊天模型类似，但有一些额外的特性。它们经过微调，可以提供结构化的数据输出。

当你需要调用外部服务的API或进行数据提取时，这将非常方便。

In [14]:
chat = ChatOpenAI(model='gpt-3.5-turbo-0613', temperature=1, openai_api_key=openai_api_key)

output = chat(messages=
     [
         SystemMessage(content="You are an helpful AI bot"),
         HumanMessage(content="What’s the weather like in Boston right now?")
     ],
     functions=[{
         "name": "get_current_weather",
         "description": "Get the current weather in a given location",
         "parameters": {
             "type": "object",
             "properties": {
                 "location": {
                     "type": "string",
                     "description": "The city and state, e.g. San Francisco, CA"
                 },
                 "unit": {
                     "type": "string",
                     "enum": ["celsius", "fahrenheit"]
                 }
             },
             "required": ["location"]
         }
     }
     ]
)
output

AIMessage(content='', additional_kwargs={'function_call': {'name': 'get_current_weather', 'arguments': '{\n  "location": "Boston, MA"\n}'}})

看到传回给我们的额外的`additional_kwargs`了吗？我们可以拿到它并传递给外部API以获取数据。这省去了进行输出解析的麻烦。

### **文本嵌入模型**
将你的文本转换成向量（一系列保持你的文本语义“含义”的数字）。主要用于比较两段文本。

*顺便说一下：语义意味着“与语言或逻辑中的含义相关”。*

In [15]:
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)

In [16]:
text = "Hi! It's time for the beach"

In [17]:
text_embedding = embeddings.embed_query(text)
print (f"Here's a sample: {text_embedding[:5]}...")
print (f"Your embedding is length {len(text_embedding)}")

Here's a sample: [-0.00019600906371495047, -0.0031846734422911363, -0.0007734206914647714, -0.019472001962491232, -0.015092319017854244]...
Your embedding is length 1536


## Prompts - 通常用作指导你的模型的文本

### **Prompt**
你将传递给底层模型的内容

In [18]:
from langchain.llms import OpenAI

llm = OpenAI(model_name="text-davinci-003", openai_api_key=openai_api_key)

# I like to use three double quotation marks for my prompts because it's easier to read
prompt = """
Today is Monday, tomorrow is Wednesday.

What is wrong with that statement?
"""

print(llm(prompt))


The statement is incorrect. Tomorrow is Tuesday, not Wednesday.


### **Prompt Template**
一个基于用户输入、其他非静态信息和固定模板字符串的组合，帮助创建提示的对象。

可以将其视为Python中的[f-string](https://realpython.com/python-f-strings/)，但用于提示。

*高级：查看 [LangSmithHub](https://smith.langchain.com/hub) 获取更多社区提示模板*

In [19]:
from langchain.llms import OpenAI
from langchain import PromptTemplate

llm = OpenAI(model_name="text-davinci-003", openai_api_key=openai_api_key)

# Notice "location" below, that is a placeholder for another value later
template = """
I really want to travel to {location}. What should I do there?

Respond in one short sentence
"""

prompt = PromptTemplate(
    input_variables=["location"],
    template=template,
)

final_prompt = prompt.format(location='Rome')

print (f"Final Prompt: {final_prompt}")
print ("-----------")
print (f"LLM Output: {llm(final_prompt)}")

Final Prompt: 
I really want to travel to Rome. What should I do there?

Respond in one short sentence

-----------
LLM Output: Visit the Colosseum, the Vatican, and the Trevi Fountain.


### **Example Selectors**
一种从一系列示例中选择的简单方法，允许你将上下文信息动态地放入你的提示中。当你的任务具有细微差别或你有一长列示例时，这种方法经常被使用。

查看不同类型的示例选择器[这里](https://python.langchain.com/docs/modules/model_io/prompts/example_selectors/)

如果你想了解为什么示例很重要（提示工程），请查看[这个视频](https://www.youtube.com/watch?v=dOxUroR57xs)

In [20]:
from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.prompts import FewShotPromptTemplate, PromptTemplate
from langchain.llms import OpenAI

llm = OpenAI(model_name="text-davinci-003", openai_api_key=openai_api_key)

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Example Input: {input}\nExample Output: {output}",
)

# Examples of locations that nouns are found
examples = [
    {"input": "pirate", "output": "ship"},
    {"input": "pilot", "output": "plane"},
    {"input": "driver", "output": "car"},
    {"input": "tree", "output": "ground"},
    {"input": "bird", "output": "nest"},
]



In [21]:
# SemanticSimilarityExampleSelector will select examples that are similar to your input by semantic meaning

example_selector = SemanticSimilarityExampleSelector.from_examples(
    # This is the list of examples available to select from.
    examples, 
    
    # This is the embedding class used to produce embeddings which are used to measure semantic similarity.
    OpenAIEmbeddings(openai_api_key=openai_api_key), 
    
    # This is the VectorStore class that is used to store the embeddings and do a similarity search over.
    Chroma, 
    
    # This is the number of examples to produce.
    k=2
)

In [22]:
similar_prompt = FewShotPromptTemplate(
    # The object that will help select examples
    example_selector=example_selector,
    
    # Your prompt
    example_prompt=example_prompt,
    
    # Customizations that will be added to the top and bottom of your prompt
    prefix="Give the location an item is usually found in",
    suffix="Input: {noun}\nOutput:",
    
    # What inputs your prompt will receive
    input_variables=["noun"],
)

In [23]:
# Select a noun!
my_noun = "plant"
# my_noun = "student"

print(similar_prompt.format(noun=my_noun))

Give the location an item is usually found in

Example Input: tree
Example Output: ground

Example Input: bird
Example Output: nest

Input: plant
Output:


In [24]:
llm(similar_prompt.format(noun=my_noun))

' pot'

### **Output Parsers Method 1: Prompt Instructions & String Parsing**
一种格式化模型输出的有用方法。通常用于结构化输出。LangChain在他们的[文档](https://python.langchain.com/docs/modules/model_io/output_parsers)中列出了更多输出解析器。

两个重要概念：

**1. 格式指令** - 一个自动生成的提示，告诉LLM如何根据你想要的结果格式化它的响应

**2. 解析器** - 一种方法，将模型的文本输出提取到一个期望的结构中（通常是json）

In [25]:
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI

In [26]:
llm = OpenAI(model_name="text-davinci-003", openai_api_key=openai_api_key)

In [27]:
# How you would like your response structured. This is basically a fancy prompt template
response_schemas = [
    ResponseSchema(name="bad_string", description="This a poorly formatted user input string"),
    ResponseSchema(name="good_string", description="This is your response, a reformatted response")
]

# How you would like to parse your output
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

In [28]:
# See the prompt template you created for formatting
format_instructions = output_parser.get_format_instructions()
print (format_instructions)

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"bad_string": string  // This a poorly formatted user input string
	"good_string": string  // This is your response, a reformatted response
}
```


In [29]:
template = """
You will be given a poorly formatted string from a user.
Reformat it and make sure all the words are spelled correctly

{format_instructions}

% USER INPUT:
{user_input}

YOUR RESPONSE:
"""

prompt = PromptTemplate(
    input_variables=["user_input"],
    partial_variables={"format_instructions": format_instructions},
    template=template
)

promptValue = prompt.format(user_input="welcom to califonya!")

print(promptValue)


You will be given a poorly formatted string from a user.
Reformat it and make sure all the words are spelled correctly

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"bad_string": string  // This a poorly formatted user input string
	"good_string": string  // This is your response, a reformatted response
}
```

% USER INPUT:
welcom to califonya!

YOUR RESPONSE:



In [30]:
llm_output = llm(promptValue)
llm_output

'```json\n{\n\t"bad_string": "welcom to califonya!", \n\t"good_string": "Welcome to California!"\n}\n```'

In [31]:
output_parser.parse(llm_output)

{'bad_string': 'welcom to califonya!', 'good_string': 'Welcome to California!'}

### **Output Parsers Method 2: OpenAI Functions**
当OpenAI发布函数调用功能时，游戏规则发生了变化。这是刚开始时推荐的方法。

他们专门训练了模型来输出结构化数据。通过指定Pydantic模式来获取结构化输出变得非常简单。

定义模式的方法有很多，我更喜欢使用Pydantic模型，因为它们的组织方式。可以参考OpenAI的[文档](https://platform.openai.com/docs/guides/gpt/function-calling)了解其他方法。

要使用这种方法，你需要使用支持[函数调用](https://openai.com/blog/function-calling-and-other-api-updates#:~:text=Developers%20can%20now%20describe%20functions%20to%20gpt%2D4%2D0613%20and%20gpt%2D3.5%2Dturbo%2D0613%2C)的模型。我将使用`gpt4-0613`

**示例 1: 简单**

让我们开始定义一个简单的模型来提取。

In [32]:
from langchain.pydantic_v1 import BaseModel, Field
from typing import Optional

class Person(BaseModel):
    """Identifying information about a person."""

    name: str = Field(..., description="The person's name")
    age: int = Field(..., description="The person's age")
    fav_food: Optional[str] = Field(None, description="The person's favorite food")

然后，让我们创建一个链（稍后会有更多介绍），它将为我们完成提取工作

In [33]:
from langchain.chains.openai_functions import create_structured_output_chain

llm = ChatOpenAI(model='gpt-4-0613', openai_api_key=openai_api_key)

chain = create_structured_output_chain(Person, llm, prompt)
chain.run(
    "Sally is 13, Joey just turned 12 and loves spinach. Caroline is 10 years older than Sally."
)

Person(name='Sally, Joey, Caroline', age=13, fav_food='spinach')

注意我们只从那个列表中得到了一个人的数据吗？那是因为我们没有指定我们想要多个。让我们改变我们的模式，以指定如果可能的话，我们想要一个人员列表。

In [34]:
from typing import Sequence

class People(BaseModel):
    """Identifying information about all people in a text."""

    people: Sequence[Person] = Field(..., description="The people in the text")

现在我们将调用People而不是Person

In [35]:
chain = create_structured_output_chain(People, llm, prompt)
chain.run(
    "Sally is 13, Joey just turned 12 and loves spinach. Caroline is 10 years older than Sally."
)

People(people=[Person(name='Sally', age=13, fav_food=None), Person(name='Joey', age=12, fav_food='spinach'), Person(name='Caroline', age=23, fav_food=None)])

让我们做更多的解析

**示例 2: 枚举**

现在让我们解析当提到列表中的一个产品时

In [36]:
import enum

llm = ChatOpenAI(model='gpt-4-0613', openai_api_key=openai_api_key)

class Product(str, enum.Enum):
    CRM = "CRM"
    VIDEO_EDITING = "VIDEO_EDITING"
    HARDWARE = "HARDWARE"

In [37]:
class Products(BaseModel):
    """Identifying products that were mentioned in a text"""

    products: Sequence[Product] = Field(..., description="The products mentioned in a text")

In [38]:
chain = create_structured_output_chain(Products, llm, prompt)
chain.run(
    "The CRM in this demo is great. Love the hardware. The microphone is also cool. Love the video editing"
)

Products(products=[<Product.CRM: 'CRM'>, <Product.HARDWARE: 'HARDWARE'>, <Product.VIDEO_EDITING: 'VIDEO_EDITING'>])

## 索引 - 将文档结构化以便LLM可以使用它们

### **Document Loaders**
从其他来源导入数据的简便方法。与[OpenAI 插件](https://openai.com/blog/chatgpt-plugins)共享功能，[特别是检索插件](https://github.com/openai/chatgpt-retrieval-plugin)。

在这里查看文档加载器的[大列表](https://python.langchain.com/docs/integrations/document_loaders/)。[Llama Index](https://llamahub.ai/)上还有更多。

**HackerNews**

In [39]:
from langchain.document_loaders import HNLoader

In [40]:
loader = HNLoader("https://news.ycombinator.com/item?id=34422627")

In [41]:
data = loader.load()

In [42]:
print (f"Found {len(data)} comments")
print (f"Here's a sample:\n\n{''.join([x.page_content[:150] for x in data[:2]])}")

Found 76 comments
Here's a sample:

Ozzie_osman 8 months ago  
             | next [–] 

LangChain is awesome. For people not sure what it's doing, large language models (LLMs) are very Ozzie_osman 8 months ago  
             | parent | next [–] 

Also, another library to check out is GPT Index (https://github.com/jerryjliu/gpt_index)


**Books from Gutenberg Project**

国外免费电子书平台。

In [43]:
from langchain.document_loaders import GutenbergLoader

loader = GutenbergLoader("https://www.gutenberg.org/cache/epub/2148/pg2148.txt")

data = loader.load()

In [44]:
print(data[0].page_content[1855:1984])

      At Paris, just after dark one gusty evening in the autumn of 18-,


      I was enjoying the twofold luxury of meditation 


**URLs 和网页**

让我们试试看 [Paul Graham 的网站](http://www.paulgraham.com/)

In [45]:
from langchain.document_loaders import UnstructuredURLLoader

urls = [
    "http://www.paulgraham.com/",
]

loader = UnstructuredURLLoader(urls=urls)

data = loader.load()

data[0].page_content

'New: \n\nHow to Do Great Work |\nRead |\nWill |\nTruth\n\n\n\n\n\nWant to start a startup? Get funded by Y Combinator.\n\n\n\n\n\n\n\n\n\n© mmxxiii pg'

### **Text Splitters**
很多时候你的文档太长了（比如一本书）对于你的LLM来说。你需要将它分割成块。文本分割器可以帮助做到这一点。

有许多方法可以将你的文本分割成块，尝试[不同的方法](https://python.langchain.com/en/latest/modules/indexes/text_splitters.html)来看哪种最适合你。

In [46]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [47]:
# This is a long document we can split up.
with open('data/PaulGrahamEssays/worked.txt') as f:
    pg_work = f.read()
    
print (f"You have {len([pg_work])} document")

You have 1 document


In [48]:
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size = 150,
    chunk_overlap  = 20,
)

texts = text_splitter.create_documents([pg_work])

In [49]:
print (f"You have {len(texts)} documents")

You have 610 documents


In [50]:
print ("Preview:")
print (texts[0].page_content, "\n")
print (texts[1].page_content)

Preview:
February 2021Before college the two main things I worked on, outside of school,
were writing and programming. I didn't write essays. I wrote what 

beginning writers were supposed to write then, and probably still
are: short stories. My stories were awful. They had hardly any plot,


有许多不同的方法来进行文本分割，这真的取决于你的检索策略和应用设计。在[这里](https://python.langchain.com/docs/modules/data_connection/document_transformers/)查看更多分割器。

### **Retrievers**
将文档与语言模型结合的简便方法。

有许多不同类型的检索器，最广泛支持的是 VectoreStoreRetriever

In [51]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings

loader = TextLoader('data/PaulGrahamEssays/worked.txt')
documents = loader.load()

In [52]:
# Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)

# Split your docs into texts
texts = text_splitter.split_documents(documents)

# Get embedding engine ready
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)

# Embedd your texts
db = FAISS.from_documents(texts, embeddings)

In [53]:
# Init your retriever. Asking for just 1 document back
retriever = db.as_retriever()

In [54]:
retriever

VectorStoreRetriever(tags=['FAISS'], vectorstore=<langchain.vectorstores.faiss.FAISS object at 0x7f8389169070>)

In [55]:
docs = retriever.get_relevant_documents("what types of things did the author want to build?")

In [56]:
print("\n\n".join([x.page_content[:200] for x in docs[:2]]))

standards; what was the point? No one else wanted one either, so
off they went. That was what happened to systems work.I wanted not just to build things, but to build things that would
last.In this di

much of it in grad school.Computer Science is an uneasy alliance between two halves, theory
and systems. The theory people prove things, and the systems people
build things. I wanted to build things. 


### **VectorStores**
用于存储向量的数据库。最受欢迎的有 Mivus、 [Pinecone](https://www.pinecone.io/) 和 [Weaviate](https://weaviate.io/)。在 OpenAI 的[检索文档](https://github.com/openai/chatgpt-retrieval-plugin#choosing-a-vector-database)上有更多示例。[Chroma](https://www.trychroma.com/) 和 [FAISS](https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/) 在本地工作起来很方便。

从概念上讲，把它们想象成带有一个嵌入（向量）列和一个元数据列的表格。

示例

| 嵌入          | 元数据       |
| ----------- | ----------- |
| [-0.00015641732898075134, -0.003165106289088726, ...]      | {'date' : '1/2/23}       |
| [-0.00035465431654651654, 1.4654131651654516546, ...]   | {'date' : '1/3/23}        |

In [57]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings

loader = TextLoader('data/PaulGrahamEssays/worked.txt')
documents = loader.load()

# Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)

# Split your docs into texts
texts = text_splitter.split_documents(documents)

# Get embedding engine ready
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)

In [58]:
print (f"You have {len(texts)} documents")

You have 78 documents


In [59]:
embedding_list = embeddings.embed_documents([text.page_content for text in texts])

In [60]:
print (f"You have {len(embedding_list)} embeddings")
print (f"Here's a sample of one: {embedding_list[0][:3]}...")

You have 78 embeddings
Here's a sample of one: [-0.001058628615053026, -0.01118234211553424, -0.012874804746266883]...


你的向量存储库存储你的嵌入（☝️）并使它们易于搜索

## Memory
帮助LLM记住信息。

Memory是一个相对宽泛的术语。它可以简单到记住你过去聊过的信息，或者是更复杂的信息检索。

我们将其应用于聊天消息用例。这将用于聊天机器人。

有许多类型的记忆，探索[文档](https://python.langchain.com/en/latest/modules/memory/how_to_guides.html)来看哪一个适合你的用例。

### Chat Message History

In [61]:
from langchain.memory import ChatMessageHistory
from langchain.chat_models import ChatOpenAI

chat = ChatOpenAI(temperature=0, openai_api_key=openai_api_key)

history = ChatMessageHistory()

history.add_ai_message("hi!")

history.add_user_message("what is the capital of france?")

In [62]:
history.messages

[AIMessage(content='hi!'),
 HumanMessage(content='what is the capital of france?')]

In [63]:
ai_response = chat(history.messages)
ai_response

AIMessage(content='The capital of France is Paris.')

In [64]:
history.add_ai_message(ai_response.content)
history.messages

[AIMessage(content='hi!'),
 HumanMessage(content='what is the capital of france?'),
 AIMessage(content='The capital of France is Paris.')]

## Chains ⛓️⛓️⛓️
自动结合不同的LLM调用和行动

例如：摘要 #1, 摘要 #2, 摘要 #3 > 最终摘要

查看[这个视频](https://www.youtube.com/watch?v=f9_BWhCI4Zo&t=2s)解释不同的摘要链类型

有许多[链的应用](https://python.langchain.com/docs/modules/chains/)搜索看看哪些最适合你的用例。

我们将介绍其中的两个：

### 1. Simple Sequential Chains

简单的链条，你可以使用一个LLM的输出作为另一个LLM的输入。适合拆分任务（并保持你的LLM专注）

In [65]:
from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains import SimpleSequentialChain

llm = OpenAI(temperature=1, openai_api_key=openai_api_key)

In [66]:
template = """Your job is to come up with a classic dish from the area that the users suggests.
% USER LOCATION
{user_location}

YOUR RESPONSE:
"""
prompt_template = PromptTemplate(input_variables=["user_location"], template=template)

# Holds my 'location' chain
location_chain = LLMChain(llm=llm, prompt=prompt_template)

In [67]:
template = """Given a meal, give a short and simple recipe on how to make that dish at home.
% MEAL
{user_meal}

YOUR RESPONSE:
"""
prompt_template = PromptTemplate(input_variables=["user_meal"], template=template)

# Holds my 'meal' chain
meal_chain = LLMChain(llm=llm, prompt=prompt_template)

In [68]:
overall_chain = SimpleSequentialChain(chains=[location_chain, meal_chain], verbose=True)

In [69]:
review = overall_chain.run("Rome")



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m
A classic dish from Rome is Spaghetti alla Carbonara, featuring egg, Parmesan cheese, black pepper, and pancetta or guanciale.[0m
[33;1m[1;3m
Ingredients:
- 8oz spaghetti 
- 4 tablespoons olive oil
- 4oz diced pancetta or guanciale
- 2 cloves garlic, minced
- 2 eggs, lightly beaten
- 2 tablespoons parsley, chopped 
- ½ cup grated Parmesan 
- Salt and black pepper to taste

Instructions:
1. Bring a pot of salted water to a boil and add the spaghetti. Cook according to package directions. 
2. Meanwhile, add the olive oil to a large skillet over medium-high heat. Add the diced pancetta and garlic, and cook until pancetta is browned and garlic is fragrant.
3. In a medium bowl, whisk together the eggs, parsley, Parmesan, and salt and pepper.
4. Drain the cooked spaghetti and add it to the skillet with the pancetta and garlic. Remove from heat and pour the egg mixture over the spaghetti, stirring to combine. 
5. Serve t

### 2. Summarization Chain

轻松地浏览大量长文档并获取摘要。除了 map-reduce 之外，还可以查看[这个视频](https://www.youtube.com/watch?v=f9_BWhCI4Zo)了解其他链类型

In [70]:
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = TextLoader('data/PaulGrahamEssays/disc.txt')
documents = loader.load()

# Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(chunk_size=700, chunk_overlap=50)

# Split your docs into texts
texts = text_splitter.split_documents(documents)

# There is a lot of complexity hidden in this one line. I encourage you to check out the video above for more detail
chain = load_summarize_chain(llm, chain_type="map_reduce", verbose=True)
chain.run(texts)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"January 2017Because biographies of famous scientists tend to 
edit out their mistakes, we underestimate the 
degree of risk they were willing to take.
And because anything a famous scientist did that
wasn't a mistake has probably now become the
conventional wisdom, those choices don't
seem risky either.Biographies of Newton, for example, understandably focus
more on physics than alchemy or theology.
The impression we get is that his unerring judgment
led him straight to truths no one else had noticed.
How to explain all the time he spent on alchemy
and theology?  Well, smart people are often kind of
crazy.But maybe there is a simpler explanation. Maybe"


CONCISE SUMMARY:[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"the smartness and the craziness were not as sepa

" Biographies tend to omit famous scientists' mistakes from their stories, but Newton was willing to take risks and explore multiple fields to make his discoveries. He placed three risky bets, one of which resulted in the creation of physics as we know it today."

## Agents 🤖🤖

官方LangChain文档完美地描述了代理：
> 一些应用程序不仅需要预定的LLM/其他工具调用链，而且可能需要一个**未知的链**，这取决于用户的输入。在这些类型的链中，有一个“代理”，它可以访问一套工具。根据用户输入，代理可以**决定调用哪些工具（如果有的话）**。

基本上，你使用LLM不仅仅是为了文本输出，而且还用于决策。这个功能的酷炫和强大是无法过分强调的。

Sam Altman强调LLM是很好的'[推理引擎](https://www.youtube.com/watch?v=L_Guz73e6fw&t=867s)'。代理利用了这一点。

### Tools

代理的“能力”。这是一个功能之上的抽象，使LLM（和代理）易于与之交互。例如：Google搜索。

这个领域与[OpenAI插件](https://platform.openai.com/docs/plugins/introduction)有共通之处。

### Toolkit

代理可以选择的工具组

让我们把它们全部整合起来：

In [71]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.llms import OpenAI
import json

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

In [72]:
serpapi_api_key=os.getenv("SERP_API_KEY", "YourAPIKey")

In [73]:
toolkit = load_tools(["serpapi"], llm=llm, serpapi_api_key=serpapi_api_key)

In [74]:
agent = initialize_agent(toolkit, llm, agent="zero-shot-react-description", verbose=True, return_intermediate_steps=True)

In [75]:
response = agent({"input":"what was the first album of the" 
                    "band that Natalie Bergman is a part of?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I should try to find out what band Natalie Bergman is a part of.
Action: Search
Action Input: "Natalie Bergman band"[0m
Observation: [36;1m[1;3m['Natalie Bergman is an American singer-songwriter. She is one half of the duo Wild Belle, along with her brother Elliot Bergman. Her debut solo album, Mercy, was released on Third Man Records on May 7, 2021. She is based in Los Angeles.', 'Natalie Bergman type: American singer-songwriter.', 'Natalie Bergman main_tab_text: Overview.', 'Natalie Bergman kgmid: /m/0qgx4kh.', 'Natalie Bergman genre: Folk.', 'Natalie Bergman parents: Susan Bergman, Judson Bergman.', 'Natalie Bergman born: 1988 or 1989 (age 34–35).', 'Natalie Bergman is an American singer-songwriter. She is one half of the duo Wild Belle, along with her brother Elliot Bergman. Her debut solo album, Mercy, ...'][0m
Thought:[32;1m[1;3m I should search for the first album of Wild Belle
Action: Search
Action Input: "Wild

![Wild Belle](../res/WildBelle1.png)

🎵Enjoy🎵
https://open.spotify.com/track/1eREJIBdqeCcqNCB1pbz7w?si=c014293b63c7478c