# LLMs
大型语言模型(Large Language Models, llm)是LangChain的核心组成部分。LangChain不服务于它自己的llm，而是提供了一个标准接口，用于与许多不同的llm进行交互。具体地说，这个接口接受一个字符串作为输入并返回一个字符串。

有很多LLM提供商(OpenAI, Cohere, hug Face等)- LLM类旨在为所有这些提供标准接口。

In [1]:
from dotenv import load_dotenv, find_dotenv
from langchain.globals import set_debug

load_dotenv(find_dotenv())
set_debug(False)

## 快速入门
大型语言模型(Large Language Models, llm)是LangChain的核心组成部分。LangChain不服务于它自己的llm，而是提供了一个标准接口，用于与许多不同的llm进行交互。

有很多LLM提供商(OpenAI, Cohere, hugFace等)- LLM类旨在为所有这些提供标准接口。

在本演练中，我们将使用OpenAI LLM包装器，尽管突出显示的功能对于所有LLM类型都是通用的。


In [2]:
from langchain_openai import OpenAI

llm = OpenAI()

### LCEL
llm实现了可运行接口，这是LangChain表达式语言(LCEL)的基本构建块。这意味着它们支持`invoke`, `ainvoke`, `stream`, `aststream`, `batch`, `abbatch`, `aststream`日志调用。

llm接受字符串作为输入，或者可以强制为字符串提示的对象，包括`List[BaseMessage]`和`PromptValue`。


In [3]:
llm.invoke(
    "What are some theories about the relationship between unemployment and inflation?"
)

"\n\n1. Phillips Curve Theory: This theory suggests that there is an inverse relationship between unemployment and inflation. As unemployment decreases, inflation increases and vice versa. This is because when there is low unemployment, there is higher demand for goods and services, leading to an increase in prices.\n\n2. Natural Rate of Unemployment Theory: This theory argues that there is a natural rate of unemployment in the economy, which is not affected by inflation. Any deviation from this rate will result in a trade-off between unemployment and inflation.\n\n3. Cost-Push Inflation Theory: According to this theory, inflation is caused by an increase in the cost of production, such as wages, which leads to higher prices for goods and services. This can result in higher unemployment as businesses may not be able to afford to hire as many workers.\n\n4. Demand-Pull Inflation Theory: This theory suggests that inflation is caused by excess demand for goods and services, leading to an 

In [4]:
for chunk in llm.stream(
    "What are some theories about the relationship between unemployment and inflation?"
):
    print(chunk, end="", flush=True)



1. The Phillips Curve: This theory suggests that there is an inverse relationship between unemployment and inflation. As unemployment decreases, wages and demand for goods and services increase, leading to higher inflation. Conversely, when unemployment rises, there is less demand for goods and services, which can lead to lower inflation.

2. Demand-Pull Inflation: This theory suggests that when unemployment is low, consumers have more money to spend, leading to an increase in demand for goods and services. This increase in demand can push prices up, leading to inflation.

3. Cost-Push Inflation: This theory suggests that when unemployment is high, businesses may have to pay higher wages to attract workers, which increases their production costs. As a result, businesses may increase prices to cover these costs, leading to inflation.

4. Rational Expectations Theory: This theory suggests that individuals and businesses base their economic decisions on their expectations of future infl

In [5]:
llm.batch(
    [
        "What are some theories about the relationship between unemployment and inflation?"
    ]
)

['\n\n1. The Phillips Curve: This theory states that there is an inverse relationship between unemployment and inflation. When unemployment is low, inflation tends to be high and vice versa. This is because when there is low unemployment, workers have more bargaining power and can demand higher wages, leading to an increase in prices.\n\n2. Expectations-augmented Phillips Curve: This theory builds on the original Phillips Curve by taking into account the role of inflation expectations. It suggests that if people expect inflation to increase, they will demand higher wages, leading to an increase in inflation.\n\n3. Natural Rate of Unemployment: According to this theory, there is a natural rate of unemployment in the economy, which is determined by structural factors such as demographics, technology, and labor market regulations. When the unemployment rate is below the natural rate, inflation tends to increase.\n\n4. Cost-push Inflation: This theory suggests that inflation is caused by a

In [6]:
await llm.ainvoke(
    "What are some theories about the relationship between unemployment and inflation?"
)

'\n\n1. Phillips Curve Theory: This theory states that there is an inverse relationship between unemployment and inflation. When unemployment is low, there is a high demand for workers, which leads to rising wages and increased consumer spending, ultimately causing inflation to rise. Conversely, when unemployment is high, there is less demand for workers, leading to stagnant or falling wages and decreased consumer spending, resulting in lower inflation.\n\n2. Natural Rate of Unemployment Theory: This theory suggests that there is a natural rate of unemployment in the economy that is not affected by changes in inflation. Any deviation from this rate will only have a temporary impact on inflation, as wages and prices will eventually adjust to the natural rate of unemployment.\n\n3. Expectations-Augmented Phillips Curve Theory: This theory builds on the Phillips Curve by incorporating the role of inflation expectations. It suggests that when people expect prices to rise, they will demand 

In [None]:
async for chunk in llm.astream(
    "What are some theories about the relationship between unemployment and inflation?"
):
    print(chunk, end="", flush=True)

In [None]:
await llm.abatch(
    [
        "What are some theories about the relationship between unemployment and inflation?"
    ]
)

In [None]:
async for chunk in llm.astream_log(
    "What are some theories about the relationship between unemployment and inflation?"
):
    print(chunk)

## Custom LLM
如果您想使用自己的LLM或与LangChain支持的包装器不同的包装器，本手册将介绍如何创建自定义LLM包装器。

用标准LLM接口包装LLM允许您以最少的代码修改在现有的LangChain程序中使用LLM

作为奖励，您的LLM将自动成为一个可运行的LangChain，并将受益于一些现成的优化，`astream_events`API等。

## Implementation
定制LLM只需要实现两件事:

| Method | Description |
| ------------ | ----------- |
| _call | 接受一个字符串和一些可选的停止词，并返回一个字符串。invoke使用。 |
| _llm_type | 返回字符串的属性，仅用于日志记录目的。|

可选实现
| Method | Description |
| ------------ | ----------- |
| _identifying_params | 用于帮助识别模型和打印LLM;应该返回一个字典。这是一个@property。 |
| _acall | 用于帮助识别模型和打印LLM;应该返回一个字典。这是一个@property。 |
| _stream | 方法逐个令牌对输出进行流处理。 |
| _acall | 提供stream的异步本地实现;在较新的LangChain版本中，默认为stream。 |


In [7]:
from typing import Any, Dict, Iterator, List, Mapping, Optional

from langchain_core.callbacks.manager import CallbackManagerForLLMRun
from langchain_core.language_models.llms import LLM
from langchain_core.outputs import GenerationChunk


class CustomLLM(LLM):
    """A custom chat model that echoes the first `n` characters of the input.

    When contributing an implementation to LangChain, carefully document
    the model including the initialization parameters, include
    an example of how to initialize the model and include any relevant
    links to the underlying models documentation or API.

    Example:

        .. code-block:: python

            model = CustomChatModel(n=2)
            result = model.invoke([HumanMessage(content="hello")])
            result = model.batch([[HumanMessage(content="hello")],
                                 [HumanMessage(content="world")]])
    """

    n: int
    """The number of characters from the last message of the prompt to be echoed."""

    def _call(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> str:
        """Run the LLM on the given input.

        Override this method to implement the LLM logic.

        Args:
            prompt: The prompt to generate from.
            stop: Stop words to use when generating. Model output is cut off at the
                first occurrence of any of the stop substrings.
                If stop tokens are not supported consider raising NotImplementedError.
            run_manager: Callback manager for the run.
            **kwargs: Arbitrary additional keyword arguments. These are usually passed
                to the model provider API call.

        Returns:
            The model output as a string. Actual completions SHOULD NOT include the prompt.
        """
        if stop is not None:
            raise ValueError("stop kwargs are not permitted.")
        return prompt[: self.n]

    def _stream(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> Iterator[GenerationChunk]:
        """Stream the LLM on the given prompt.

        This method should be overridden by subclasses that support streaming.

        If not implemented, the default behavior of calls to stream will be to
        fallback to the non-streaming version of the model and return
        the output as a single chunk.

        Args:
            prompt: The prompt to generate from.
            stop: Stop words to use when generating. Model output is cut off at the
                first occurrence of any of these substrings.
            run_manager: Callback manager for the run.
            **kwargs: Arbitrary additional keyword arguments. These are usually passed
                to the model provider API call.

        Returns:
            An iterator of GenerationChunks.
        """
        for char in prompt[: self.n]:
            chunk = GenerationChunk(text=char)
            if run_manager:
                run_manager.on_llm_new_token(chunk.text, chunk=chunk)

            yield chunk

    @property
    def _identifying_params(self) -> Dict[str, Any]:
        """Return a dictionary of identifying parameters."""
        return {
            # The model name allows users to specify custom token counting
            # rules in LLM monitoring applications (e.g., in LangSmith users
            # can provide per token pricing for their model and monitor
            # costs for the given LLM.)
            "model_name": "CustomChatModel",
        }

    @property
    def _llm_type(self) -> str:
        """Get the type of language model used by this chat model. Used for logging purposes only."""
        return "custom"

这个LLM将实现许多LangChain抽象支持的标准的可`Runnable`

In [8]:
llm = CustomLLM(n=5)
print(llm)

[1mCustomLLM[0m
Params: {'model_name': 'CustomChatModel'}


In [9]:
llm.invoke("This is a foobar thing")

'This '

In [10]:
await llm.ainvoke("world")

'world'

In [11]:
llm.batch(["woof woof woof", "meow meow meow"])

['woof ', 'meow ']

In [12]:
async for token in llm.astream("hello"):
    print(token, end="|", flush=True)

h|e|l|l|o|

让我们确认一下，in与其他LangChain api可以很好地集成。

In [14]:
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages(
    [("system", "you are a bot"), ("human", "{input}")]
)
llm = CustomLLM(n=7)
chain = prompt | llm
chain.invoke({"input": "hello there!"})

'System:'

In [None]:
idx = 0
async for event in chain.astream_events({"input": "hello there!"}, version="v1"):
    print(event)
    idx += 1
    if idx > 7:
        # Truncate
        break

## 缓存
LangChain为llm提供了一个可选的缓存层。这有两个有用的原因
如果您经常多次请求相同的完成，它可以通过减少对LLM提供程序的API调用数量来节省资金。它可以通过减少对LLM提供程序的API调用数量来加快应用程序的速度。

In [15]:
from langchain.globals import set_llm_cache
from langchain_openai import OpenAI

# To make the caching really obvious, lets use a slower model.
llm = OpenAI(model_name="gpt-3.5-turbo-instruct", n=2, best_of=2)

In [16]:
%%time
from langchain.cache import InMemoryCache

set_llm_cache(InMemoryCache())

# The first time, it is not yet in cache, so it should take longer
llm.predict("Tell me a joke")

  warn_deprecated(


CPU times: user 198 ms, sys: 89.3 ms, total: 287 ms
Wall time: 4.15 s


"\n\nWhy couldn't the bicycle stand up by itself? Because it was two-tired!"

In [17]:
%%time
# The second time it is, so it goes faster
llm.predict("Tell me a joke")

CPU times: user 1.33 ms, sys: 2.46 ms, total: 3.79 ms
Wall time: 3.33 ms


"\n\nWhy couldn't the bicycle stand up by itself? Because it was two-tired!"

### SQLite Cache


In [18]:
# We can do the same thing with a SQLite cache
from langchain.cache import SQLiteCache

set_llm_cache(SQLiteCache(database_path=".langchain.db"))

In [19]:
%%time
# The first time, it is not yet in cache, so it should take longer
llm.predict("Tell me a joke")

CPU times: user 35.5 ms, sys: 32.2 ms, total: 67.7 ms
Wall time: 3.88 s


"\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything."

## streaming
所有llm都实现了Runnable接口，该接口附带了所有方法的默认实现。Ainvoke, batch, abbatch, stream, astream。这为所有llm提供了流媒体的基本支持。

流支持默认返回单个值的迭代器(在异步流的情况下是AsyncIterator)，这是底层LLM提供程序返回的最终结果。这显然不会给你一个令牌一个令牌的流，这需要LLM提供商的本地支持，但确保你的代码期望一个令牌迭代器可以在我们的任何LLM集成中工作。

查看[integrations support token-by-token streaming here.](https://python.langchain.com/docs/integrations/llms/)

In [20]:
from langchain_openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0, max_tokens=512)
for chunk in llm.stream("Write me a song about sparkling water."):
    print(chunk, end="", flush=True)



Verse 1:
Bubbles dancing in my glass
Clear and crisp, it's such a blast
Refreshing taste, it's like a dream
Sparkling water, you make me beam

Chorus:
Oh sparkling water, you're my delight
With every sip, you make me feel so right
You're like a party in my mouth
I can't get enough, I'm hooked no doubt

Verse 2:
No sugar, no calories, just pure bliss
You're the healthier choice, I can't resist
With flavors like lime and lemon too
Sparkling water, you're always new

Chorus:
Oh sparkling water, you're my delight
With every sip, you make me feel so right
You're like a party in my mouth
I can't get enough, I'm hooked no doubt

Bridge:
Some may say you're just plain water
But to me, you're so much more
You bring a sparkle to my day
In every sip, I find my way

Chorus:
Oh sparkling water, you're my delight
With every sip, you make me feel so right
You're like a party in my mouth
I can't get enough, I'm hooked no doubt

Outro:
So here's to you, my sparkling friend
You'll always be my go-to b

JSONDecodeError: Expecting value: line 1 column 157 (char 156)

## Tracking token usage
本笔记本介绍了如何跟踪特定呼叫的令牌使用情况。它目前只针对OpenAI API实现。

让我们首先看一个非常简单的例子，它跟踪单个LLM调用的令牌使用情况。

In [21]:
from langchain_community.callbacks import get_openai_callback
from langchain_openai import OpenAI

In [23]:
llm = OpenAI(model_name="gpt-3.5-turbo-instruct", n=2, best_of=2)
with get_openai_callback() as cb:
    result = llm.invoke("Tell me a joke")
    print(cb)

Tokens Used: 0
	Prompt Tokens: 0
	Completion Tokens: 0
Successful Requests: 0
Total Cost (USD): $0.0


上下文管理器中的任何东西都会被跟踪。下面是使用它按顺序跟踪多个调用的示例。

In [24]:
with get_openai_callback() as cb:
    result = llm.invoke("Tell me a joke")
    result2 = llm.invoke("Tell me a joke")
    print(cb.total_tokens)

0


如果使用了包含多个步骤的链或代理，它将跟踪所有这些步骤。

In [None]:
from langchain.agents import AgentType, initialize_agent, load_tools
from langchain_openai import OpenAI

llm = OpenAI(temperature=0)
tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

In [None]:
with get_openai_callback() as cb:
    response = agent.run(
        "Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?"
    )
    print(f"Total Tokens: {cb.total_tokens}")
    print(f"Prompt Tokens: {cb.prompt_tokens}")
    print(f"Completion Tokens: {cb.completion_tokens}")
    print(f"Total Cost (USD): ${cb.total_cost}")