# 如何创建一个自定义LLM类
本笔记本介绍如何创建自定义LLM封装器，适用于以下场景：
- 希望使用自己的LLM模型
- 需要使用LangChain已支持封装器之外的替代方案
将您的LLM封装为标准`LLM`接口后，只需极少的代码修改即可在现有LangChain程序中使用该模型。
作为额外优势，您的LLM将自动成为LangChain的`Runnable`，并默认获得一些优化功能，如异步支持、`astream_events` API等。
:::注意您当前正在查阅[文本补全模型](/docs/concepts/text_llms)的使用文档。许多最新且最受欢迎的模型属于[聊天补全模型](/docs/concepts/chat_models)。
除非您正在专门使用更高级的提示技术，否则您可能正在寻找[此页面](/docs/how_to/custom_chat_model/)。:::
## 实现
自定义LLM只需实现以下两项必需内容：

| 方法          | 描述                                                                       ||---------------|---------------------------------------------------------------------------|| `_call`       | 接收一个字符串和一些可选的停止词，返回一个字符串。由 `invoke` 调用。 || `_llm_type`   | 一个返回字符串的属性，仅用于日志记录目的。


可选实现方案：

| 方法      | 描述                                                                                                     ||----------------------|-----------------------------------------------------------------------------------------------------------|| `_identifying_params` | 用于辅助识别模型并打印LLM信息；应返回一个字典。这是一个 **@property** 属性。 || `_acall`              | 提供`_call`的异步原生实现，由`ainvoke`使用。                                    || `_stream`             | 用于逐令牌流式输出结果的方法。                                                               || `_astream`            | 提供 `_stream` 的异步原生实现；在较新的 LangChain 版本中，默认指向 `_stream`。 |


让我们实现一个简单的自定义LLM，它仅返回输入内容的前n个字符。

In [1]:
from typing import Any, Dict, Iterator, List, Mapping, Optional

from langchain_core.callbacks.manager import CallbackManagerForLLMRun
from langchain_core.language_models.llms import LLM
from langchain_core.outputs import GenerationChunk


class CustomLLM(LLM):
    """A custom chat model that echoes the first `n` characters of the input.

    When contributing an implementation to LangChain, carefully document
    the model including the initialization parameters, include
    an example of how to initialize the model and include any relevant
    links to the underlying models documentation or API.

    Example:

        .. code-block:: python

            model = CustomChatModel(n=2)
            result = model.invoke([HumanMessage(content="hello")])
            result = model.batch([[HumanMessage(content="hello")],
                                 [HumanMessage(content="world")]])
    """

    n: int
    """The number of characters from the last message of the prompt to be echoed."""

    def _call(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> str:
        """Run the LLM on the given input.

        Override this method to implement the LLM logic.

        Args:
            prompt: The prompt to generate from.
            stop: Stop words to use when generating. Model output is cut off at the
                first occurrence of any of the stop substrings.
                If stop tokens are not supported consider raising NotImplementedError.
            run_manager: Callback manager for the run.
            **kwargs: Arbitrary additional keyword arguments. These are usually passed
                to the model provider API call.

        Returns:
            The model output as a string. Actual completions SHOULD NOT include the prompt.
        """
        if stop is not None:
            raise ValueError("stop kwargs are not permitted.")
        return prompt[: self.n]

    def _stream(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> Iterator[GenerationChunk]:
        """Stream the LLM on the given prompt.

        This method should be overridden by subclasses that support streaming.

        If not implemented, the default behavior of calls to stream will be to
        fallback to the non-streaming version of the model and return
        the output as a single chunk.

        Args:
            prompt: The prompt to generate from.
            stop: Stop words to use when generating. Model output is cut off at the
                first occurrence of any of these substrings.
            run_manager: Callback manager for the run.
            **kwargs: Arbitrary additional keyword arguments. These are usually passed
                to the model provider API call.

        Returns:
            An iterator of GenerationChunks.
        """
        for char in prompt[: self.n]:
            chunk = GenerationChunk(text=char)
            if run_manager:
                run_manager.on_llm_new_token(chunk.text, chunk=chunk)

            yield chunk

    @property
    def _identifying_params(self) -> Dict[str, Any]:
        """Return a dictionary of identifying parameters."""
        return {
            # The model name allows users to specify custom token counting
            # rules in LLM monitoring applications (e.g., in LangSmith users
            # can provide per token pricing for their model and monitor
            # costs for the given LLM.)
            "model_name": "CustomChatModel",
        }

    @property
    def _llm_type(self) -> str:
        """Get the type of language model used by this chat model. Used for logging purposes only."""
        return "custom"

### 让我们来测试一下 🧪

该LLM将实现LangChain的标准`Runnable`接口，许多LangChain抽象功能都支持此接口！

In [2]:
llm = CustomLLM(n=5)
print(llm)

[1mCustomLLM[0m
Params: {'model_name': 'CustomChatModel'}


In [3]:
llm.invoke("This is a foobar thing")

'This '

In [4]:
await llm.ainvoke("world")

'world'

In [5]:
llm.batch(["woof woof woof", "meow meow meow"])

['woof ', 'meow ']

In [6]:
await llm.abatch(["woof woof woof", "meow meow meow"])

['woof ', 'meow ']

In [7]:
async for token in llm.astream("hello"):
    print(token, end="|", flush=True)

h|e|l|l|o|

让我们确认它能很好地与其他 `LangChain` API 集成。

In [15]:
from langchain_core.prompts import ChatPromptTemplate

In [16]:
prompt = ChatPromptTemplate.from_messages(
    [("system", "you are a bot"), ("human", "{input}")]
)

In [17]:
llm = CustomLLM(n=7)
chain = prompt | llm

In [18]:
idx = 0
async for event in chain.astream_events({"input": "hello there!"}, version="v1"):
    print(event)
    idx += 1
    if idx > 7:
        # Truncate
        break

{'event': 'on_chain_start', 'run_id': '05f24b4f-7ea3-4fb6-8417-3aa21633462f', 'name': 'RunnableSequence', 'tags': [], 'metadata': {}, 'data': {'input': {'input': 'hello there!'}}}
{'event': 'on_prompt_start', 'name': 'ChatPromptTemplate', 'run_id': '7e996251-a926-4344-809e-c425a9846d21', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'input': {'input': 'hello there!'}}}
{'event': 'on_prompt_end', 'name': 'ChatPromptTemplate', 'run_id': '7e996251-a926-4344-809e-c425a9846d21', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'input': {'input': 'hello there!'}, 'output': ChatPromptValue(messages=[SystemMessage(content='you are a bot'), HumanMessage(content='hello there!')])}}
{'event': 'on_llm_start', 'name': 'CustomLLM', 'run_id': 'a8766beb-10f4-41de-8750-3ea7cf0ca7e2', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'input': {'prompts': ['System: you are a bot\nHuman: hello there!']}}}
{'event': 'on_llm_stream', 'name': 'CustomLLM', 'run_id': 'a8766beb-10f4-41de-8750-3ea7cf0ca7e2', '

## 贡献指南
我们感谢所有聊天模型集成方面的贡献。
以下是一份清单，用于确保您的贡献能够被纳入LangChain：
文档：
* 该模型为所有初始化参数提供了文档字符串，因为这些内容将展示在[API参考文档](https://python.langchain.com/api_reference/langchain/index.html)中。* 如果模型由某项服务提供支持，则该类的文档字符串（doc-string）会包含指向模型API的链接。
测试：
* [ ] 为被重写的方法添加单元测试或集成测试。若你已重写相关代码，请验证`invoke`、`ainvoke`、`batch`、`stream`等功能是否正常运作。
流式传输（如果您正在实现它）：
* [ ] 确保调用 `on_llm_new_token` 回调函数* [ ] `on_llm_new_token` 在生成分块之前被调用
停止标记行为：
* [ ] 应遵守停止标记* [ ] 停止标记应作为响应的一部分被包含
机密API密钥：
* [ ] 如果您的模型需要连接API，初始化时可能会接收API密钥作为参数。对于这类敏感信息，请使用Pydantic的`SecretStr`类型进行封装，以避免用户打印模型时意外泄露密钥内容。