# 🚀Mixtral-8x7B-instruct实现ReAct Agent并支持工具调用

![](../assets/simple-react.gif)

在开始之前，如果你还不了解如何利用grammar让LLMs按照你要求的格式输出，点击[mistral-stable-output-JSON-format](./mistral-stable-output-JSON-format.ipynb)查看如何控制模型输出格式。
参考：
- https://github.com/Maximilian-Winter/llama-cpp-agent
- https://www.reddit.com/r/MistralAI/comments/191wg3i/react_prompting_mistral_or_mixtral/
- https://pastebin.com/a94xp56b
- https://pastebin.com/H3a9HiHw

![](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*adwhd8hmwcCAYTZNMiSAxA.png)

CoT让模型具备了问题分解与推理（？）能力，但受限于模型知识它不知道今天北京的天气，也不知道最新的新闻资讯，也无法进行复杂的数学元算。同时，一些复杂问题往往无法通过单轮的交互完成，模型往往需要进行多轮的信息收集、计算、工具调用才能得到合理答案。ReAct（Reasoning and Acting）是一个很好的解决方案，其模型交互过程如下：

```text
Question: 用户问题
Assistant:
  Thought: 思考
  Action: 调用tool/function
  Observation: 反馈执行结果
  ...循环N次
  Thought: Now, I know the final answer for this question.
  Final Answer: 最终答案
```

一般我们需要在模型输出`Observation: `时打断模型，调用函数或工具API执行，并将结果拼接到`Observation: `后，否则模型会自己想象答案。

为了确保模型了解所有可调用的函数或工具，在输入提示词（system 或 user）中需要需要告知模型的详细信息，比如：

```text
You are an AI agent capable of using a variety of functions to answer a question.

{{func_desc}}

Response using the follow format:

Thought: think step by steps, how to solve the question, put your thought process here.
Action: {
    "function": $FUNCTION_NAME,
    "args": $FUNCTION_ARGS,
}
Observation: function output
...(this Thought/Action/Observation can repeat N times) 
Thought: Now I know the final answer.
Final Answer: the final answer

$FUNCTION_NAME is the name of function. $FUNCTION_ARGS is a dictionary input matching the requirement of function.

<START>
Question: {{question}}\n
```

In [1]:
from llama_cpp import Llama, LlamaGrammar
model = Llama("/data/hf/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf", n_gpu_layers=-1, verbose=False)

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
  Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes


In [4]:
def search_weather(city: str):
    """A useful function to check the weather.
    
    Args:
        city: the city where you want to check the weather
    """
    if city == "北京":
        return "北京气温36度"
    elif city == "巴黎":
        return "巴黎气温24度"
    else:
        return f"{city}气温20度"

func_desc = """function: search_weather
  description: A useful function to check the weather.
  args:
    city (str): the city where you want to check the weather
"""

# from typing import List, Optional, Union, Dict
# from llama_cpp_agent.gbnf_grammar_generator.gbnf_grammar_from_pydantic_models import generate_text_documentation, create_dynamic_model_from_function, generate_gbnf_grammar
# func_desc = generate_text_documentation([create_dynamic_model_from_function(search_weather)], model_prefix="function", fields_prefix="args")
# func_grammar = generate_gbnf_grammar(create_dynamic_model_from_function(search_weather), set(), dict())
# print(func_desc)

In [5]:
REACT_PROMPT = '''[INST] You are an AI agent capable of using a variety of functions to answer a question.

### Accessable Function

{{func_desc}}

### Reponse Format
Response using the follow format:

Thought: think step by steps, how to solve the question, put your thought process here.
Action:
```json
{
    "function": $FUNCTION_NAME,
    "args": $FUNCTION_ARGS
}
```
Observation: function output
...(this Thought/Action/Observation can repeat N times) 
Thought: I now know the final answer 
Final Answer: the final answer

$FUNCTION_NAME is the name of function. $FUNCTION_ARGS is a dictionary input matching the requirement of function.

<START>
Question: {{question}} [\INST]
'''

In [18]:
# from pydantic import Dict, List
import re
import json

def react_agent(model, question, function, function_desc, max_rounds=3):
    # construct input prompt
    print(f"Question: {question}")
    prompt = REACT_PROMPT.replace("{{question}}", question).replace("{{func_desc}}", function_desc)
    output = ""
    try:
        for i in range(max_rounds):
            # Thought step
            response = model.create_completion(prompt, stop=["Observation: "], max_tokens=400)["choices"][0]["text"]
            output += response
            print(response, end="")
            # If "FinalAnswer" in reponse, end react process
            if "Final Anwser:" in response:
                output += response
                break

            elif "Action:" in response:
                args = json.loads(re.findall("```json([\s\S]*?)```", response)[0].replace("\n", "").replace(" ", ""))
                obs = function(**args["args"])
                obs = f"Observation: {obs}" + "\nThought: "
                output += obs
                prompt += obs
                print(obs, end="")
                continue
    except Exception as e:
        print(f"ERROR: {e}")
    return output

    

In [26]:
question = "北京和巴黎现在哪个地方更热？"
output = react_agent(model, question, search_weather, func_desc)

Question: 北京和巴黎现在哪个地方更热？
Thought: The question asks for a comparison of the current temperature in Beijing and Paris. I can use the search\_weather function to find out. I will need to call the function twice, once for each city.

Action:
```json
{
    "function": "search_weather",
    "args": {
        "city": "北京"
    }
}
```
Observation: 北京气温36度
Thought:  To answer this question, I need to check the current temperature in Beijing and Paris. I already know the current temperature in Beijing is 36 degrees. Now, I should get the temperature in Paris.
Action:
```json
{
    "function": "search_weather",
    "args": {
        "city": "Paris"
    }
}
```
Observation: Paris气温20度
Thought: 北京的温度比巴黎高, so I can answer the question now.
Final Answer: 现在北京更热。