# 🚀 CoT Agent增强Mistral模型能力

![](https://www.promptingguide.ai/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fcot.1933d9fe.png&w=1920&q=75)

LLM本质上是自回归模型，下一个token预测依赖之前的上下文，通过给予一些token空间让模型对问题做进一步拆在，一步步思考最后得到结论，可以显著提升模型解决复杂问题的准确率，这就CoT(Chain of Thoughts)的核心。下面应用[enforce-open-souce-llms-response-with-JSON-format](./enforce-open-source-llms-response-with-JSON-format.ipynb)中的格式化输出能力，构造一个Mistral-CoT-Agent提升模型输出的准确率。

In [1]:
import json
from llama_cpp import Llama, LlamaGrammar
from pydantic import BaseModel
model = Llama("/data/hf/dolphin-2.1-mistral-7b.Q4_K_M.gguf", n_gpu_layers=-1)

llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /data/hf/dolphin-2.1-mistral-7b.Q4_K_M.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = ehartford_dolphin-2.1-mistral-7b
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama

First, let's try to solve a problem without CoT.

In [2]:
question = "The cafeteria had 23 apples. If they used 20 tomake lunch and bought 6 more, how many applesdo they have?"
prompt = f'''[INST] Question: {question}

Answer this question.
[/INST]
'''

res = model.create_completion(prompt, max_tokens=1000)
print(res["choices"][0]['text'])


llama_print_timings:        load time =     142.54 ms
llama_print_timings:      sample time =      53.02 ms /   161 runs   (    0.33 ms per token,  3036.53 tokens per second)
llama_print_timings: prompt eval time =     142.31 ms /    53 tokens (    2.69 ms per token,   372.42 tokens per second)
llama_print_timings:        eval time =    1384.71 ms /   160 runs   (    8.65 ms per token,   115.55 tokens per second)
llama_print_timings:       total time =    1842.07 ms /   213 tokens



[SOLUTION] Solution: They still have 23 apples.

Explanation: The cafeteria had 23 apples to begin with. They used 20 of them for lunch, which leaves them with 23 - 20 = 3 apples remaining. Then they bought 6 more apples, so now they have 3 + 6 = 9 apples. But wait! The original problem stated that they had 23 apples to begin with. If they used 20 and bought 6 more, then they should still have 23 apples (20 - 20 + 6 = 23). Therefore, the cafeteria still has 23 apples.


The answer is wrong, let's try CoT.

In [3]:
prompt = '''[INST] Question: {{question}}

Think step by steps to anwser the question, responding in the requisted format:

{
  "thought": "put your thought process here",
  "answer": "put the final answer here",
}
[/INST]
'''.replace("{{question}}", question)

In [4]:
class CoT(BaseModel):
    thought: str
    answer: str

grammar = LlamaGrammar.from_json_schema(CoT.schema_json())
res = model.create_completion(prompt, max_tokens=1000, grammar=grammar)
# print(json.loads(res["choices"][0]['text']))

/tmp/ipykernel_123400/39596009.py:5: PydanticDeprecatedSince20: The `schema_json` method is deprecated; use `model_json_schema` and json.dumps instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.4/migration/
  grammar = LlamaGrammar.from_json_schema(CoT.schema_json())
from_string grammar:
space ::= space_1 
space_1 ::= [ ] | 
string ::= ["] string_5 ["] space 
string_3 ::= [^"\] | [\] string_4 
string_4 ::= ["\/bfnrt] | [u] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] 
string_5 ::= string_3 string_5 | 
root ::= [{] space ["] [t] [h] [o] [u] [g] [h] [t] ["] space [:] space string [,] space ["] [a] [n] [s] [w] [e] [r] ["] space [:] space string [}] space 

Llama.generate: prefix-match hit

llama_print_timings:        load time =     142.54 ms
llama_print_timings:      sample time =     472.41 ms /    79 runs   (    5.98 ms per token,   167.23 tokens per second)
llama_print_timings: prompt eval time =      47.81 m

In [5]:
thought = json.loads(res["choices"][0]['text'])["thought"]
answer = json.loads(res["choices"][0]['text'])["answer"]

print(f"Thought: {thought}/n/n")
print(f"Answer: {answer}")

Thought: First, we need to find out how many apples they have left after using 20 for lunch. We subtract the used apples from the initial number of apples: 23 - 20 = 3. Then, we add the 6 new apples: 3 + 6./n/n
Answer: 9


The anwer is correct!