# Chain of Thought (CoT)

Chain of thought is a prompting technique introduced in the paper ["Chain-of-Thought Prompting Elicits Reasoning in Large Language Models"](https://arxiv.org/abs/2201.11903) where throught prompting the authors generate a series of intermediate reasoning steps which improves the ability of LLMs to perform complex reasoning.

In this guide, we use [outlines](https://outlines-dev.github.io/outlines/) to apply chain of thought through structured output with the quantized `Hermes-2-Pro-Llama-3-8B`.

## Requirements

### Install llama-cpp-python and outlines

In [1]:
# RUN IT ONLY ONCE TO INSTALL THE REQUIREMENTS
# %pip install llama-cpp-python outlines

For detailed installation instructions, see [llama-cpp-python installation](https://llama-cpp-python.readthedocs.io/en/stable/) and [outlines installation](https://outlines-dev.github.io/outlines/installation/)

### Pull the model from HuggingFace

Download a GGUF model from HuggingFace [here](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/tree/main), for example, the Q4_K_M one (it requires 4.92 GB):

In [2]:
# RUN IT ONLY ONCE TO DOWNLOAD THE GGUF MODEL, IN THIS CASE THE Q4_K_M
# !wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf

## Usage

### Chain of Thought

### Define Pydantic class

We first define our Pydantic class for a reasoning step

In [3]:
from pydantic import BaseModel, Field

class Reasoning_Step(BaseModel):
    reasoning_step: str = Field(..., description="Reasoning step")

We then define the Pydantic class for reasoning which will consist of a list of reasoning steps and a conclusion

In [4]:
from typing import List

class Reasoning(BaseModel):
    reasoning: List[Reasoning_Step] = Field(..., description="List of reasoning steps")
    conclusion: str = Field(..., description="Conclusion")

### Load the model

In [5]:
import llama_cpp
from llama_cpp import Llama
from outlines import generate, models

llm = Llama(
    "/big_storage/llms/models/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf",
    tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
        "NousResearch/Hermes-2-Pro-Llama-3-8B"
    ),
    n_gpu_layers=-1,
    flash_attn=True,
    n_ctx=8192,
    verbose=False
)

model = models.LlamaCpp(llm)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [6]:
import warnings
warnings.filterwarnings("ignore", category=RuntimeWarning) # ignore runtime warnings

We build a regex from the `Reasoning` Pydantic class which the model will be forced to follow

In [7]:
from outlines.integrations.utils import convert_json_schema_to_str
from outlines.fsm.json_schema import build_regex_from_schema

json_schema = Reasoning.model_json_schema()
schema_str = convert_json_schema_to_str(json_schema=json_schema)
regex_str = build_regex_from_schema(schema_str, whitespace_pattern=r" ")
regex_str

'\\{ "reasoning" : \\[ ((\\{ "reasoning_step" : "([^"\\\\\\x00-\\x1F\\x7F-\\x9F]|\\\\["\\\\])*" \\})(, (\\{ "reasoning_step" : "([^"\\\\\\x00-\\x1F\\x7F-\\x9F]|\\\\["\\\\])*" \\})){0,})? \\] , "conclusion" : "([^"\\\\\\x00-\\x1F\\x7F-\\x9F]|\\\\["\\\\])*" \\}'

We then need to adapt our prompt to the [Hermes prompt format for JSON schema](https://github.com/NousResearch/Hermes-Function-Calling?tab=readme-ov-file#prompt-format-for-json-mode--structured-outputs)

In [8]:
def generate_hermes_prompt(question):
    return (
        "<|im_start|>system\n"
        "You are a world class AI model who answers questions in JSON with correct Pydantic schema. "
        "Here's the json schema you must adhere to:\n<schema>\n" + str(json_schema) + "\n</schema>"
        "\n<|im_start|>user\n" + question + "<|im_end|>"
        "\n<|im_start|>assistant\n"
    )

For a given `user_prompt` we obtain the hermes prompt

In [9]:
user_prompt = "9.11 and 9.9 -- which is bigger?"
prompt = generate_hermes_prompt(user_prompt)
print(prompt)

<|im_start|>system
You are a world class AI model who answers questions in JSON with correct Pydantic schema. Here's the json schema you must adhere to:
<schema>
{'$defs': {'Reasoning_Step': {'properties': {'reasoning_step': {'description': 'Reasoning step', 'title': 'Reasoning Step', 'type': 'string'}}, 'required': ['reasoning_step'], 'title': 'Reasoning_Step', 'type': 'object'}}, 'properties': {'reasoning': {'description': 'List of reasoning steps', 'items': {'$ref': '#/$defs/Reasoning_Step'}, 'title': 'Reasoning', 'type': 'array'}, 'conclusion': {'description': 'Conclusion', 'title': 'Conclusion', 'type': 'string'}}, 'required': ['reasoning', 'conclusion'], 'title': 'Reasoning', 'type': 'object'}
</schema>
<|im_start|>user
9.11 and 9.9 -- which is bigger?<|im_end|>
<|im_start|>assistant



We use `generate.regex` by passing the Pydantic class we previously defined, and call the generator with the Hermes prompt:

In [10]:
generator = generate.regex(model, regex_str)
response = generator(prompt, max_tokens=1024, temperature=0, seed=42)
response

'{ "reasoning" : [ { "reasoning_step" : "Both 9.11 and 9.9 are decimal numbers." }, { "reasoning_step" : "When comparing decimal numbers, we look at the numbers after the decimal point." }, { "reasoning_step" : "In this case, 9.11 has the number 1 after the decimal point, while 9.9 has the number 9." }, { "reasoning_step" : "Since 1 is greater than 9, 9.11 is greater than 9.9." } ], "conclusion" : "9.11 is bigger." }'

We obtain a series of intermediate reasoning steps as well as the conclusion

In [11]:
import json

json_response = json.loads(response)
json_response["reasoning"]

[{'reasoning_step': 'Both 9.11 and 9.9 are decimal numbers.'},
 {'reasoning_step': 'When comparing decimal numbers, we look at the numbers after the decimal point.'},
 {'reasoning_step': 'In this case, 9.11 has the number 1 after the decimal point, while 9.9 has the number 9.'},
 {'reasoning_step': 'Since 1 is greater than 9, 9.11 is greater than 9.9.'}]

In [12]:
json_response["conclusion"]

'9.11 is bigger.'

We notice that the 4th reasoning step is wrong `Since 1 is greater than 9, 9.11 is greater than 9.9.`, so we should probably give the model some examples for this particular task.