# What is Chain-of-Thoughts (CoT) prompting?

CoT prompting in NLP used to guide the responses generated by language models, often improves interpretability and accuracy for more complex queries.

**Differential between prompting and CoT prompting:**

- Standard Prompting: Direct, concise answer generation from question (`input-output` pairs) without explanation or intermediate steps .

- CoT Prompting: The model walks through its thought process `input-CoT-output` step-by-step to reach an answer.

![image.png](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*gwANOWpyuNX2xuqd0wgoHg.png)

**The Innovation of CoT Prompting**

- **Transformer-based language models** like GPT-3 
    - (through `autoregressive modeling`) excel at:
        - predicting the next token in a sequence.
        - generating fluent text. 
    - are bad at: tasks requiring reasoning or multi-step problem-solving.

`Note:` `autoregressive modeling` is to generates text by selecting the most probable next word or token based on the context of the previous words.

- **Standard prompting** with further limits the model's reasoning ability.

- **Scaling model size** has improved performance:
    - But does **NOT** inherently address the reasoning challenges on standard prompting.
    - For CoT prompting to work effectively, the model must have **reached a certain size** (100B parameters) where it unlock complex cognitive functions.

- **CoT prompting** outperforms across benchmarks:
    - arithmetic reasoning (GSM8K dataset)
    - commonsense reasoning (CSQA, StrategyQA)
    - symbolic reasoning (coin flip task)

# Basic knowledge on PaLM

The **Pathways Language Model (PaLM)** with 540B parameters:
- is based on the Transformer architecture, utilizing a decoder-only setup where each time step can attend to itself and previous steps.
- introduced CoT prompting.

**Memorization**

Neural networks are prone to memorizing portions of their training data, a behavior linked to overfitting. 

Larger models tend to exhibit higher rates of memorization, particularly for rare examples encountered only once during training; reducing generalization to new data.

This presents a challenge in training large-scale models, despite efforts to mitigate it through techniques like regularization and data augmentation.


# What are the properties of CoT Prompting?

1. **Decomposition of Multi-Step Problems**: 
    - break down complex, multi-step problems into manageable intermediate steps. 
    - allocate additional computational resources to problems that require more intricate reasoning.

2. **Interpretability of Model Behavior**: 
    - provides a clearer, more interpretable view into how a model arrives at a particular answer. 
    - identify the reasoning process and highlights potential points of error.
    - fully understanding the model’s computations supporting an answer remains a challenge.

3. **Versatility Across Tasks**: including math word problems, commonsense reasoning, and symbolic manipulation.

4. **Ease of Elicitation in Large Models**: by incorporating examples of such reasoning sequences into prompts (called few-shot learning prompts).

# Tree-of-Thought Prompting

<img src='https://miro.medium.com/v2/resize:fit:1100/format:webp/1*M3Oe3aEiVp4ikPXU0ugRAg.png' width='600'>

**Tree-of-Thought (ToT)**:
- is an improvement of CoT.
- is a tree search approach.
- explore multiple potential reasoning paths.
- use state evaluation and backtracking to ensure more accurate outcomes.

**Why need ToT?**
- LLMs (decoder models) generate token by token.
- They may follow an incorrect path when choosing subsequent words for the answers.
- They often lack the ability to recover from such missteps.

**ToT mechanism**
1. generate multiple potential “first thoughts”.
2. selects the most promising option.
3. generates alternative options for the next steps.
4. weak or incorrect answers are discarded.

**Cons of ToT**: takes time as the model being invoked multiple times.

# Arithmetic Reasoning in CoT

<img src='https://miro.medium.com/v2/resize:fit:1100/format:webp/1*931C0NW9MbjzGNAAbxkong.png' width='600'>

# Commonsense Reasoning in CoT

e.g., 
`Query:` “Sammy wants to go where there are many people. Where might he go? 
(a) desert, (b) racetrack, (c) populated areas, (d) apartment,” 

CoT prompting would lead the model to reason step-by-step as follows:

1. **Initial understanding**: “Sammy wants to go to a place with many people.”
2. **Evaluation of options**: “The desert and racetrack are not places with many people. Apartments have some people but not many in comparison.”
3. **Conclusion**: “Populated areas have many people, so the answer is © populated areas.”

This process breaks down the reasoning into intermediate steps that make it easier for the model to handle tasks that require more than just recalling facts.

<img src='https://miro.medium.com/v2/resize:fit:1100/format:webp/1*8M6MG_9rQZE2HyqhT_emMA.png' width='600'>

# Symbolic Reasoning in CoT

Symbolic reasoning involves tasks such as performing operations on variables, manipulating sequences of symbols, or following abstract rules.

e.g., `Query`: Concatenate the last letters of the words ‘Apple’ and ‘Banana’.

With CoT prompting, the model is guided to think through each step explicitly:

1. Step 1: “The last letter of ‘Apple’ is ‘e’.”
2. Step 2: “The last letter of ‘Banana’ is ‘a’.”
3. Step 3: “Concatenate ‘e’ and ‘a’ to form ‘ea’.”
4. Conclusion: “The answer is ‘ea’.”

# Challenges and Limitations

- Generating step-by-step reasoning increases the number of tokens the model must process  => requires more computational resources and time.
- CoT is inaccessible to smaller, less resource-intensive models (< 100B params).
- CoT is not infallible.
- CoT's effectiveness in abstract reasoning tasks remains an open question.

# Concise Chain-of-Thought

Concise Chain of Thought (CCoT) is a combination of **Chain of Thought (CoT) prompting** and **Concise prompting**

**Concise Prompting**
- instructs models to produce answers that are as brief as possible without sacrificing the essential reasoning or content required for a correct solution.
- `Pros:` reduces the number of tokens produced, which directly correlates with lower computational and financial costs.
- `Cons:` negatively impact the performance, especially in tasks that benefit from more elaborate reasoning steps.

**Concise Chain of Thought (CCoT)**
- combines the **step-by-step** problem-solving approach of CoT with the efficiency-oriented approach of concise prompting.
- Few-shot examples, provided to the model during prompting in CCoT, are written **concisely**.

<img src='https://miro.medium.com/v2/resize:fit:1100/format:webp/1*8TvudEI1KmwVT9x-1z7m7Q.jpeg' alt='ToT' width='300'>

**Benefits of CCoT**
- Reduced Token Costs
- Faster Responses
- Efficiency without Significant Performance Drop

**Limitations of CCoT**
- Performance Penalty in Specific Domains: CCoT is not suitable for problems that require detailed reasoning steps.
- Dependence on Model Sophistication.
- Generalization Across Models: CCoT has mainly been tested on the GPT model, not tested on other models.

# Basic inference code on CoT reasoning

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

  from .autonotebook import tqdm as notebook_tqdm


Load the model and tokenizer

In [3]:
model_name = 'EleutherAI/gpt-neo-125M' # or "EleutherAI/gpt-neox-20b", GPT-3, GPT-4
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

Define the input prompt with CoT reasoning

In [4]:
prompt = """\
Q: There are 3 apples, and you take away 2. How many apples do you have now? 
Let's think step by step. 
First, there are 3 apples in total.
Then, you take away 2 apples.
So, you have taken 2 apples with you.
Therefore, the answer is 2.
A: 2

Q: If a train leaves the station at 3 PM and travels 60 miles per hour, how far will it have traveled by 6 PM? 
Let's break it down step by step.
First, the train travels for 3 hours from 3 PM to 6 PM.
In each hour, it travels 60 miles.
So, in 3 hours, it will have traveled 60 * 3 = 180 miles.
Therefore, the train will have traveled 180 miles by 6 PM.
A: 180

Q: If a store sells a dozen eggs for $3, how much will 8 dozen eggs cost?
Let's solve this step by step.
First, 1 dozen eggs costs $3.
Therefore, 8 dozen eggs will cost 8 * 3 = $24.
So, the total cost for 8 dozen eggs is $24.
A: 24

Q: John has twice as many brothers as sisters. His sister Ann has the same number of brothers as sisters. How many brothers and sisters are in the family? 
Let's reason step by step.
"""

Encode the input prompt and generate the model's response

In [5]:
device = "cuda" if torch.cuda.is_available() else "cpu"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
output = model.generate(**inputs, max_length=300, do_sample=False)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


Decode the generated response

In [6]:
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
generated_text

"Q: There are 3 apples, and you take away 2. How many apples do you have now? \nLet's think step by step. \nFirst, there are 3 apples in total.\nThen, you take away 2 apples.\nSo, you have taken 2 apples with you.\nTherefore, the answer is 2.\nA: 2\n\nQ: If a train leaves the station at 3 PM and travels 60 miles per hour, how far will it have traveled by 6 PM? \nLet's break it down step by step.\nFirst, the train travels for 3 hours from 3 PM to 6 PM.\nIn each hour, it travels 60 miles.\nSo, in 3 hours, it will have traveled 60 * 3 = 180 miles.\nTherefore, the train will have traveled 180 miles by 6 PM.\nA: 180\n\nQ: If a store sells a dozen eggs for $3, how much will 8 dozen eggs cost?\nLet's solve this step by step.\nFirst, 1 dozen eggs costs $3.\nTherefore, 8 dozen eggs will cost 8 * 3 = $24.\nSo, the total cost for 8 dozen eggs is $24.\nA: 24\n\nQ: John has twice as many brothers as sisters. His sister Ann has the same number of brothers as sisters. How many brothers and sisters ar

# Reference

[Chain-Of-Thought ( CoT ) in Large Language Models prompting and Concise CoT with Code implementation using Python and PyTorch](https://medium.com/@devmallyakarar/chain-of-thought-cot-in-large-language-models-prompting-and-concise-cot-with-code-82821f9a832d)