# LLM Reasoning Summary

LLMs generate responses by predicting likely next tokens, enabling reasoning through patterns learned from data. Key reasoning approaches include:

- **Zero-shot:** Answering tasks without examples.  
- **Few-shot:** Learning from a few examples in the prompt.  
- **Chain-of-Thought (CoT):** Breaking problems into intermediate reasoning steps.  
- **Analogical Reasoning:** Solving new problems using similarities to known cases.  
- **Decoding Strategies:** Techniques like greedy, beam search, or sampling to generate outputs.  
- **Self-Consistency:** Sampling multiple reasoning paths and selecting the most consistent answer.  

**Limitations:**  
- **Irrelevant Context:** May include unrelated or distracting information from long prompts.  
- **Premise Order Sensitivity:** Changing the order of facts or statements can lead to different or incorrect conclusions.  
- **Self-Correction Challenges:** May fail to identify or correct mistakes without expli

# LLM Reasoning Experiments

In [7]:
from transformers import pipeline

# Initialize text-generation pipeline
gen = pipeline("text-generation", model="gpt2")

Device set to use mps:0


### Reasoning Experiment 1: Simple Math

In [10]:
prompt1 = "Step by step: If I have 10 apples and eat 3, then buy 5 more, how many apples do I have?"
output1 = gen(prompt1, 
              max_new_tokens=60,
              truncation=True,
              pad_token_id=50256
             )[0]['generated_text']

print("Prompt 1:", prompt1)
print("Output 1:", output1)

Prompt 1: Step by step: If I have 10 apples and eat 3, then buy 5 more, how many apples do I have?
Output 1: Step by step: If I have 10 apples and eat 3, then buy 5 more, how many apples do I have?

You can make this math work by going to the calculator on this site and just multiplying by 10 until you get to 10.

How many apples per apple should I buy?

You can buy more apples if you want. The price must be in the range of $1.


### Reasoning Experiment 2: Tricky Math

In [14]:
prompt2 = "Step by step: I have 2 bananas. I eat 2 and then buy 4. How many bananas do I have now?"
output2 = gen(prompt2, 
              max_new_tokens=60,
              truncation=True,
              pad_token_id=50256
             )[0]['generated_text']
print("\nPrompt 2:", prompt2)
print("Output 2:", output2)


Prompt 2: Step by step: I have 2 bananas. I eat 2 and then buy 4. How many bananas do I have now?
Output 2: Step by step: I have 2 bananas. I eat 2 and then buy 4. How many bananas do I have now? I'm not sure because I don't know, but it's probably 2.5. My friend gave me these and it was a really good banana. It's a little strange that I'm not able to eat them, especially when I'm like 5 or 6 days in the fridge. I was


### Reasoning Experiment 3: Text / Story Prompt

In [15]:
prompt3 = "Step by step: Explain why the sky is blue in simple terms."
output3 = gen(prompt3, 
              max_new_tokens=100,
              truncation=True,
              pad_token_id=50256
             )[0]['generated_text']
print("\nPrompt 3:", prompt3)
print("Output 3:", output3)


Prompt 3: Step by step: Explain why the sky is blue in simple terms.
Output 3: Step by step: Explain why the sky is blue in simple terms.

You'll want to be able to see the sky with both your eyes. The sky is blue because the Earth is blue, and the sky is blue because it is deep blue. If you can see the sky with both eyes, then you can see the sky with both eyes.

The blue sky is because the Earth is blue. The Earth is blue because that is the Earth.

If you really want to see that blue sky with both eyes, see the Earth with both


## Observations
**Which prompts worked?**

Prompt 3 (Sky explanation) – Partially worked
- Output attempts to explain the sky, mentions “blue”, “Earth”, “eyes”.
- But it’s incorrect and repetitive, not scientifically accurate.

**Which failed?**

Prompt 1 & Prompt 2 (Math) – Failed
- Output doesn’t correctly compute “10 − 3 + 5 = 12” or “2 − 2 + 4 = 4”.
- GPT-2 repeats the prompt, then produces irrelevant text.
  
**Insights on why reasoning fails in GPT-2 (or smaller models)**
- LLMs like GPT-2 are good at mimicking language, not performing calculations or logic.
- Prompt engineering helps a little, but model limitations dominate.
- This motivates why we need agents, tool use, and larger LLMs for reasoning tasks.