# Asking LLMs to solve a problem

Asking LLMs to solve some problems this will check their reasoning ability

In [1]:
# Importing the Libraries

import requests
import json
import time
import re
from typing import List
from IPython.display import Markdown, display, update_display

In [2]:
OLLAMA_URL = "http://localhost:11434/api/chat"

In [3]:
def model_test(model_name:str, prompts):
    payload = {
        "model": model_name,
        "messages": prompts,
        "think": True,   # It will enable the thinking/reasoning capacity of the model
        "stream": False  # Important: to get streaming responses
    }
    start_time = time.time()
    response = requests.post(OLLAMA_URL, json=payload, stream=False)
    reply = response.json()["message"]["content"]
    reply = re.sub(r"<think>.*?</think>", "", reply, flags=re.DOTALL).strip()
    print(f"Model Name: {model_name}\n")
    display(Markdown(reply))
    print("Total Time Taken: ", time.time()-start_time, "\n\n")

In [5]:
# Only some models from ollama supports the reasoning we need to pick those models to test this problems

model_list = ["qwen3", "deepseek-r1:14b", "gpt-oss"] # get this from ollama list

### Training vs Inference time scaling

In [6]:
easy_puzzle = [
    {"role": "user", "content": 
        "You toss 2 coins. One of them is heads. What's the probability the other is tails? Answer with the probability only."},
]

In [7]:
for model in model_list:
    model_test(model, easy_puzzle)

Model Name: qwen3



To determine the probability that the *other* coin is tails, given that **one of the two coins is heads**, we must carefully analyze the possible outcomes and the information provided.

---

### Step 1: Understand the Sample Space

When two fair coins are tossed, the possible outcomes are:

- **HH** (both heads)
- **HT** (first is heads, second is tails)
- **TH** (first is tails, second is heads)
- **TT** (both tails)

Each outcome is equally likely, with a probability of $ \frac{1}{4} $.

---

### Step 2: Interpret the Given Information

The problem states: *“One of them is heads.”* This is interpreted as **at least one coin is heads**, not that **exactly one** is heads or that **a specific coin** is heads. Therefore, the outcome **TT** is eliminated, and we're left with:

- **HH**
- **HT**
- **TH**

These are the only outcomes that satisfy the condition that **at least one coin is heads**.

---

### Step 3: Determine the Favorable Outcomes

We are to find the probability that the **other** coin is tails. That is, we want the probability that **exactly one** of the coins is heads (since if one is heads, the other must be tails to satisfy the condition).

From the reduced sample space:

- **HT** → one head, one tail
- **TH** → one head, one tail
- **HH** → two heads

So, **two out of the three** outcomes (HT and TH) result in the other coin being tails.

---

### Step 4: Compute the Probability

The probability that the other coin is tails, given that at least one is heads, is:

$$
\frac{\text{Number of favorable outcomes}}{\text{Total number of possible outcomes}} = \frac{2}{3}
$$

---

### Final Answer

$$
\boxed{\dfrac{2}{3}}
$$

Total Time Taken:  604.8166105747223 


Model Name: deepseek-r1:14b



To determine the probability that one coin is Heads and the other is Tails when at least one coin shows Heads, follow these steps:

1. **Identify All Possible Outcomes:**
   
   When tossing two coins, there are four possible outcomes:
   - Heads & Heads (HH)
   - Heads & Tails (HT)
   - Tails & Heads (TH)
   - Tails & Tails (TT)

2. **Exclude the Outcome Where Both Are Tails:**
   
   Since one of the coins is known to be Heads, we can eliminate the outcome where both coins are Tails (TT). This leaves us with three possible outcomes:
   - HH
   - HT
   - TH

3. **Determine Favorable Outcomes:**
   
   We're interested in cases where one coin is Heads and the other is Tails. These are:
   - HT
   - TH

4. **Calculate the Probability:**
   
   The number of favorable outcomes is 2, and the total possible outcomes after eliminating TT are 3.

   \[
   \text{Probability} = \frac{\text{Number of Favorable Outcomes}}{\text{Total Possible Outcomes}} = \frac{2}{3}
   \]

**Final Answer:**

\boxed{\dfrac{2}{3}}

Total Time Taken:  133.46091604232788 


Model Name: gpt-oss



2/3

Total Time Taken:  63.571455240249634 




### Testing out the models but with little bit difficulty

In [8]:
hard = """
On a bookshelf, two volumes of Pushkin stand side by side: the first and the second.
The pages of each volume together have a thickness of 2 cm, and each cover is 2 mm thick.
A worm gnawed (perpendicular to the pages) from the first page of the first volume to the last page of the second volume.
What distance did it gnaw through?
"""
hard_puzzle = [
    {"role": "user", "content": hard}
]

In [9]:
for model in model_list:
    model_test(model, hard_puzzle)

Model Name: qwen3



To determine how far the worm gnawed, we need to consider the structure of the two volumes and the path it took.

---

### **Understanding the Structure of Each Volume**

Each volume is composed of:

- **Pages**: 2 cm thick
- **Front Cover**: 2 mm thick
- **Back Cover**: 2 mm thick

Thus, the total thickness of each volume is:
$$
2\ \text{cm} + 2\ \text{mm} + 2\ \text{mm} = 24\ \text{mm}
$$

---

### **Path of the Worm**

The worm starts at the **first page of the first volume**, which is **after the front cover** of that volume.

It ends at the **last page of the second volume**, which is **before the back cover** of that volume.

So, the worm's path includes:

1. **All the pages of the first volume**: 2 cm (20 mm)
2. **The back cover of the first volume**: 2 mm
3. **The front cover of the second volume**: 2 mm
4. **All the pages of the second volume**: 2 cm (20 mm)

---

### **Calculating the Total Distance**

Adding up these distances:
$$
20\ \text{mm} + 2\ \text{mm} + 2\ \text{mm} + 20\ \text{mm} = 44\ \text{mm}
$$

Converting to centimeters:
$$
44\ \text{mm} = 4.4\ \text{cm}
$$

---

### **Conclusion**

The worm gnawed through a total of:

$$
\boxed{4.4}
$$

**centimeters.**

Total Time Taken:  774.0899741649628 


Model Name: deepseek-r1:14b



To determine the distance from the first page of the first book to the last page of the second book on a shelf where they are placed side by side, we need to consider the structure of each book. Each book has a front cover of 2 mm, a back cover of 2 mm, and pages that are 20 mm thick.

1. **Breakdown of each book:**
   - Front cover: 2 mm
   - Pages: 20 mm
   - Back cover: 2 mm

2. **Positioning on the shelf:**
   - The first book occupies positions 0 to 24 mm.
     - Front cover: 0-2 mm
     - Pages: 2-22 mm
     - Back cover: 22-24 mm
   - The second book starts at position 24 mm and occupies positions 24 to 48 mm.
     - Front cover: 24-26 mm
     - Pages: 26-46 mm
     - Back cover: 46-48 mm

3. **Distance calculation:**
   - The first page of the first book is at position 2 mm.
   - The last page of the second book is at position 46 mm.
   - The distance between these two points is \(46 - 2 = 44\) mm.

Thus, the total distance from the first page of the first book to the last page of the second book is \(\boxed{44}\) mm.

Total Time Taken:  1973.7783906459808 


Model Name: gpt-oss



The worm travels only through the “width” (thickness) of the books.

- Volume 1 pages  = 2 cm = 20 mm  
- Back cover of Volume 1 = 2 mm  
- Front cover of Volume 2 = 2 mm  
- Volume 2 pages  = 2 cm = 20 mm  

Adding those parts:

\(20\text{ mm} + 2\text{ mm} + 2\text{ mm} + 20\text{ mm} = 44\text{ mm}\).

So the worm gnawed **44 mm (4.4 cm)** in total.

Total Time Taken:  255.9420030117035 




### A spicy challenge to test the competitive spirit

In [10]:
dilemma_prompt = """
You and a partner are contestants on a game show. You're each taken to separate rooms and given a choice:
Cooperate: Choose "Share" — if both of you choose this, you each win $1,000.
Defect: Choose "Steal" — if one steals and the other shares, the stealer gets $2,000 and the sharer gets nothing.
If both steal, you both get nothing.
Do you choose to Steal or Share? Pick one.
"""

dilemma = [
    {"role": "user", "content": dilemma_prompt},
]

In [11]:
for model in model_list:
    model_test(model, dilemma)

Model Name: qwen3



In this scenario, the decision hinges on the classic **Prisoner's Dilemma** structure, where individual rationality leads to a suboptimal outcome for both players. Here's the breakdown:

1. **If you choose to Steal (Defect):**  
   - If your partner also Steals, you both get **$0**.  
   - If your partner Shares, you get **$2,000**.  

2. **If you choose to Share (Cooperate):**  
   - If your partner also Shares, you both get **$1,000**.  
   - If your partner Steals, you get **$0**.  

### Key Considerations:
- **Dominant Strategy:** In a one-time, anonymous interaction, the **rational choice** for an individual is to **Steal** (Defect). This is because:  
  - If you assume your partner will **Share**, you gain **$2,000** by stealing.  
  - If you assume your partner will **Steal**, you avoid losing **$1,000** by also stealing (both get $0).  
  - Stealing maximizes your potential payoff regardless of the partner’s choice.  

- **Risk of Trust:** Choosing to Share risks losing everything if your partner defects. Without communication or trust, the safer (self-interested) choice is to Steal.  

- **Nash Equilibrium:** The only stable outcome in this scenario is **both Stealing**, leading to **$0** for both. While this is suboptimal for the group, it is the outcome when both act rationally.  

### Final Answer:  
**Steal**.  
While mutual cooperation would yield $1,000 each, the rational choice in a one-time, anonymous interaction is to **Steal** to avoid the risk of losing out entirely. The dominant strategy ensures you maximize your own gain, even if it results in a worse collective outcome.  

**Answer:** Steal.

Total Time Taken:  159.45117616653442 


Model Name: deepseek-r1:14b



In this game show scenario, which is akin to the Prisoner's Dilemma, each contestant must decide independently whether to Cooperate (Share) or Defect (Steal). The possible outcomes are as follows:

1. **Both Share**: Each contestant receives $1,000.
2. **One Shares, the Other Steals**: The stealer gets $2,000 while the sharer gets nothing.
3. **Both Steal**: Both contestants receive nothing.

Analyzing from a game theory perspective, particularly considering the Nash equilibrium, each individual's rational choice is to Defect (Steal). This is because defecting maximizes one's own potential gain regardless of the other's decision. If both decide to Steal, they end up with nothing, which is the Nash equilibrium as neither can improve their outcome by changing their strategy unilaterally.

However, if there were trust or communication, choosing Share could lead to a better mutual outcome. Without such assurances, the rational choice aligns with defecting.

**Decision**: Choose **Steal**.

Total Time Taken:  308.85629439353943 


Model Name: gpt-oss



**Steal**  

In this one‑shot game the payoff matrix is:

|                 | Partner: Share | Partner: Steal |
|-----------------|----------------|----------------|
| **You: Share** | 1,000 / 1,000  | 0 / 2,000      |
| **You: Steal** | 2,000 / 0      | 0 / 0          |

No matter what your partner does, stealing gives you a higher or equal payoff:

- If the partner shares, you get $2,000 instead of $1,000.
- If the partner steals, you get $0 regardless of your choice, so stealing is no worse.

Because “Steal” dominates “Share,” the rational, self‑interested choice in a single, non‑repeated round is **Steal**.

Total Time Taken:  170.99034667015076 


