---
title: "Creating an interesting word logic problem: Can AI solve it properly?"
description: "Can AI help me create a brain teaser of sorts focused on word logic, and then solve it like a human would? (Trying to figure out how to use this blog)"
author: "Amelie Zhang"
date: "2/2/2025"
categories:
  - LLMs
  - word problem
  - logic
---

When we first started playing around with LLMs and seeing if they could properly communicate with humans one of my initial instincts was to see if they could think like humans. Ever since I was a child, something I really enjoyed doing was solving logic puzzles - like those you always found on Youtube made by TED or some other channel. So, I wanted to take this topic one step further and see if AI would be able to create a problem itself and then solve it.

Thus, what I did was ask GPT to create a logic problem for me, inspired by problems where you are given information that can be both true or false, and have to deduce the final answer. After creating the problem, I then fed it into a new window in GPT and tried to see if it could be solved. 

Obviously, I also enjoyed myself and had fun solving it as well.

# A logic problem for LLMs 

* Read the following and see if you can answer the question:

>### The Stolen Key (created by GPT 4o)

Four friends—Alex, Ben, Cara, and Dana—were the only people in a locked house when the front door key went missing. Each of them made a statement:  

- **Alex**: "I didn’t take the key."  
- **Ben**: "Cara took the key."  
- **Cara**: "Dana is lying."  
- **Dana**: "Ben is lying."  

Only one of them is telling the truth. Who stole the key?

<img src="a.png" width="50%">

# How do LLMs response to the prompt?

### GPT4o

Let's analyze the statements logically:

- Only **one** person is telling the truth.
- If **Ben** were telling the truth ("Cara took the key"), then all other statements must be false.
  - Alex saying "I didn’t take the key" would be false → meaning Alex **did** take the key. This contradicts Ben's claim.
  - Therefore, Ben is **lying**.

Since Ben is lying, Cara **did not** take the key. That means we must find another truth-teller.

- If **Cara** were telling the truth ("Dana is lying"), then:
  - Dana saying "Ben is lying" would be false, meaning Ben is **telling the truth**.
  - But we already established that Ben was lying, so this contradicts our assumption.

Since Cara cannot be the truth-teller either, let’s check Dana.

- If **Dana** were telling the truth ("Ben is lying"), then:
  - Ben’s statement ("Cara took the key") is false → meaning **Cara did NOT take the key**.
  - That makes Alex’s statement ("I didn’t take the key") false → meaning **Alex DID take the key**.

Since this setup is consistent and only one truth-teller exists, the **key was stolen by Alex**.

### GPT4o1

**Answer: Alex took the key.**

Here’s a quick way to see why:

1. **Assume Dana is telling the truth** (“Ben is lying”).
2. If Ben is lying, then “Cara took the key” is false, so Cara did **not** take the key.
3. Because only one statement can be true, Alex’s and Cara’s statements must be false:
   - Alex’s statement “I didn’t take the key” is false, so Alex **did** take the key.
   - Cara’s statement “Dana is lying” is false, meaning Dana is indeed telling the truth.

All conditions are satisfied with Dana telling the truth and Alex as the thief.

### Analysis of LLM Reasoning
This problem is an example of how LLMs process logical constraints. Given the one-truth constraint, LLMs systematically eliminate contradictions to arrive at a valid solution.

Notably, both GPT-4o and GPT-4o1 reached the same answer through slightly different reasoning:

GPT-4o used a step-by-step contradiction approach, testing each individual’s truthfulness and discarding inconsistencies.
GPT-4o1 opted for a more streamlined deduction, assuming Dana’s truthfulness and resolving the logical consequences.
This exercise highlights LLMs’ ability to process deductive logic, truth-value constraints, and contradictions, making them effective at solving structured reasoning puzzles.

Would another LLM produce the same answer? Would performance change if the statements were reworded? Testing across different models could reveal nuances in AI logical reasoning and bias towards specific deduction styles.

What do you think—should we test other AI models on this problem?

## My thoughts:

I found this super interesting after trying to make this logic problem out of boredom - and attempting to capture how AI can think through a reasoning problem that humans typically attack in a non-linear fashion. While both different GPT models ended up getting the right result, I was fasicinated by the different methods that the two models used, that were logical but did not at all follow the same framework.

In the future, I really want to experiment more with how to use AI to attack these sorts of reasoning problems that may not have a clear direct answer, and think about the actual thinking process of the AI LLM behind it.

If this problem were to get more complicated, I'd be curious about how the AI could form strings of thought that would mimic the sequential way humans tend to think about problems.