# Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic
Paper link https://arxiv.org/pdf/2309.13339.pdf

## Abstract

Recent advancements in large language models have showcased their remarkable generalizability across various
domains. However, their reasoning abilities still have significant room for improvement, especially when confronted
with scenarios requiring multi-step reasoning. Although large language models possess extensive knowledge, their
behavior, particularly in terms of reasoning, often fails to effectively utilize this knowledge to establish a coherent
thinking paradigm. Generative language models sometimes show hallucinations as their reasoning procedures
are unconstrained by logical principles. Aiming to improve the zero-shot chain-of-thought reasoning ability of large
language models, we propose Logical Chain-of-Thought (LogiCoT), a neurosymbolic framework which leverages
principles from symbolic logic to verify and revise the reasoning processes accordingly. Experimental evaluations
conducted on language tasks in diverse domains, including arithmetic, commonsense, symbolic, causal inference,
and social problems, demonstrate the efficacy of the enhanced reasoning paradigm by logic.

## Setup


In [2]:
import json
import os
import sys

import boto3

#Create the connection to Bedrock
bedrock = boto3.client(
    service_name='bedrock',
    region_name='us-east-1', 
    
)

bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name='us-east-1', 
    
)

def invoke_model(prompt_data):
    body = {"prompt": "Human: " + prompt_data + " \\nAssistant:",
            "max_tokens_to_sample": 1000, 
            "temperature": 1,
            "top_k": 250,
            "top_p": 0.999,
            "stop_sequences": ["\\n\\nHuman:"]}

    body = json.dumps(body) # Encode body as JSON string

    modelId = 'anthropic.claude-instant-v1' 
    accept = 'application/json'
    contentType = 'application/json'

    #Invoke the model
    response = bedrock_runtime.invoke_model(body=body.encode('utf-8'), # Encode to bytes
                                     modelId=modelId, 
                                     accept=accept, 
                                     contentType=contentType)

    response_body = json.loads(response.get('body').read())
    return response_body.get('completion')

## Example 1 (GSM8K Dataset)

### Example 1: Think

In [5]:
prompt_data = """
Question: Terry eats 2 yogurts a day. They are currently on sale at 4 yogurts for
$5.00. How much does he spend on yogurt over 30 days?

Let’s think step by step (Number the steps taken).

Answer:
"""
print(invoke_model(prompt_data))

 1) Terry eats 2 yogurts per day

2) The sale is 4 yogurts for $5

3) So the price per yogurt is $5/4 = $1.25

4) In 30 days Terry will eat 2 * 30 = 60 yogurts 

5) At $1.25 per yogurt, 60 yogurts will cost 60 * $1.25 = $75

Therefore, the amount Terry will spend on yogurt over 30 days is $75.


### Example 1: Verification Step #3

In [6]:
prompt_data = """

Question: Terry eats 2 yogurts a day. They are currently on sale at 4 yogurts for
$5.00. How much does he spend on yogurt over 30 days?

Let’s think step by step.

Answer:
1) Terry eats 2 yogurts per day

2) The sale is 4 yogurts for $5

Verification of the next step:
3) So the price per yogurt is $5/4 = $1.25

Let’s check two different reviews (X and Y).
Suport the more plausible one and criticise the other one.

"""
print(invoke_model(prompt_data))

 Okay, let's think through this step-by-step:

X) To calculate the cost for 30 days:
- Terry eats 2 yogurts per day
- In 30 days there are 30 days 
- So in 30 days Terry will eat 2 * 30 = 60 yogurts
- Since the sale is 4 yogurts for $5, to buy 60 yogurts it will take 60 / 4 = 15 batches of 4 yogurts each
- 15 batches at $5 per batch is 15 * $5 = $75

Therefore, the cost for Terry over 30 days is $75.

Y) To calculate the cost per yogurt: 
- The sale is 4 yogurts for $5
- So each yogurt costs $5/4 = $1.25
- Terry eats 2 yogurts per day
- In 30 days that's 2 * 30 = 60 yogurts
- At $1.25 per yogurt, 60 yogurts will cost 60 * $1.25 = $75

Therefore, the cost for Terry over 30 days is $75.

Review X calculates the total cost correctly but takes a less direct approach. Review Y is more straightforward and plausible - it directly calculates the cost per yogurt based on the sale information, then multiplies that by the total number of yogurts Terry will eat over 30 days. Review Y shows the log

In [7]:
prompt_data = """

Question: Terry eats 2 yogurts a day. They are currently on sale at 4 yogurts for
$5.00. How much does he spend on yogurt over 30 days?

Let’s think step by step.

Answer:
1) Terry eats 2 yogurts per day

2) The sale is 4 yogurts for $5

Verification of the next step:
3) So the price per yogurt is $5/4 = $1.25

Let’s check two different reviews (X and Y).
Suport the more plausible one and criticise the other one.

Okay, let's think through this step-by-step:

X) To calculate the cost for 30 days:
- Terry eats 2 yogurts per day
- In 30 days there are 30 days 
- So in 30 days Terry will eat 2 * 30 = 60 yogurts
- Since the sale is 4 yogurts for $5, to buy 60 yogurts it will take 60 / 4 = 15 batches of 4 yogurts each
- 15 batches at $5 per batch is 15 * $5 = $75

Therefore, the cost for Terry over 30 days is $75.

Y) To calculate the cost per yogurt: 
- The sale is 4 yogurts for $5
- So each yogurt costs $5/4 = $1.25
- Terry eats 2 yogurts per day
- In 30 days that's 2 * 30 = 60 yogurts
- At $1.25 per yogurt, 60 yogurts will cost 60 * $1.25 = $75

Therefore, the cost for Terry over 30 days is $75.

Review X calculates the total cost correctly but takes a less direct approach. Review Y is more straightforward and plausible - it directly calculates the cost per yogurt based on the sale information, then multiplies that by the total number of yogurts Terry will eat over 30 days. Review Y shows the logic in a clearer step-by-step manner. Therefore, I would say Review Y provides the more plausible solution and explanation.

Let’s start by analyzing one by one:
I. What are the premises and previous steps to support the verification of step #3?
(Your answer should quote exact quote as support.)

II. Criticise the incorrect review.
(Note the examined step doesn’t have to tackle the whole problem at once.)

Finally, identify whether step #3 is true or false.
Analysis and conclusion:

"""
print(invoke_model(prompt_data))

 I. What are the premises and previous steps to support the verification of step #3?
(Your answer should quote exact quote as support.)

The premises and previous steps that support the verification of step #3 are:

"2) The sale is 4 yogurts for $5" 

This statement provides the information needed to calculate the price per yogurt.

II. Criticise the incorrect review. 
(Note the examined step doesn’t have to tackle the whole problem at once.)

Review X takes a less direct approach by first calculating the total number of yogurts Terry will eat over 30 days, and then the number of batches needed to buy that amount. This adds unnecessary steps compared to Review Y. Review Y more directly calculates the price per yogurt based on the given sale information, which is a clearer and more straightforward approach.

Finally, identify whether step #3 is true or false.
Analysis and conclusion:

Step #3 is true. Given that the sale is for 4 yogurts for $5, the price per yogurt can correctly be cal

### Example 1: Revision

In [8]:
prompt_data = """
Question: Terry eats 2 yogurts a day. They are currently on sale at 4 yogurts for
$5.00. How much does he spend on yogurt over 30 days?

Let’s think step by step.

Answer:
1) Terry eats 2 yogurts per day

2) The sale is 4 yogurts for $5

Verification of the next step:
3) So the price per yogurt is $5/4 = $1.25

Let’s check two different reviews (X and Y).
Suport the more plausible one and criticise the other one.

Okay, let's think through this step-by-step:

X) To calculate the cost for 30 days:
- Terry eats 2 yogurts per day
- In 30 days there are 30 days 
- So in 30 days Terry will eat 2 * 30 = 60 yogurts
- Since the sale is 4 yogurts for $5, to buy 60 yogurts it will take 60 / 4 = 15 batches of 4 yogurts each
- 15 batches at $5 per batch is 15 * $5 = $75

Therefore, the cost for Terry over 30 days is $75.

Y) To calculate the cost per yogurt: 
- The sale is 4 yogurts for $5
- So each yogurt costs $5/4 = $1.25
- Terry eats 2 yogurts per day
- In 30 days that's 2 * 30 = 60 yogurts
- At $1.25 per yogurt, 60 yogurts will cost 60 * $1.25 = $75

Therefore, the cost for Terry over 30 days is $75.

Review X calculates the total cost correctly but takes a less direct approach. Review Y is more straightforward and plausible - it directly calculates the cost per yogurt based on the sale information, then multiplies that by the total number of yogurts Terry will eat over 30 days. Review Y shows the logic in a clearer step-by-step manner. Therefore, I would say Review Y provides the more plausible solution and explanation.

Let’s start by analyzing one by one:
I. What are the premises and previous steps to support the verification of step #3?
(Your answer should quote exact quote as support.)

II. Criticise the incorrect review.
(Note the examined step doesn’t have to tackle the whole problem at once.)

Finally, identify whether step #3 is true or false.
Analysis and conclusion:

 I. What are the premises and previous steps to support the verification of step #3?
(Your answer should quote exact quote as support.)

The premises and previous steps that support the verification of step #3 are:

"2) The sale is 4 yogurts for $5" 

This statement provides the information needed to calculate the price per yogurt.

II. Criticise the incorrect review. 
(Note the examined step doesn’t have to tackle the whole problem at once.)

Review X takes a less direct approach by first calculating the total number of yogurts Terry will eat over 30 days, and then the number of batches needed to buy that amount. This adds unnecessary steps compared to Review Y. Review Y more directly calculates the price per yogurt based on the given sale information, which is a clearer and more straightforward approach.

Finally, identify whether step #3 is true or false.
Analysis and conclusion:

Step #3 is true. Given that the sale is for 4 yogurts for $5, the price per yogurt can correctly be calculated as $5/4 = $1.25. This calculation is directly supported by the information provided in step 2 about the sale terms. Review Y provides a valid verification of this step by correctly applying the premises from previous steps to reach the conclusion in step 3.




Question: Terry eats 2 yogurts a day. They are currently on sale at 4 yogurts for
$5.00. How much does he spend on yogurt over 30 days?

Let’s think step by step.

Answer:
1) Terry eats 2 yogurts per day

2) The sale is 4 yogurts for $5
"""
print(invoke_model(prompt_data))

 Okay, here are the step-by-step workings:

1) Terry eats 2 yogurts per day

2) The sale is 4 yogurts for $5 

3) To calculate the price per yogurt based on the sale information:
   - The sale is for 4 yogurts for $5
   - To calculate the price per individual yogurt, we divide the total price by the number of yogurts sold as a bundle
   - So the price per yogurt is $5/4 = $1.25

4) Terry eats 2 yogurts per day 
   - In 30 days there are 30 days
   - So in 30 days Terry will eat 2 * 30 = 60 yogurts

5) At $1.25 per yogurt (calculated in step 3), 60 yogurts will cost:
   - 60 yogurts
   - At $1.25 per yogurt  
   - So 60 * $1.25 = $75

Therefore, the total amount Terry will spend on yogurt in 30 days is $75.

Does this make sense? Let me know if you need any part explained further.


## Example 2 (GSM8K Dataset)

### Example 2: Think

In [9]:
prompt_data = """
Question: Mike plays ping pong for 40 minutes. In the first 20 minutes, he scores 4
points. In the second 20 minutes, he scores 25% more points. How many total
points did he score?

Let’s think step by step (Number the steps taken).
Answer:
"""
print(invoke_model(prompt_data))

 1. Mike played ping pong for 40 minutes total, which was split into two sections:
- The first 20 minutes, he scored 4 points
- The second 20 minutes

2. We're given that in the second 20 minutes, he scored 25% more points than the first 20 minutes. 
25% of 4 points is 0.25 * 4 = 1 point.

3. So in the second 20 minutes he scored 4 + 1 = 5 points. 

4. To get the total points:
- First 20 minutes: 4 points  
- Second 20 minutes: 5 points
- Total points = 4 + 5 = 9 points

Therefore, the total number of points Mike scored is 9.


### Example 2: Verification of step #4

In [10]:
prompt_data = """

Question: Mike plays ping pong for 40 minutes. In the first 20 minutes, he scores 4
points. In the second 20 minutes, he scores 25% more points. How many total
points did he score?

Let’s think step by step (Number the steps taken).
Answer:
 1. Mike played ping pong for 40 minutes total, which was split into two sections:
- The first 20 minutes, he scored 4 points
- The second 20 minutes

2. We're given that in the second 20 minutes, he scored 25% more points than the first 20 minutes. 
25% of 4 points is 0.25 * 4 = 1 point.

3. So in the second 20 minutes he scored 4 + 1 = 5 points.

Verification of the next step:
4. To get the total points:
- First 20 minutes: 4 points  
- Second 20 minutes: 5 points
- Total points = 4 + 5 = 9 points

Let’s check two different reviews (X and Y).
Suport the more plausible one and criticise the other one.

"""
print(invoke_model(prompt_data))

 Review X:
The working and logic seems correct. All steps are numbered and explained clearly. 

Review Y: 
Step 3 is incorrect. 25% of 4 is 1, not 0.25. It should say he scored 4 + 1 = 5 points in the second 20 minutes, not 4 + 0.25 = 4.25 points.  

My analysis: 
Review X is correct. The solution provided has the right working and logic. 
Review Y makes a valid criticism of Step 3, pointing out correctly that 25% of 4 is 1, not 0.25. This would result in a different point total being calculated in Step 4.

Review X appropriately validated that the working and logic presented was sound. Review Y constructively identified a specific error in Step 3, which could impact the accuracy of the total point calculation. Of the two reviews, Review Y provides a more useful evaluation by highlighting a technical mistake, rather than just a blanket validation. This allows the error to be addressed and the solution strengthened.


In [11]:
prompt_data = """

Question: Mike plays ping pong for 40 minutes. In the first 20 minutes, he scores 4
points. In the second 20 minutes, he scores 25% more points. How many total
points did he score?

Let’s think step by step (Number the steps taken).
Answer:
 1. Mike played ping pong for 40 minutes total, which was split into two sections:
- The first 20 minutes, he scored 4 points
- The second 20 minutes

2. We're given that in the second 20 minutes, he scored 25% more points than the first 20 minutes. 
25% of 4 points is 0.25 * 4 = 1 point.

3. So in the second 20 minutes he scored 4 + 1 = 5 points.

Verification of the next step:
4. To get the total points:
- First 20 minutes: 4 points  
- Second 20 minutes: 5 points
- Total points = 4 + 5 = 9 points

Let’s check two different reviews (X and Y).
Suport the more plausible one and criticise the other one.

 Review X:
The working and logic seems correct. All steps are numbered and explained clearly. 

Review Y: 
Step 3 is incorrect. 25% of 4 is 1, not 0.25. It should say he scored 4 + 1 = 5 points in the second 20 minutes, not 4 + 0.25 = 4.25 points.  

My analysis: 
Review X is correct. The solution provided has the right working and logic. 
Review Y makes a valid criticism of Step 3, pointing out correctly that 25% of 4 is 1, not 0.25. This would result in a different point total being calculated in Step 4.

Review X appropriately validated that the working and logic presented was sound. Review Y constructively identified a specific error in Step 3, which could impact the accuracy of the total point calculation. Of the two reviews, Review Y provides a more useful evaluation by highlighting a technical mistake, rather than just a blanket validation. This allows the error to be addressed and the solution strengthened.


Let’s start by analyzing one by one:
I. What are the premises and previous steps to support the verification of step #4?
(Your answer should quote exact quote as support.)

II. Criticise the incorrect review.
(Note the examined step doesn’t have to tackle the whole problem at once.)
Finally, identify whether step #4 is true or false.

Analysis and conclusion:
"""
print(invoke_model(prompt_data))

 Here is my analysis:

I. The premises and previous steps that support the verification of step #4 are:

- "Mike played ping pong for 40 minutes total, which was split into two sections:" 

- "The first 20 minutes, he scored 4 points" 

- "In the second 20 minutes, he scored 25% more points than the first 20 minutes. 25% of 4 points is 0.25 * 4 = 1 point."

- "So in the second 20 minutes he scored 4 + 1 = 5 points."

II. Review Y correctly criticizes step #3. While step #3 calculates 25% of 4 points as 0.25 instead of 1, it only examines that step in isolation without considering the previous logic. 25% of 4 is indeed 1 point as Review Y pointed out, not 0.25. Review Y provides a specific technical criticism, but it does not fully validate or invalidate the overall solution.

Finally, based on the premises and steps provided, step #4 stating the total points as 4 + 5 = 9 points is true.


### Example 2: Revision

In [12]:
prompt_data = """
Question: Mike plays ping pong for 40 minutes. In the first 20 minutes, he scores 4
points. In the second 20 minutes, he scores 25% more points. How many total
points did he score?

Let’s think step by step (Number the steps taken).
Answer:
 1. Mike played ping pong for 40 minutes total, which was split into two sections:
- The first 20 minutes, he scored 4 points
- The second 20 minutes

2. We're given that in the second 20 minutes, he scored 25% more points than the first 20 minutes. 
25% of 4 points is 0.25 * 4 = 1 point.

3. So in the second 20 minutes he scored 4 + 1 = 5 points.

Verification of the next step:
4. To get the total points:
- First 20 minutes: 4 points  
- Second 20 minutes: 5 points
- Total points = 4 + 5 = 9 points

Let’s check two different reviews (X and Y).
Suport the more plausible one and criticise the other one.

 Review X:
The working and logic seems correct. All steps are numbered and explained clearly. 

Review Y: 
Step 3 is incorrect. 25% of 4 is 1, not 0.25. It should say he scored 4 + 1 = 5 points in the second 20 minutes, not 4 + 0.25 = 4.25 points.  

My analysis: 
Review X is correct. The solution provided has the right working and logic. 
Review Y makes a valid criticism of Step 3, pointing out correctly that 25% of 4 is 1, not 0.25. This would result in a different point total being calculated in Step 4.

Review X appropriately validated that the working and logic presented was sound. Review Y constructively identified a specific error in Step 3, which could impact the accuracy of the total point calculation. Of the two reviews, Review Y provides a more useful evaluation by highlighting a technical mistake, rather than just a blanket validation. This allows the error to be addressed and the solution strengthened.


Let’s start by analyzing one by one:
I. What are the premises and previous steps to support the verification of step #4?
(Your answer should quote exact quote as support.)

II. Criticise the incorrect review.
(Note the examined step doesn’t have to tackle the whole problem at once.)
Finally, identify whether step #4 is true or false.

Analysis and conclusion:

Here is my analysis:

I. The premises and previous steps that support the verification of step #4 are:

- "Mike played ping pong for 40 minutes total, which was split into two sections:" 

- "The first 20 minutes, he scored 4 points" 

- "In the second 20 minutes, he scored 25% more points than the first 20 minutes. 25% of 4 points is 0.25 * 4 = 1 point."

- "So in the second 20 minutes he scored 4 + 1 = 5 points."

II. Review Y correctly criticizes step #3. While step #3 calculates 25% of 4 points as 0.25 instead of 1, it only examines that step in isolation without considering the previous logic. 25% of 4 is indeed 1 point as Review Y pointed out, not 0.25. Review Y provides a specific technical criticism, but it does not fully validate or invalidate the overall solution.

Finally, based on the premises and steps provided, step #4 stating the total points as 4 + 5 = 9 points is true.


Question: Mike plays ping pong for 40 minutes. In the first 20 minutes, he scores 4
points. In the second 20 minutes, he scores 25% more points. How many total
points did he score?

Let’s think step by step (Number the steps taken).
Answer:
 1. Mike played ping pong for 40 minutes total, which was split into two sections:
- The first 20 minutes, he scored 4 points
- The second 20 minutes

2. We're given that in the second 20 minutes, he scored 25% more points than the first 20 minutes. 
25% of 4 points is 0.25 * 4 = 1 point.

3. So in the second 20 minutes he scored 4 + 1 = 5 points.

"""
print(invoke_model(prompt_data))

 Here is how I would solve this problem step-by-step:

1. Mike played ping pong for 40 minutes total, split into two 20 minute sections
2. In the first 20 minutes he scored 4 points 
3. To calculate his score in the second 20 minutes:
   - He scored 25% more than the first section
   - 25% of 4 points is 0.25 * 4 = 1 point
   - In the first section he scored 4 points
   - So 25% more is 4 + 1 = 5 points
4. Total score:
   - First 20 minutes: 4 points
   - Second 20 minutes: 5 points  
   - Total points = 4 + 5 = 9 points

Let me know if you have any other questions!


## Example 3 (Date)

### Example 3: Think

In [13]:
prompt_data = """
Question: The deadline is Jun 1, 2021, which is 2 days away from now. What is the
date tomorrow in MM/DD/YYYY?

Let’s think step by step.
Answer:
"""
print(invoke_model(prompt_data))

 Okay, here are the steps:

1) The current date is May 30, 2021 
2) The deadline is June 1, 2021, which is 2 days away from now
3) To find the date tomorrow, we add 1 day to the current date
4) The current date is 05/30/2021
5) Adding 1 day gives 05/31/2021

Therefore, the date tomorrow in MM/DD/YYYY format is 05/31/2021.


### Example 3: Verification of step #1

In [14]:
prompt_data = """

Question: The deadline is Jun 1, 2021, which is 2 days away from now. What is the
date tomorrow in MM/DD/YYYY?

Let’s think step by step.
Answer:

Verification of the next step:
1) The current date is May 30, 2021

Let’s check two different reviews (X and Y).
Suport the more plausible one and criticise the other one.

"""
print(invoke_model(prompt_data))

 Review X: 

The date tomorrow would be 05/31/2021. 

Review Y:

The date tomorrow would be 06/01/2021.

Analysis: 

Review X is more plausible. While the deadline mentioned is June 1st, which is 2 days away, the question explicitly asks for the date tomorrow, not the deadline date. Since today is May 30th, tomorrow's date would be May 31st, not jumping ahead to June 1st. 

Review Y makes the incorrect assumption that the question is asking for the deadline date of June 1st, rather than the date tomorrow. The deadline information provides context, but is not what is being asked for. 

Therefore, Review X correctly identifies that with the given information of the current date (May 30th), the date tomorrow would be May 31st, 2021. Review Y is not as accurate in its interpretation of what date is being asked for in the question.


In [15]:
prompt_data = """

Question: The deadline is Jun 1, 2021, which is 2 days away from now. What is the
date tomorrow in MM/DD/YYYY?

Let’s think step by step.
Answer:

Verification of the next step:
1) The current date is May 30, 2021

Let’s check two different reviews (X and Y).
Suport the more plausible one and criticise the other one.

 Review X: 

The date tomorrow would be 05/31/2021. 

Review Y:

The date tomorrow would be 06/01/2021.

Analysis: 

Review X is more plausible. While the deadline mentioned is June 1st, which is 2 days away, the question explicitly asks for the date tomorrow, not the deadline date. Since today is May 30th, tomorrow's date would be May 31st, not jumping ahead to June 1st. 

Review Y makes the incorrect assumption that the question is asking for the deadline date of June 1st, rather than the date tomorrow. The deadline information provides context, but is not what is being asked for. 

Therefore, Review X correctly identifies that with the given information of the current date (May 30th), the date tomorrow would be May 31st, 2021. Review Y is not as accurate in its interpretation of what date is being asked for in the question.

Let’s start by analyzing one by one:

I. What are the premises and previous steps to support the verification of step #1?
(Your answer should quote exact quote as support.)

II. Criticise the incorrect review.
(Note the examined step doesn’t have to tackle the whole problem at once.)

Finally, identify whether step #1 is true or false.

Analysis and conclusion:
"""
print(invoke_model(prompt_data))

 Here is my analysis:

I. The premises and previous steps to support verification of step #1 are:

"The deadline is Jun 1, 2021, which is 2 days away from now." 

This directly states that the current date is May 30, 2021, since the deadline of June 1st is 2 days away.

II. Review Y is incorrect because it does not accurately interpret what date is being asked for in the question. The question asks "What is the date tomorrow in MM/DD/YYYY?", not "What is the deadline date?". Review Y makes the incorrect assumption that the question is asking for the deadline of June 1st, rather than the actual date tomorrow, given the current date is May 30th.

Conclusion: Step #1 stating the current date is May 30, 2021 is true based on the information given in the question that the deadline of June 1st is 2 days away from now.


### Example 3: Revision

In [16]:
prompt_data = """

Question: The deadline is Jun 1, 2021, which is 2 days away from now. What is the
date tomorrow in MM/DD/YYYY?

Let’s think step by step.
Answer:

Verification of the next step:
1) The current date is May 30, 2021

Let’s check two different reviews (X and Y).
Suport the more plausible one and criticise the other one.

 Review X: 

The date tomorrow would be 05/31/2021. 

Review Y:

The date tomorrow would be 06/01/2021.

Analysis: 

Review X is more plausible. While the deadline mentioned is June 1st, which is 2 days away, the question explicitly asks for the date tomorrow, not the deadline date. Since today is May 30th, tomorrow's date would be May 31st, not jumping ahead to June 1st. 

Review Y makes the incorrect assumption that the question is asking for the deadline date of June 1st, rather than the date tomorrow. The deadline information provides context, but is not what is being asked for. 

Therefore, Review X correctly identifies that with the given information of the current date (May 30th), the date tomorrow would be May 31st, 2021. Review Y is not as accurate in its interpretation of what date is being asked for in the question.

Let’s start by analyzing one by one:

I. What are the premises and previous steps to support the verification of step #1?
(Your answer should quote exact quote as support.)

II. Criticise the incorrect review.
(Note the examined step doesn’t have to tackle the whole problem at once.)

Finally, identify whether step #1 is true or false.

Analysis and conclusion:

 Here is my analysis:

I. The premises and previous steps to support verification of step #1 are:

"The deadline is Jun 1, 2021, which is 2 days away from now." 

This directly states that the current date is May 30, 2021, since the deadline of June 1st is 2 days away.

II. Review Y is incorrect because it does not accurately interpret what date is being asked for in the question. The question asks "What is the date tomorrow in MM/DD/YYYY?", not "What is the deadline date?". Review Y makes the incorrect assumption that the question is asking for the deadline of June 1st, rather than the actual date tomorrow, given the current date is May 30th.

Conclusion: Step #1 stating the current date is May 30, 2021 is true based on the information given in the question that the deadline of June 1st is 2 days away from now.



Question: The deadline is Jun 1, 2021, which is 2 days away from now. What is the
date tomorrow in MM/DD/YYYY?

Let’s think step by step.
Answer:
"""
print(invoke_model(prompt_data))

 Here are the steps:

1) The current date is May 30, 2021

Analysis of Step 1:
- The question states that the deadline is Jun 1, 2021, which is 2 days away from now. 
- Since the deadline is 2 days away, we can infer that the current date must be May 30, 2021.
- Therefore, Step 1 correctly identifies the current date.

2) The date tomorrow would be 05/31/2021

Analysis of Step 2: 
- The question asks for the date tomorrow, not the deadline date.
- Given that today is May 30th, tomorrow's date would be May 31st.  
- Therefore, Step 2 correctly identifies that with the current date of May 30th, the date tomorrow would be May 31, 2021.

Conclusion: The two steps correctly answer the question by first identifying the current date based on the deadline information, and then determining that the date tomorrow would be May 31, 2021.


## Example 4: AQuA

### Example 4: Think

In [18]:
prompt_data = """
Analyze and answer the following single-choice problem.
Question: An aeroplane flies along the four sides of a square at the speeds of 200,
400, 600 and 800km/hr. Find the average speed of the plane around the field?

Options:
OptA)384
OptB)562
OptC)458
OptD)156
OptE)452

Let’s think step by step (Number the steps taken).
Answer:
"""
print(invoke_model(prompt_data))

 Here are the step-by-step workings:

1) The length of each side of the square field is L km (let's assume it's 1 km for simplicity).

2) Time taken to fly the first side (at 200 km/hr): T1 = L/200 = 1/200 hrs = 0.005 hrs

3) Time taken to fly the second side (at 400 km/hr): T2 = L/400 = 1/400 hrs = 0.0025 hrs 

4) Time taken to fly the third side (at 600 km/hr): T3 = L/600 = 1/600 hrs = 0.001666 hrs

5) Time taken to fly the fourth side (at 800 km/hr): T4 = L/800 = 1/800 hrs = 0.00125 hrs

6) Total time taken = T1 + T2 + T3 + T4 = 0.005 + 0.0025 + 0.001666 + 0.00125 = 0.011666 hrs

7) Total distance covered = 4L = 4 km 

8) Average speed = Total Distance / Total Time = 4 km / 0.011666 hrs = 4/(0.011666*3600) km/hr = 4 * (3600/0.011666) = 562 km/hr

The average speed is 562 km/hr. Hence, the answer is OptB.


### Example 4: Verification of step #2

In [19]:
prompt_data = """
Analyze and answer the following single-choice problem.
Question: An aeroplane flies along the four sides of a square at the speeds of 200,
400, 600 and 800km/hr. Find the average speed of the plane around the field?

Options:
OptA)384
OptB)562
OptC)458
OptD)156
OptE)452

Let’s think step by step (Number the steps taken).
Answer:

Here are the step-by-step workings:

1) The length of each side of the square field is L km (let's assume it's 1 km for simplicity).

Verification of the next step:
2) Time taken to fly the first side (at 200 km/hr): T1 = L/200 = 1/200 hrs = 0.005 hrs

Let’s check two different reviews (X and Y).
Suport the more plausible one and criticise the other one.

"""
print(invoke_model(prompt_data))

 Here are two reviews with my analysis:

Review X:
3) Time taken to fly the second side (at 400 km/hr): T2 = L/400 = 1/400 hrs = 0.0025 hrs 
4) Time taken to fly the third side (at 600 km/hr): T3 = L/600 = 1/600 hrs = 0.00166666 hrs
5) Time taken to fly the fourth side (at 800 km/hr): T4 = L/800 = 1/800 hrs = 0.00125 hrs
6) Total time = T1 + T2 + T3 + T4 = 0.005 + 0.0025 + 0.00166666 + 0.00125 = 0.01041666 hrs
7) Total distance covered = 4L = 4 km 
8) Average speed = Total Distance / Total Time = 4 / 0.01041666 = 384 km/hr

This is a logical step-by-step working. The times and speeds are calculated correctly for each side. The total time and distance are then used to find the average speed. This review provides the correct answer and working.

Review Y: 
3) Average speed = (Speed1 + Speed2 + Speed3 + Speed4) / 4  
             = (200 + 400 + 600 + 800) / 4 = 1000 / 4 = 250 km/hr

This review skips important calculation steps. It directly takes the average of the individual speeds, with

In [20]:
prompt_data = """
Analyze and answer the following single-choice problem.
Question: An aeroplane flies along the four sides of a square at the speeds of 200,
400, 600 and 800km/hr. Find the average speed of the plane around the field?

Options:
OptA)384
OptB)562
OptC)458
OptD)156
OptE)452

Let’s think step by step (Number the steps taken).
Answer:

Here are the step-by-step workings:

1) The length of each side of the square field is L km (let's assume it's 1 km for simplicity).

Verification of the next step:
2) Time taken to fly the first side (at 200 km/hr): T1 = L/200 = 1/200 hrs = 0.005 hrs

Let’s check two different reviews (X and Y).
Suport the more plausible one and criticise the other one.

Let’s start by analyzing one by one:

 Here are two reviews with my analysis:

Review X:
3) Time taken to fly the second side (at 400 km/hr): T2 = L/400 = 1/400 hrs = 0.0025 hrs 
4) Time taken to fly the third side (at 600 km/hr): T3 = L/600 = 1/600 hrs = 0.00166666 hrs
5) Time taken to fly the fourth side (at 800 km/hr): T4 = L/800 = 1/800 hrs = 0.00125 hrs
6) Total time = T1 + T2 + T3 + T4 = 0.005 + 0.0025 + 0.00166666 + 0.00125 = 0.01041666 hrs
7) Total distance covered = 4L = 4 km 
8) Average speed = Total Distance / Total Time = 4 / 0.01041666 = 384 km/hr

This is a logical step-by-step working. The times and speeds are calculated correctly for each side. The total time and distance are then used to find the average speed. This review provides the correct answer and working.

Review Y: 
3) Average speed = (Speed1 + Speed2 + Speed3 + Speed4) / 4  
             = (200 + 400 + 600 + 800) / 4 = 1000 / 4 = 250 km/hr

This review skips important calculation steps. It directly takes the average of the individual speeds, without considering the different times taken. Average speed is calculated as Total Distance / Total Time, not by averaging the individual speeds. 

Review X is more plausible as it shows the full correct working, while Review Y skips important steps and provides the wrong approach to solving this average speed question. Review X should be supported for providing the logically correct working and solution.


I. What are the premises and previous steps to support the verification of step #2?
(Your answer should quote exact quote as support.)

II. Criticise the incorrect review.
(Note the examined step doesn’t have to tackle the whole problem at once.)

Finally, identify whether step #2 is true or false.

Analysis and conclusion:

"""
print(invoke_model(prompt_data))

 I. The premises and previous steps to support the verification of step #2 are:

"Let's assume it's 1 km for simplicity. Time taken to fly the first side (at 200 km/hr): T1 = L/200 = 1/200 hrs = 0.005 hrs"

II. The step that is incorrectly handled in Review Y is: 

"3) Average speed = (Speed1 + Speed2 + Speed3 + Speed4) / 4
             = (200 + 400 + 600 + 800) / 4 = 1000 / 4 = 250 km/hr"

This step takes the direct average of the individual speeds, without considering the different times taken to travel each side. However, calculating the average speed correctly requires taking the total distance traveled divided by the total time taken, not averaging the individual speeds.  

Finally, step #2 stating "Time taken to fly the first side (at 200 km/hr): T1 = L/200 = 1/200 hrs = 0.005 hrs" is true based on the premise provided in step 1 that the length of each side L is 1 km.

Therefore, Review X shows the full correct working and logically calculates the average speed, while Review Y ta

### Example 4: Revision

In [21]:
prompt_data = """
Analyze and answer the following single-choice problem.
Question: An aeroplane flies along the four sides of a square at the speeds of 200,
400, 600 and 800km/hr. Find the average speed of the plane around the field?

Options:
OptA)384
OptB)562
OptC)458
OptD)156
OptE)452

Let’s think step by step (Number the steps taken).
Answer:

Here are the step-by-step workings:

1) The length of each side of the square field is L km (let's assume it's 1 km for simplicity).

Verification of the next step:
2) Time taken to fly the first side (at 200 km/hr): T1 = L/200 = 1/200 hrs = 0.005 hrs

Let’s check two different reviews (X and Y).
Suport the more plausible one and criticise the other one.

Let’s start by analyzing one by one:

 Here are two reviews with my analysis:

Review X:
3) Time taken to fly the second side (at 400 km/hr): T2 = L/400 = 1/400 hrs = 0.0025 hrs 
4) Time taken to fly the third side (at 600 km/hr): T3 = L/600 = 1/600 hrs = 0.00166666 hrs
5) Time taken to fly the fourth side (at 800 km/hr): T4 = L/800 = 1/800 hrs = 0.00125 hrs
6) Total time = T1 + T2 + T3 + T4 = 0.005 + 0.0025 + 0.00166666 + 0.00125 = 0.01041666 hrs
7) Total distance covered = 4L = 4 km 
8) Average speed = Total Distance / Total Time = 4 / 0.01041666 = 384 km/hr

This is a logical step-by-step working. The times and speeds are calculated correctly for each side. The total time and distance are then used to find the average speed. This review provides the correct answer and working.

Review Y: 
3) Average speed = (Speed1 + Speed2 + Speed3 + Speed4) / 4  
             = (200 + 400 + 600 + 800) / 4 = 1000 / 4 = 250 km/hr

This review skips important calculation steps. It directly takes the average of the individual speeds, without considering the different times taken. Average speed is calculated as Total Distance / Total Time, not by averaging the individual speeds. 

Review X is more plausible as it shows the full correct working, while Review Y skips important steps and provides the wrong approach to solving this average speed question. Review X should be supported for providing the logically correct working and solution.


I. What are the premises and previous steps to support the verification of step #2?
(Your answer should quote exact quote as support.)

II. Criticise the incorrect review.
(Note the examined step doesn’t have to tackle the whole problem at once.)

Finally, identify whether step #2 is true or false.

Analysis and conclusion:

 I. The premises and previous steps to support the verification of step #2 are:

"Let's assume it's 1 km for simplicity. Time taken to fly the first side (at 200 km/hr): T1 = L/200 = 1/200 hrs = 0.005 hrs"

II. The step that is incorrectly handled in Review Y is: 

"3) Average speed = (Speed1 + Speed2 + Speed3 + Speed4) / 4
             = (200 + 400 + 600 + 800) / 4 = 1000 / 4 = 250 km/hr"

This step takes the direct average of the individual speeds, without considering the different times taken to travel each side. However, calculating the average speed correctly requires taking the total distance traveled divided by the total time taken, not averaging the individual speeds.  

Finally, step #2 stating "Time taken to fly the first side (at 200 km/hr): T1 = L/200 = 1/200 hrs = 0.005 hrs" is true based on the premise provided in step 1 that the length of each side L is 1 km.

Therefore, Review X shows the full correct working and logically calculates the average speed, while Review Y takes an incorrect shortcut and arrives at the wrong solution. Review X is more plausible and should be supported.



Analyze and answer the following single-choice problem.
Question: An aeroplane flies along the four sides of a square at the speeds of 200,
400, 600 and 800km/hr. Find the average speed of the plane around the field?

Options:
OptA)384
OptB)562
OptC)458
OptD)156
OptE)452

Let’s think step by step (Number the steps taken).
Answer:

Here are the step-by-step workings:

1) The length of each side of the square field is L km (let's assume it's 1 km for simplicity).

"""
print(invoke_model(prompt_data))

 Here are the step-by-step workings:

1) The length of each side of the square field is L km (let's assume it's 1 km for simplicity).

2) Time taken to fly the first side (at 200 km/hr): 
   T1 = L/Speed = 1 km / 200 km/hr = 1/200 hr = 0.005 hr

3) Time taken to fly the second side (at 400 km/hr):  
   T2 = L/Speed = 1 km / 400 km/hr = 1/400 hr = 0.0025 hr
   
4) Time taken to fly the third side (at 600 km/hr):
   T3 = L/Speed = 1 km / 600 km/hr = 1/600 hr = 0.00166666 hr
   
5) Time taken to fly the fourth side (at 800 km/hr):  
   T4 = L/Speed = 1 km / 800 km/hr = 1/800 hr = 0.00125 hr
   
6) Total time = T1 + T2 + T3 + T4 = 0.005 + 0.0025 + 0.00166666 + 0.00125 = 0.01041666 hr

7) Total distance covered = 4 * L = 4 km

8) Average speed = Total Distance / Total Time 
                = 4 km / 0.01041666 hr
                = 384 km/hr

The answer is OptA) 384

My working shows all the calculation steps clearly. Please let me know if you need any part explained further.
