### DISCLAIMER

This is system based on the methodology proposed by [AI Explained](https://www.youtube.com/watch?v=wVzuvf9D9BU).

In [214]:
import os
from dotenv import load_dotenv
load_dotenv()

from langchain import PromptTemplate
from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain.chat_models import ChatOpenAI
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

### Creating Smart GPT Prompts

The system works in 3 stages:

1. Generate several different solutions using CoT Reasoning [paper 1](https://arxiv.org/pdf/2305.02897.pdf)
2. Use reflection prompt to analysis each of the solution [paper 2](https://arxiv.org/pdf/2303.11366.pdf)
3. Use a resolved to analysis reflections and extract answer, this is done through dialogue-enabled resolving [paper 3](https://arxiv.org/pdf/2303.17071.pdf)

First off we create 3 prompts representing this flow.

In [212]:
chain_of_though_template = """
Question {question} . Answer: Let's work this out in a step by step way to be sure we have the right answer.""
"""

chain_of_though_prompt = PromptTemplate(
    input_variables=["question"],
    template=chain_of_though_template,
)
chain_of_though_prompt.format(question="What is the meaning of life?")

'\nQuestion What is the meaning of life? . Answer: Let\'s work this out in a step by step way to be sure we have the right answer.""\n'

In [83]:
reflextion_template = """
You are a researcher tasked with investigating the {options} response options provided. 
List the flaws and faulty logic of each answer options. 
Let's work this out in a step by step way to be sure we have all the errors:
"""

reflextion_prompt = PromptTemplate(
    input_variables=["options"],
    template=reflextion_template,
)
reflextion_prompt.format(options="\n".join(["Option 1", "Option 2"]))

"\nYou are a researcher tasked with investigating the Option 1\nOption 2 response options provided. \nList the flaws and faulty logic of each answer options. \nLet's work this out in a step by step way to be sure we have all the errors:\n"

In [104]:
resolver_template = """
You are a resolver task with 
1) finding which of the {researcher_analysis} answers options the researcher thought was the best
2) improving that answer, and
3) Printing the improved answer in full
Let's work this out in a step by step way to be sure we have the right answer:
"""

resolver_prompt = PromptTemplate(
    input_variables=["researcher_analysis"],
    template=resolver_template,
)
resolver_prompt.format(researcher_analysis="researcher_analysis")

"\nYou are a resolver task with \n1) finding which of the researcher_analysis answers options the researcher thought was the best\n2) improving that answer, and\n3) Printing the improved answer in full\nLet's work this out in a step by step way to be sure we have the right answer:\n"

In [205]:
extraction_template = """
You are an extraction tool designed to check which answer a user has given.

Given the question: {question}

The working out of the solution by the user is:\n {solution}

Extract which answer the user thought was the correct answer. 

Please respond with no addtional text, only: A, B, C or D.
"""

extraction_prompt = PromptTemplate(
    input_variables=["question", "solution"],
    template=extraction_template,
)
extraction_prompt.format(question="is it, a, b, c or d", solution="The answer is A")

'\nYou are an extraction tool designed to check which answer a user has given.\n\nGiven the question: is it, a, b, c or d\n\nThe working out of the solution by the user is:\n The answer is A\n\nExtract which answer the user thought was the correct answer. \n\nPlease respond with no addtional text, only: A, B, C or D.\n'

### Setup Up Chat Agent

In [109]:
chat_llm = ChatOpenAI(temperature=0.9, model="gpt-3.5-turbo")
llm_result = chat_llm([HumanMessage(content="Translate this sentence from English to French. I love programming.")])
llm_result

AIMessage(content="J'aime programmer.", additional_kwargs={}, example=False)

### Experimentation

The following is going through the flow of Smart GPT individually for each prompt

In [215]:
question1 = """
What is the embryological origin of the hyoid bone?
(A) The first pharyngeal arch
(B) The first and second pharyngeal arches
(C) The second pharyngeal arch
(D) The second and third pharyngeal arches
"""

question2 = """
In a given population, 1 out of every 400 people has a cancer caused by a completely recessive
allele, b. Assuming the population is in Hardy-Weinberg equilibrium, which of the following is
the expected proportion of individuals who carry the b allele but are not expected to develop the
cancer?
(A) 1/400 (B) 19/400 (C) 20/400 (D) 38/400
"""

chain = LLMChain(llm=chat_llm, prompt=chain_of_though_prompt, verbose=True)
result = chain.run({"question": question2})
result



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Question 
In a given population, 1 out of every 400 people has a cancer caused by a completely recessive
allele, b. Assuming the population is in Hardy-Weinberg equilibrium, which of the following is
the expected proportion of individuals who carry the b allele but are not expected to develop the
cancer?
(A) 1/400 (B) 19/400 (C) 20/400 (D) 38/400
 . Answer: Let's work this out in a step by step way to be sure we have the right answer.""
[0m

[1m> Finished chain.[0m


'We know that the frequency of the b allele is q = sqrt(1/400) = 1/20. \nUsing Hardy-Weinberg equilibrium, we can calculate the expected frequencies of each genotype: \nBB = (1/20)^2 = 1/400 \nBb = 2(1/20)(19/20) = 19/200 \nbb = (19/20)^2 = 361/400 \n\nThe proportion of individuals who carry the b allele but are not expected to develop the cancer is the frequency of the Bb genotype, since carriers will not exhibit the recessive disease. \nTherefore, the expected proportion is 19/200 or simplified to 19/400, which is answer choice (B).'

In [95]:
question = question2
num_options = 2

options = []
for i in range(num_options):
  options.append(chain.run({"question": question}))
options



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Question 
In a given population, 1 out of every 400 people has a cancer caused by a completely recessive
allele, b. Assuming the population is in Hardy-Weinberg equilibrium, which of the following is
the expected proportion of individuals who carry the b allele but are not expected to develop the
cancer?
(A) 1/400 (B) 19/400 (C) 20/400 (D) 38/400
 . Answer: Let's work this out in a step by step way to be sure we have the right answer.""
[0m

[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Question 
In a given population, 1 out of every 400 people has a cancer caused by a completely recessive
allele, b. Assuming the population is in Hardy-Weinberg equilibrium, which of the following is
the expected proportion of individuals who carry the b allele but are not expected to develop the
cancer?
(A) 1/400 (B) 19/400 (C) 20/400 (D) 38/400
 . Answer: Let

In [99]:
reflextion_chain = LLMChain(llm=chat_llm, prompt=reflextion_prompt, verbose=True)
reflextion_result = reflextion_chain.run({"options": options})
reflextion_result



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
You are a researcher tasked with investigating the ['First, we need to determine the frequency of the b allele in the population. Since the cancer caused by the b allele is completely recessive, we can use the Hardy-Weinberg equation:\n\np^2 + 2pq + q^2 = 1\n\nwhere p is the frequency of the dominant allele (B) and q is the frequency of the recessive allele (b).\n\nSince the cancer caused by the b allele is completely recessive, we know that q^2 represents the frequency of individuals who have the cancer. We are told that 1 out of every 400 people has the cancer, so:\n\nq^2 = 1/400\nq = sqrt(1/400) = 1/20\n\nNow we can use the equation to solve for p:\n\np^2 + 2pq + q^2 = 1\np^2 + 2p(1-p) + (1/20)^2 = 1\np^2 - 2p + 1 + 1/400 = 1\np^2 - 2p + 1/400 = 0\n\nSolving for p using the quadratic formula, we get:\n\np = (2 ± sqrt(4 - 4(1)(1/400))) / 2\np = 1 - q\np = 0.975\n\nSo the frequency of the B allele in the

In [105]:
resolver_chain = LLMChain(llm=chat_llm, prompt=resolver_prompt, verbose=True)
resolved_result = resolver_chain.run({"researcher_analysis": reflextion_result})
resolved_result



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
You are a resolver task with 
1) finding which of the Answer choice (A): 
The answer choice suggests the expected proportion to be 0.325. However, this is not correct because it does not take into account the complete recessiveness of the b allele and the fact that carriers of the allele are not expected to develop cancer. 

Answer choice (B): 
The answer choice suggests the expected proportion to be 19/400. This is the correct answer and there are no apparent flaws or faulty logic in this choice. 

Answer choice (C): 
The answer choice suggests the expected proportion to be 0.95. However, this is not correct because it does not take into account the complete recessiveness of the b allele and the fact that carriers of the allele are not expected to develop cancer. 

Answer choice (D): 
The answer choice suggests the expected proportion to be 38/400. This is close to the correct answer of 19/400, but it is

'1) After careful consideration, the correct answer choice is (B) which suggests the expected proportion to be 19/400. \n2) To improve on this answer, we can further explain why this answer is correct. The Hardy-Weinberg principle allows us to predict the frequency of alleles in a population, assuming that the population is in equilibrium and certain conditions are met. In this case, we are looking at the frequency of the b allele, which is recessive. The equation for calculating the frequency of a recessive allele in a population is q^2, where q is the frequency of the recessive allele and q^2 is the proportion of homozygous recessive individuals in the population. \n\nWe are given that the frequency of the B allele is 0.4, so the frequency of the b allele is 0.6 (since p + q = 1). Therefore, the proportion of homozygous recessive individuals in the population is q^2 = (0.6)^2 = 0.36. \n\nWe can now use the equation for calculating the proportion of carriers of a recessive allele in a

In [111]:
extraction_chain = LLMChain(llm=chat_llm, prompt=extraction_prompt, verbose=True)
final_answer = extraction_chain.run({"solution": resolved_result, "question": question})
final_answer



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
You are an extraction tool designed to check which answer a user has given.

The solution given is: 1) After careful consideration, the correct answer choice is (B) which suggests the expected proportion to be 19/400. 
2) To improve on this answer, we can further explain why this answer is correct. The Hardy-Weinberg principle allows us to predict the frequency of alleles in a population, assuming that the population is in equilibrium and certain conditions are met. In this case, we are looking at the frequency of the b allele, which is recessive. The equation for calculating the frequency of a recessive allele in a population is q^2, where q is the frequency of the recessive allele and q^2 is the proportion of homozygous recessive individuals in the population. 

We are given that the frequency of the B allele is 0.4, so the frequency of the b allele is 0.6 (since p + q = 1). Therefore, the proportion of

'D'

### Whole flow in one Go

*Note: The evaluation is based on the [MMMLU](https://github.com/hendrycks/test) bench mark. In order for this to work you need to download and extract the eval at the root of this directory.*

In [198]:
def smart_gpt(question, num_options=3, GPT4=False, verbose=False):
  print("💭 {}\n\n".format(question))
  question_model = "gpt-4" if GPT4 else "gpt-3.5-turbo"
  print("Using model: {}".format(question_model))
  gpt_4_chat_llm = ChatOpenAI(temperature=0.9, model=question_model)
  gpt_3_5_chat_llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo")
  
  options = []
  option_chain = LLMChain(llm=gpt_4_chat_llm, prompt=chain_of_though_prompt, verbose=verbose)
  for i in range(num_options):
    option = option_chain.run({"question": question})
    print("🙋 Option {}:\n\n{}\n".format(i, option))
    options.append(option)
    

  reflextion_chain = LLMChain(llm=gpt_4_chat_llm, prompt=reflextion_prompt, verbose=verbose)
  reflextion_result = reflextion_chain.run({"options": options})
  print("===> 🤔 Reflextion Results: \n\n{}\n".format(reflextion_result))
  
  resolver_chain = LLMChain(llm=gpt_4_chat_llm, prompt=resolver_prompt, verbose=verbose)
  resolved_result = resolver_chain.run({"researcher_analysis": reflextion_result})
  print("===>✅ Resolved Results: \n\n{}\n".format(resolved_result))
  
  extraction_chain = LLMChain(llm=chat_llm, prompt=extraction_prompt, verbose=verbose)
  final_answer = extraction_chain.run({"question": question, "solution": resolved_result})
  
  return(resolved_result, final_answer)

In [199]:
# resolved_result, res = smart_gpt(question1)
# print(resolved_result, res)

In [206]:
import pandas as pd

def create_df():
  cols = ["question", "A", "B", "C", "D",  "solution"]
  df = pd.read_csv("./data/test/formal_logic_test.csv", header=None)
  df.columns = cols
  df['llm_question'] = df.apply(lambda row: "{}\n\nA: {}\nB: {}\nC: {}\nD: {}\n".format(row["question"], row["A"], row["B"], row["C"], row["D"]), axis=1)
  df['llm_answer'] = pd.Series(dtype='str')
  df['correct'] = pd.Series(dtype='bool')
  return df

df = create_df()
print("Total size of question bank is: {}".format(len(df)))
df.head()

Total size of question bank is: 126


Unnamed: 0,question,A,B,C,D,solution,llm_question,llm_answer,correct
0,Identify the conclusion of the following argum...,It is hard not to verify in our peers the same...,"The arrogance of our consciousness, which in g...","Because of this, it is difficult to convince p...",It is difficult to convince people of the unco...,D,Identify the conclusion of the following argum...,,
1,Select the best translation into predicate log...,Tdc,Tcd,Tcc,dTc,A,Select the best translation into predicate log...,,
2,Select the best English interpretation of the...,Some large houses are bigger than some apartme...,Some houses are bigger than all large apartments.,Some large apartments are bigger than some hou...,Some houses are bigger thatn some large apartm...,C,Select the best English interpretation of the...,,
3,Construct a complete truth table for the follo...,Valid,Invalid. Counterexample when G and H are true,Invalid. Counterexample when G is true and H i...,Invalid. Counterexample when H is true and G i...,A,Construct a complete truth table for the follo...,,
4,Use the following key to translate the given ...,If it's not the case that both Izzy plays Mine...,If it's both not the case that Izzy plays Mine...,Izzy playing Minecraft and Ashleigh stopping b...,Russell's making dinner is a sufficient condit...,B,Use the following key to translate the given ...,,


In [207]:
NUM_QUESTIONS = 25
for index, row in df[0:NUM_QUESTIONS].iterrows():
  print("Running question: {}".format(index))
  question = row["llm_question"]

  resolved_result, llm_answer = smart_gpt(question, 3)
  
  df.at[index, "llm_answer"] = llm_answer
  df.at[index, "correct"] = row["solution"] == llm_answer

Running question: 0
💭 Identify the conclusion of the following argument. It is hard not to verify in our peers the same weakened intelligence due to emotions that we observe in our everyday patients. The arrogance of our consciousness, which in general, belongs to the strongest defense mechanisms, blocks the unconscious complexes. Because of this, it is difficult to convince people of the unconscious, and in turn to teach them what their conscious knowledge contradicts. (Sigmund Freud, The Origin and Development of Psychoanalysis)

A: It is hard not to verify in our peers the same weakened intelligence due to emotions that we observe in our everyday patients.
B: The arrogance of our consciousness, which in general, belongs to the strongest defense mechanisms, blocks the unconscious complexes.
C: Because of this, it is difficult to convince people of the unconscious, and in turn to teach them what their conscious knowledge contradicts.
D: It is difficult to convince people of the uncons

Running question: 2
💭  Select the best English interpretation of the given proposition, using the following translation key: Ax: x is an apartment Hx: x is a house Lx: x is large Bxy: x is bigger than y (∃x)[(Ax • Lx) • (∃y)(Hy • Bxy)]

A: Some large houses are bigger than some apartments.
B: Some houses are bigger than all large apartments.
C: Some large apartments are bigger than some houses.
D: Some houses are bigger thatn some large apartments.



Using model: gpt-3.5-turbo
🙋 Option 0:

The proposition can be broken down as follows:

(∃x)[(Ax • Lx) • (∃y)(Hy • Bxy)]
- (∃x) means "there exists at least one x"
- (Ax • Lx) means "x is an apartment and x is large"
- (∃y)(Hy • Bxy) means "there exists at least one y such that y is a house and x is bigger than y"

Putting it all together, the complete interpretation is: Some apartments are large and there is at least one house that is smaller than at least one of those large apartments.

Therefore, the correct answer is C: Some large apa

Running question: 4
💭  Use the following key to translate the given formula of PL to natural, English sentences.
A: Marina reads a Percy Jackson book.
B: Izzy plays Minecraft.
C: Emily stops working.
D: Russell makes dinner.
E: Ashleigh stops by.
(~B • E) ⊃ D

A: If it's not the case that both Izzy plays Minecraft and Ashleigh stops by then Russell makes dinner.
B: If it's both not the case that Izzy plays Minecraft and Ashleigh stops by then Russell makes dinner.
C: Izzy playing Minecraft and Ashleigh stopping by entails that Russell makes dinner.
D: Russell's making dinner is a sufficient condition for Izzy not playing Minecraft and Ashleigh stopping by.



Using model: gpt-3.5-turbo
🙋 Option 0:

The formula can be translated as "If it is not the case that Izzy plays Minecraft and Ashleigh stops by, then Russell makes dinner." This means that if both events of Izzy playing Minecraft and Ashleigh stopping by do not occur, then Russell will make dinner.

🙋 Option 1:

The formula ~B • E

🙋 Option 1:

First, we need to fill out the truth table for the argument:

| K | L | ~K ≡ L | K · L | K · ~L | (K · L) ∨ (K · ~L) | ~K ≡ L / (K · L) ∨ (K · ~L) |
|---|---|---------|-------|--------|-----------------------|---------------------------|
| T | T |    F    |   T   |    F   |           T           |             T             |
| T | F |    T    |   F   |    T   |           T           |             F             |
| F | T |    T    |   F   |    F   |           F           |             T             |
| F | F |    F    |   F   |    F   |           F           |             T             |

Now, we need to determine whether the argument is valid or invalid based on the truth table. An argument is valid if the conclusion is always true when the premises are true. In this case, the conclusion is "(~K ≡ L) / (K · L) ∨ (K · ~L)", and we can see from the truth table that the conclusion is always true when the premises are true. Therefore, the argument is valid and the answer is A.

Running question: 8
💭  Which of the following propositions is an immediate (one-step) consequence in PL of the given premises?
~E ⊃ ~F
G ⊃ F
H ∨ ~E
H ⊃ I
~I

A: E ⊃ F
B: F ⊃ G
C: H ⊃ ~E
D: ~H



Using model: gpt-3.5-turbo
🙋 Option 0:

First, we can use modus tollens (MT) with the first two premises to derive G ⊃ ~E:

1. ~E ⊃ ~F (given)
2. G ⊃ F (given)
3. ∴ G ⊃ ~E (MT from 1 and 2)

Next, we can apply disjunctive syllogism (DS) with premises 3 and the derived G ⊃ ~E:

4. H ∨ ~E (given)
5. G ⊃ ~E (derived in step 3)
6. ∴ H ⊃ ~E (DS from 4 and 5)

Finally, we can use modus ponens (MP) with premises 6 and 4 (using the negation of the antecedent rule) to get ~H:

7. H ⊃ ~E (derived in step 6)
8. ~E ⊃ ~H (contrapositive of 7)
9. ∴ ~H (MP from 8 and 4)

Therefore, the immediate consequence of the given premises is option D: ~H.

🙋 Option 1:

First, we can use the second premise to conclude that if G is true, then F is true:

G ⊃ F

Next, we can use the first premise to conclude that if ~E is

===> 🤔 Reflextion Results: 

Option A: (∀x)[Kx ⊃ (∀y)(Py ⊃ Lxy)]
- Flaw: None.
- Faulty Logic: None.

Option B: (∀x)[Kx ⊃ (∀y)(Lxy ⊃ Py)]
- Flaw: Translates "luckier than" as "x is luckier than y" instead of "y is luckier than x".
- Faulty Logic: Implies that if y is a pauper, then x is luckier than y, which is not equivalent to the original statement.

Option C: (∀x)(∀y)[(Kx • Py) ⊃ Lyx]
- Flaw: Translates "all kings" and "all paupers" separately instead of combining them into one sentence.
- Faulty Logic: Implies that if x is a king and y is a pauper, then y is luckier than x, which is not equivalent to the original statement.

Option D: (∀x)[(Kx • (∀y)(Py)) ⊃ Lxy]
- Flaw: Translates "all paupers" as "all y are paupers" instead of "if y is a pauper".
- Faulty Logic: Implies that if x is a king and all y are paupers, then x is luckier than y, which is not equivalent to the original statement.

===>✅ Resolved Results: 

1) The option with no flaws or faulty logic is Option A: (∀x)[Kx ⊃

Running question: 12
💭 Which of the following propositions is an immediate (one-step) consequence in PL of the given premises?
E ⊃ ~F
~F ⊃ G
~G

A: E ⊃ F
B: F ⊃ G
C: G ⊃ E
D: ~~F



Using model: gpt-3.5-turbo
🙋 Option 0:

First, we can use Modus Tollens on the second premise to infer ~G → ~(~F), which simplifies to ~G → F. 

Then, we can use Modus Ponens with the first premise and our previous result to infer E → ~(~F), which simplifies to E → F. 

Therefore, the immediate consequence in PL of the given premises is option A: E ⊃ F.

🙋 Option 1:

To find an immediate consequence, we want to apply a single rule of inference to the premises. One useful rule of inference in this case is called modus tollens, which says that from A ⊃ B and ~B, we can infer ~A. Here, we see that we have the first two statements of that form:

- E ⊃ ~F
- ~F ⊃ G
- ~G

We can use modus tollens to conclude ~E from the first premise and ~G:

- ~(E ⊃ ~F) (by modus tollens from 1 and ~F)
- E ∧ F (by DeMorgan's from

Running question: 14
💭 Use indirect truth tables to determine whether the following argument is valid. If the argument is invalid, choose an option which presents a counterexample. (There may be other counterexamples as well.)
H ≡ (~I ∨ J)
H ∨ ~J / ~I

A: Valid
B: Invalid. Counterexample when H, I, and J are false
C: Invalid. Counterexample when H and I are true and J is false
D: Invalid. Counterexample when I is true and H and J are false



Using model: gpt-3.5-turbo
🙋 Option 0:

To use indirect truth tables, we negate the conclusion and use the premises to see if we obtain a contradiction. 

Premises:
1. H ≡ (~I ∨ J)
2. H ∨ ~J 

Conclusion:
~(~I)

Negating the conclusion, we get:

I

Now we create a truth table using the premises and the negated conclusion:

| H | I | J | ~J | ~I ∨ J | H ≡ (~I ∨ J) | H ∨ ~J | ~(~I) |
|---|---|---|----|--------|-----------------|----------|--------|
| T | T | T |  F |   T    |        T        |     T    |   F   |
| T | T | F |  T |   T    |        T 

🙋 Option 2:

The correct translation is A: (∃x){Fx • (∀y)[Dy ⊃ (∃z)(Vz • Rxyz)]}. 

Explanation: 
- (∃x){...}: There exists a firefighter x who satisfies the following conditions...
- Fx: x is a firefighter.
- (∀y)[Dy ⊃ (...)] : For any danger y, if y is present, then... 
- (∃z)(Vz • Rxyz): There exists a victim z who x rescues from y. 

In other words, there is at least one firefighter who is capable of rescuing victims from any potential danger.

===> 🤔 Reflextion Results: 

Option A: 

Flaw: This option includes the universal quantifier (∀y), which means the statement is claiming that all victims are rescued by firefighters, which is not what the original statement was conveying. 

Option B: 

Faulty Logic: This option translates the original statement as "There exists at least one firefighter who rescues at least one victim from at least one danger," which does not capture the fact that all victims in danger are rescued by firefighters. 

Overall, both options have issues in accura

🙋 Option 2:

First, we need to identify the predicates and variables in the given statement:

- Predicates: "is a planet" (Px), "is smaller than" (Sxy)
- Variable: Jupiter (j)

Next, we can translate the statement using quantifiers and connectives:

- "All planets": ∀x(Px)
- "are smaller than Jupiter": Sxj

Putting it all together, the best translation into predicate logic is:

D: (∀x)(Px ⊃ Sxj)

===> 🤔 Reflextion Results: 

Option A: (∃x)(Px • Sxj)
Flaws/faulty logic: 
- This statement says there exists a planet that is smaller than Jupiter, but the original statement says that all planets are smaller, so this option is incorrect.

Option B: (∀x)(Px • Sxj)
Flaws/faulty logic: 
- This statement says every planet is smaller than Jupiter, which matches the original statement, so this option is a possible answer.

Option C: (∀x)(Px ⊃ Sjx)
Flaws/faulty logic: 
- This statement says that if something is a planet, then it is smaller than Jupiter, which is not the same as saying all planets a

===> 🤔 Reflextion Results: 

Option A:
Flaws: 
- It doesn't capture the whole meaning of the original sentence since it leaves open the possibility that some artifacts might be people.
- The use of the negation symbol "~" is not enough to fully express the relationship between artifacts and people. 

Faulty Logic: 
- It assumes that negating a statement is enough to capture its full meaning. 

Option B:
Flaws: 
- None that have been identified. 

Faulty Logic: 
- None that have been identified. 

Option C:
Flaws: 
- It translates the opposite meaning of the original sentence. 
- It implies that there could be some other things that are artifacts (just not people), which is not what the original sentence says.

Faulty Logic: 
- It assumes that negating a statement is enough to capture its full meaning. 

Option D: 
Flaws: 
- It implies a stronger statement than the original sentence, namely that every single artifact in existence is not a person. 
- It goes beyond the scope of the origi

Running question: 21
💭 Select the best translation into predicate logic. Marco moves from Italy to Spain. (i: Italy; m: Marco; s: Spain; Mxyz: x moves to y from z)

A: Msim
B: Mmis
C: Mims
D: Mmsi



Using model: gpt-3.5-turbo
🙋 Option 0:

First, we need to identify the predicates and the arguments in the sentence. We have two predicates: "moves" and "from". The arguments are "Marco", "Italy", and "Spain". 

Next, we need to determine the correct order of the arguments for the "moves" predicate. According to the Mxyz notation, the first argument should be the subject (who is moving), the second argument should be the destination (where they are moving to), and the third argument should be the origin (where they are moving from). 

Using this information, we can construct the following translation: 

D: Mmsi (Marco moves to Spain from Italy) 

This translation correctly represents the movement of Marco from Italy to Spain. Therefore, the answer is D.

🙋 Option 1:

The best translation i

===>✅ Resolved Results: 

Step 1: Clarify the original proposition
The original proposition is that no apartment is bigger than any large house.

Step 2: Identify flaws in the given answer options
Option A is flawed because it does not have the same meaning as the original proposition, and its use of double negatives makes the interpretation unclear. Option B is the correct interpretation of the original proposition. Option C is flawed because it changes the meaning of the proposition by using "smaller" instead of "not bigger". Option D is completely opposite to the original proposition, and its use of "all" in both cases makes the interpretation too extreme and incorrect.

Step 3: Improve the answer
To improve the answer, we can restate option B in a clearer and more concise way:
No apartment is larger than any large house.

Step 4: Print the improved answer in full
The improved answer is: No apartment is larger than any large house.

Running question: 23
💭  Construct a complete truth

===>✅ Resolved Results: 

1) Finding which of the options accurately represents the relationship between Pablo going to the store and his brother driving him and paying for gas:

Option A: D ⊃ (G ≡ P) - Flawed logic due to incorrect use of the conditional operator and order of connectives.

Option B: G ⊃ (D ≡ P) - Flawed logic due to incorrect use of the conditional operator and order of connectives.

Option C: (D • G) ⊃ P - Flawed logic as it only accounts for both conditions being necessary, not either condition being sufficient.

Option D: P ≡ (D • G) - Accurately represents the biconditional relationship between Pablo going to the store and his brother driving him and paying for gas.

2) Improving the answer:
Option D is the best answer, as it accurately represents the relationship between the variables. However, it could be improved by using parentheses to clarify the order of operations and avoid any potential confusion:

P ≡ (D ∧ G)

This revised formula clearly shows that the l

In [208]:
print("Smart GPT got {}% correct.".format(len(df[df.correct == True]) / NUM_QUESTIONS * 100))

Smart GPT got 44.0% correct.


In [209]:
df.head()

Unnamed: 0,question,A,B,C,D,solution,llm_question,llm_answer,correct
0,Identify the conclusion of the following argum...,It is hard not to verify in our peers the same...,"The arrogance of our consciousness, which in g...","Because of this, it is difficult to convince p...",It is difficult to convince people of the unco...,D,Identify the conclusion of the following argum...,D,True
1,Select the best translation into predicate log...,Tdc,Tcd,Tcc,dTc,A,Select the best translation into predicate log...,B,False
2,Select the best English interpretation of the...,Some large houses are bigger than some apartme...,Some houses are bigger than all large apartments.,Some large apartments are bigger than some hou...,Some houses are bigger thatn some large apartm...,C,Select the best English interpretation of the...,C,True
3,Construct a complete truth table for the follo...,Valid,Invalid. Counterexample when G and H are true,Invalid. Counterexample when G is true and H i...,Invalid. Counterexample when H is true and G i...,A,Construct a complete truth table for the follo...,A,True
4,Use the following key to translate the given ...,If it's not the case that both Izzy plays Mine...,If it's both not the case that Izzy plays Mine...,Izzy playing Minecraft and Ashleigh stopping b...,Russell's making dinner is a sufficient condit...,B,Use the following key to translate the given ...,A,False
