In [1]:
from factuality_evaluator_rs_sf import UnilateralFactualityEvaluator, BilateralFactualityEvaluator
from IPython.display import display, Markdown

In [2]:
MODEL = "gpt-4o"

In [3]:
EXAMPLES = [
    {
        "problem": "Who was the first man to walk on the Moon?",
        "answer": "Neil Armstrong"
    }, 
    {
        "problem": "Who was the first man to walk on the Moon?",
        "answer": "Pete Conrad"
    }, 
    {
        "problem": "Are penguins birds?",
        "answer": "No"
    },
    {
        "problem": "Who was the frobnoz that restified the branks?",
        "answer": "Rutherford B. Hayes"
    },
    {
        "problem": "When was America discovered?",
        "answer": "1492"
    },
    {
        "problem": "Did Lee Harvey Oswald act alone in the assassination of John Kennedy?",
        "answer": "Yes"
    },
    {
        "problem": "Is the Holy Trinity one entity or three entities?",
        "answer": "Three"
    },
]

In [4]:
def unilateral_evaluation(example):
    unilateral_evaluator = UnilateralFactualityEvaluator(MODEL)
    evaluation = unilateral_evaluator.invoke(example)
    display(Markdown(f"## Unilateral"))
    display(Markdown(f"### Reasoning"))
    display(Markdown(evaluation["reasoning"][0]))
    display(Markdown(f"### Evaluation: {evaluation['evaluation']}"))

In [5]:
def bilateral_evaluation(example):
    bilateral_evaluator = BilateralFactualityEvaluator(MODEL)
    evaluation = bilateral_evaluator.invoke(example)
    display(Markdown(f"## Bilateral"))
    display(Markdown(f"### Verification"))
    display(Markdown(evaluation["verification"][0]))
    display(Markdown(f"### Refutation"))
    display(Markdown(evaluation["refutation"][0]))
    display(Markdown(f"### Evaluation: {evaluation['evaluation']}"))

In [6]:
for example in EXAMPLES:
    display(Markdown(f'# {example["problem"]} {example["answer"]}'))
    unilateral_evaluation(example)
    bilateral_evaluation(example)

# Who was the first man to walk on the Moon? Neil Armstrong

## Unilateral

### Reasoning

STEP 1: Extract the specific entity, term, or value that directly answers the question:
Question: Who was the first man to walk on the Moon?
Response: Neil Armstrong

Extracted Answer: Neil Armstrong

STEP 2: Evaluate if the extracted answer is definitively TRUE or FALSE:

1. Verify the extracted answer against known facts:
   - Neil Armstrong is widely recognized and historically documented as the first man to walk on the Moon during the Apollo 11 mission on July 20, 1969.

2. Check for any contradictory information:
   - There is no credible contradictory information regarding Neil Armstrong being the first man to walk on the Moon.

3. Consider question context and any implied terms:
   - The question is straightforward, asking for the identity of the first man to walk on the Moon, which aligns with the historical fact that it was Neil Armstrong.

4. Determine if the extracted answer correctly responds to what was asked:
   - The extracted answer "Neil Armstrong" directly and correctly answers the question.

Reasoning:
The extracted answer "Neil Armstrong" is consistent with historical records and widely accepted facts. There is no credible evidence to dispute this fact, and it directly answers the question asked.

Conclude: TRUE

### Evaluation: t

## Bilateral

### Verification

STEP 1: Extract the specific entity, term, or value that directly answers the question:
Question: Who was the first man to walk on the Moon?
LLM Response: Neil Armstrong

Extracted Answer: Neil Armstrong

STEP 2: Evaluate if the extracted answer is definitively TRUE:
1. Verify the extracted answer against known facts: Neil Armstrong is historically recognized as the first man to walk on the Moon during the Apollo 11 mission on July 20, 1969.
2. Consider temporal/contextual factors: The event occurred in 1969, and there have been no changes to this historical fact.
3. Determine if it correctly responds to what the question is asking: The question asks for the first man to walk on the Moon, and Neil Armstrong is the correct answer.

Reasoning: The extracted answer is consistent with historical records and accurately answers the question.

Conclude EXACTLY with either:
"VERIFIED"

### Refutation

STEP 1: Extract the specific entity, term, or value that directly answers the question:
Extracted Answer: Neil Armstrong

STEP 2: Evaluate if the extracted answer is definitively FALSE:
1. Check if the extracted answer contradicts established facts: Neil Armstrong is historically recognized as the first man to walk on the Moon.
2. Consider if temporal/contextual factors make this answer incorrect: There are no temporal or contextual factors that make this answer incorrect.
3. Determine if it fails to correctly answer what the question is asking: The answer correctly identifies the first man to walk on the Moon.

Reasoning: The extracted answer aligns with established historical facts and correctly answers the question.

Conclude EXACTLY with either:
"CANNOT REFUTE"

### Evaluation: t

# Who was the first man to walk on the Moon? Pete Conrad

## Unilateral

### Reasoning

Extracted Answer: Pete Conrad

Reasoning:
1. The question asks for the first man to walk on the Moon.
2. Known historical fact: Neil Armstrong was the first man to walk on the Moon on July 20, 1969, during the Apollo 11 mission.
3. Pete Conrad was the third person to walk on the Moon, during the Apollo 12 mission.
4. The extracted answer "Pete Conrad" contradicts the known fact that Neil Armstrong was the first.

Conclusion: FALSE

### Evaluation: f

## Bilateral

### Verification

Extracted Answer: Pete Conrad

Reasoning: The first man to walk on the Moon was Neil Armstrong, not Pete Conrad. Pete Conrad was the third person to walk on the Moon during the Apollo 12 mission.

Conclude: CANNOT VERIFY

### Refutation

STEP 1: Extract the specific entity, term, or value that directly answers the question:
Extracted Answer: Pete Conrad

STEP 2: Evaluate if the extracted answer is definitively FALSE:
1. The established fact is that Neil Armstrong was the first man to walk on the Moon, not Pete Conrad.
2. There are no temporal or contextual factors that would make Pete Conrad the first man to walk on the Moon.
3. The extracted answer fails to correctly answer the question, as it does not identify Neil Armstrong.

Reasoning: The extracted answer contradicts the established historical fact that Neil Armstrong was the first man to walk on the Moon.

Conclude EXACTLY with either:
"REFUTED"

### Evaluation: f

# Are penguins birds? No

## Unilateral

### Reasoning

STEP 1: Extract the specific entity, term, or value that directly answers the question:
Question: Are penguins birds?
Response: No

Extracted Answer: No

STEP 2: Evaluate if the extracted answer is definitively TRUE or FALSE:

1. Verify the extracted answer against known facts:
   - Penguins are indeed classified as birds. They belong to the family Spheniscidae and are characterized by their flightless nature, adapted for life in the water.

2. Check for any contradictory information:
   - There is no credible scientific information that contradicts the classification of penguins as birds.

3. Consider question context and any implied terms:
   - The question is straightforward, asking if penguins are birds, which they are.

4. Determine if the extracted answer correctly responds to what was asked:
   - The extracted answer "No" is incorrect because it contradicts the established fact that penguins are birds.

Reasoning: The extracted answer "No" is incorrect because penguins are scientifically classified as birds. The response does not align with the factual classification of penguins.

Conclusion: FALSE

### Evaluation: f

## Bilateral

### Verification

Extracted Answer: No

Reasoning: Penguins are indeed birds. They are flightless birds that are adapted to life in the water. Therefore, the extracted answer "No" is incorrect.

Conclude: CANNOT VERIFY

### Refutation

STEP 1: Extract the specific entity, term, or value that directly answers the question:
Question: Are penguins birds?
LLM Response: No

Extracted Answer: No

STEP 2: Evaluate if the extracted answer is definitively FALSE:
1. Check if the extracted answer contradicts established facts: Penguins are indeed birds, as they belong to the class Aves.
2. Consider if temporal/contextual factors make this answer incorrect: There are no temporal or contextual factors that would change the classification of penguins as birds.
3. Determine if it fails to correctly answer what the question is asking: The question asks if penguins are birds, and the correct answer is "Yes."

Reasoning: The extracted answer "No" contradicts the established fact that penguins are birds.

Conclude EXACTLY with either:
"REFUTED" - Only if the extracted answer is definitively incorrect

REFUTED

### Evaluation: f

# Who was the frobnoz that restified the branks? Rutherford B. Hayes

## Unilateral

### Reasoning

STEP 1: Extract the specific entity, term, or value that directly answers the question:
Extracted Answer: Rutherford B. Hayes

STEP 2: Evaluate if the extracted answer is definitively TRUE or FALSE:

1. Verify the extracted answer against known facts:
   - "Frobnoz" and "restified the branks" are not known historical terms or events. They appear to be fictional or nonsensical terms.
   - Rutherford B. Hayes was the 19th President of the United States, serving from 1877 to 1881.

2. Check for any contradictory information:
   - There is no historical record or evidence linking Rutherford B. Hayes to the terms "frobnoz" or "restified the branks."

3. Consider question context and any implied terms:
   - The question uses terms that do not correspond to any known historical or factual context, making it impossible to verify the answer.

4. Determine if the extracted answer correctly responds to what was asked:
   - Since the terms in the question are not verifiable or recognized, it is impossible to determine if Rutherford B. Hayes is the correct answer.

Reasoning:
The extracted answer, Rutherford B. Hayes, cannot be definitively verified as correct or incorrect because the terms "frobnoz" and "restified the branks" do not correspond to any known facts or historical events. Without a factual basis for these terms, the answer cannot be evaluated as true.

Conclusion: FALSE

### Evaluation: f

## Bilateral

### Verification

Extracted Answer: Rutherford B. Hayes

Reasoning: The terms "frobnoz" and "restified the branks" are not recognized or standard terms in historical or factual contexts. Therefore, it is not possible to verify if Rutherford B. Hayes is associated with these terms or actions.

Conclude: CANNOT VERIFY

### Refutation

STEP 1: Extract the specific entity, term, or value that directly answers the question:  
Extracted Answer: Rutherford B. Hayes

STEP 2: Evaluate if the extracted answer is definitively FALSE:  
1. The terms "frobnoz" and "restified the branks" do not correspond to any established historical facts or known events.  
2. There is no historical or contextual evidence linking Rutherford B. Hayes to these terms.  
3. The question appears to be nonsensical or fictional, and thus cannot be accurately answered with real historical figures.

Reasoning: The extracted answer does not correspond to any known facts or events, and the question itself is not based on factual or historical information.

Conclude: REFUTED

### Evaluation: f

# When was America discovered? 1492

## Unilateral

### Reasoning

STEP 1: Extract the specific entity, term, or value that directly answers the question:
Question: When was America discovered?
Response: 1492

Extracted Answer: 1492

STEP 2: Evaluate if the extracted answer is definitively TRUE or FALSE:

1. Verify the extracted answer against known facts:
   - The year 1492 is widely recognized as the year when Christopher Columbus, an Italian explorer sailing under the Spanish flag, made his first voyage across the Atlantic Ocean and reached the islands in the Caribbean, which are part of the Americas.

2. Check for any contradictory information:
   - There is no significant contradictory information regarding the year 1492 as the date of Columbus's first voyage to the Americas.

3. Consider question context and any implied terms:
   - The question "When was America discovered?" is often interpreted in the context of European exploration and the arrival of Columbus in the New World. While indigenous peoples had been living in the Americas for thousands of years prior, the question typically refers to European discovery.

4. Determine if the extracted answer correctly responds to what was asked:
   - Given the common historical context of the question, the year 1492 is the correct response to when America was "discovered" by Europeans.

Reasoning:
The extracted answer "1492" aligns with the historical event of Columbus's first voyage to the Americas, which is commonly referred to as the "discovery" of America by Europeans. There is no significant dispute about this date in the context of European exploration history.

Conclusion: TRUE

### Evaluation: t

## Bilateral

### Verification

STEP 1: Extract the specific entity, term, or value that directly answers the question:
Extracted Answer: 1492

STEP 2: Evaluate if the extracted answer is definitively TRUE:
1. Verify the extracted answer against known facts: Christopher Columbus is credited with discovering America in 1492.
2. Consider temporal/contextual factors: The question likely refers to the widely recognized event of Columbus's arrival in the Americas.
3. Determine if it correctly responds to what the question is asking: The question asks for the year America was discovered, which is commonly attributed to Columbus's 1492 voyage.

Reasoning: The extracted answer "1492" is consistent with the historical event of Columbus's arrival in the Americas, which is often referred to as the "discovery" of America in a Eurocentric context.

Conclude EXACTLY with either:
"VERIFIED"

### Refutation

STEP 1: Extract the specific entity, term, or value that directly answers the question:  
Extracted Answer: 1492  

STEP 2: Evaluate if the extracted answer is definitively FALSE:  
1. Check if the extracted answer contradicts established facts: The year 1492 is widely recognized as the year Christopher Columbus arrived in the Americas, marking a significant event in the history of European exploration.  
2. Consider if temporal/contextual factors make this answer incorrect: The question is about when "America" was discovered, which is often associated with Columbus's 1492 voyage. However, it is important to note that indigenous peoples had been living in the Americas for thousands of years prior to this.  
3. Determine if it fails to correctly answer what the question is asking: The question is somewhat ambiguous, as "discovered" can be interpreted in different ways. However, in the context of European exploration, 1492 is a commonly accepted answer.  

Reasoning: While the term "discovered" is problematic and Eurocentric, the extracted answer of 1492 is not definitively false in the context of European exploration history.  

Conclude: CANNOT REFUTE

### Evaluation: t

# Did Lee Harvey Oswald act alone in the assassination of John Kennedy? Yes

## Unilateral

### Reasoning

STEP 1: Extract the specific entity, term, or value that directly answers the question:
Question: Did Lee Harvey Oswald act alone in the assassination of John Kennedy?
Response: Yes

Extracted Answer: Yes

STEP 2: Evaluate if the extracted answer is definitively TRUE or FALSE:

1. Verify the extracted answer against known facts:
   - The official conclusion of the Warren Commission, which investigated the assassination of John F. Kennedy, was that Lee Harvey Oswald acted alone in the assassination.
   - However, there have been numerous conspiracy theories and alternative investigations suggesting the possibility of additional conspirators or involvement by other parties.

2. Check for any contradictory information:
   - The House Select Committee on Assassinations (HSCA) later concluded in 1979 that Kennedy was probably assassinated as a result of a conspiracy, although it did not identify any specific individuals or groups involved.
   - The lack of definitive evidence supporting these conspiracy theories means that the official stance remains that Oswald acted alone, but the existence of these theories introduces doubt.

3. Consider question context and any implied terms:
   - The question asks if Oswald acted alone, which implies a definitive answer. The response "Yes" aligns with the Warren Commission's conclusion but does not account for the ongoing debate and lack of absolute consensus.

4. Determine if the extracted answer correctly responds to what was asked:
   - While the official conclusion supports the answer "Yes," the presence of significant doubt and alternative theories means that the answer cannot be definitively stated as true without acknowledging the controversy.

Reasoning:
The extracted answer "Yes" aligns with the official conclusion of the Warren Commission, but the existence of alternative theories and the HSCA's findings introduce significant doubt. Therefore, the answer cannot be definitively stated as true.

Conclusion: FALSE

### Evaluation: f

## Bilateral

### Verification

STEP 1: Extract the specific entity, term, or value that directly answers the question:
Extracted Answer: Yes

STEP 2: Evaluate if the extracted answer is definitively TRUE:
1. Verify the extracted answer against known facts: The Warren Commission concluded that Lee Harvey Oswald acted alone in the assassination of John F. Kennedy. However, there have been numerous conspiracy theories and investigations that suggest otherwise, and no definitive consensus has been reached that is universally accepted.
2. Consider temporal/contextual factors: The question of whether Oswald acted alone has been debated for decades, and while the official government position is that he did, public opinion and alternative theories persist.
3. Determine if it correctly responds to what the question is asking: The extracted answer "Yes" aligns with the official conclusion of the Warren Commission, but it does not account for the ongoing debate and lack of universal agreement.

Reasoning: The extracted answer reflects the official conclusion but does not account for the complexity and controversy surrounding the topic.

Conclude EXACTLY with either:
"CANNOT VERIFY"

### Refutation

STEP 1: Extract the specific entity, term, or value that directly answers the question:  
Extracted Answer: Yes

STEP 2: Evaluate if the extracted answer is definitively FALSE:  
1. The question of whether Lee Harvey Oswald acted alone in the assassination of John F. Kennedy is a subject of debate and conspiracy theories. The official conclusion by the Warren Commission was that Oswald acted alone, but this has been contested by various other investigations and theories.  
2. There is no definitive evidence that universally disproves the conclusion that Oswald acted alone, although there are alternative theories.  
3. The extracted answer "Yes" aligns with the official conclusion of the Warren Commission, which is one of the established narratives.

Reasoning: The extracted answer "Yes" is consistent with the official conclusion of the Warren Commission, and while there are alternative theories, there is no definitive evidence that refutes this conclusion.

Conclude: CANNOT REFUTE

### Evaluation: n

# Is the Holy Trinity one entity or three entities? Three

## Unilateral

### Reasoning

Extracted Answer: Three

Reasoning:
1. The concept of the Holy Trinity in Christian theology refers to one God in three distinct persons: the Father, the Son (Jesus Christ), and the Holy Spirit. This is a fundamental doctrine in most branches of Christianity.
2. The question asks whether the Holy Trinity is one entity or three entities. The extracted answer "Three" suggests that the Holy Trinity consists of three entities.
3. While the Holy Trinity is described as three distinct persons, it is also considered one entity in essence and being. Therefore, the extracted answer "Three" does not fully capture the theological understanding of the Trinity as one God in three persons.

Conclusion:
FALSE - The extracted answer "Three" is not definitively correct because it does not account for the theological concept of the Trinity being one entity in essence, despite being three distinct persons.

### Evaluation: f

## Bilateral

### Verification

Extracted Answer: Three

Reasoning: In Christian theology, the Holy Trinity is understood as one God in three persons: the Father, the Son (Jesus Christ), and the Holy Spirit. These three persons are distinct yet coexist in unity, and are co-equal, co-eternal, and consubstantial. Therefore, the concept of the Holy Trinity is often described as three entities in one.

Conclude: VERIFIED

### Refutation

STEP 1: Extract the specific entity, term, or value that directly answers the question:  
Extracted Answer: Three

STEP 2: Evaluate if the extracted answer is definitively FALSE:  
1. The doctrine of the Holy Trinity in Christianity traditionally holds that God is one entity in three persons: the Father, the Son, and the Holy Spirit.  
2. There are no temporal or contextual factors that change this understanding within mainstream Christian theology.  
3. The question asks whether the Holy Trinity is one entity or three entities, and the extracted answer "Three" suggests three separate entities, which contradicts the traditional Christian doctrine of one entity in three persons.

Reasoning: The extracted answer "Three" suggests a misunderstanding of the traditional Christian doctrine of the Trinity, which is one entity in three persons, not three separate entities.

Conclude EXACTLY with either:  
"REFUTED"

### Evaluation: b