# Politifact Evaluation Code

Assumes the dataset is in `politifact_factcheck_data.jsonl` relative to this notebook. 

NOTE: file downloads as `.json`, not `.jsonl`, although the format is in JSON lines.



Politifact dataset has 6 `verdict` labels:
* true
* mostly-true
* half-true
* false
* mostly-false
* pants-fire

We will map Politifact labels as follows:
1. `true, mostly-true` to `true`
1. `false, mostly-false, pants-fire` to `false`
1. `half-true` to `mixed`

We will map the NewsAgent label `unknown` to `mixed`, and keep `true` and `false` as-is

### Checking dataset

1. `statement` value must be a string
1. `verdict` must be one of the 6 Politifact label strings

In [1]:
import pandas as pd
from sklearn.metrics import f1_score

dataset = pd.read_json('datasets/politifact_factcheck_data.jsonl', lines=True)
print(dataset.head(3))
print(f"Total rows: {len(dataset)}")
politifact_labels = ['true', 'mostly-true', 'half-true', 'mostly-false', 'false', 'pants-fire']


def check_claim(row):
    ok = True
    if not isinstance(row['statement'], str):
        print(row['statement'])
        ok = False
    if not isinstance(row['verdict'], str):
        print(row['verdict'])
        ok = False
    if row['verdict'] not in politifact_labels:
        ok = False
        print(row['verdict'])
    return ok


valid_rows = dataset.apply(check_claim, axis=1)

# Check that every row has a valid claim (str) and label ("SUPPORTS", "REFUTES", "NOT ENOUGH INFO")
print(f"Number of valid rows: {valid_rows.sum()}")

       verdict statement_originator  \
0         true         Barack Obama   
1        false           Matt Gaetz   
2  mostly-true         Kelly Ayotte   

                                           statement statement_date  \
0  John McCain opposed bankruptcy protections for...      6/11/2008   
1  "Bennie Thompson actively cheer-led riots in t...       6/7/2022   
2  Says Maggie Hassan was "out of state on 30 day...      5/18/2016   

  statement_source        factchecker factcheck_date  \
0           speech  Adriel Bettelheim      6/16/2008   
1       television        Yacob Reyes      6/13/2022   
2             news     Clay Wirestone      5/27/2016   

                             factcheck_analysis_link  
0  https://www.politifact.com/factchecks/2008/jun...  
1  https://www.politifact.com/factchecks/2022/jun...  
2  https://www.politifact.com/factchecks/2016/may...  
Total rows: 21152
Number of valid rows: 21152


In [2]:
from tqdm import tqdm
from core.processing import process_query


def translate_label(label: str) -> str:
    """Translate the newsagent label to the Politifact label."""
    return 'mixed' if label == 'unknown' else label
# newsagent_label_mapping = {
#     'true': 'true',
#     'unknown': 'mixed',
#     'mixed': 'mixed',
#     'false': 'false',
# }

politifact_label_mapping = {
    'true': 'true',
    'mostly-true': 'true',
    'half-true': 'mixed',
    'mostly-false': 'false',
    'false': 'false',
    'pants-fire': 'false'
}
dataset['label'] = dataset['verdict'].map(politifact_label_mapping)
dataset.head(3)

Unnamed: 0,verdict,statement_originator,statement,statement_date,statement_source,factchecker,factcheck_date,factcheck_analysis_link,label
0,true,Barack Obama,John McCain opposed bankruptcy protections for...,6/11/2008,speech,Adriel Bettelheim,6/16/2008,https://www.politifact.com/factchecks/2008/jun...,True
1,false,Matt Gaetz,"""Bennie Thompson actively cheer-led riots in t...",6/7/2022,television,Yacob Reyes,6/13/2022,https://www.politifact.com/factchecks/2022/jun...,False
2,mostly-true,Kelly Ayotte,"Says Maggie Hassan was ""out of state on 30 day...",5/18/2016,news,Clay Wirestone,5/27/2016,https://www.politifact.com/factchecks/2016/may...,True


In [3]:
async def evaluate_df(df: pd.DataFrame, builtin_tools: list[str]=[], user_tool_kwargs=[]):
    agent_responses, predicted_labels, correct_prediction = [], [], []
    for i in tqdm(range(len(df))):
        row = df.iloc[i]
        claim = row['statement']
        label = row['label']

        agent_response = await process_query(text=claim, builtin_tools=builtin_tools, user_tool_kwargs=user_tool_kwargs)
        predicted_label = agent_response['final_label']
        predicted_politifact_label = translate_label(predicted_label)

        # Compile results
        agent_responses.append(agent_response)
        predicted_labels.append(predicted_politifact_label)
        correct_prediction.append(predicted_politifact_label == label)
    return agent_responses, predicted_labels, correct_prediction 

import nest_asyncio
nest_asyncio.apply()



In [4]:
async def run_evaluation(dataset, builtin_tools, user_tool_kwargs):
    """
    Assumes the dataset has a 'label' column with the labels after the mapping.
    """
    # Evaluate the entire dataset
    df = dataset.copy()
    agent_responses, preds, is_correct_list = await evaluate_df(df, builtin_tools=builtin_tools, user_tool_kwargs=user_tool_kwargs)
    df['agent_response'] = agent_responses
    df['predicted_label'] = preds
    df['is_correct'] = is_correct_list

    # Compute F1 score for the 3 labels
    f1 = f1_score(df['label'],
                  df['predicted_label'], average='macro')
    print(f"F1 score: {f1:.4f}")
    return df, f1



Run with calculator, Wikipedia, and web search

In [9]:
mini_df = dataset.iloc[:100].copy()

In [10]:
mini_results, mini_f1 = await run_evaluation(dataset=mini_df, builtin_tools=['calculator', 'web_search', 'wikipedia'], user_tool_kwargs=[])

  0%|          | 0/100 [00:00<?, ?it/s]

Using first search result John McCain 2008 presidential campaign!


  1%|          | 1/100 [00:46<1:16:36, 46.43s/it]

Using first search result January 6 United States Capitol attack!


  3%|▎         | 3/100 [01:41<48:40, 30.11s/it]  

Using first search result Statistics of the COVID-19 pandemic in the United States!
Using first search result Centers for Disease Control and Prevention!


  4%|▍         | 4/100 [02:43<1:08:19, 42.71s/it]

Using first search result Republican Party (United States)!


  5%|▌         | 5/100 [03:13<1:00:27, 38.18s/it]

Using first search result Birthright citizenship in the United States!


  6%|▌         | 6/100 [03:42<55:10, 35.22s/it]  

Using first search result Food, Conservation, and Energy Act of 2008!


  7%|▋         | 7/100 [04:11<51:24, 33.16s/it]

Using first search result Crime statistics!


 10%|█         | 10/100 [06:34<1:01:37, 41.08s/it]

Using first search result Robert Rubin!
Using first search result United States federal budget!


 11%|█         | 11/100 [07:28<1:07:03, 45.21s/it]

Using first search result Joint Comprehensive Plan of Action!


 12%|█▏        | 12/100 [07:53<56:58, 38.85s/it]  

Using first search result Climate change denial!
Using first search result False or misleading statements by Donald Trump!


 16%|█▌        | 16/100 [10:19<51:46, 36.98s/it]  

Using first search result Tampa Bay!


 18%|█▊        | 18/100 [11:29<50:44, 37.13s/it]

Using first search result Internal Revenue Service!
Using first search result List of countries by real GDP growth rate!


 20%|██        | 20/100 [14:45<1:23:25, 62.56s/it]

Using first search result First presidency of Donald Trump!


 21%|██        | 21/100 [15:16<1:09:49, 53.04s/it]

Using first search result Stimulus (economics)!


 22%|██▏       | 22/100 [16:10<1:09:15, 53.28s/it]

Using first search result Postal voting!


 23%|██▎       | 23/100 [16:35<57:42, 44.97s/it]  

Using first search result Spring Education Group!


 26%|██▌       | 26/100 [17:45<38:10, 30.95s/it]

Using first search result Racism in Europe!


 28%|██▊       | 28/100 [18:36<33:28, 27.90s/it]

Using first search result Marco Rubio!


 29%|██▉       | 29/100 [19:02<32:17, 27.29s/it]

Using first search result Political positions of Donald Trump!


 30%|███       | 30/100 [19:23<29:31, 25.31s/it]

Using first search result The Exposé!
Using first search result COVID-19 vaccine!


 31%|███       | 31/100 [20:26<42:02, 36.55s/it]

Using first search result Colonia!
Using first search result United Farm Workers!


 32%|███▏      | 32/100 [21:05<42:30, 37.51s/it]

Using first search result Cancer Research UK!
Using first search result Cancer Research UK!


 35%|███▌      | 35/100 [22:52<34:49, 32.15s/it]

Using first search result Democratic Party (United States)!


 36%|███▌      | 36/100 [23:22<33:24, 31.32s/it]

Using first search result Joe Biden!
Using first search result Presidential Conflicts of Interest Act!


 37%|███▋      | 37/100 [24:14<39:21, 37.48s/it]

Using first search result JB Pritzker!


 38%|███▊      | 38/100 [24:45<36:50, 35.66s/it]

Using first search result Medicaid!


 39%|███▉      | 39/100 [25:16<34:55, 34.36s/it]

Using first search result Tennessee!
Using first search result Frivolous litigation!


 40%|████      | 40/100 [26:06<39:00, 39.01s/it]

Using first search result In God We Trust!


 43%|████▎     | 43/100 [27:17<28:19, 29.82s/it]

Using first search result African Americans!
Using first search result United States incarceration rate!


 45%|████▌     | 45/100 [29:03<38:12, 41.68s/it]

Using first search result United States federal budget!


 48%|████▊     | 48/100 [30:17<26:18, 30.36s/it]

Using first search result Qatar Airways!
Using first search result Delta Air Lines fleet!


 50%|█████     | 50/100 [31:16<24:24, 29.30s/it]

Using first search result September 11 attacks!


 52%|█████▏    | 52/100 [31:51<18:16, 22.85s/it]

Using first search result Deaths in 2025!
Using first search result Rape statistics!


 53%|█████▎    | 53/100 [33:10<31:05, 39.69s/it]

Using first search result Omarosa Manigault Newman!
Using first search result Omarosa Manigault Newman!


 54%|█████▍    | 54/100 [33:58<32:26, 42.31s/it]

Using first search result Africa!


 56%|█████▌    | 56/100 [35:30<33:37, 45.85s/it]

Using first search result Michigan!
Using first search result Water export!


 57%|█████▋    | 57/100 [36:21<33:54, 47.32s/it]

Using first search result Barack Obama!


 58%|█████▊    | 58/100 [36:54<30:07, 43.04s/it]

Using first search result Red states and blue states!


 59%|█████▉    | 59/100 [37:25<26:51, 39.31s/it]

Using first search result Affordable Care Act!


 60%|██████    | 60/100 [38:03<26:01, 39.04s/it]

Using first search result Green Light!


 62%|██████▏   | 62/100 [38:57<20:47, 32.84s/it]

Using first search result Nuclear program of Iran!
Using first search result Nuclear program of Iran!


 64%|██████▍   | 64/100 [40:13<20:14, 33.75s/it]

Using first search result Wisconsin!


 65%|██████▌   | 65/100 [40:47<19:42, 33.79s/it]

Using first search result Merrimack River!
Using first search result Merrimack River!


 66%|██████▌   | 66/100 [41:47<23:43, 41.86s/it]

Using first search result Henry Paulson!
Using first search result Ben Bernanke!


 67%|██████▋   | 67/100 [43:08<29:22, 53.42s/it]

Using first search result Beto O'Rourke!


 69%|██████▉   | 69/100 [43:53<19:38, 38.01s/it]

Using first search result Rape kit!


 71%|███████   | 71/100 [44:48<15:35, 32.27s/it]

Using first search result Ted Cruz!


 72%|███████▏  | 72/100 [45:19<14:54, 31.93s/it]

Using first search result Foreign policy of the Barack Obama administration!


 73%|███████▎  | 73/100 [46:17<17:50, 39.65s/it]

Using first search result Teenage suicide in the United States!
Using first search result Ernest Amory Codman!


 75%|███████▌  | 75/100 [47:48<17:35, 42.23s/it]

Using first search result Rick Santorum!


 77%|███████▋  | 77/100 [48:29<11:41, 30.50s/it]

Using first search result Presidency of Joe Biden!


 78%|███████▊  | 78/100 [49:21<13:28, 36.75s/it]

Using first search result Data plan!
Using first search result Net neutrality in the United States!


 79%|███████▉  | 79/100 [50:14<14:39, 41.88s/it]

Using first search result Scott Walker (politician)!
Using first search result Scott Walker (politician)!


 80%|████████  | 80/100 [51:16<15:55, 47.75s/it]

Using first search result Martin Luther King Jr.!


 81%|████████  | 81/100 [51:38<12:40, 40.02s/it]

Using first search result Majority minority in the United States!


 82%|████████▏ | 82/100 [52:07<11:02, 36.80s/it]

Using first search result Constitution of North Carolina!


 84%|████████▍ | 84/100 [52:58<08:02, 30.14s/it]

Using first search result Heidi Cruz!
Using first search result Heidi Cruz!
Using first search result Goldman Sachs!


 85%|████████▌ | 85/100 [54:11<10:42, 42.86s/it]

Using first search result Mpox!


 86%|████████▌ | 86/100 [54:44<09:17, 39.84s/it]

Using first search result Price of oil!


 87%|████████▋ | 87/100 [55:10<07:46, 35.85s/it]

Using first search result Affordable Care Act!


 89%|████████▉ | 89/100 [56:12<06:03, 33.02s/it]

Using first search result Nancy Pelosi!


 91%|█████████ | 91/100 [56:57<03:59, 26.66s/it]

Using first search result Tax Cuts and Jobs Act!
Using first search result United States debt ceiling!


 93%|█████████▎| 93/100 [58:17<03:44, 32.07s/it]

Using first search result Road traffic safety!
Using first search result Road traffic safety!


 94%|█████████▍| 94/100 [58:53<03:19, 33.31s/it]

Using first search result New antisemitism!


 95%|█████████▌| 95/100 [59:38<03:03, 36.75s/it]

Using first search result List of United States Congresses!


 96%|█████████▌| 96/100 [1:00:25<02:39, 39.89s/it]

Using first search result School meal programs in the United States!


 97%|█████████▋| 97/100 [1:01:01<01:56, 38.70s/it]

Using first search result George Frideric Handel!


 98%|█████████▊| 98/100 [1:01:27<01:10, 35.03s/it]

Using first search result University of Massachusetts Amherst!


 99%|█████████▉| 99/100 [1:02:02<00:35, 35.04s/it]

Using first search result List of unproven methods against COVID-19!
Using first search result Ivermectin!


100%|██████████| 100/100 [1:03:12<00:00, 37.92s/it]

F1 score: 0.2994





In [11]:
print(mini_f1)

0.2994306418219462


Write out results

In [12]:
with open('politifact_f1.txt', 'w') as f:
    f.write(f"F1: {mini_f1:.4f}\n")

mini_results.to_json('politifact_results.jsonl', lines=True, orient='records')