In [65]:
import numpy as np
import json
from transformers import AutoTokenizer
import nltk
nltk.download('punkt')
from evaluate import load
metric = load("rouge")

[nltk_data] Downloading package punkt to /Users/andrew/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [6]:
gpt35 = []
with open("../data/gpt-3.5-test-output.jsonl") as f:
    for line in f:
        gpt35.append(json.loads(line))

In [16]:
gpt4 = []
with open("../data/gpt-4-test-output.jsonl") as f:
    for line in f:
        gpt4.append(json.loads(line))

In [17]:
len(gpt35), len(gpt4)

(602, 602)

In [33]:
gpt35[0][2]

{'summary': "Democratic presidential candidate Joe Biden has refused once more to answer questions on whether he would pack the Supreme Court if he wins in November. His evasion took place during campaign stops in Ohio. He did, however, hint that he is not in favour of court-packing and disagreed with Trump’s recent appointment of a Supreme Court nominee ahead of the election. Despite avoiding a definitive answer, Biden emphasised that the focus should be on Trump's current actions with the court, instead of hypothetical ones that he might take if elected.\n",
 'article': 'Biden says he\'s \'not a fan\' of court packing\nDemocratic presidential nominee Joe Biden said Monday that he is "not a fan" of the idea of adding seats to the Supreme Court after repeatedly dodging questions about the issue.\n"I\'m not a fan of court packing, but I don\'t want to get off on that whole issue. I want to keep focused," Biden told WKRC, a Cincinnati-area CBS/CW affiliate. "The president would like noth

In [34]:
preds35 = []
labels35 = []
preds35_filtered = []
labels35_filtered = []
for item in gpt35:
    pred_summary = item[1]["choices"][0]["message"]["content"]
    true_summary = item[2]['summary']
    preds35.append(pred_summary)
    labels35.append(true_summary)
    if item[2]['article_bias'] != item[2]['summary_bias']:
        preds35_filtered.append(pred_summary)
        labels35_filtered.append(true_summary)

In [37]:
preds4 = []
labels4 = []
preds4_filtered = []
labels4_filtered = []
for item in gpt4:
    pred_summary = item[1]["choices"][0]["message"]["content"]
    true_summary = item[2]['summary']
    preds4.append(pred_summary)
    labels4.append(true_summary)
    if item[2]['article_bias'] != item[2]['summary_bias']:
        preds4_filtered.append(pred_summary)
        labels4_filtered.append(true_summary)

In [66]:
tokenizer = AutoTokenizer.from_pretrained('facebook/bart-large-cnn')

In [72]:
def compute_metrics(predictions, labels):
    # Rouge expects a newline after each sentence
    decoded_preds = [
        "\n".join(nltk.sent_tokenize(pred.strip())) for pred in predictions
    ]
    decoded_labels = [
        "\n".join(nltk.sent_tokenize(label.strip())) for label in labels
    ]

    # Note that other metrics may not have a `use_aggregator` parameter
    # and thus will return a list, computing a metric for each sentence.
    result = metric.compute(
        predictions=decoded_preds,
        references=decoded_labels,
        use_stemmer=True,
        use_aggregator=True,
    )
    # Extract a few results
    result = {key: value * 100 for key, value in result.items()}

    # Add mean generated length
    prediction_lens = [
        len(tokenizer.encode(pred)) for pred in predictions
    ]
    result["gen_len"] = np.mean(prediction_lens)

    return {k: round(v, 4) for k, v in result.items()}

In [73]:
compute_metrics(preds35, labels35)

{'rouge1': 37.8725,
 'rouge2': 10.4547,
 'rougeL': 21.4027,
 'rougeLsum': 33.3545,
 'gen_len': 132.99}

In [74]:
compute_metrics(preds4, labels4)

{'rouge1': 26.8912,
 'rouge2': 7.3055,
 'rougeL': 14.8793,
 'rougeLsum': 23.8015,
 'gen_len': 410.8339}

In [75]:
compute_metrics(preds35_filtered, labels35_filtered)

{'rouge1': 34.9019,
 'rouge2': 7.8862,
 'rougeL': 19.1026,
 'rougeLsum': 30.6104,
 'gen_len': 131.9974}

In [76]:
compute_metrics(preds4_filtered, labels4_filtered)

{'rouge1': 25.3446,
 'rouge2': 6.1122,
 'rougeL': 13.8997,
 'rougeLsum': 22.4193,
 'gen_len': 409.9136}

In [43]:
preds35[0]

'Joe Biden has stated that he is "not a fan" of court packing, but believes that Republicans\' push to confirm Judge Amy Coney Barrett to the Supreme Court before the election is a form of court packing. He has also described questions about his views on court packing as a distraction and emphasized the importance of investing in Ohio in the upcoming election. From the perspective of the political right, this may be seen as a shift in position on court packing and an attempt to avoid addressing the issue directly.'

In [44]:
labels35[0]

"Democratic presidential candidate Joe Biden has refused once more to answer questions on whether he would pack the Supreme Court if he wins in November. His evasion took place during campaign stops in Ohio. He did, however, hint that he is not in favour of court-packing and disagreed with Trump’s recent appointment of a Supreme Court nominee ahead of the election. Despite avoiding a definitive answer, Biden emphasised that the focus should be on Trump's current actions with the court, instead of hypothetical ones that he might take if elected.\n"

In [28]:
preds4[0]

"From the perspective of the political left, Senator John McCain's comments on Russia's actions regarding Syria reflect skepticism about the intentions and commitment of the Russian government in resolving the Syrian conflict, particularly in relation to the use of chemical weapons. Progressives might agree with McCain's assessment that Russia's reluctance to enforce the agreement with the threat of force undermines the potential success of the disarmament deal. They may also share his concern about Russia's continued support for the Assad regime, which complicates the situation and hinders efforts toward a peaceful resolution and protection of human rights in Syria.\n\nHowever, the left might diverge from McCain on the means to address the conflict. While McCain advocated for arming moderate rebel groups, those on the political left may argue for non-military approaches and diplomatic solutions to avoid further escalation of violence. They may emphasize the need for cautious engagemen

In [29]:
labels4[0]

"Senator John McCain has expressed skepticism towards Russia's sincerity in its deal with the U.S. to help remove chemical weapons from Syria. He believes that the agreement will be hard to enforce without the threat of force, emphasizing that Russia refuses to agree to the use of force, regardless of Bashar Assad's actions. McCain is also critical of Russia for not attributing blame for the August 21st chemical weapons attack and for continuing to arm Assad’s regime. He voiced his belief that Russia is not maintaining seriousness on the issue, while it continues to facilitate the provision of weapons to Syria. McCain also stated that if he were president, he would have increased support for the Free Syrian Army and provided weapons to moderate rebel fighters.\n"

# match gpt-3.5 outputs to gpt-4

In [45]:
results = []
for item in gpt35:
    example = item[2]
    gpt35_sum = item[1]["choices"][0]["message"]["content"]
    for item2 in gpt4:
        if item2[2] == example:
            gpt4_sum = item2[1]["choices"][0]["message"]["content"]
    example["gpt4"] = gpt4_sum
    example["gpt3.5"] = gpt35_sum
    results.append(example)

In [62]:
def display_result(item):
    print(f"Article Bias {item['article_bias']}, Summary Bias {item['summary_bias']}, id {item['id']}\n")
    print(f"Summary: {item['summary']}\n")
    print("GPT4:", item['gpt4'], '\n')
    print("GPT3.5:", item['gpt3.5'], '\n')

In [63]:
display_result(results[4])

Article Bias left, Summary Bias right, id 4305

Summary: Thirty-four US soldiers suffered traumatic brain injuries from an Iranian missile attack on Al Asad Air Base in Iraq back in early January, according to the Pentagon. Notably, the figure is higher than the 11 initially reported to have been treated for concussion symptoms following the incident. The missile attack came after escalated tensions between the US and Iran following a US airstrike that killed a high-ranking Iranian general. 17 of the affected troops are still under medical observation, per Pentagon spokesman Jonathan Hoffman.

GPT4: From a political right perspective, the focus might be on the fact that injuries sustained in combat zones can often take time to manifest and that the Pentagon and Trump administration are actively monitoring and addressing the health of the troops. The increase in numbers from the initial report is seen not as a coverup or downplaying but as an example of the evolving nature of the medica

In [83]:
display_result(results[11])

Article Bias left, Summary Bias right, id 5010

Summary: Seventeen Republican congressmen and Justin Amash voted against the House resolution condemning QAnon, a conspiracy theorist group. Rep. Buddy Carter of Georgia was one of those who voted against the resolution but later explained that he did so unintentionally. Some of the representatives believe that voting against the resolution was a means to avoid giving QAnon any legitimacy. In addition, some emphasize that Congress should focus on more pressing issues like addressing city violence and providing relief to small businesses and working families amid the pandemic. President Trump had previously been criticized for not directly condemning QAnon.

GPT4: From a political right perspective, the hesitation or refusal by some Republicans to condemn QAnon through a resolution can be attributed to a number of reasons that conservatives might argue:

1. **Free Speech Concerns**: Some on the political right may argue that condemning a p