# Zero Shot for Rep/Dem Classification

## Setup/Imports

In [2]:
!pip install sentencepiece
!pip install transformers
!pip install gdown

Keyring is skipped due to an exception: 'keyring.backends'
Keyring is skipped due to an exception: 'keyring.backends'
Keyring is skipped due to an exception: 'keyring.backends'


In [4]:
# Import packages

# For downloading data
import gdown
# For working with JSON files
import json
# For working with LMs
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch
import numpy as np
import pandas as pd
import random
# For status bars
from tqdm.notebook import tqdm
# To display markdown
from IPython.display import display, Markdown

In [5]:
# device = "cuda"
# model_id = "google/flan-t5-large"
# model_filename_string = 'flan-t5-large'
# model_string = 'FLAN-T5 Large'
# model = T5ForConditionalGeneration.from_pretrained(model_id).to(device)
# tokenizer = T5Tokenizer.from_pretrained(model_id)

## Model Initialization
### Note:
On CPU, need to talk to prof about best way to run this

In [6]:
device = "cpu"  # Change this from "cuda" to "cpu"
model_id = "google/flan-t5-large"
model_filename_string = 'flan-t5-large'
model_string = 'FLAN-T5 Large'

# Load the model and tokenizer, ensuring they are set to use the CPU
model = T5ForConditionalGeneration.from_pretrained(model_id).to(device)
tokenizer = T5Tokenizer.from_pretrained(model_id)

In [7]:
speeches = pd.read_csv('presidential_speeches.csv')

In [8]:
speeches_dem = speeches[speeches["Party"]=="Democratic"].reset_index(drop=True)
speeches_rep = speeches[speeches["Party"]=="Republican"].reset_index(drop=True)

In [9]:
speeches_dem.iloc[473].Transcript

"UDIENCE MEMBER: We love you, President Obama! Well, you know I love you back. It is a rare honor in this life to follow one of your heroes. And John Lewis is one of my heroes. Now, I have to imagine that when a younger John Lewis woke up that morning 50 years ago and made his way to Brown Chapel, heroics were not on his mind. A day like this was not on his mind. Young folks with bedrolls and backpacks were milling about. Veterans of the movement trained newcomers in the tactics of non violence; the right way to protect yourself when attacked. A doctor described what tear gas does to the body, while marchers scribbled down instructions for contacting their loved ones. The air was thick with doubt, anticipation and fear. And they comforted themselves with the final verse of the final hymn they sung: “No matter what may be the test, God will take care of you; Lean, weary one, upon His breast, God will take care of you.” And then, his knapsack stocked with an apple, a toothbrush, and a bo

In [10]:
speeches_dem.columns

Index(['Date', 'President', 'Party', 'Speech Title', 'Summary', 'Transcript',
       'URL'],
      dtype='object')

In [11]:
sampled_dem = speeches_dem.sample(n=50, random_state=42)  
sampled_rep = speeches_rep.sample(n=50, random_state=42)
combined_sample = pd.concat([sampled_dem, sampled_rep], ignore_index=True)

In [12]:
combined_sample

Unnamed: 0,Date,President,Party,Speech Title,Summary,Transcript,URL
0,2013-12-04,Barack Obama,Democratic,Speech on Economic Mobility,President Obama delivers a speech on Economic ...,"Thank you. Thank you, everybody. Thank you so ...",https://millercenter.org/the-presidency/presid...
1,1839-12-02,Martin Van Buren,Democratic,Third Annual Message to Congress,Van Buren reestablishes his proposal for total...,Fellow Citizens of the Senate and House of Rep...,https://millercenter.org/the-presidency/presid...
2,1887-12-06,Grover Cleveland,Democratic,Third Annual Message,,To the Congress of the United States: You are ...,https://millercenter.org/the-presidency/presid...
3,1856-01-24,Franklin Pierce,Democratic,Message Regarding Disturbances in Kansas,,Circumstances have occurred to disturb the cou...,https://millercenter.org/the-presidency/presid...
4,1858-02-02,James Buchanan,Democratic,Message to Congress Transmitting the Constitut...,,To the Senate and House of Representatives of ...,https://millercenter.org/the-presidency/presid...
...,...,...,...,...,...,...,...
95,2001-09-21,George W. Bush,Republican,Address on the U.S. Response to the Attacks of...,President Bush addresses Congress on the US re...,"Mr. Speaker, Mr. President Pro Tempore, member...",https://millercenter.org/the-presidency/presid...
96,1981-08-03,Ronald Reagan,Republican,Remarks on the Air Traffic Controllers Strike,President Ronald Reagan speaks about the air t...,The President. This morning at 7 l933 the unio...,https://millercenter.org/the-presidency/presid...
97,1988-05-31,Ronald Reagan,Republican,Address at Moscow State University,President Reagan speaks of specific freedoms i...,"President Reagan: Thank you, Rector Logunov, a...",https://millercenter.org/the-presidency/presid...
98,1953-12-08,Dwight D. Eisenhower,Republican,Atoms for Peace,Before the General Assembly of the United Nati...,"Madame President, Members of the General Assem...",https://millercenter.org/the-presidency/presid...


In [13]:
possible_choices = ['Democratic', 'Republican']

In [14]:
def apply_prompt_1(text, possible_choices):
    return f'Which political party does this speech most closely align with?\nSpeech: {text}\nChoices: {possible_choices[0]} or {possible_choices[1]}\nAnswer:'

In [15]:
text = speeches_dem.iloc[473].Transcript
label = "Democratic"

In [16]:
prompted_text = apply_prompt_1(text, possible_choices)

In [17]:
# Tokenize the input
input = tokenizer(prompted_text, return_tensors='pt', truncation=True)
# Put input tensors on the GPU
input_ids = input.input_ids.to(device)

In [18]:
losses_and_targets = []
for target_pretokenized in possible_choices:
    # Tokenize the current label choice
    target = tokenizer(target_pretokenized, return_tensors='pt', truncation=True)
    # Put target tensor on GPU
    target_ids = target.input_ids.to(device)
    with torch.no_grad():
        # Run the prompted example through the model and get the loss of the
        # current possible choice
        outputs = model(input_ids, labels=target_ids)
    loss = outputs.loss.item()
    losses_and_targets.append((loss, target_pretokenized))

In [19]:
for loss, target in losses_and_targets:
    print(f'{target}: {loss:.4f}')

Democratic: 0.1710
Republican: 0.8297


In [20]:
losses_and_targets.sort()
lowest_loss, best_choice = losses_and_targets[0]
correct_prediction = (best_choice == label)
print(f'The model made a correct prediction: {correct_prediction}')

The model made a correct prediction: True


In [21]:
def classify_example(text, label, possible_choices, verbose):
    """
    This function classifies one example, determining if the model places more
    probability on the right answer.
    Input: 
        an example text that has already been prompted,
        the corresponding label,
        a list of possible choices to evaluate
        a flag for whether to print additional info
    Output: 
        whether the prediction is correct (True if correct, False if incorrect)
    
    """
    # Print the example and label if we're in verbose mode
    if verbose:
      # Format the text with indents so it's easier to read when printed
        indented_text = text.replace("\n", "\n\t")
        print(f'Input text to the model:\n\t{indented_text}')
        print(f'Label: {label}')
    # Tokenize the input
    input = tokenizer(text, return_tensors='pt', truncation=True)
    # Put input tensors on the GPU
    input_ids = input.input_ids.to(device)
    # Compare the scores of possible targets
    losses_and_targets = []
    for target_pretokenized in possible_choices:
        target = tokenizer(target_pretokenized, return_tensors='pt', truncation=True)
        # put target tensor on GPU
        target_ids = target.input_ids.to(device)
        with torch.no_grad():
            # Run the prompted example through the model
            outputs = model(input_ids, labels=target_ids)
        loss = outputs.loss.item()
        losses_and_targets.append((loss, target_pretokenized))
    # This example was classified correctly if the correct choice has the
    # highest log-likelihood per token
    # (we normalize by number of tokens so that longer answers aren't penalized)
    losses_and_targets.sort()
    _, best_choice = losses_and_targets[0]
    if best_choice == label:
        correct_prediction = True
        is_correct_text = 'Correct'
    else:
        correct_prediction = False
        is_correct_text = 'Wrong'
    if verbose:
        print(f'{is_correct_text} prediction: {best_choice}\n')
    # Return True if the prediction is correct, False otherwise
    return correct_prediction

In [22]:
correct_prediction = classify_example(prompted_text, label, possible_choices, verbose=True)

Input text to the model:
	Which political party does this speech most closely align with?
	Speech: UDIENCE MEMBER: We love you, President Obama! Well, you know I love you back. It is a rare honor in this life to follow one of your heroes. And John Lewis is one of my heroes. Now, I have to imagine that when a younger John Lewis woke up that morning 50 years ago and made his way to Brown Chapel, heroics were not on his mind. A day like this was not on his mind. Young folks with bedrolls and backpacks were milling about. Veterans of the movement trained newcomers in the tactics of non violence; the right way to protect yourself when attacked. A doctor described what tear gas does to the body, while marchers scribbled down instructions for contacting their loved ones. The air was thick with doubt, anticipation and fear. And they comforted themselves with the final verse of the final hymn they sung: “No matter what may be the test, God will take care of you; Lean, weary one, upon His breast

Correct prediction: Democratic



In [23]:
def classify_dataset(prompted_examples, labels, possible_choices, verbose=False):
    """
    This function takes in a whole dataset of prompted examples with labels
    And returns the accuracy
    """
    num_examples = len(prompted_examples)
    correct_predictions = [] # 0 = incorrect, 1 = correct
    for i in tqdm(range(num_examples)):
        prompted_example = prompted_examples[i]
        label = labels[i]
        # Print the first five examples: this will be true if we are at
        # the first five examples and the verbose argument was already set to true
        verbose_example = (i < 5) & verbose
        correct_prediction = classify_example(prompted_example, label,
                                              possible_choices, verbose_example)
        # Convert true/false into an integer
        # (so we can easily get the percentage that are true)
        correct_predictions.append(int(correct_prediction))
    accuracy = sum(correct_predictions) / len(correct_predictions)
    return accuracy

In [24]:
test_texts = combined_sample["Transcript"]
test_labels = ["Democratic"]*50 + ["Republican"] *50

In [25]:
# First prompt the examples
test_texts_prompt_1 = [apply_prompt_1(t, possible_choices) for t in test_texts]
display(Markdown('**Prompt 1:**'))
# Then evaluate
accuracy = classify_dataset(test_texts_prompt_1, test_labels,
                                       possible_choices, verbose=True)
# Print the accuracy
display(Markdown(f'**Prompt 1 accuracy: {accuracy*100:.2f}%**'))

**Prompt 1:**

HBox(children=(FloatProgress(value=0.0), HTML(value='')))

Input text to the model:
	Which political party does this speech most closely align with?
	Speech: Thank you. Thank you, everybody. Thank you so much. Please, please have a seat. Thank you so much. Well, thank you, Neera, for the wonderful introduction and sharing a story that resonated with me. There were a lot of parallels in my life and probably resonated with some of you. Over the past 10 years, the Center for American Progress has done incredible work to shape the debate over expanding opportunity for all Americans. And I could not be more grateful to CAP not only for giving me a lot of good policy ideas, but also giving me a lot of staff. ( Laughter. ) My friend, John Podesta, ran my transition; my Chief of Staff, Denis McDonough, did a stint at CAP. So you guys are obviously doing a good job training folks. I also want to thank all the members of Congress and my administration who are here today for the wonderful work that they do. I want to thank Mayor Gray and everyone here at

Correct prediction: Democratic

Input text to the model:
	Which political party does this speech most closely align with?
	Speech: Fellow Citizens of the Senate and House of Representatives: I regret that I can not on this occasion congratulate you that the past year has been one of unalloyed prosperity. The ravages of fire and disease have painfully afflicted otherwise flourishing portions of our country, and serious embarrassments yet derange the trade of many of our cities. But notwithstanding these adverse circumstances, that general prosperity which has been heretofore so bountifully bestowed upon us by the Author of All Good still continues to call for our warmest gratitude. Especially have we reason to rejoice in the exuberant harvests which have lavishly recompensed well directed industry and given to it that sure reward which is vainly sought in visionary speculations. I can not, indeed, view without peculiar satisfaction the evidences afforded by the past season of the benefi

Correct prediction: Democratic

Input text to the model:
	Which political party does this speech most closely align with?
	Speech: To the Congress of the United States: You are confronted at the threshold of your legislative duties with a condition of the national finances which imperatively demands immediate and careful consideration. The amount of money annually exacted, through the operation of present laws, from the industries and necessities of the people largely exceeds the sum necessary to meet the expenses of the Government. When we consider that the theory of our institutions guarantees to every citizen the full enjoyment of all the fruits of his industry and enterprise, with only such deduction as may be his share toward the careful and economical maintenance of the Government which protects him, it is plain that the exaction of more than this is indefensible extortion and a culpable betrayal of American fairness and justice. This wrong inflicted upon those who bear the burde

Wrong prediction: Republican

Input text to the model:
	Which political party does this speech most closely align with?
	Speech: Circumstances have occurred to disturb the course of governmental organization in the Territory of Kansas and produce there a condition of things which renders it incumbent on me to call your attention to the subject and urgently to recommend the adoption by you of such measures of legislation as the grave exigencies of the case appear to require. A brief exposition of the circumstances referred to and of their causes will be necessary to the full understanding of the recommendations which it is proposed to submit. The act to organize the Territories of Nebraska and Kansas was a manifestation of the legislative opinion of Congress on two great points of constitutional construction: One, that the designation of the boundaries of a new Territory and provision for its political organization and administration as a Territory are measures which of right fall withi

Wrong prediction: Republican

Input text to the model:
	Which political party does this speech most closely align with?
	Choices: Democratic or Republican
	Answer:
Label: Democratic


Wrong prediction: Republican




**Prompt 1 accuracy: 52.00%**