## Generate Political Speeches using GPT2

### Intro

Use easy wrapper classes to easily train models based on US-presidents. Runs smoothly on (Pro-Version) of Google Colab on Cuda.

In this experiment, the following setup is chosen:

- Use pretrained GPT2 model provided by Huggingface
- Train on all presidents first to learn **political vocabulary**
- Finetune to specific president

In [None]:
import pandas as pd
import GptPresidential as President

DATA_PATH = "..."

### Get Data Loaders

For presidents

- Trump
- Obama
- Clinton

In [None]:
# load dataobject
dt = President.GetData(data_path=DATA_PATH)

# get iterables of all speeches after 1960
all_loader = dt.get_dataloader_of(after_year=1960)

# get iterables of specific presidents
trump_loader = dt.get_dataloader_of("trump")
obama_loader = dt.get_dataloader_of("obama")
clinton_loader = dt.get_dataloader_of("clinton")

### Download GPT2 model

1. Train on all presidents
2. Copy model to finetune with respect to each president

In [None]:
# download gpt2 model
mod = President.PresidentModel()

# if error "No CUDA GPUs are available" on Google Colab:
# go to runtime -> change... type -> cuda

In [None]:
# train on all presidents
params = dict(
    epochs=1,
    learning_rate=2e-4,
    epsilon = 1e-07
)
mod.fine_tune(all_loader, **params)

# possible parameters
# epochs
# learning_rate
# epsilon
# warmup_steps

In [None]:
# train specific presidents
mod_trump = mod.copy()
mod_obama = mod.copy()
mod_clinton = mod.copy()

In [None]:
# fine tune to politician
params = dict(
    epochs=1,
    learning_rate=5e-4
)
mod_trump.fine_tune(trump_loader, **params)
mod_obama.fine_tune(obama_loader, **params)
mod_clinton.fine_tune(clinton_loader, **params)

### Generate Outputs

1. Use same prompts for each president
2. Compare real speeches to generated speeches using the same seed.

In [None]:
# predict
prompt = "Our most critical issuse is"
temp = []
print("\n\n------------------ ALL ------------------")
temp.append(dict(president="all", prompt=prompt, gen=mod.generate_sentences(prompt, max_length=200, num_sentences=3)))
print("\n\n------------------ TRUMP ------------------")
temp.append(dict(president="trump", prompt=prompt, gen=mod_trump.generate_sentences(prompt, max_length=200, num_sentences=3)))
print("\n\n------------------ OBAMA ------------------")
temp.append(dict(president="obama", prompt=prompt, gen=mod_obama.generate_sentences(prompt, max_length=200, num_sentences=3)))
print("\n\n------------------ CLINTON ------------------")
temp.append(dict(president="clinton", prompt=prompt, gen=mod_clinton.generate_sentences(prompt, max_length=200, num_sentences=3)))


In [None]:
# real texts
seed_sentences = [
  """I know your pain. I know you’re hurt. We had an election that was stolen from us. It was a landslide election, and everyone knows it, especially the other side, but you have to go home now. We have to have peace. We have to have law and order. We have to respect our great people in law and order. We don’t want anybody hurt. It’s a very tough period of time. There’s never been a time like this where such a thing happened, where they could take it away from all of us, from me, from you, from our country. This was a fraudulent election, but we can’t play into the hands of these people. We have to have peace. So go home. We love you. You’re very special. You’ve seen what happens. You see the way others are treated that are so bad and so evil. I know how you feel. But go home and go home at peace.""",
  """I want to thank the first lady, my entire family, and Vice President Pence, Mrs. Pence for being with us all through this. And we were getting ready for a big celebration. We were winning everything and all of a sudden it was just called off. The results tonight have been phenomenal and we are getting ready… I mean, literally we were just all set to get outside and just celebrate something that was so beautiful, so good. Such a vote, such a success to citizens of this country have come out in record numbers. This is a record. There’s never been anything like it to support our incredible movement. We won states that we weren’t expected to win. Florida, we didn’t win it. We won it by a lot.""",
  """It’s essential that Congress fund another 10,000 ICE officers, and we’re asking for that, so that we can eliminate MS-13 and root out the criminal cartels from our country. Now, we’re getting them out anyway, but we’d like to get them out a lot faster. And when you see these towns, and when you see these thugs being thrown into the back of a paddy wagon, you just see them thrown in rough. I said, “Please don’t be too nice.” Like when you guys put somebody in the car and you’re protecting their head, you know, the way you put the hand over, like don’t hit their head. And they’ve just killed somebody. Don’t hit their head. I said, “You can take the hand away, okay?”""",
  """Few challenges facing America -- and the world -- are more urgent than combating climate change. The science is beyond dispute and the facts are clear. Sea levels are rising. Coastlines are shrinking. We’ve seen record drought, spreading famine, and storms that are growing stronger with each passing hurricane season.""",
  """We remember with reverence the lives we lost. We read their names. We press their photos to our hearts. And on this day that marks their death, we recall the beauty and meaning of their lives; men and women and children of every color and every creed, from across our nation and from more than 100 others. They were innocent. Harming no one, they went about their daily lives. Gone in a horrible instant, they now dwell in the House of the Lord forever.""",
  """Good afternoon, everybody.  Let me start out by saying that I was sorely tempted to wear a tan suit today for my last press conference.  But Michelle, whose fashion sense is a little better than mine, tells me that's not appropriate in January."""
]

# use prompts
seed_texts = [
  "I know your pain. I",
  "I want to thank the",
  "It’s essential that Congress fund",
  "Few challenges facing America and",
  "We remember with reverence the",
  "Good afternoon, everybody.  Let me"
]

results = []

for i in range(3):
  results.append(dict(president="trump",
                      seed_text=seed_sentences[i],
                      seed_text_start=seed_texts[i],
                      generated_text=mod_trump.generate_sentences(seed_texts[i],
                                                                  max_length=200,
                                                                  num_sentences=1)))
for i in range(3):
  i = i + 3
  results.append(dict(president="obama",
                      seed_text=seed_sentences[i],
                      seed_text_start=seed_texts[i],
                      generated_text=mod_obama.generate_sentences(seed_texts[i],
                                                                  max_length=200,
                                                                  num_sentences=1)))
results = pd.DataFrame(results)