### Welcome! This notebook describes a method of generating inferential decompositions from text as introduced in our EMNLP 2023 paper: [Natural Language Decompositions of Implicit Content Enable Better Text Representations](https://arxiv.org/pdf/2305.14583.pdf)!

We will guide you through the process step-by-step, and provide explanations and code snippets along the way. The method can be broken down into the following steps:

1. **Sample a small number of items from your dataset**: Here, we use a dataset of tweets posted by legislators during the 115, 116 and 117th US Congresses. 
2. **Craft Implicit and Explicit Propositions**: Refer to Appendix 2. of our paper for a description of the instructions used in the paper to craft exemplar poropotitions. We will use the same instructions to craft implicit and explicit propositions for our dataset.
3. **Prompt an LLM with the crafted exemplars**: Here, we will use GPT3.5 Turbo for our experiments. 
4. **Validate**: Confirm that a random sample of the generated decompositions are _plausible_.
5. **Downstream Usage**: Use the decompositions in the target task. 

#### Getting Started

To begin, run the first cell below to import the necessary packages and set up the environment. The helper functions and accompanying code are in `eval_mteb.py` and `generation_utils.py`. 

##### Note: 
We assume that your OPENAI_API_KEY is an environment variable. One way to set it is by running - `conda env config vars set OPENAI_API_KEY=<your_key_here>` inside your conda environment. It can also be set manually in the config by setting  `config["llm"]["openai_api_key"]`.

The code below was tested on a linux server, but should work on other hardware with alterations to pytorch versioning. \[More additions coming soon!]

In [1]:
import os 
import json 
from tqdm import tqdm
import pandas as pd
import random
from simple_colors import * 


OPENAI_API_KEY = os.environ['OPENAI_API_KEY']
from pathlib import Path

from transformers import GenerationConfig 


#### Choosing the data 

For the purpose of this tutorial, we choose a dataset of congressional tweets sampled from the 115th, 116th and 117th Congress. The data can be found in `data/sampled_tweets_senate_115-117.jsonl`.

### Step 1: Sample a small number of items from your dataset

In our case, we sample a small number of tweets from the dataset. 

In [12]:
from misc_utils import read_jsonl, write_jsonl, create_textboxes, show_document

TWEETS_FILEPATH = Path('data/sampled_tweets_senate_115-117.jsonl')
tweets = read_jsonl(TWEETS_FILEPATH)

random.seed(42)
exemplar_candidates = random.sample(tweets, 10)
exemplar_tweets = [x['tweet'] for x in exemplar_candidates]

In [13]:
# See the sampled tweets

for index, tweet in enumerate(exemplar_tweets): 
    text = show_document(index, tweet) 
    display(text) 

HTML(value="<h3 style='font-family: sans-serif; color:blue;'>Document 1:</h3><p style='font-family: Verdana'>G…

HTML(value="<h3 style='font-family: sans-serif; color:blue;'>Document 2:</h3><p style='font-family: Verdana'>E…

HTML(value="<h3 style='font-family: sans-serif; color:blue;'>Document 3:</h3><p style='font-family: Verdana'>O…

HTML(value="<h3 style='font-family: sans-serif; color:blue;'>Document 4:</h3><p style='font-family: Verdana'>F…

HTML(value="<h3 style='font-family: sans-serif; color:blue;'>Document 5:</h3><p style='font-family: Verdana'>T…

HTML(value="<h3 style='font-family: sans-serif; color:blue;'>Document 6:</h3><p style='font-family: Verdana'>🚨…

HTML(value="<h3 style='font-family: sans-serif; color:blue;'>Document 7:</h3><p style='font-family: Verdana'>@…

HTML(value="<h3 style='font-family: sans-serif; color:blue;'>Document 8:</h3><p style='font-family: Verdana'>I…

HTML(value="<h3 style='font-family: sans-serif; color:blue;'>Document 9:</h3><p style='font-family: Verdana'>S…

HTML(value="<h3 style='font-family: sans-serif; color:blue;'>Document 10:</h3><p style='font-family: Verdana'>…

### Step 2: Craft Implicit and Explicit Propositions
In this step, we craft both explicit and implicit exemplars for the sampled tweets. Fill up the textboxes next to the tweets with propositons and press the "Submit" button when you're done. 

##### TIP: 
If you want to start with the exemplars used in the paper, turn the ```start_with_existing_exemplars``` flag to ```True``` in the next cell.

Our sampled tweets and exemplars can also be found in  `exemplars/leg_tweets_exemplars.jsonl`

In [17]:
start_with_existing_exemplars = True

In [18]:
tweet_decomp_exemplars = []

if not start_with_existing_exemplars : 
    random.seed(42)
    exemplar_candidates = random.sample(tweets, 10)
    exemplar_tweets = [x['tweet'] for x in exemplar_candidates]

else: 
    # load exemplars used in the paper 
    paper_exemplars = read_jsonl("exemplars/leg_tweets_exemplars.jsonl")
    exemplar_tweets = [x[0] for x in paper_exemplars] 
    exemplar_decomps = [x[1] for x in paper_exemplars] 

for index, tweet in enumerate(exemplar_tweets[:5]): # remove slicing to include all tweets
    fancy_text = show_document(index, tweet)
    
    # display the document 
    display(fancy_text)
    
    if start_with_existing_exemplars: 
        decomps = create_textboxes("\n".join(exemplar_decomps[index]))
    else: 
        decomps = create_textboxes()
    
    tweet_decomp_exemplars.append([tweet, decomps])

HTML(value="<h3 style='font-family: sans-serif; color:blue;'>Document 1:</h3><p style='font-family: Verdana'>T…

Textarea(value='The Honest Ads Act will strengthen protections against foreign election interference\nRussia w…

Button(description='Submit', style=ButtonStyle())

HTML(value="<h3 style='font-family: sans-serif; color:blue;'>Document 2:</h3><p style='font-family: Verdana'>O…

Textarea(value="A police officer killed George Floyd\nGeorge Floyd's death was unjust\nBlack Americans deserve…

Button(description='Submit', style=ButtonStyle())

HTML(value="<h3 style='font-family: sans-serif; color:blue;'>Document 3:</h3><p style='font-family: Verdana'>H…

Textarea(value="Wyoming was the fist state to recognize womens' right to vote\nWyoming supports gender equalit…

Button(description='Submit', style=ButtonStyle())

HTML(value='<h3 style=\'font-family: sans-serif; color:blue;\'>Document 4:</h3><p style=\'font-family: Verdana…

Textarea(value="Mark Zuckerberg makes insincere apologies\nFacebook is trying to avoid accountability\nFaceboo…

Button(description='Submit', style=ButtonStyle())

HTML(value="<h3 style='font-family: sans-serif; color:blue;'>Document 5:</h3><p style='font-family: Verdana'>F…

Textarea(value='There is an urgent need to help DACA recipients\nSupport for Dreamers is bipartisan\nDACA rece…

Button(description='Submit', style=ButtonStyle())

#### Step 3: Save them in the right format 

In [19]:
with open("exemplars/user_collected_exemplars.jsonl", "w") as f: 
    for elem in tweet_decomp_exemplars: 
        s= json.dumps(elem)
        f.write(f"{s}\n")

### Step 3: Prompting a LLM with the crafted exemplars

We use the `GenerationEmbedder` class from `eval_mteb.py` along with the hyperparameters specified in `configs/leg-tweet-gen-gpt3.5-propositions-all.yaml` to prompt GPT3.5 Turbo with the exemplars. The generated decompositions can be found in `data/gpt3.5_tweets_to_gen_all.jsonl`. 

For your use, the decompositions will be stored in `outputs/test.jsonl`. Similar to **Step 2**, if you want to use the exemplars described in our paper, set `use_existing_exemplars = True` in the next cell.


In [20]:
use_existing_exemplars = True

In [21]:
from eval_mteb import  GenerationEmbedder, load_config

TWEETS_FILEPATH = Path('data/sampled_tweets_senate_115-117.jsonl')
tweets = read_jsonl(TWEETS_FILEPATH)

# load the config file and the exemplars 
config = load_config('configs/leg-tweet-gen-gpt3.5-propositions-all.yaml')

# use existing exemplars or 
if use_existing_exemplars is True: 
    exemplars = read_jsonl(config["data"]['exemplars_path'])
else: 
    exemplars = read_jsonl("exemplars/user_collected_exemplars.jsonl") 

# initialize the generation object with hyperparameters loaded from the config file
model = GenerationEmbedder(
    instructions=config["data"]["instructions"],
    openai_api_key=config["llm"]["openai_api_key"],
    exemplar_pool=exemplars,
    exemplar_format=config["exemplars"]["format"],
    exemplar_sep=config["exemplars"]["separator"],
    multi_output_sep=config["exemplars"]["multi_output_separator"],
    exemplars_per_prompt=config["exemplars"]["exemplars_per_prompt"],
    draws_per_pool=config["exemplars"]["draws_per_pool"],
    repeat_draws=config["exemplars"]["repeat_draws"],
    shuffles_per_draw=config["exemplars"]["shuffles_per_draw"],
    output_combination_strategy=config["embeddings"]["output_combination_strategy"],
    include_original_doc=config["embeddings"]["include_original_doc"],
    embedding_model_name=config["embeddings"]["embedding_model_name"],
    gen_model_name=config["llm"]["gen_model_name"],
    generations_per_prompt=config["llm"]["generations_per_prompt"],
    temperature=config["llm"]["temperature"],
    top_p=config["llm"]["top_p"],
    generation_kwargs=config["llm"]["generation_kwargs"],
    max_tokens=config["llm"]["max_tokens"],
    cache_db_path=config["main"]["cache_db_path"],
    dry_run=config["main"]["dry_run"],
    device=config["embeddings"]["device"],
    seed=config["main"]["seed"],
)

For the purpose of the tutorial, we are generating decompositions for the first 10 tweets, decompositions for the whole dataset can be found in `data/gpt3.5_tweets_to_gen_all.jsonl`.

In [22]:
# generate propositions from tweets 
# simple batching code that deals with breaks in connections

# use a small sample of tweets to test the generations

OUTPUT_PATH = Path("outputs/test3.jsonl") 

if not OUTPUT_PATH.is_file():
    # If it doesn't exist, create the file
    OUTPUT_PATH.touch()

tweet_texts = [tweet['tweet'] for tweet in tweets][:10] # remove [:10] to run on all tweets
propositions = read_jsonl(OUTPUT_PATH)

batch_size = 100
for index in tqdm(range(len(propositions), len(tweet_texts), batch_size)):
    batch = tweet_texts[index:index+batch_size]
    propositions.extend(model.generate_from_inputs(batch))
    write_jsonl(propositions, OUTPUT_PATH)

0it [00:00, ?it/s]


#### Step 4: Validate

We sample some of the generated decompositions and confirm that they are _plausible_. In our paper, this was done using a human study. Please refer to Section 3 of our paper for more details. 


In [23]:
# before sampling, make sure to keep the tweet with the generations: 

for tweet_text, props in zip(tweet_texts, propositions):
    props.append(tweet_text)

# sample from the propositions
random.seed(42)
sample = random.sample(propositions, 5)

for elem in sample: 
    print(f"TWEET: \n{blue(elem[-1])}\n")
    print("PROPOSITONS:")
    for prop in elem[:-1]:
        print(green(prop))
    print("\n---------------------------\n")

TWEET: 
[34mCindy &amp; I are praying for all those in the path of #HurricaneIrma. We thank the brave volunteers &amp; urge all to listen to local officials.[0m

PROPOSITONS:
[32mCindy and the speaker are praying for those affected by Hurricane Irma[0m
[32mVolunteers are acting bravely to assist[0m
[32mIt is important to follow the guidance of local officials during the hurricane[0m
[32mHurricane Irma poses a threat to people's safety[0m

---------------------------

TWEET: 
[34mWe must do more to address mental health issues our veterans face and ensure all have access to treatment. @WSAZnews #suicidepreventionmonth  [0m

PROPOSITONS:
[32mVeterans face mental health challenges[0m
[32mAccess to treatment for veterans should be improved[0m
[32mSuicide prevention efforts are important[0m
[32mMental health support for veterans is necessary[0m
[32m@WSAZnews raises awareness about suicide prevention[0m

---------------------------

TWEET: 
[34mThis project will bring 

#### Step 5: Use the propositions for your own downstream task!

## Finding tweets with implicit similarity

When we embed a document through the surface form of its content, documents with a similar communicative intent but are expressed differently in their lexical forms are placed further in the embedding space. 

To find such document pairs, we can make use of the inferential decompositions we obtained above. These decompositions helps us get over the lexical choices of a communicator, and lets us focus instead on their communicative intent. Hence, documents that seem far in the embedding space are brought closer through their similar decompositions. 

Here, we show some samples of such document pairs. 

In [24]:
from misc_utils import distance_func
# NOTE: distance_func is a simple distance function that returns the minimum pairwise distance among all possible inferences of a pair of tweets. 

# tweets
tweet_texts = [x['tweet'] for x in tweets] 

# load all the decompositions 
decompositions = read_jsonl("data/gpt3.5_tweets_to_gen_all.jsonl") 

assert len(tweet_texts) == len(decompositions), "Length of documents don't match length of decompositions" 

In [25]:
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim
import torch
import numpy as np


st_model = SentenceTransformer("all-MiniLM-L6-v2") 
device = "cuda" if torch.cuda.is_available() else "cpu"

# compute embeddings and embedding similarities 
doc_embeddings = st_model.encode(tweet_texts, device=device, show_progress_bar=True) 
doc_distances = 1 - cos_sim(doc_embeddings, doc_embeddings) 

# compute embeddings of decompositions
# depending on your dataset, this might take a little bit of time
decomp_embeddings = np.array([st_model.encode(x, device=device) for x in tqdm(decompositions)], dtype="object")

Batches:   0%|          | 0/1228 [00:00<?, ?it/s]

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39284/39284 [05:45<00:00, 113.71it/s]


In [26]:
# create a matrix of decomposition distances 
# try with only a subset of comments for speed 
doc_sample_size = 1000

from itertools import combinations 
pairs = list(combinations(range(doc_sample_size), 2)) 

decomp_dists = np.zeros((doc_sample_size, doc_sample_size)) 

for pair in tqdm(pairs): 
    index1, index2 = pair
    decomp_dists[index1, index2] = distance_func(decomp_embeddings[index1], decomp_embeddings[index2])[2]
    decomp_dists[index2, index1] = decomp_dists[index1, index2]


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 499500/499500 [02:10<00:00, 3824.12it/s]


#### Finding tweet pairs that move closer in decomposition embedding space

In the next cell, we try to find pairs of documents (here, tweets) that move closer in embedding space once we look at their decompositions. The shift in distance is a hyperparameter, and best results can be obtained by setting a value that is meaningful for your dataset. 

Since we are using a distance function that looks at the minimum distance between all possible decompositions, the decomposition that brings the tweets closer is highlighted in <font color='red'>red</font>. The <font color='blue'>tweets</font> are colored in <font color='blue'>blue</font>, and <font color='green'>decompositions</font> in <font color='green'>green</font>.


In [27]:
# find pairs where distance is low(er) in decomposition space but high in comment space 
from nltk.tokenize import TweetTokenizer 
tt = TweetTokenizer() 

# change this value to control the number of examples displayed 
num_display_examples = 25
counter = 0 

for pair in tqdm(pairs): 
    if counter > num_display_examples: 
        break 
    
    index1, index2 = pair
    doc_distance = float(doc_distances[index1, index2])
    decomp_distance = decomp_dists[index1, index2] 
    #print(doc_distance, decomp_distance, doc_distance - decomp_distance) 
    
    # find docs of similar length
    if not abs(len(tt.tokenize(tweet_texts[index1])) - len(tt.tokenize(tweet_texts[index2]))) < 10: 
        continue 

    # find docs where the shift is more than 0.5 and distance over decomposition is low 
    # change these values accoring 
    if doc_distance - decomp_distance > 0.4 and decomp_distance < 0.3: 
        print(index1, index2)
        print(f"Distance moved from {doc_distance} -> {decomp_distance}")
        a = distance_func(decomp_embeddings[index1], decomp_embeddings[index2])
        
        print(blue(tweet_texts[index1]))
        print(green(decompositions[index1]))
        print(red(decompositions[index1][a[0]]))
    
        print(blue(tweet_texts[index2]))
        print(green(decompositions[index2]))
        print(red(decompositions[index2][a[1]]))
        print("**************")
        print("\n\n")
        counter += 1
        


  1%|██▊                                                                                                                                                                                              | 7440/499500 [00:02<03:01, 2713.68it/s]

7 20
Distance moved from 0.680652379989624 -> 0.19718265533447266
[34mThe cultural &amp; natural history of the @NewRiverNPS expands over 70,000 acres in WV. But we don’t currently have the funds necessary to maintain this beautiful WV landmark. I introduced a bill this wk to address the maintenance backlog at our national parks    [0m
[32m['New River National Park is an important part of cultural and natural history', 'The park requires funds to properly maintain it', "The park is an important part of West Virginia's heritage", 'Legislation has been introduced to address park maintenance', 'National Parks are an important part of American heritage'][0m
[31mNational Parks are an important part of American heritage[0m
[34mOur #MonumentsForAll belong to all of us. They should not be sold off to the highest bidder. We will keep fighting to protect these special places which are a part of our history, our heritage, our local economies and our way of life.[0m
[32m['National monumen

  2%|████▍                                                                                                                                                                                           | 11451/499500 [00:03<02:33, 3173.26it/s]

10 819
Distance moved from 0.5271098613739014 -> 0.11885702610015869
[34mThousands in DC will #MarchforLife today, the 45th anniversary of Roe v. Wade. As they march, they speak with one voice: "Life is sacred. Life is precious. Life is worth protecting." I stand with them &amp; all #ProLife Americans in defense of the unborn. #WhyWeMarch @March_for_Life[0m
[32m['Marchers believe that human life is sacred and valuable', 'Abortion should be illegal', 'The pro-life position is important', 'The March for Life is significant', 'Americans have a right to protest peacefully'][0m
[31mThe March for Life is significant[0m
[34mI stand with students across #NM and America who are marching against gun violence, and look forward to joining the #MarchForOurLives in Santa Fe today. I pledge to do everything I can to enact common sense gun safety measures and end the gun lobby’s stranglehold over Washington.  [0m
[32m['Students are protesting gun violence across America', 'The March For Our L

  3%|█████▍                                                                                                                                                                                          | 14150/499500 [00:04<02:36, 3100.34it/s]

13 934
Distance moved from 0.6534666419029236 -> 0.13976097106933594
[34mHere we go again. We need an EPA administrator who will protect our air, water and soil, not someone who’s made a career by making it easier for corporations to trash our environment. [0m
[32m['The new EPA Administrator should prioritize protecting the environment', 'Previous Administrators have been complicit in allowing corporations to pollute', 'The environment is in need of protection', 'Protecting the environment is a serious responsibility of the EPA.'][0m
[31mThe environment is in need of protection[0m
[34m.@kabbottBHN, president of @BostonHarborNow, highlighted the importance of tourism and recreation to New England's coastal economy. Our coastal communities cannot afford an oil spill.

#ProtectOurCoast[0m
[32m["Tourism is important to New England's coastal economy", 'Protecting the coast from oil spills is essential', 'Coastal economies are vulnerable to oil spills', 'Coastal tourism would suffer

  3%|█████▊                                                                                                                                                                                          | 15045/499500 [00:04<02:54, 2770.62it/s]

14 691
Distance moved from 0.6972324252128601 -> 0.2638554573059082
[34mOur tax laws should encourage US companies to keep #jobs here, not send them abroad. The new tax law actually created an incentive to move jobs abroad to take advantage of tax havens. Sen Klobuchar introduced a bill to close that loophole &amp; protect US jobs [0m
[32m['Tax laws should incentivize companies to keep jobs in the US', 'The new tax law created an incentive for outsourcing', 'Tax havens should not be used to avoid US taxes', 'Senator Klobuchar introduced a bill to prevent outsourcing and tax avoidance', 'American jobs need to be protected'][0m
[31mAmerican jobs need to be protected[0m
[34mExactly right. The Finger Lakes region is not and will never be a good location for this incinerator, and we have to stand strong to protect local jobs, so many in the tourism industry, &amp; our precious natural resources. #FLX [0m
[32m['The Finger Lakes region is a poor choice for the incinerator', 'Protecti

  5%|████████▉                                                                                                                                                                                       | 23154/499500 [00:07<02:51, 2776.48it/s]

23 55
Distance moved from 0.709179162979126 -> 0.2626243829727173
[34mDirector Pompeo’s recent trip to North Korea, I believe, highlights how effective and committed he is to pursuing diplomatic opportunities.[0m
[32m['Mike Pompeo is committed to diplomacy', 'Pompeo recently visited North Korea', 'Diplomacy is the preferred strategy for conflict resolution', 'Pompeo is an effective leader'][0m
[31mDiplomacy is the preferred strategy for conflict resolution[0m
[34mPleased to be joined by RI's Honorary Consul of France, Roger Begin, for French President @EmmanuelMacron’s address to a joint session of Congress. [0m
[32m["Rhode Island Honorary Consul Roger Begin joined in for President Macron's address to Congress", 'The French President addressed both houses of Congress', 'The United States has friendly diplomatic relations with France', 'Diplomacy is important for international relations'][0m
[31mDiplomacy is important for international relations[0m
**************



23 429
D

  5%|█████████▍                                                                                                                                                                                      | 24659/499500 [00:08<02:12, 3587.56it/s]

24 254
Distance moved from 0.6566082239151001 -> 0.24104857444763184
[34mWe are not going to ABANDON the wall.

We are going to BUILD the wall![0m
[32m['The wall will be built', 'The wall is a priority', 'There is an opposition to the wall', 'Construction of the wall is necessary', 'Abandoning the wall is not an option'][0m
[31mThe wall is a priority[0m
[34mJoin me Thursday at Heritage for keynote address on my bill to force sanctuary cities to #followthelaworfundthewall
[0m
[32m['Join me at Heritage on Thursday for my keynote address', 'My bill aims to enforce the law on sanctuary cities', 'My bill suggests taking funding from sanctuary cities to fund the wall', 'Immigration policy is a key issue for my constituents', 'Supporting the border wall is a priority'][0m
[31mSupporting the border wall is a priority[0m
**************





  6%|███████████▉                                                                                                                                                                                    | 31093/499500 [00:10<02:18, 3387.18it/s]

31 177
Distance moved from 0.733190655708313 -> 0.22485262155532837
[34mYesterday in Senate Finance, I spoke out about the outrageous policy of separating children from their parents at the border.  [0m
[32m['Children should not be separated from their parents at the border', 'Separating families is cruel and unusual', 'The US immigration system needs fixing', 'Human rights are important and should be respected', "The Trump administration's immigration policies are immoral"][0m
[31mThe US immigration system needs fixing[0m
[34mFixing our immigration system is important, but #RAISE Act would be harmful to our nation’s values &amp; economy. Full statement: [0m
[32m['Our immigration system needs to be reformed', 'RAISE Act is not the solution', 'RAISE Act would harm American economy', 'RAISE Act would be detrimental to American values', 'Immigration is a key component of American prosperity'][0m
[31mOur immigration system needs to be reformed[0m
**************



31 202
Distan

  6%|████████████▍                                                                                                                                                                                   | 32208/499500 [00:10<02:11, 3565.35it/s]

31 997
Distance moved from 0.8771203756332397 -> 0.28157639503479004
[34mYesterday in Senate Finance, I spoke out about the outrageous policy of separating children from their parents at the border.  [0m
[32m['Children should not be separated from their parents at the border', 'Separating families is cruel and unusual', 'The US immigration system needs fixing', 'Human rights are important and should be respected', "The Trump administration's immigration policies are immoral"][0m
[31mThe US immigration system needs fixing[0m
[34mThe RAISE Act would raise working Americans’ wages by giving priority to the best-skilled immigrants.  [0m
[32m['The RAISE Act would prioritize highly skilled immigrants', 'Skilled immigrants are more likely to raise wages for American workers', 'Prioritizing high-skill immigration could improve the US economy', 'The US government should reform its immigration policy'][0m
[31mThe US government should reform its immigration policy[0m
**************




  7%|████████████▌                                                                                                                                                                                   | 32567/499500 [00:10<02:15, 3453.25it/s]

32 815
Distance moved from 0.6179095506668091 -> 0.12021380662918091
[34mMore good news for American workers. More good news for our economy. More proof that #ThisGOPAgendaWorks  [0m
[32m['The GOP agenda works', 'The American economy is improving', 'American workers are benefiting from the GOP agenda', 'Americans should support the GOP agenda'][0m
[31mThe American economy is improving[0m
[34mFewer Americans are filing for unemployment today than at any time since 1969, a 48 year low.  [0m
[32m['The number of Americans filing for unemployment is at a 48 year low', 'Fewer Americans are unemployed than in previous years', 'The economy is improving', 'The job market is better than it has been in decades'][0m
[31mThe economy is improving[0m
**************





  7%|█████████████▊                                                                                                                                                                                  | 35952/499500 [00:11<02:07, 3643.55it/s]

36 120
Distance moved from 0.6662980318069458 -> 0.21766602993011475
[34mLife begins at conception! I proudly stand with the Pro-Life movement and #StandForLife! [0m
[32m['Life begins at conception', 'The pro-life movement is important to support', 'Abortion should be illegal', 'Unborn children have a right to life', 'Human life is valuable at all stages of development'][0m
[31mAbortion should be illegal[0m
[34mOverturning #RoevWade would not end abortion, it would just end safe abortion.[0m
[32m['Reversing Roe v. Wade would not put an end to abortion', 'Women would still seek abortions', 'Abortions should be safe', 'Overturning Roe v. Wade would threaten safe abortions', "Roe v. Wade protects a woman's right to abortion"][0m
[31mAbortions should be safe[0m
**************



36 459
Distance moved from 0.6714106798171997 -> 0.2247447967529297
[34mLife begins at conception! I proudly stand with the Pro-Life movement and #StandForLife! [0m
[32m['Life begins at conception', 

  8%|██████████████▉                                                                                                                                                                                 | 38830/499500 [00:12<02:33, 2998.97it/s]

39 164
Distance moved from 0.6268096566200256 -> 0.07287865877151489
[34mOn the anniversary of the horrific shooting at Columbine, students are again walking out of their classrooms to call for an end to senseless gun violence. These young people are right -- #enough. We need to enact common-sense gun safety legislation now.  [0m
[32m['Students are walking out of classrooms to protest gun violence', 'Gun violence is a serious problem in America', 'Gun control is a necessary step to end violence', 'America needs common-sense gun safety legislation', 'Columbine was a tragic event that should not be forgotten'][0m
[31mGun violence is a serious problem in America[0m
[34mAt 2:33 PM, it will be one week since five members of the @capgaznews family were killed in a horrific act of gun violence.

At 2:33 PM, I'll be participating in a moment of silence – in honor of their lives, their memories and their families. Join me.

[0m
[32m['The Capital Gazette shooting was a horrific act of g

  9%|█████████████████▎                                                                                                                                                                              | 44921/499500 [00:14<02:06, 3601.43it/s]

45 445
Distance moved from 0.7276069521903992 -> 0.1644517183303833
[34mOklahomans are hurting from Obamacare. Senate Republicans are committed to repealing and replacing the disastrous healthcare law. [0m
[32m['Obamacare is causing problems for Oklahomans', 'The healthcare law needs to be repealed and replaced', 'Oklahoma residents are negatively impacted by the healthcare law', 'Republicans are commited to a better healthcare system', 'Obamacare is bad for America'][0m
[31mObamacare is bad for America[0m
[34m#Trumpcare would pull rug out from under critical access hospitals, like this one in Holyoke, CO, that help patients access lifesaving care.  [0m
[32m['Critical access hospitals will be at risk under Trumpcare', 'Trumpcare will have a negative impact on access to healthcare', 'People rely on critical access hospitals for lifesaving care', 'The Holyoke, CO hospital is vital for providing medical attention', 'Trumpcare is bad for American healthcare'][0m
[31mTrumpcare is

  9%|█████████████████▉                                                                                                                                                                              | 46669/499500 [00:14<02:32, 2974.19it/s]

47 473
Distance moved from 0.6497713327407837 -> 0.21559476852416992
[34mWhat is Trump afraid of? According to news reports President Trump himself has acknowledged that the release of the Nunes memo was designed to disrupt Robert Mueller’s investigation. No political stunt should interfere with the special counsel’s work.[0m
[32m["Trump fears Mueller's investigation", 'The Nunes memo was released with the intent of hindering the investigation', "The special counsel's investigation is important", 'Political stunts should not be used to interfere with investigations', 'The Nunes memo was a political stunt'][0m
[31mThe special counsel's investigation is important[0m
[34mHours after Americans voted for an independent check on his administration, @realDonaldTrump fires the Attorney General and installs a partisan ally to oversee the Mueller probe. 

This isn’t a coincidence. We need accountability and to protect the Special Counsel’s investigation[0m
[32m['Donald Trump has fired t

 10%|██████████████████▎                                                                                                                                                                             | 47556/499500 [00:15<02:37, 2871.23it/s]

48 378
Distance moved from 0.6953811645507812 -> 0.1676352620124817
[34mIntroduced the Keep Families Together &amp; Enforce the Law Act. This bill would #KeepFamiliesTogether during legal proceedings, protect children, authorize 225 more immigration judges &amp; ensure the integrity of our immigration laws.[0m
[32m['The Keep Families Together & Enforce the Law Act will protect immigrant families', 'Children deserve protection', 'More immigration judges are needed', 'Immigration laws must be upheld', 'The immigration system is in need of reform'][0m
[31mThe immigration system is in need of reform[0m
[34m.@RepJayapal visited a federal prison and met with asylum seekers who had been transferred from the border. What they told her is horrifying. We need to fight back against Trump's cruel immigration policy.  [0m
[32m['Asylum seekers are being mistreated', 'A congresswoman visited a federal prison', "The Trump administration's immigration policies are cruel", 'Immigration policies

 10%|██████████████████▋                                                                                                                                                                             | 48715/499500 [00:15<02:37, 2866.10it/s]

49 658
Distance moved from 0.9463708400726318 -> 0.28416907787323
[34mMore than 20 different government entities administer more than 160 different federal housing programs. I sent a letter asking for officials to identify duplication and overlap in federal housing assistance programs. [0m
[32m['The federal government provides housing assistance through many different agencies', 'Housing programs may be unnecessarily duplicated', 'The government should reduce inefficiencies in federal housing programs', "Taxpayers' money should be spent wisely", 'A letter was sent to officials asking for a review of housing programs'][0m
[31mTaxpayers' money should be spent wisely[0m
[34mWe spent over $170k to build trails in national parks. Seems like not too bad until you read the next line that the parks were in Russia[0m
[32m['Taxpayer money was used to build trails in Russian national parks', 'US funds should be used within the US', 'US money going to foreign countries can be problematic'

 10%|███████████████████▍                                                                                                                                                                            | 50663/499500 [00:16<02:24, 3113.21it/s]

52 83
Distance moved from 0.6601840257644653 -> 0.1399441957473755
[34mNew Hampshire, there are four more days left to #GetCovered. The Affordable Care Act open enrollment period ends December 15. Go to  today to get started. [0m
[32m['The open enrollment period for the Affordable Care Act is closing soon', 'New Hampshire residents have four days left to sign up', 'Healthcare is important for all Americans', 'Healthcare costs should be affordable'][0m
[31mHealthcare is important for all Americans[0m
[34m🚨 Our fight to protect health care for millions of families is not over.  

Spread far and wide if useful. [0m
[32m['A fight to protect healthcare has been undertaken', 'Millions of American families depend on the ACA for healthcare', 'Healthcare should be available for all Americans', 'We must keep fighting for healthcare'][0m
[31mHealthcare should be available for all Americans[0m
**************



52 92
Distance moved from 0.5998044013977051 -> 0.1719595193862915
[34mNew


