## Cosine Feature Extraction and Random Forest Classifier

The following sources were used to construct this Jupyter Notebook:

* [Numpy: Dot Multiplication, Vstack, Hstack, Flatten](https://www.youtube.com/watch?v=nkO6bmp511M)
* [Scikit Learn TF-IDF Feature Extraction and Latent Semantic Analysis](https://www.youtube.com/watch?v=BJ0MnawUpaU)
* [Fake News Challenge TF-IDF Baseline](https://github.com/gmyrianthous/fakenewschallenge/blob/master/baseline.py)
* [Python TF-IDF Algorithm Built From Scratch](https://www.youtube.com/watch?v=hXNbFNCgPfY)
* [Theory Behind TF-IDF](https://www.youtube.com/watch?v=4vT4fzjkGCQ)
* [Plotting Classifier Boundaries](http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html)

In [56]:
import sys
print(sys.version)

#Import all required modules

#For parsing and visualizing data
from pandas import DataFrame, read_csv
import pandas as pd

#For visualizing data
import matplotlib.pyplot as plt

#For processing data
import numpy as np
import pickle
from sklearn.model_selection import train_test_split

#Feature Engineering
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from scipy.sparse import hstack
import baseline_features

#Classifiers
from sklearn.naive_bayes import MultinomialNB
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression

#For scoring
from sklearn.metrics import accuracy_score
import score #Score used in competition

#Progress Bar
from tqdm import tqdm

#Reloading modules that have been updated
#import importlib
#importlib.reload(baseline_features)

3.6.4 (default, Jan  6 2018, 11:51:59) 
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)]


# Data Preparation

## Create Dataframes

In [63]:
#Import data from CSV file and create a dataframe
def create_dataframe(filename):
    #Read file into a pandas dataframe
    df = pd.read_csv(filename)
    #Remove white space in column names
    df.columns = [c.replace(' ', '_') for c in df.columns]
    return df

In [64]:
#Create dataframes for both training and testing sets
train_df_tmp = create_dataframe('train_stances.csv')
train_bodies_df = create_dataframe('train_bodies.csv')

test_df_tmp = create_dataframe('competition_test_stances.csv')
test_bodies_df = create_dataframe('test_bodies.csv')

train_df_tmp.head(5)

Unnamed: 0,Headline,Body_ID,Stance
0,Police find mass graves with at least '15 bodi...,712,unrelated
1,Hundreds of Palestinians flee floods in Gaza a...,158,agree
2,"Christian Bale passes on role of Steve Jobs, a...",137,unrelated
3,HBO and Apple in Talks for $15/Month Apple TV ...,1034,unrelated
4,Spider burrowed through tourist's stomach and ...,1923,disagree


## Join Dataframes on Body_ID

In [65]:
train_df = pd.merge(train_df_tmp,
                 train_bodies_df[['Body_ID', 'articleBody']],
                 on='Body_ID')

test_df = pd.merge(test_df_tmp,
                 test_bodies_df[['Body_ID', 'articleBody']],
                 on='Body_ID')

train_df = train_df.rename(columns={'articleBody': 'Body_Text'})
test_df = test_df.rename(columns={'articleBody': 'Body_Text'})

In [66]:
test_df.sort_values(by=['Body_ID']).head(5)

Unnamed: 0,Headline,Body_ID,Stance,Body_Text
7305,Apple to keep gold Watch Editions in special i...,1,unrelated,Al-Sisi has denied Israeli reports stating tha...
7303,Apple installing safes in-store to protect gol...,1,unrelated,Al-Sisi has denied Israeli reports stating tha...
7304,El-Sisi denies claims he'll give Sinai land to...,1,agree,Al-Sisi has denied Israeli reports stating tha...
7306,Apple Stores to Keep Gold “Edition” Apple Watc...,1,unrelated,Al-Sisi has denied Israeli reports stating tha...
7307,South Korean woman's hair 'eaten' by robot vac...,1,unrelated,Al-Sisi has denied Israeli reports stating tha...


In [67]:
train_df.sort_values(by=['Body_ID']).head(5)

Unnamed: 0,Headline,Body_ID,Stance,Body_Text
41651,"Soldier shot, Parliament locked down after gun...",0,unrelated,A small meteorite crashed into a wooded area i...
41657,Italian catches huge wels catfish; is it a rec...,0,unrelated,A small meteorite crashed into a wooded area i...
41658,Not coming to a store near you: The pumpkin sp...,0,unrelated,A small meteorite crashed into a wooded area i...
41659,One gunman killed in shooting on Parliament Hi...,0,unrelated,A small meteorite crashed into a wooded area i...
41660,Soldier shot at war memorial in Canada,0,unrelated,A small meteorite crashed into a wooded area i...


In [68]:
#Split training data into training and validation set
#X_train, X_validate, y_train, y_validate = train_test_split(train_df[['Body_Text','Headline']], train_df['Stance'], test_size=.4, random_state=42)
#print(X_train)

# Feature Engineering

## TF-IDF Features

In [69]:
#Apply Scikit Learn TFIDF Feature Extraction Algorithm
body_text_vectorizer = TfidfVectorizer(ngram_range=(1, 2), lowercase=True, stop_words='english',max_features=1024)
headline_vectorizer = TfidfVectorizer(ngram_range=(1, 2), lowercase=True, stop_words='english',max_features=1024)

#Create vocabulary based on training data
train_body_tfidf = body_text_vectorizer.fit_transform(train_df['Body_Text'])
train_headline_tfidf = headline_vectorizer.fit_transform(train_df['Headline'])

#Use vocabulary for testing data
test_body_tfidf = body_text_vectorizer.transform(test_df['Body_Text'])
test_headline_tfidf = headline_vectorizer.transform(test_df['Headline']) 

## Cosine Similarity Features

In [9]:
#Cosine Similarity
def get_cosine_similarity(body_tfidf,headline_tfidf):
    cosine_features = []
    #len body_tfidf = len headline_tfidf
    for i in tqdm(range(body_tfidf.shape[0])):
        cosine_features.append(cosine_similarity((body_tfidf.A[0].reshape(1,-1)),(headline_tfidf.A[0].reshape(1,-1)))[0][0])
    return np.array(cosine_features).reshape(body_tfidf.shape[0],1)

In [45]:
#Leave this commented out unless you are re-calculating the cosine similarity
#which can be found in the pickle files labeled: 
#train_cosine_features.p and test_cosine_features.p

#train_cosine_features = get_cosine_similarity(train_body_tfidf,train_headline_tfidf)
#test_cosine_features = get_cosine_similarity(test_body_tfidf,test_headline_tfidf)

#pickle.dump(train_cosine_features,open('train_cosine_features.p','wb'))
#pickle.dump(test_cosine_features,open('test_cosine_features.p','wb'))

In [46]:
train_cosine_features = pickle.load(open('train_cosine_features.p','rb'))
test_cosine_features = pickle.load(open('test_cosine_features.p','rb'))

FileNotFoundError: [Errno 2] No such file or directory: 'train_cosine_features.p'

## Hand Selected Features (Baseline Features)

In [49]:
train_hand_features = baseline_features.hand_features(train_df['Headline'],train_df['Body_Text'])




0it [00:00, ?it/s][A[A[A


27it [00:00, 264.97it/s][A[A[A


44it [00:00, 211.68it/s][A[A[A


57it [00:00, 183.77it/s][A[A[A


89it [00:00, 216.32it/s][A[A[A


121it [00:00, 235.29it/s][A[A[A


163it [00:00, 264.58it/s][A[A[A


210it [00:00, 293.12it/s][A[A[A


252it [00:00, 306.64it/s][A[A[A


288it [00:01, 219.37it/s][A[A[A


315it [00:01, 170.01it/s][A[A[A


336it [00:02, 162.33it/s][A[A[A


354it [00:02, 162.70it/s][A[A[A


372it [00:02, 161.24it/s][A[A[A


389it [00:02, 160.93it/s][A[A[A


409it [00:02, 162.43it/s][A[A[A


430it [00:02, 164.09it/s][A[A[A


449it [00:02, 165.00it/s][A[A[A


468it [00:02, 165.59it/s][A[A[A


487it [00:02, 162.76it/s][A[A[A


504it [00:03, 159.68it/s][A[A[A


519it [00:03, 155.20it/s][A[A[A


542it [00:03, 157.27it/s][A[A[A


574it [00:03, 161.76it/s][A[A[A


595it [00:03, 156.40it/s][A[A[A


612it [00:04, 152.37it/s][A[A[A


627it [00:04, 150.07it/s][A[A[A


640it [00

4046it [00:28, 142.37it/s][A[A[A


4056it [00:28, 141.65it/s][A[A[A


4064it [00:28, 141.23it/s][A[A[A


4072it [00:28, 140.84it/s][A[A[A


4079it [00:29, 140.40it/s][A[A[A


4086it [00:29, 140.01it/s][A[A[A


4093it [00:29, 139.77it/s][A[A[A


4103it [00:29, 139.60it/s][A[A[A


4121it [00:29, 139.71it/s][A[A[A


4137it [00:29, 139.78it/s][A[A[A


4153it [00:29, 139.80it/s][A[A[A


4168it [00:29, 139.81it/s][A[A[A


4182it [00:29, 139.72it/s][A[A[A


4197it [00:30, 139.76it/s][A[A[A


4212it [00:30, 139.79it/s][A[A[A


4236it [00:30, 140.12it/s][A[A[A


4256it [00:30, 140.31it/s][A[A[A


4274it [00:30, 140.28it/s][A[A[A


4291it [00:30, 140.34it/s][A[A[A


4308it [00:30, 140.27it/s][A[A[A


4326it [00:30, 140.40it/s][A[A[A


4345it [00:30, 140.55it/s][A[A[A


4362it [00:31, 140.50it/s][A[A[A


4378it [00:31, 139.85it/s][A[A[A


4391it [00:32, 137.09it/s][A[A[A


4417it [00:32, 137.47it/s][A[A[A


4445it [00:3

8969it [00:58, 152.53it/s][A[A[A


8977it [00:58, 152.39it/s][A[A[A


8986it [00:59, 152.28it/s][A[A[A


9003it [00:59, 152.30it/s][A[A[A


9024it [00:59, 152.39it/s][A[A[A


9046it [00:59, 152.50it/s][A[A[A


9064it [00:59, 152.53it/s][A[A[A


9081it [00:59, 152.52it/s][A[A[A


9097it [00:59, 152.50it/s][A[A[A


9113it [00:59, 152.51it/s][A[A[A


9130it [00:59, 152.52it/s][A[A[A


9146it [00:59, 152.52it/s][A[A[A


9162it [01:00, 152.50it/s][A[A[A


9178it [01:00, 152.47it/s][A[A[A


9197it [01:00, 152.52it/s][A[A[A


9291it [01:00, 153.82it/s][A[A[A


9363it [01:00, 154.75it/s][A[A[A


9412it [01:00, 155.26it/s][A[A[A


9459it [01:00, 155.39it/s][A[A[A


9498it [01:01, 155.16it/s][A[A[A


9529it [01:01, 154.96it/s][A[A[A


9554it [01:01, 154.87it/s][A[A[A


9575it [01:01, 154.81it/s][A[A[A


9594it [01:01, 154.78it/s][A[A[A


9612it [01:02, 154.66it/s][A[A[A


9630it [01:02, 154.68it/s][A[A[A


9648it [01:0

13670it [01:26, 157.57it/s][A[A[A


13696it [01:26, 157.69it/s][A[A[A


13725it [01:26, 157.84it/s][A[A[A


13750it [01:27, 157.94it/s][A[A[A


13778it [01:27, 158.08it/s][A[A[A


13804it [01:27, 157.99it/s][A[A[A


13826it [01:27, 157.95it/s][A[A[A


13846it [01:27, 157.85it/s][A[A[A


13863it [01:27, 157.81it/s][A[A[A


13879it [01:27, 157.76it/s][A[A[A


13911it [01:28, 157.95it/s][A[A[A


13931it [01:28, 157.95it/s][A[A[A


13950it [01:28, 157.95it/s][A[A[A


13968it [01:28, 157.98it/s][A[A[A


13987it [01:28, 158.01it/s][A[A[A


14006it [01:28, 158.03it/s][A[A[A


14024it [01:28, 158.05it/s][A[A[A


14048it [01:28, 158.14it/s][A[A[A


14073it [01:28, 158.23it/s][A[A[A


14099it [01:29, 158.34it/s][A[A[A


14122it [01:29, 158.40it/s][A[A[A


14144it [01:29, 158.46it/s][A[A[A


14177it [01:29, 158.65it/s][A[A[A


14202it [01:29, 158.66it/s][A[A[A


14225it [01:29, 158.66it/s][A[A[A


14260it [01:29, 158.87it/

19304it [01:56, 165.70it/s][A[A[A


19323it [01:56, 165.71it/s][A[A[A


19341it [01:56, 165.58it/s][A[A[A


19357it [01:57, 165.26it/s][A[A[A


19370it [01:57, 164.95it/s][A[A[A


19380it [01:57, 164.73it/s][A[A[A


19423it [01:57, 164.96it/s][A[A[A


19460it [01:57, 165.13it/s][A[A[A


19485it [01:57, 165.20it/s][A[A[A


19513it [01:58, 165.30it/s][A[A[A


19596it [01:58, 165.86it/s][A[A[A


19638it [01:58, 165.91it/s][A[A[A


19674it [01:58, 166.08it/s][A[A[A


19710it [01:58, 166.01it/s][A[A[A


19739it [01:58, 165.95it/s][A[A[A


19764it [01:59, 165.94it/s][A[A[A


19786it [01:59, 165.72it/s][A[A[A


19804it [01:59, 165.55it/s][A[A[A


19819it [01:59, 165.52it/s][A[A[A


19853it [01:59, 165.66it/s][A[A[A


19886it [01:59, 165.80it/s][A[A[A


19922it [02:00, 165.96it/s][A[A[A


19956it [02:00, 166.10it/s][A[A[A


19986it [02:00, 166.19it/s][A[A[A


20015it [02:00, 166.21it/s][A[A[A


20041it [02:00, 165.94it/

24319it [02:27, 164.54it/s][A[A[A


24337it [02:27, 164.54it/s][A[A[A


24357it [02:28, 164.56it/s][A[A[A


24373it [02:28, 164.46it/s][A[A[A


24387it [02:28, 164.35it/s][A[A[A


24399it [02:28, 164.25it/s][A[A[A


24410it [02:28, 164.13it/s][A[A[A


24420it [02:28, 164.06it/s][A[A[A


24429it [02:28, 164.00it/s][A[A[A


24438it [02:29, 163.95it/s][A[A[A


24447it [02:29, 163.89it/s][A[A[A


24456it [02:29, 163.74it/s][A[A[A


24466it [02:29, 163.69it/s][A[A[A


24475it [02:29, 163.62it/s][A[A[A


24483it [02:29, 163.56it/s][A[A[A


24491it [02:29, 163.49it/s][A[A[A


24499it [02:29, 163.41it/s][A[A[A


24507it [02:30, 163.35it/s][A[A[A


24515it [02:30, 163.29it/s][A[A[A


24523it [02:30, 163.22it/s][A[A[A


24531it [02:30, 163.13it/s][A[A[A


24539it [02:30, 163.07it/s][A[A[A


24547it [02:30, 163.00it/s][A[A[A


24566it [02:30, 163.02it/s][A[A[A


24580it [02:30, 162.98it/s][A[A[A


24591it [02:30, 162.87it/

28369it [02:55, 161.44it/s][A[A[A


28402it [02:55, 161.53it/s][A[A[A


28433it [02:55, 161.62it/s][A[A[A


28464it [02:56, 161.70it/s][A[A[A


28493it [02:56, 161.73it/s][A[A[A


28519it [02:56, 161.78it/s][A[A[A


28545it [02:56, 161.78it/s][A[A[A


28568it [02:56, 161.80it/s][A[A[A


28590it [02:56, 161.81it/s][A[A[A


28611it [02:56, 161.77it/s][A[A[A


28630it [02:56, 161.78it/s][A[A[A


28653it [02:57, 161.82it/s][A[A[A


28673it [02:57, 161.81it/s][A[A[A


28693it [02:57, 161.83it/s][A[A[A


28713it [02:57, 161.85it/s][A[A[A


28732it [02:57, 161.87it/s][A[A[A


28762it [02:57, 161.94it/s][A[A[A


28787it [02:57, 161.99it/s][A[A[A


28813it [02:57, 162.04it/s][A[A[A


28837it [02:57, 162.08it/s][A[A[A


28870it [02:58, 162.18it/s][A[A[A


28897it [02:58, 162.09it/s][A[A[A


28940it [02:58, 162.23it/s][A[A[A


28967it [02:58, 162.09it/s][A[A[A


28989it [02:58, 162.03it/s][A[A[A


29008it [02:59, 162.05it/

33196it [03:22, 163.67it/s][A[A[A


33219it [03:22, 163.69it/s][A[A[A


33242it [03:23, 163.72it/s][A[A[A


33266it [03:23, 163.76it/s][A[A[A


33297it [03:23, 163.83it/s][A[A[A


33322it [03:23, 163.77it/s][A[A[A


33343it [03:23, 163.61it/s][A[A[A


33360it [03:24, 163.50it/s][A[A[A


33374it [03:24, 163.39it/s][A[A[A


33386it [03:24, 163.32it/s][A[A[A


33413it [03:24, 163.37it/s][A[A[A


33486it [03:24, 163.64it/s][A[A[A


33518it [03:24, 163.69it/s][A[A[A


33548it [03:24, 163.71it/s][A[A[A


33577it [03:25, 163.77it/s][A[A[A


33606it [03:25, 163.83it/s][A[A[A


33642it [03:25, 163.92it/s][A[A[A


33672it [03:25, 163.92it/s][A[A[A


33709it [03:25, 164.02it/s][A[A[A


33738it [03:25, 164.02it/s][A[A[A


33764it [03:25, 164.04it/s][A[A[A


33789it [03:25, 164.08it/s][A[A[A


33814it [03:26, 164.10it/s][A[A[A


33837it [03:26, 163.94it/s][A[A[A


33855it [03:26, 163.68it/s][A[A[A


33869it [03:27, 163.48it/

37780it [03:51, 163.32it/s][A[A[A


37794it [03:51, 163.29it/s][A[A[A


37811it [03:51, 163.29it/s][A[A[A


37828it [03:51, 163.29it/s][A[A[A


37845it [03:51, 163.29it/s][A[A[A


37862it [03:51, 163.29it/s][A[A[A


37887it [03:51, 163.33it/s][A[A[A


37910it [03:52, 163.36it/s][A[A[A


37930it [03:52, 163.36it/s][A[A[A


37949it [03:52, 163.35it/s][A[A[A


37970it [03:52, 163.37it/s][A[A[A


37992it [03:52, 163.39it/s][A[A[A


38012it [03:52, 163.40it/s][A[A[A


38031it [03:52, 163.39it/s][A[A[A


38077it [03:52, 163.52it/s][A[A[A


38108it [03:52, 163.58it/s][A[A[A


38136it [03:53, 163.62it/s][A[A[A


38163it [03:53, 163.63it/s][A[A[A


38187it [03:53, 163.62it/s][A[A[A


38209it [03:53, 163.62it/s][A[A[A


38229it [03:53, 163.64it/s][A[A[A


38249it [03:53, 163.65it/s][A[A[A


38269it [03:53, 163.64it/s][A[A[A


38288it [03:53, 163.64it/s][A[A[A


38306it [03:54, 163.65it/s][A[A[A


38329it [03:54, 163.67it/

42439it [04:21, 162.31it/s][A[A[A


42460it [04:21, 162.23it/s][A[A[A


42477it [04:21, 162.23it/s][A[A[A


42511it [04:21, 162.30it/s][A[A[A


42549it [04:22, 162.38it/s][A[A[A


42576it [04:22, 162.37it/s][A[A[A


42599it [04:22, 162.39it/s][A[A[A


42622it [04:22, 162.40it/s][A[A[A


42644it [04:22, 162.42it/s][A[A[A


42670it [04:22, 162.45it/s][A[A[A


42693it [04:22, 162.48it/s][A[A[A


42732it [04:22, 162.56it/s][A[A[A


42760it [04:23, 162.54it/s][A[A[A


42784it [04:23, 162.53it/s][A[A[A


42805it [04:23, 162.51it/s][A[A[A


42824it [04:23, 162.49it/s][A[A[A


42841it [04:23, 162.44it/s][A[A[A


42856it [04:23, 162.39it/s][A[A[A


42869it [04:24, 162.35it/s][A[A[A


42881it [04:24, 162.32it/s][A[A[A


42900it [04:24, 162.32it/s][A[A[A


42914it [04:24, 162.31it/s][A[A[A


42948it [04:24, 162.38it/s][A[A[A


43014it [04:24, 162.56it/s][A[A[A


43047it [04:24, 162.61it/s][A[A[A


43078it [04:24, 162.64it/

47678it [04:51, 163.82it/s][A[A[A


47702it [04:51, 163.83it/s][A[A[A


47725it [04:51, 163.84it/s][A[A[A


47747it [04:51, 163.82it/s][A[A[A


47766it [04:51, 163.81it/s][A[A[A


47793it [04:51, 163.84it/s][A[A[A


47813it [04:51, 163.84it/s][A[A[A


47835it [04:51, 163.86it/s][A[A[A


47866it [04:52, 163.91it/s][A[A[A


47895it [04:52, 163.95it/s][A[A[A


47920it [04:52, 163.97it/s][A[A[A


47944it [04:52, 163.94it/s][A[A[A


47966it [04:52, 163.96it/s][A[A[A


47987it [04:52, 163.97it/s][A[A[A


48008it [04:52, 163.97it/s][A[A[A


48027it [04:53, 163.91it/s][A[A[A


48043it [04:53, 163.85it/s][A[A[A


48057it [04:53, 163.79it/s][A[A[A


48077it [04:53, 163.80it/s][A[A[A


48102it [04:53, 163.83it/s][A[A[A


48132it [04:53, 163.87it/s][A[A[A


48153it [04:53, 163.88it/s][A[A[A


48174it [04:53, 163.89it/s][A[A[A


48194it [04:54, 163.89it/s][A[A[A


48213it [04:54, 163.88it/s][A[A[A


48231it [04:54, 163.88it/

In [50]:
test_hand_features = baseline_features.hand_features(test_df['Headline'],test_df['Body_Text'])




0it [00:00, ?it/s][A[A[A


19it [00:00, 186.21it/s][A[A[A


50it [00:00, 243.09it/s][A[A[A


79it [00:00, 256.20it/s][A[A[A


98it [00:00, 237.72it/s][A[A[A


120it [00:00, 232.68it/s][A[A[A


139it [00:00, 221.70it/s][A[A[A


158it [00:00, 193.87it/s][A[A[A


174it [00:00, 174.23it/s][A[A[A


188it [00:01, 165.69it/s][A[A[A


201it [00:01, 157.39it/s][A[A[A


213it [00:01, 149.93it/s][A[A[A


246it [00:01, 161.61it/s][A[A[A


284it [00:01, 174.63it/s][A[A[A


318it [00:01, 184.04it/s][A[A[A


369it [00:01, 201.45it/s][A[A[A


403it [00:02, 192.27it/s][A[A[A


430it [00:02, 187.26it/s][A[A[A


453it [00:02, 166.67it/s][A[A[A


471it [00:02, 165.73it/s][A[A[A


535it [00:02, 181.49it/s][A[A[A


565it [00:03, 180.94it/s][A[A[A


592it [00:03, 180.90it/s][A[A[A


646it [00:03, 191.50it/s][A[A[A


721it [00:03, 207.46it/s][A[A[A


766it [00:03, 202.71it/s][A[A[A


802it [00:04, 200.36it/s][A[A[A


832it [00

5917it [00:31, 187.44it/s][A[A[A


5927it [00:31, 186.66it/s][A[A[A


5936it [00:31, 185.80it/s][A[A[A


5944it [00:32, 185.01it/s][A[A[A


5983it [00:32, 185.63it/s][A[A[A


6023it [00:32, 186.29it/s][A[A[A


6052it [00:32, 186.61it/s][A[A[A


6080it [00:32, 186.88it/s][A[A[A


6106it [00:32, 186.83it/s][A[A[A


6130it [00:32, 186.54it/s][A[A[A


6151it [00:33, 184.69it/s][A[A[A


6167it [00:33, 183.30it/s][A[A[A


6180it [00:33, 182.43it/s][A[A[A


6191it [00:34, 181.72it/s][A[A[A


6214it [00:34, 181.84it/s][A[A[A


6234it [00:34, 181.88it/s][A[A[A


6253it [00:34, 181.88it/s][A[A[A


6274it [00:34, 181.93it/s][A[A[A


6292it [00:34, 181.76it/s][A[A[A


6309it [00:34, 181.54it/s][A[A[A


6325it [00:34, 181.27it/s][A[A[A


6366it [00:34, 181.91it/s][A[A[A


6390it [00:35, 182.06it/s][A[A[A


6414it [00:35, 182.21it/s][A[A[A


6437it [00:35, 182.24it/s][A[A[A


6459it [00:35, 182.18it/s][A[A[A


6480it [00:3

10976it [01:04, 169.20it/s][A[A[A


10993it [01:04, 169.18it/s][A[A[A


11010it [01:05, 169.17it/s][A[A[A


11034it [01:05, 169.28it/s][A[A[A


11055it [01:05, 169.34it/s][A[A[A


11079it [01:05, 169.44it/s][A[A[A


11116it [01:05, 169.73it/s][A[A[A


11141it [01:05, 169.77it/s][A[A[A


11164it [01:05, 169.80it/s][A[A[A


11186it [01:05, 169.78it/s][A[A[A


11206it [01:06, 169.78it/s][A[A[A


11227it [01:06, 169.84it/s][A[A[A


11248it [01:06, 169.89it/s][A[A[A


11271it [01:06, 169.97it/s][A[A[A


11292it [01:06, 170.00it/s][A[A[A


11345it [01:06, 170.54it/s][A[A[A


11375it [01:06, 170.53it/s][A[A[A


11402it [01:06, 170.68it/s][A[A[A


11479it [01:06, 171.57it/s][A[A[A


11521it [01:07, 171.70it/s][A[A[A


11557it [01:07, 171.78it/s][A[A[A


11589it [01:07, 171.99it/s][A[A[A


11620it [01:07, 171.69it/s][A[A[A


11645it [01:07, 171.31it/s][A[A[A


11665it [01:08, 170.95it/s][A[A[A


11682it [01:08, 170.70it/

15817it [01:32, 170.29it/s][A[A[A


15856it [01:32, 170.52it/s][A[A[A


15884it [01:33, 169.78it/s][A[A[A


15905it [01:33, 169.67it/s][A[A[A


15924it [01:33, 169.53it/s][A[A[A


15940it [01:34, 169.50it/s][A[A[A


15959it [01:34, 169.51it/s][A[A[A


15976it [01:34, 169.50it/s][A[A[A


15993it [01:34, 169.44it/s][A[A[A


16009it [01:34, 169.37it/s][A[A[A


16038it [01:34, 169.49it/s][A[A[A


16063it [01:34, 169.57it/s][A[A[A


16084it [01:34, 169.56it/s][A[A[A


16113it [01:34, 169.68it/s][A[A[A


16143it [01:35, 169.81it/s][A[A[A


16168it [01:35, 169.90it/s][A[A[A


16198it [01:35, 170.03it/s][A[A[A


16226it [01:35, 170.14it/s][A[A[A


16253it [01:35, 170.23it/s][A[A[A


16279it [01:35, 170.26it/s][A[A[A


16303it [01:35, 170.31it/s][A[A[A


16327it [01:35, 170.38it/s][A[A[A


16351it [01:35, 170.34it/s][A[A[A


16372it [01:36, 170.31it/s][A[A[A


16391it [01:36, 170.30it/s][A[A[A


16410it [01:36, 170.30it/

20594it [02:01, 169.46it/s][A[A[A


20617it [02:01, 169.37it/s][A[A[A


20637it [02:01, 169.33it/s][A[A[A


20659it [02:01, 169.37it/s][A[A[A


20710it [02:02, 169.65it/s][A[A[A


20739it [02:02, 169.65it/s][A[A[A


20838it [02:02, 170.32it/s][A[A[A


20885it [02:02, 170.33it/s][A[A[A


20924it [02:02, 170.35it/s][A[A[A


20957it [02:03, 170.34it/s][A[A[A


20985it [02:03, 170.21it/s][A[A[A


21008it [02:03, 170.14it/s][A[A[A


21038it [02:03, 170.25it/s][A[A[A


21061it [02:03, 170.27it/s][A[A[A


21086it [02:03, 170.33it/s][A[A[A


21109it [02:03, 170.33it/s][A[A[A


21130it [02:04, 170.26it/s][A[A[A


21149it [02:04, 170.04it/s][A[A[A


21164it [02:04, 169.86it/s][A[A[A


21177it [02:04, 169.71it/s][A[A[A


21193it [02:04, 169.70it/s][A[A[A


21230it [02:04, 169.86it/s][A[A[A


21253it [02:05, 169.89it/s][A[A[A


21274it [02:05, 169.87it/s][A[A[A


21294it [02:05, 169.87it/s][A[A[A


21313it [02:05, 169.45it/

25349it [02:32, 166.73it/s][A[A[A


25358it [02:32, 166.68it/s][A[A[A


25366it [02:32, 166.59it/s][A[A[A


25373it [02:32, 166.52it/s][A[A[A


25380it [02:32, 166.42it/s][A[A[A


25391it [02:32, 166.36it/s][A[A[A


25399it [02:32, 166.28it/s][A[A[A


25410it [02:32, 166.24it/s][A[A[A


25413it [02:32, 166.22it/s][A[A[A

In [51]:
train_hand_features = np.array(train_hand_features)
test_hand_features = np.array(test_hand_features)

## Word Overlap Features (Baseline Feature)

In [24]:
train_overlap_features = baseline_features.word_overlap_features(train_df['Headline'],train_df['Body_Text'])


0it [00:00, ?it/s][A
1it [00:02,  2.47s/it][A
35it [00:02, 13.62it/s][A
54it [00:02, 20.21it/s][A
88it [00:02, 31.74it/s][A
124it [00:02, 43.14it/s][A
185it [00:02, 62.19it/s][A
236it [00:03, 76.75it/s][A
276it [00:03, 84.06it/s][A
310it [00:03, 84.80it/s][A
337it [00:03, 87.36it/s][A
360it [00:03, 90.90it/s][A
384it [00:04, 94.51it/s][A
407it [00:04, 97.72it/s][A
437it [00:04, 102.45it/s][A
465it [00:04, 106.42it/s][A
491it [00:04, 108.80it/s][A
515it [00:04, 110.53it/s][A
539it [00:04, 113.23it/s][A
573it [00:04, 117.86it/s][A
599it [00:05, 118.49it/s][A
621it [00:05, 119.27it/s][A
641it [00:05, 120.10it/s][A
668it [00:05, 122.80it/s][A
689it [00:05, 124.19it/s][A
710it [00:05, 123.95it/s][A
728it [00:05, 123.88it/s][A
745it [00:06, 123.54it/s][A
760it [00:06, 123.23it/s][A
774it [00:06, 122.84it/s][A
787it [00:06, 122.59it/s][A
800it [00:06, 122.36it/s][A
829it [00:06, 124.84it/s][A
859it [00:06, 127.41it/s][A
880it [00:06, 128.08it/s][A
900it [00

6769it [00:38, 177.23it/s][A
6795it [00:38, 177.42it/s][A
6821it [00:38, 177.25it/s][A
6843it [00:38, 177.05it/s][A
6863it [00:38, 176.86it/s][A
6881it [00:38, 176.69it/s][A
6898it [00:39, 176.53it/s][A
6914it [00:39, 176.39it/s][A
6929it [00:39, 176.29it/s][A
6944it [00:39, 176.18it/s][A
6959it [00:39, 175.94it/s][A
6973it [00:39, 175.72it/s][A
6986it [00:39, 175.54it/s][A
6999it [00:39, 175.24it/s][A
7016it [00:40, 175.21it/s][A
7034it [00:40, 175.22it/s][A
7053it [00:40, 175.25it/s][A
7074it [00:40, 175.33it/s][A
7092it [00:40, 175.28it/s][A
7109it [00:40, 175.26it/s][A
7127it [00:40, 175.26it/s][A
7150it [00:40, 175.39it/s][A
7173it [00:40, 175.52it/s][A
7193it [00:40, 175.56it/s][A
7213it [00:41, 175.10it/s][A
7230it [00:41, 174.84it/s][A
7245it [00:41, 174.61it/s][A
7259it [00:41, 174.34it/s][A
7277it [00:41, 174.34it/s][A
7299it [00:41, 174.43it/s][A
7316it [00:41, 174.42it/s][A
7335it [00:42, 174.46it/s][A
7363it [00:42, 174.70it/s][A
7396it [00

13383it [01:14, 180.31it/s][A
13401it [01:14, 180.22it/s][A
13420it [01:14, 180.22it/s][A
13438it [01:14, 180.21it/s][A
13465it [01:14, 180.33it/s][A
13485it [01:14, 180.26it/s][A
13504it [01:14, 180.19it/s][A
13522it [01:15, 180.03it/s][A
13538it [01:15, 179.99it/s][A
13554it [01:15, 179.96it/s][A
13570it [01:15, 179.93it/s][A
13593it [01:15, 179.99it/s][A
13624it [01:15, 180.16it/s][A
13651it [01:15, 180.27it/s][A
13674it [01:15, 180.33it/s][A
13699it [01:15, 180.42it/s][A
13725it [01:16, 180.52it/s][A
13750it [01:16, 180.56it/s][A
13773it [01:16, 180.58it/s][A
13795it [01:16, 180.58it/s][A
13816it [01:16, 180.41it/s][A
13834it [01:16, 180.33it/s][A
13851it [01:16, 180.25it/s][A
13867it [01:16, 180.21it/s][A
13887it [01:17, 180.23it/s][A
13918it [01:17, 180.39it/s][A
13939it [01:17, 180.38it/s][A
13959it [01:17, 180.29it/s][A
13979it [01:17, 180.31it/s][A
14000it [01:17, 180.34it/s][A
14019it [01:17, 180.35it/s][A
14040it [01:17, 180.38it/s][A
14067it 

21215it [01:50, 191.61it/s][A
21239it [01:50, 191.65it/s][A
21263it [01:50, 191.67it/s][A
21286it [01:51, 191.68it/s][A
21320it [01:51, 191.81it/s][A
21346it [01:51, 191.63it/s][A
21368it [01:51, 191.35it/s][A
21386it [01:51, 191.19it/s][A
21419it [01:51, 191.31it/s][A
21440it [01:52, 191.22it/s][A
21458it [01:52, 191.13it/s][A
21475it [01:52, 191.04it/s][A
21490it [01:52, 190.97it/s][A
21505it [01:52, 190.92it/s][A
21521it [01:52, 190.89it/s][A
21536it [01:52, 190.85it/s][A
21567it [01:52, 190.95it/s][A
21587it [01:53, 190.93it/s][A
21606it [01:53, 190.93it/s][A
21625it [01:53, 190.90it/s][A
21644it [01:53, 190.83it/s][A
21662it [01:53, 190.82it/s][A
21691it [01:53, 190.90it/s][A
21713it [01:53, 190.92it/s][A
21734it [01:53, 190.93it/s][A
21755it [01:53, 190.85it/s][A
21775it [01:54, 190.86it/s][A
21816it [01:54, 191.05it/s][A
21856it [01:54, 191.23it/s][A
21886it [01:54, 191.25it/s][A
21914it [01:54, 191.22it/s][A
21954it [01:54, 191.40it/s][A
21983it 

27505it [02:25, 189.47it/s][A
27517it [02:25, 189.40it/s][A
27529it [02:25, 189.31it/s][A
27540it [02:25, 189.19it/s][A
27550it [02:25, 189.13it/s][A
27561it [02:25, 189.07it/s][A
27571it [02:25, 189.00it/s][A
27584it [02:25, 188.96it/s][A
27595it [02:26, 188.86it/s][A
27607it [02:26, 188.81it/s][A
27619it [02:26, 188.76it/s][A
27632it [02:26, 188.71it/s][A
27644it [02:26, 188.66it/s][A
27656it [02:26, 188.59it/s][A
27682it [02:26, 188.64it/s][A
27714it [02:26, 188.72it/s][A
27752it [02:26, 188.85it/s][A
27778it [02:27, 188.81it/s][A
27801it [02:27, 188.76it/s][A
27822it [02:27, 188.75it/s][A
27851it [02:27, 188.82it/s][A
27874it [02:27, 188.75it/s][A
27894it [02:27, 188.49it/s][A
27910it [02:28, 188.32it/s][A
27924it [02:28, 188.12it/s][A
27935it [02:28, 188.00it/s][A
27945it [02:28, 187.87it/s][A
27955it [02:28, 187.81it/s][A
27975it [02:28, 187.82it/s][A
28005it [02:29, 187.89it/s][A
28036it [02:29, 187.97it/s][A
28058it [02:29, 187.98it/s][A
28079it 

33956it [02:59, 188.97it/s][A
33978it [02:59, 188.98it/s][A
33999it [02:59, 188.96it/s][A
34050it [03:00, 189.13it/s][A
34079it [03:00, 189.13it/s][A
34105it [03:00, 189.15it/s][A
34130it [03:00, 189.16it/s][A
34154it [03:00, 189.15it/s][A
34176it [03:00, 189.15it/s][A
34197it [03:00, 189.12it/s][A
34216it [03:00, 189.09it/s][A
34314it [03:01, 189.52it/s][A
34355it [03:01, 189.62it/s][A
34394it [03:01, 189.72it/s][A
34433it [03:01, 189.76it/s][A
34467it [03:01, 189.79it/s][A
34498it [03:01, 189.80it/s][A
34526it [03:01, 189.83it/s][A
34553it [03:01, 189.87it/s][A
34579it [03:02, 189.89it/s][A
34606it [03:02, 189.93it/s][A
34634it [03:02, 189.98it/s][A
34661it [03:02, 189.98it/s][A
34685it [03:02, 189.99it/s][A
34708it [03:02, 190.00it/s][A
34731it [03:02, 190.02it/s][A
34760it [03:02, 190.07it/s][A
34788it [03:02, 190.12it/s][A
34814it [03:03, 190.07it/s][A
34836it [03:03, 190.02it/s][A
34856it [03:03, 189.89it/s][A
34873it [03:03, 189.77it/s][A
34887it 

40615it [03:34, 189.21it/s][A
40637it [03:34, 189.20it/s][A
40657it [03:35, 188.91it/s][A
40673it [03:35, 188.88it/s][A
40708it [03:35, 188.95it/s][A
40732it [03:35, 188.97it/s][A
40754it [03:35, 188.92it/s][A
40773it [03:35, 188.89it/s][A
40791it [03:35, 188.87it/s][A
40808it [03:36, 188.85it/s][A
40825it [03:36, 188.84it/s][A
40842it [03:36, 188.83it/s][A
40879it [03:36, 188.91it/s][A
40912it [03:36, 188.97it/s][A
40938it [03:36, 188.95it/s][A
40961it [03:36, 188.91it/s][A
40984it [03:36, 188.93it/s][A
41005it [03:37, 188.91it/s][A
41025it [03:37, 188.89it/s][A
41043it [03:37, 188.86it/s][A
41060it [03:37, 188.84it/s][A
41088it [03:37, 188.88it/s][A
41109it [03:37, 188.88it/s][A
41129it [03:37, 188.88it/s][A
41149it [03:37, 188.88it/s][A
41169it [03:37, 188.85it/s][A
41187it [03:38, 188.85it/s][A
41205it [03:38, 188.84it/s][A
41223it [03:38, 188.83it/s][A
41241it [03:38, 188.81it/s][A
41281it [03:38, 188.91it/s][A
41309it [03:38, 188.95it/s][A
41334it 

47459it [04:11, 188.85it/s][A
47484it [04:11, 188.81it/s][A
47506it [04:11, 188.81it/s][A
47528it [04:11, 188.82it/s][A
47550it [04:11, 188.78it/s][A
47578it [04:11, 188.81it/s][A
47602it [04:12, 188.83it/s][A
47624it [04:12, 188.84it/s][A
47649it [04:12, 188.87it/s][A
47672it [04:12, 188.88it/s][A
47695it [04:12, 188.90it/s][A
47718it [04:12, 188.91it/s][A
47741it [04:12, 188.92it/s][A
47763it [04:12, 188.89it/s][A
47789it [04:12, 188.92it/s][A
47811it [04:13, 188.93it/s][A
47833it [04:13, 188.93it/s][A
47874it [04:13, 189.02it/s][A
47901it [04:13, 189.04it/s][A
47927it [04:13, 189.04it/s][A
47951it [04:13, 188.94it/s][A
47971it [04:13, 188.94it/s][A
47993it [04:14, 188.95it/s][A
48013it [04:14, 188.92it/s][A
48032it [04:14, 188.83it/s][A
48048it [04:14, 188.73it/s][A
48061it [04:14, 188.70it/s][A
48075it [04:14, 188.68it/s][A
48097it [04:14, 188.70it/s][A
48132it [04:14, 188.76it/s][A
48154it [04:15, 188.71it/s][A
48173it [04:15, 188.67it/s][A
48190it 

In [25]:
test_overlap_features = baseline_features.word_overlap_features(test_df['Headline'],test_df['Body_Text'])


0it [00:00, ?it/s][A
20it [00:00, 196.49it/s][A
52it [00:00, 255.50it/s][A
79it [00:00, 259.13it/s][A
99it [00:00, 241.89it/s][A
119it [00:00, 232.46it/s][A
140it [00:00, 228.23it/s][A
160it [00:00, 206.96it/s][A
178it [00:00, 195.23it/s][A
194it [00:01, 188.67it/s][A
210it [00:01, 182.61it/s][A
233it [00:01, 186.44it/s][A
277it [00:01, 205.19it/s][A
315it [00:01, 217.14it/s][A
368it [00:01, 236.78it/s][A
404it [00:01, 228.27it/s][A
434it [00:01, 217.75it/s][A
460it [00:02, 198.51it/s][A
535it [00:02, 221.29it/s][A
572it [00:02, 220.79it/s][A
604it [00:02, 219.03it/s][A
675it [00:02, 236.12it/s][A
721it [00:02, 243.53it/s][A
764it [00:03, 238.28it/s][A
799it [00:03, 233.73it/s][A
829it [00:03, 231.37it/s][A
869it [00:03, 235.89it/s][A
913it [00:03, 241.25it/s][A
948it [00:03, 242.64it/s][A
981it [00:04, 243.48it/s][A
1012it [00:04, 244.30it/s][A
1042it [00:04, 243.77it/s][A
1070it [00:04, 243.94it/s][A
1097it [00:04, 244.19it/s][A
1124it [00:04, 242.2

7275it [00:39, 186.27it/s][A
7313it [00:39, 186.76it/s][A
7338it [00:39, 186.92it/s][A
7363it [00:39, 187.00it/s][A
7427it [00:39, 188.14it/s][A
7463it [00:39, 188.42it/s][A
7497it [00:39, 188.64it/s][A
7530it [00:39, 188.99it/s][A
7565it [00:39, 189.39it/s][A
7598it [00:40, 189.32it/s][A
7626it [00:40, 189.14it/s][A
7651it [00:40, 188.92it/s][A
7680it [00:40, 189.16it/s][A
7707it [00:40, 189.35it/s][A
7732it [00:40, 189.04it/s][A
7753it [00:41, 189.01it/s][A
7784it [00:41, 189.28it/s][A
7807it [00:41, 189.12it/s][A
7828it [00:41, 188.92it/s][A
7847it [00:41, 188.80it/s][A
7875it [00:41, 189.01it/s][A
7903it [00:41, 189.22it/s][A
7928it [00:41, 189.37it/s][A
7953it [00:41, 189.50it/s][A
8097it [00:42, 192.48it/s][A
8157it [00:42, 192.99it/s][A
8208it [00:42, 193.45it/s][A
8253it [00:42, 193.81it/s][A
8294it [00:42, 194.08it/s][A
8331it [00:43, 193.72it/s][A
8361it [00:43, 193.23it/s][A
8386it [00:43, 193.14it/s][A
8408it [00:43, 193.06it/s][A
8449it [00

14578it [01:16, 190.80it/s][A
14601it [01:16, 190.75it/s][A
14622it [01:16, 190.76it/s][A
14644it [01:16, 190.80it/s][A
14665it [01:16, 190.81it/s][A
14686it [01:16, 190.83it/s][A
14707it [01:17, 190.84it/s][A
14728it [01:17, 190.78it/s][A
14747it [01:17, 190.76it/s][A
14766it [01:17, 190.75it/s][A
14785it [01:17, 190.72it/s][A
14804it [01:17, 190.71it/s][A
14828it [01:17, 190.77it/s][A
14848it [01:17, 190.71it/s][A
14867it [01:17, 190.62it/s][A
14886it [01:18, 190.62it/s][A
14910it [01:18, 190.68it/s][A
14940it [01:18, 190.80it/s][A
14963it [01:18, 190.77it/s][A
14984it [01:18, 190.74it/s][A
15028it [01:18, 191.05it/s][A
15066it [01:18, 191.29it/s][A
15098it [01:18, 191.45it/s][A
15129it [01:19, 191.38it/s][A
15156it [01:19, 191.35it/s][A
15180it [01:19, 191.31it/s][A
15202it [01:19, 191.32it/s][A
15224it [01:19, 191.30it/s][A
15245it [01:19, 191.24it/s][A
15264it [01:19, 191.22it/s][A
15283it [01:19, 191.20it/s][A
15330it [01:20, 191.54it/s][A
15357it 

21924it [01:51, 197.03it/s][A
21968it [01:51, 197.24it/s][A
21997it [01:51, 197.16it/s][A
22022it [01:51, 197.11it/s][A
22044it [01:51, 197.08it/s][A
22191it [01:51, 198.22it/s][A
22249it [01:52, 198.39it/s][A
22299it [01:52, 198.59it/s][A
22345it [01:52, 198.82it/s][A
22391it [01:52, 198.16it/s][A
22425it [01:53, 197.83it/s][A
22455it [01:53, 197.92it/s][A
22483it [01:53, 197.91it/s][A
22508it [01:53, 197.95it/s][A
22533it [01:53, 197.97it/s][A
22557it [01:53, 197.98it/s][A
22586it [01:54, 198.06it/s][A
22611it [01:54, 198.09it/s][A
22639it [01:54, 198.16it/s][A
22665it [01:54, 198.14it/s][A
22699it [01:54, 198.26it/s][A
22725it [01:54, 198.22it/s][A
22749it [01:54, 198.12it/s][A
22770it [01:54, 198.03it/s][A
22789it [01:55, 197.98it/s][A
22818it [01:55, 198.06it/s][A
22839it [01:55, 198.05it/s][A
22859it [01:55, 198.03it/s][A
22884it [01:55, 198.08it/s][A
22906it [01:55, 198.04it/s][A
22926it [01:55, 198.00it/s][A
22945it [01:55, 197.98it/s][A
22964it 

## Polarity Features (Baseline Feature)

In [37]:
train_polarity_features = baseline_features.polarity_features(train_df['Headline'],train_df['Body_Text'])




0it [00:00, ?it/s][A[A[A


30it [00:00, 295.38it/s][A[A[A


47it [00:00, 223.93it/s][A[A[A


61it [00:00, 196.80it/s][A[A[A


95it [00:00, 231.02it/s][A[A[A


124it [00:00, 241.23it/s][A[A[A


172it [00:00, 279.59it/s][A[A[A


215it [00:00, 300.28it/s][A[A[A


254it [00:00, 310.44it/s][A[A[A


289it [00:01, 263.87it/s][A[A[A


318it [00:01, 212.91it/s][A[A[A


340it [00:01, 206.72it/s][A[A[A


361it [00:01, 206.47it/s][A[A[A


381it [00:01, 205.72it/s][A[A[A


401it [00:01, 203.49it/s][A[A[A


425it [00:02, 205.02it/s][A[A[A


450it [00:02, 206.89it/s][A[A[A


472it [00:02, 206.10it/s][A[A[A


493it [00:02, 202.17it/s][A[A[A


512it [00:02, 195.38it/s][A[A[A


529it [00:02, 189.31it/s][A[A[A


563it [00:02, 194.45it/s][A[A[A


584it [00:03, 193.46it/s][A[A[A


604it [00:03, 190.64it/s][A[A[A


622it [00:03, 187.83it/s][A[A[A


638it [00:03, 185.34it/s][A[A[A


656it [00:03, 185.04it/s][A[A[A


672it [00

4854it [00:27, 175.79it/s][A[A[A


4884it [00:27, 175.66it/s][A[A[A


4910it [00:27, 175.72it/s][A[A[A


4934it [00:28, 174.24it/s][A[A[A


4953it [00:28, 172.92it/s][A[A[A


4968it [00:28, 172.06it/s][A[A[A


4981it [00:29, 171.38it/s][A[A[A


4992it [00:29, 170.54it/s][A[A[A


5001it [00:29, 169.94it/s][A[A[A


5012it [00:29, 169.74it/s][A[A[A


5034it [00:29, 169.90it/s][A[A[A


5060it [00:29, 170.20it/s][A[A[A


5081it [00:29, 170.30it/s][A[A[A


5099it [00:29, 170.23it/s][A[A[A


5116it [00:30, 170.18it/s][A[A[A


5133it [00:30, 170.14it/s][A[A[A


5150it [00:30, 170.10it/s][A[A[A


5167it [00:30, 169.94it/s][A[A[A


5202it [00:30, 170.52it/s][A[A[A


5253it [00:30, 171.63it/s][A[A[A


5295it [00:30, 172.39it/s][A[A[A


5328it [00:30, 171.91it/s][A[A[A


5355it [00:31, 171.43it/s][A[A[A


5377it [00:31, 170.85it/s][A[A[A


5396it [00:31, 170.18it/s][A[A[A


5417it [00:31, 170.30it/s][A[A[A


5435it [00:3

10514it [00:57, 183.12it/s][A[A[A


10539it [00:57, 183.18it/s][A[A[A


10563it [00:57, 183.24it/s][A[A[A


10589it [00:57, 183.36it/s][A[A[A


10616it [00:57, 183.51it/s][A[A[A


10649it [00:57, 183.76it/s][A[A[A


10678it [00:58, 183.91it/s][A[A[A


10705it [00:58, 183.53it/s][A[A[A


10727it [00:58, 183.23it/s][A[A[A


10746it [00:58, 183.16it/s][A[A[A


10764it [00:58, 183.07it/s][A[A[A


10781it [00:58, 183.01it/s][A[A[A


10814it [00:59, 183.25it/s][A[A[A


10854it [00:59, 183.61it/s][A[A[A


10881it [00:59, 183.70it/s][A[A[A


10907it [00:59, 183.78it/s][A[A[A


10932it [00:59, 183.85it/s][A[A[A


10956it [00:59, 183.93it/s][A[A[A


10991it [00:59, 184.21it/s][A[A[A


11018it [00:59, 184.22it/s][A[A[A


11043it [01:00, 184.03it/s][A[A[A


11064it [01:00, 183.86it/s][A[A[A


11083it [01:00, 183.84it/s][A[A[A


11104it [01:00, 183.88it/s][A[A[A


11124it [01:00, 183.90it/s][A[A[A


11167it [01:00, 184.30it/

15458it [01:24, 183.12it/s][A[A[A


15480it [01:24, 183.14it/s][A[A[A


15502it [01:24, 183.09it/s][A[A[A


15522it [01:24, 183.05it/s][A[A[A


15541it [01:24, 183.02it/s][A[A[A


15579it [01:25, 183.25it/s][A[A[A


15603it [01:25, 183.31it/s][A[A[A


15627it [01:25, 183.27it/s][A[A[A


15649it [01:25, 183.18it/s][A[A[A


15675it [01:25, 183.26it/s][A[A[A


15696it [01:25, 183.26it/s][A[A[A


15717it [01:25, 183.19it/s][A[A[A


15736it [01:25, 183.14it/s][A[A[A


15827it [01:26, 183.97it/s][A[A[A


15866it [01:26, 183.87it/s][A[A[A


15898it [01:26, 183.78it/s][A[A[A


15925it [01:26, 183.84it/s][A[A[A


15969it [01:26, 184.13it/s][A[A[A


16014it [01:26, 184.43it/s][A[A[A


16049it [01:27, 184.09it/s][A[A[A


16077it [01:27, 183.70it/s][A[A[A


16099it [01:27, 183.43it/s][A[A[A


16117it [01:27, 183.33it/s][A[A[A


16135it [01:28, 183.33it/s][A[A[A


16155it [01:28, 183.34it/s][A[A[A


16176it [01:28, 183.36it/

21801it [01:54, 190.35it/s][A[A[A


21838it [01:54, 190.50it/s][A[A[A


21865it [01:54, 190.55it/s][A[A[A


21892it [01:54, 190.54it/s][A[A[A


21916it [01:55, 190.52it/s][A[A[A


21956it [01:55, 190.70it/s][A[A[A


21984it [01:55, 190.58it/s][A[A[A


22010it [01:55, 190.64it/s][A[A[A


22034it [01:55, 190.68it/s][A[A[A


22058it [01:55, 190.72it/s][A[A[A


22082it [01:55, 190.75it/s][A[A[A


22108it [01:55, 190.81it/s][A[A[A


22136it [01:55, 190.89it/s][A[A[A


22164it [01:56, 190.96it/s][A[A[A


22190it [01:56, 190.96it/s][A[A[A


22219it [01:56, 191.05it/s][A[A[A


22250it [01:56, 191.15it/s][A[A[A


22277it [01:56, 191.21it/s][A[A[A


22305it [01:56, 191.28it/s][A[A[A


22335it [01:56, 191.37it/s][A[A[A


22363it [01:56, 191.37it/s][A[A[A


22389it [01:57, 191.29it/s][A[A[A


22411it [01:57, 191.27it/s][A[A[A


22432it [01:57, 191.24it/s][A[A[A


22452it [01:57, 191.20it/s][A[A[A


22471it [01:57, 191.14it/

26958it [02:21, 190.20it/s][A[A[A


26997it [02:21, 190.27it/s][A[A[A


27032it [02:22, 190.25it/s][A[A[A


27062it [02:22, 189.77it/s][A[A[A


27089it [02:22, 189.82it/s][A[A[A


27113it [02:22, 189.75it/s][A[A[A


27134it [02:23, 189.69it/s][A[A[A


27153it [02:23, 189.63it/s][A[A[A


27170it [02:23, 189.55it/s][A[A[A


27186it [02:23, 189.50it/s][A[A[A


27201it [02:23, 189.45it/s][A[A[A


27237it [02:23, 189.56it/s][A[A[A


27258it [02:23, 189.56it/s][A[A[A


27282it [02:23, 189.59it/s][A[A[A


27304it [02:24, 189.61it/s][A[A[A


27331it [02:24, 189.66it/s][A[A[A


27362it [02:24, 189.74it/s][A[A[A


27388it [02:24, 189.79it/s][A[A[A


27414it [02:24, 189.77it/s][A[A[A


27437it [02:24, 189.68it/s][A[A[A


27457it [02:24, 189.61it/s][A[A[A


27475it [02:24, 189.54it/s][A[A[A


27491it [02:25, 189.48it/s][A[A[A


27506it [02:25, 189.42it/s][A[A[A


27520it [02:25, 189.36it/s][A[A[A


27534it [02:25, 189.30it/

32499it [02:48, 192.57it/s][A[A[A


32520it [02:48, 192.57it/s][A[A[A


32541it [02:48, 192.55it/s][A[A[A


32561it [02:49, 192.54it/s][A[A[A


32580it [02:49, 192.43it/s][A[A[A


32597it [02:49, 192.32it/s][A[A[A


32614it [02:49, 192.31it/s][A[A[A


32633it [02:49, 192.30it/s][A[A[A


32650it [02:49, 192.28it/s][A[A[A


32667it [02:49, 192.26it/s][A[A[A


32683it [02:50, 192.24it/s][A[A[A


32701it [02:50, 192.22it/s][A[A[A


32733it [02:50, 192.30it/s][A[A[A


32770it [02:50, 192.40it/s][A[A[A


32797it [02:50, 192.44it/s][A[A[A


32841it [02:50, 192.59it/s][A[A[A


32885it [02:50, 192.73it/s][A[A[A


32920it [02:50, 192.79it/s][A[A[A


32953it [02:51, 192.67it/s][A[A[A


32983it [02:51, 192.73it/s][A[A[A


33021it [02:51, 192.84it/s][A[A[A


33052it [02:51, 192.81it/s][A[A[A


33079it [02:51, 192.83it/s][A[A[A


33105it [02:51, 192.75it/s][A[A[A


33127it [02:51, 192.72it/s][A[A[A


33155it [02:51, 192.77it/

38083it [03:16, 193.98it/s][A[A[A


38114it [03:16, 194.03it/s][A[A[A


38144it [03:16, 194.07it/s][A[A[A


38174it [03:16, 194.06it/s][A[A[A


38200it [03:16, 194.04it/s][A[A[A


38224it [03:16, 194.06it/s][A[A[A


38248it [03:17, 194.06it/s][A[A[A


38271it [03:17, 194.06it/s][A[A[A


38293it [03:17, 194.04it/s][A[A[A


38317it [03:17, 194.06it/s][A[A[A


38343it [03:17, 194.09it/s][A[A[A


38373it [03:17, 194.15it/s][A[A[A


38403it [03:17, 194.20it/s][A[A[A


38432it [03:17, 194.25it/s][A[A[A


38459it [03:17, 194.28it/s][A[A[A


38488it [03:18, 194.33it/s][A[A[A


38516it [03:18, 194.36it/s][A[A[A


38544it [03:18, 194.40it/s][A[A[A


38572it [03:18, 194.44it/s][A[A[A


38600it [03:18, 194.49it/s][A[A[A


38628it [03:18, 194.52it/s][A[A[A


38656it [03:18, 194.52it/s][A[A[A


38681it [03:18, 194.39it/s][A[A[A


38701it [03:19, 194.36it/s][A[A[A


38720it [03:19, 194.35it/s][A[A[A


38739it [03:19, 194.33it/

43868it [03:45, 194.51it/s][A[A[A


43889it [03:45, 194.52it/s][A[A[A


43916it [03:45, 194.55it/s][A[A[A


43940it [03:45, 194.57it/s][A[A[A


43962it [03:45, 194.56it/s][A[A[A


43987it [03:46, 194.59it/s][A[A[A


44017it [03:46, 194.63it/s][A[A[A


44042it [03:46, 194.62it/s][A[A[A


44071it [03:46, 194.66it/s][A[A[A


44100it [03:46, 194.70it/s][A[A[A


44145it [03:46, 194.81it/s][A[A[A


44177it [03:46, 194.78it/s][A[A[A


44204it [03:46, 194.81it/s][A[A[A


44256it [03:47, 194.95it/s][A[A[A


44290it [03:47, 194.92it/s][A[A[A


44319it [03:47, 194.84it/s][A[A[A


44353it [03:47, 194.90it/s][A[A[A


44380it [03:47, 194.92it/s][A[A[A


44406it [03:47, 194.92it/s][A[A[A


44433it [03:47, 194.95it/s][A[A[A


44466it [03:48, 195.01it/s][A[A[A


44494it [03:48, 195.03it/s][A[A[A


44520it [03:48, 194.92it/s][A[A[A


44558it [03:48, 195.00it/s][A[A[A


44593it [03:48, 195.07it/s][A[A[A


44630it [03:48, 195.14it/

49480it [04:13, 194.82it/s][A[A[A


49499it [04:14, 194.80it/s][A[A[A


49524it [04:14, 194.82it/s][A[A[A


49544it [04:14, 194.82it/s][A[A[A


49564it [04:14, 194.81it/s][A[A[A


49590it [04:14, 194.83it/s][A[A[A


49611it [04:14, 194.82it/s][A[A[A


49646it [04:14, 194.88it/s][A[A[A


49673it [04:14, 194.91it/s][A[A[A


49698it [04:14, 194.91it/s][A[A[A


49722it [04:15, 194.87it/s][A[A[A


49744it [04:15, 194.88it/s][A[A[A


49765it [04:15, 194.88it/s][A[A[A


49786it [04:15, 194.83it/s][A[A[A


49804it [04:15, 194.71it/s][A[A[A


49824it [04:15, 194.71it/s][A[A[A


49841it [04:16, 194.67it/s][A[A[A


49857it [04:16, 194.66it/s][A[A[A


49873it [04:16, 194.63it/s][A[A[A


49890it [04:16, 194.62it/s][A[A[A


49906it [04:16, 194.60it/s][A[A[A


49928it [04:16, 194.61it/s][A[A[A


49946it [04:16, 194.60it/s][A[A[A


49964it [04:16, 194.57it/s][A[A[A


49972it [04:16, 194.57it/s][A[A[A

In [38]:
test_polarity_features = baseline_features.polarity_features(test_df['Headline'],test_df['Body_Text'])




0it [00:00, ?it/s][A[A[A


15it [00:00, 144.04it/s][A[A[A


35it [00:00, 172.61it/s][A[A[A


61it [00:00, 200.87it/s][A[A[A


87it [00:00, 215.51it/s][A[A[A


108it [00:00, 213.78it/s][A[A[A


130it [00:00, 214.21it/s][A[A[A


150it [00:00, 206.31it/s][A[A[A


169it [00:00, 194.81it/s][A[A[A


187it [00:01, 180.22it/s][A[A[A


203it [00:01, 171.30it/s][A[A[A


217it [00:01, 167.35it/s][A[A[A


257it [00:01, 183.88it/s][A[A[A


299it [00:01, 199.49it/s][A[A[A


327it [00:01, 197.88it/s][A[A[A


370it [00:01, 210.40it/s][A[A[A


400it [00:02, 199.71it/s][A[A[A


425it [00:02, 195.63it/s][A[A[A


447it [00:02, 178.28it/s][A[A[A


464it [00:02, 172.33it/s][A[A[A


518it [00:02, 185.41it/s][A[A[A


545it [00:02, 187.38it/s][A[A[A


571it [00:03, 187.50it/s][A[A[A


595it [00:03, 187.80it/s][A[A[A


655it [00:03, 200.39it/s][A[A[A


721it [00:03, 213.82it/s][A[A[A


764it [00:03, 212.05it/s][A[A[A


800it [00

6235it [00:31, 200.84it/s][A[A[A


6260it [00:31, 200.98it/s][A[A[A


6280it [00:31, 200.96it/s][A[A[A


6300it [00:31, 200.66it/s][A[A[A


6318it [00:31, 200.42it/s][A[A[A


6349it [00:31, 200.76it/s][A[A[A


6381it [00:31, 201.12it/s][A[A[A


6405it [00:31, 201.22it/s][A[A[A


6430it [00:31, 201.35it/s][A[A[A


6454it [00:32, 201.35it/s][A[A[A


6477it [00:32, 201.38it/s][A[A[A


6501it [00:32, 201.50it/s][A[A[A


6531it [00:32, 201.79it/s][A[A[A


6556it [00:32, 201.92it/s][A[A[A


6581it [00:32, 201.77it/s][A[A[A


6604it [00:32, 201.61it/s][A[A[A


6625it [00:33, 200.38it/s][A[A[A


6642it [00:33, 198.95it/s][A[A[A


6656it [00:33, 197.82it/s][A[A[A


6667it [00:33, 196.89it/s][A[A[A


6677it [00:34, 196.04it/s][A[A[A


6685it [00:34, 195.30it/s][A[A[A


6692it [00:34, 194.89it/s][A[A[A


6716it [00:34, 195.00it/s][A[A[A


6740it [00:34, 195.11it/s][A[A[A


6780it [00:34, 195.71it/s][A[A[A


6840it [00:3

12464it [01:01, 201.40it/s][A[A[A


12485it [01:01, 201.40it/s][A[A[A


12513it [01:02, 201.52it/s][A[A[A


12550it [01:02, 201.78it/s][A[A[A


12580it [01:02, 201.91it/s][A[A[A


12607it [01:02, 201.43it/s][A[A[A


12629it [01:02, 200.93it/s][A[A[A


12647it [01:03, 200.74it/s][A[A[A


12667it [01:03, 200.73it/s][A[A[A


12685it [01:03, 200.68it/s][A[A[A


12714it [01:03, 200.82it/s][A[A[A


12759it [01:03, 201.22it/s][A[A[A


12787it [01:03, 200.93it/s][A[A[A


12811it [01:03, 200.53it/s][A[A[A


12831it [01:04, 200.23it/s][A[A[A


12848it [01:04, 199.96it/s][A[A[A


12870it [01:04, 199.98it/s][A[A[A


12890it [01:04, 199.97it/s][A[A[A


12910it [01:04, 199.97it/s][A[A[A


12932it [01:04, 200.00it/s][A[A[A


12953it [01:04, 200.00it/s][A[A[A


12973it [01:04, 199.99it/s][A[A[A


12999it [01:04, 200.08it/s][A[A[A


13022it [01:05, 200.12it/s][A[A[A


13045it [01:05, 200.16it/s][A[A[A


13067it [01:05, 199.87it/

17331it [01:29, 194.18it/s][A[A[A


17345it [01:29, 193.89it/s][A[A[A


17357it [01:29, 193.76it/s][A[A[A


17371it [01:29, 193.69it/s][A[A[A


17389it [01:29, 193.67it/s][A[A[A


17413it [01:29, 193.71it/s][A[A[A


17430it [01:30, 193.54it/s][A[A[A


17445it [01:30, 193.46it/s][A[A[A


17459it [01:30, 193.40it/s][A[A[A


17473it [01:30, 193.15it/s][A[A[A


17485it [01:30, 193.00it/s][A[A[A


17497it [01:30, 192.84it/s][A[A[A


17508it [01:30, 192.70it/s][A[A[A


17518it [01:30, 192.56it/s][A[A[A


17528it [01:31, 192.40it/s][A[A[A


17538it [01:31, 192.29it/s][A[A[A


17577it [01:31, 192.49it/s][A[A[A


17595it [01:31, 192.41it/s][A[A[A


17614it [01:31, 192.40it/s][A[A[A


17636it [01:31, 192.43it/s][A[A[A


17656it [01:31, 192.42it/s][A[A[A


17675it [01:31, 192.41it/s][A[A[A


17694it [01:32, 192.25it/s][A[A[A


17711it [01:32, 192.17it/s][A[A[A


17731it [01:32, 192.18it/s][A[A[A


17751it [01:32, 192.18it/

22899it [01:58, 193.69it/s][A[A[A


22920it [01:58, 193.65it/s][A[A[A


22939it [01:58, 193.60it/s][A[A[A


22957it [01:58, 193.56it/s][A[A[A


22994it [01:58, 193.71it/s][A[A[A


23017it [01:58, 193.68it/s][A[A[A


23046it [01:58, 193.76it/s][A[A[A


23070it [01:59, 193.80it/s][A[A[A


23094it [01:59, 193.77it/s][A[A[A


23116it [01:59, 193.72it/s][A[A[A


23136it [01:59, 193.70it/s][A[A[A


23159it [01:59, 193.72it/s][A[A[A


23199it [01:59, 193.88it/s][A[A[A


23225it [01:59, 193.80it/s][A[A[A


23254it [01:59, 193.88it/s][A[A[A


23283it [02:00, 193.95it/s][A[A[A


23308it [02:00, 193.98it/s][A[A[A


23333it [02:00, 194.01it/s][A[A[A


23357it [02:00, 193.92it/s][A[A[A


23378it [02:00, 193.83it/s][A[A[A


23397it [02:00, 193.81it/s][A[A[A


23440it [02:00, 194.01it/s][A[A[A


23466it [02:00, 194.06it/s][A[A[A


23492it [02:01, 194.11it/s][A[A[A


23518it [02:01, 194.15it/s][A[A[A


23563it [02:01, 194.36it/

## Refuting Features (Baseline)

In [39]:
train_refuting_features = baseline_features.refuting_features(train_df['Headline'],train_df['Body_Text'])




0it [00:00, ?it/s][A[A[A


308it [00:00, 3059.74it/s][A[A[A


626it [00:00, 3111.69it/s][A[A[A


920it [00:00, 3056.39it/s][A[A[A


1276it [00:00, 3181.50it/s][A[A[A


1642it [00:00, 3270.85it/s][A[A[A


1924it [00:00, 2889.93it/s][A[A[A


2174it [00:00, 2783.44it/s][A[A[A


2474it [00:00, 2807.53it/s][A[A[A


2799it [00:00, 2852.01it/s][A[A[A


3077it [00:01, 2800.48it/s][A[A[A


3362it [00:01, 2803.01it/s][A[A[A


3634it [00:01, 2745.41it/s][A[A[A


3950it [00:01, 2779.67it/s][A[A[A


4284it [00:01, 2816.35it/s][A[A[A


4588it [00:01, 2828.93it/s][A[A[A


4887it [00:01, 2838.74it/s][A[A[A


5184it [00:01, 2838.91it/s][A[A[A


5478it [00:01, 2843.84it/s][A[A[A


5772it [00:02, 2836.53it/s][A[A[A


6059it [00:02, 2767.60it/s][A[A[A


6356it [00:02, 2776.36it/s][A[A[A


6700it [00:02, 2803.59it/s][A[A[A


6991it [00:02, 2792.76it/s][A[A[A


7272it [00:02, 2754.96it/s][A[A[A


7531it [00:02, 2724.30it/s][A[A

In [40]:
test_refuting_features = baseline_features.refuting_features(test_df['Headline'],test_df['Body_Text'])




0it [00:00, ?it/s][A[A[A


262it [00:00, 2566.09it/s][A[A[A


501it [00:00, 2491.13it/s][A[A[A


780it [00:00, 2581.71it/s][A[A[A


1162it [00:00, 2889.71it/s][A[A[A


1464it [00:00, 2912.01it/s][A[A[A


1789it [00:00, 2970.06it/s][A[A[A


2098it [00:00, 2986.94it/s][A[A[A


2430it [00:00, 3027.59it/s][A[A[A


2763it [00:00, 3060.88it/s][A[A[A


3073it [00:01, 3054.94it/s][A[A[A


3383it [00:01, 3058.73it/s][A[A[A


3691it [00:01, 3056.05it/s][A[A[A


3998it [00:01, 3045.45it/s][A[A[A


4367it [00:01, 3091.26it/s][A[A[A


4703it [00:01, 3109.00it/s][A[A[A


5030it [00:01, 3090.04it/s][A[A[A


5417it [00:01, 3134.57it/s][A[A[A


5753it [00:01, 3113.41it/s][A[A[A


6103it [00:01, 3132.47it/s][A[A[A


6431it [00:02, 3127.43it/s][A[A[A


6752it [00:02, 3130.15it/s][A[A[A


7072it [00:02, 3122.50it/s][A[A[A


7386it [00:02, 3121.01it/s][A[A[A


7723it [00:02, 3130.73it/s][A[A[A


8043it [00:02, 3073.10it/s][A[A

## Concatenate feature vectors

In [None]:
train_features1 = hstack([train_body_tfidf,train_headline_tfidf,train_hand_features,train_cosine_features])
test_features1 = hstack([test_body_tfidf,test_headline_tfidf,test_hand_features,test_cosine_features])

# Classification

## Extract labels

In [54]:
train_labels = list(train_df['Stance'])
test_labels = list(test_df['Stance'])

## Run Classifiers and Score Output

In [None]:
names = ["Random Forest", "Multinomial Naive Bayes", "Gradient Boosting","K Nearest Neighbors","Linear SVM", "Decision Tree", "Logistic Regression"]

classifiers = [
    RandomForestClassifier(n_estimators=10),
    MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True),
    GradientBoostingClassifier(n_estimators=200, random_state=14128, verbose=True),
    KNeighborsClassifier(4),
    SVC(kernel="linear", C=0.025),
    DecisionTreeClassifier(max_depth=5)
    logreg = linear_model.LogisticRegression(C=1e5)
]

for n, clf in zip(name, classifiers):
    print(n)
    y_pred = clf.fit(train_features,train_labels).predict(test_features)
    print(score.report_score(test_labels, y_pred))