## Baseline Features Implementation + TF-IDF

The following sources were used to construct this Jupyter Notebook:

* [Numpy: Dot Multiplication, Vstack, Hstack, Flatten](https://www.youtube.com/watch?v=nkO6bmp511M)
* [Scikit Learn TF-IDF Feature Extraction and Latent Semantic Analysis](https://www.youtube.com/watch?v=BJ0MnawUpaU)
* [Fake News Challenge TF-IDF Baseline](https://github.com/gmyrianthous/fakenewschallenge/blob/master/baseline.py)
* [Python TF-IDF Algorithm Built From Scratch](https://www.youtube.com/watch?v=hXNbFNCgPfY)
* [Theory Behind TF-IDF](https://www.youtube.com/watch?v=4vT4fzjkGCQ)

In [24]:
#Import all required modules
from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import score
from scipy.spatial.distance import cosine
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.naive_bayes import MultinomialNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.feature_extraction.text import TfidfVectorizer
from scipy.sparse import hstack
import baseline_features

In [5]:
#Import data from CSV file and create a dataframe
def create_dataframe(filename):
    #Read file into a pandas dataframe
    df = pd.read_csv(filename)
    #Remove white space in column names
    df.columns = [c.replace(' ', '_') for c in df.columns]
    return df

In [6]:
#Create dataframes for both training and testing sets
train_df_tmp = create_dataframe('train_stances.csv')
test_df_tmp = create_dataframe('competition_test_stances.csv')
train_bodies_df = create_dataframe('train_bodies.csv')
test_bodies_df = create_dataframe('test_bodies.csv')

train_df_tmp.head(5)

Unnamed: 0,Headline,Body_ID,Stance
0,Police find mass graves with at least '15 bodi...,712,unrelated
1,Hundreds of Palestinians flee floods in Gaza a...,158,agree
2,"Christian Bale passes on role of Steve Jobs, a...",137,unrelated
3,HBO and Apple in Talks for $15/Month Apple TV ...,1034,unrelated
4,Spider burrowed through tourist's stomach and ...,1923,disagree


In [7]:
train_df = pd.merge(train_df_tmp,
                 train_bodies_df[['Body_ID', 'articleBody']],
                 on='Body_ID')

test_df = pd.merge(test_df_tmp,
                 test_bodies_df[['Body_ID', 'articleBody']],
                 on='Body_ID')

train_df = train_df.rename(columns={'articleBody': 'Body_Text'})
test_df = test_df.rename(columns={'articleBody': 'Body_Text'})

In [8]:
test_df.sort_values(by=['Body_ID']).head(5)

Unnamed: 0,Headline,Body_ID,Stance,Body_Text
7305,Apple to keep gold Watch Editions in special i...,1,unrelated,Al-Sisi has denied Israeli reports stating tha...
7303,Apple installing safes in-store to protect gol...,1,unrelated,Al-Sisi has denied Israeli reports stating tha...
7304,El-Sisi denies claims he'll give Sinai land to...,1,agree,Al-Sisi has denied Israeli reports stating tha...
7306,Apple Stores to Keep Gold “Edition” Apple Watc...,1,unrelated,Al-Sisi has denied Israeli reports stating tha...
7307,South Korean woman's hair 'eaten' by robot vac...,1,unrelated,Al-Sisi has denied Israeli reports stating tha...


In [9]:
train_df.sort_values(by=['Body_ID']).head(5)

Unnamed: 0,Headline,Body_ID,Stance,Body_Text
41651,"Soldier shot, Parliament locked down after gun...",0,unrelated,A small meteorite crashed into a wooded area i...
41657,Italian catches huge wels catfish; is it a rec...,0,unrelated,A small meteorite crashed into a wooded area i...
41658,Not coming to a store near you: The pumpkin sp...,0,unrelated,A small meteorite crashed into a wooded area i...
41659,One gunman killed in shooting on Parliament Hi...,0,unrelated,A small meteorite crashed into a wooded area i...
41660,Soldier shot at war memorial in Canada,0,unrelated,A small meteorite crashed into a wooded area i...


In [31]:
#Apply Scikit Learn TFIDF Feature Extraction Algorithm
body_text_vectorizer = TfidfVectorizer(ngram_range=(1, 2), lowercase=True, stop_words='english',max_features=1024)
headline_vectorizer = TfidfVectorizer(ngram_range=(1, 2), lowercase=True, stop_words='english',max_features=1024)

# #Create vocabulary based on training data
train_body_tfidf = body_text_vectorizer.fit_transform(train_df['Body_Text'])
train_headline_tfidf = headline_vectorizer.fit_transform(train_df['Headline'])

# #Use vocabulary for testing data
test_body_tfidf = body_text_vectorizer.transform(test_df['Body_Text'])
test_headline_tfidf = headline_vectorizer.transform(test_df['Headline']) 

In [27]:
train_hand_features = baseline_features.hand_features(train_df['Headline'],train_df['Body_Text'])


0it [00:00, ?it/s][A
26it [00:00, 251.13it/s][A
43it [00:00, 205.64it/s][A
54it [00:00, 171.67it/s][A
79it [00:00, 188.52it/s][A
107it [00:00, 205.90it/s][A
142it [00:00, 226.68it/s][A
180it [00:00, 248.00it/s][A
227it [00:00, 273.70it/s][A
261it [00:01, 257.62it/s][A
290it [00:01, 205.47it/s][A
313it [00:01, 178.33it/s][A
331it [00:02, 161.69it/s][A
348it [00:02, 161.97it/s][A
363it [00:02, 159.98it/s][A
379it [00:02, 159.52it/s][A
394it [00:02, 158.88it/s][A
414it [00:02, 160.45it/s][A
431it [00:02, 160.74it/s][A
450it [00:02, 161.45it/s][A
467it [00:02, 157.28it/s][A
482it [00:03, 147.20it/s][A
494it [00:03, 146.02it/s][A
506it [00:03, 144.98it/s][A
518it [00:03, 143.40it/s][A
536it [00:03, 144.27it/s][A
568it [00:03, 148.69it/s][A
587it [00:04, 145.77it/s][A
603it [00:04, 142.98it/s][A
617it [00:04, 140.14it/s][A
629it [00:04, 138.51it/s][A
640it [00:04, 137.53it/s][A
657it [00:04, 138.13it/s][A
673it [00:04, 138.44it/s][A
687it [00:04, 138.06it/s

4543it [00:35, 129.01it/s][A
4561it [00:35, 129.11it/s][A
4578it [00:35, 129.15it/s][A
4595it [00:35, 129.13it/s][A
4617it [00:35, 129.37it/s][A
4634it [00:35, 129.46it/s][A
4656it [00:35, 129.69it/s][A
4676it [00:36, 129.88it/s][A
4697it [00:36, 130.09it/s][A
4716it [00:36, 130.17it/s][A
4738it [00:36, 130.41it/s][A
4771it [00:36, 130.95it/s][A
4799it [00:36, 131.35it/s][A
4828it [00:36, 131.78it/s][A
4854it [00:36, 131.73it/s][A
4876it [00:37, 131.75it/s][A
4896it [00:37, 131.72it/s][A
4914it [00:37, 131.50it/s][A
4929it [00:37, 130.88it/s][A
4941it [00:37, 130.40it/s][A
4951it [00:38, 130.03it/s][A
4960it [00:38, 129.55it/s][A
4968it [00:38, 129.11it/s][A
4975it [00:38, 128.83it/s][A
4982it [00:38, 128.65it/s][A
4989it [00:38, 128.47it/s][A
4996it [00:38, 128.17it/s][A
5002it [00:39, 127.89it/s][A
5020it [00:39, 128.02it/s][A
5045it [00:39, 128.31it/s][A
5066it [00:39, 128.51it/s][A
5084it [00:39, 128.64it/s][A
5101it [00:39, 128.66it/s][A
5117it [00

9639it [01:13, 131.16it/s][A
9650it [01:13, 131.09it/s][A
9661it [01:13, 131.06it/s][A
9674it [01:13, 131.06it/s][A
9709it [01:13, 131.33it/s][A
9727it [01:14, 131.31it/s][A
9743it [01:14, 131.23it/s][A
9767it [01:14, 131.38it/s][A
9794it [01:14, 131.56it/s][A
9823it [01:14, 131.77it/s][A
9846it [01:14, 131.88it/s][A
9868it [01:14, 131.98it/s][A
9890it [01:14, 131.90it/s][A
9917it [01:15, 132.07it/s][A
9938it [01:15, 132.11it/s][A
9957it [01:15, 132.13it/s][A
9975it [01:15, 132.02it/s][A
9990it [01:15, 131.92it/s][A
10004it [01:15, 131.73it/s][A
10016it [01:16, 131.51it/s][A
10026it [01:16, 131.40it/s][A
10036it [01:16, 131.34it/s][A
10047it [01:16, 131.31it/s][A
10059it [01:16, 131.28it/s][A
10069it [01:16, 131.16it/s][A
10079it [01:16, 131.10it/s][A
10092it [01:16, 131.10it/s][A
10110it [01:17, 131.16it/s][A
10125it [01:17, 131.18it/s][A
10145it [01:17, 131.26it/s][A
10163it [01:17, 131.32it/s][A
10184it [01:17, 131.41it/s][A
10204it [01:17, 131.50it/s

14050it [01:47, 130.59it/s][A
14071it [01:47, 130.66it/s][A
14094it [01:47, 130.76it/s][A
14114it [01:47, 130.82it/s][A
14133it [01:47, 130.86it/s][A
14155it [01:48, 130.94it/s][A
14180it [01:48, 131.04it/s][A
14201it [01:48, 131.05it/s][A
14220it [01:48, 131.01it/s][A
14252it [01:48, 131.18it/s][A
14279it [01:48, 131.31it/s][A
14310it [01:48, 131.47it/s][A
14335it [01:49, 131.42it/s][A
14356it [01:49, 131.46it/s][A
14383it [01:49, 131.58it/s][A
14411it [01:49, 131.72it/s][A
14435it [01:49, 131.78it/s][A
14458it [01:49, 131.79it/s][A
14478it [01:49, 131.85it/s][A
14498it [01:49, 131.86it/s][A
14516it [01:50, 131.89it/s][A
14534it [01:50, 131.89it/s][A
14551it [01:50, 131.92it/s][A
14571it [01:50, 131.98it/s][A
14589it [01:50, 132.02it/s][A
14616it [01:50, 132.14it/s][A
14641it [01:50, 132.25it/s][A
14663it [01:50, 132.25it/s][A
14682it [01:51, 132.25it/s][A
14700it [01:51, 132.24it/s][A
14726it [01:51, 132.36it/s][A
14755it [01:51, 132.48it/s][A
14776it 

19355it [02:24, 133.77it/s][A
19363it [02:24, 133.57it/s][A
19370it [02:25, 133.43it/s][A
19376it [02:25, 133.23it/s][A
19381it [02:25, 133.17it/s][A
19389it [02:25, 133.13it/s][A
19431it [02:25, 133.32it/s][A
19460it [02:25, 133.43it/s][A
19480it [02:25, 133.47it/s][A
19500it [02:26, 133.48it/s][A
19545it [02:26, 133.70it/s][A
19590it [02:26, 133.91it/s][A
19622it [02:26, 133.94it/s][A
19649it [02:26, 134.00it/s][A
19675it [02:26, 134.08it/s][A
19700it [02:26, 134.06it/s][A
19721it [02:27, 134.04it/s][A
19740it [02:27, 134.03it/s][A
19757it [02:27, 134.04it/s][A
19773it [02:27, 133.99it/s][A
19787it [02:27, 133.86it/s][A
19799it [02:28, 133.76it/s][A
19809it [02:28, 133.67it/s][A
19823it [02:28, 133.67it/s][A
19847it [02:28, 133.74it/s][A
19869it [02:28, 133.80it/s][A
19894it [02:28, 133.87it/s][A
19922it [02:28, 133.97it/s][A
19944it [02:28, 134.03it/s][A
19966it [02:28, 134.07it/s][A
19987it [02:29, 134.11it/s][A
20007it [02:29, 134.10it/s][A
20025it 

23924it [03:02, 130.83it/s][A
23938it [03:03, 130.78it/s][A
23951it [03:03, 130.69it/s][A
23962it [03:03, 130.67it/s][A
23997it [03:03, 130.79it/s][A
24018it [03:03, 130.83it/s][A
24037it [03:03, 130.85it/s][A
24056it [03:03, 130.88it/s][A
24079it [03:03, 130.92it/s][A
24098it [03:04, 130.88it/s][A
24116it [03:04, 130.91it/s][A
24139it [03:04, 130.96it/s][A
24158it [03:04, 130.97it/s][A
24176it [03:04, 130.89it/s][A
24195it [03:04, 130.92it/s][A
24215it [03:04, 130.95it/s][A
24237it [03:05, 131.00it/s][A
24258it [03:05, 131.04it/s][A
24277it [03:05, 130.85it/s][A
24292it [03:05, 130.71it/s][A
24304it [03:05, 130.69it/s][A
24316it [03:06, 130.68it/s][A
24328it [03:06, 130.67it/s][A
24347it [03:06, 130.70it/s][A
24361it [03:06, 130.61it/s][A
24373it [03:06, 130.49it/s][A
24383it [03:07, 130.30it/s][A
24391it [03:07, 130.23it/s][A
24398it [03:07, 130.14it/s][A
24404it [03:07, 130.07it/s][A
24411it [03:07, 130.04it/s][A
24418it [03:07, 130.00it/s][A
24425it 

27549it [03:40, 125.11it/s][A
27558it [03:40, 125.08it/s][A
27567it [03:40, 125.04it/s][A
27575it [03:40, 125.00it/s][A
27585it [03:40, 124.99it/s][A
27596it [03:40, 124.98it/s][A
27605it [03:40, 124.95it/s][A
27614it [03:41, 124.93it/s][A
27624it [03:41, 124.91it/s][A
27635it [03:41, 124.90it/s][A
27645it [03:41, 124.89it/s][A
27655it [03:41, 124.84it/s][A
27664it [03:41, 124.81it/s][A
27700it [03:41, 124.92it/s][A
27729it [03:41, 124.99it/s][A
27755it [03:41, 125.05it/s][A
27777it [03:42, 125.02it/s][A
27796it [03:42, 125.01it/s][A
27812it [03:42, 125.01it/s][A
27841it [03:42, 125.08it/s][A
27861it [03:42, 125.10it/s][A
27880it [03:42, 125.04it/s][A
27895it [03:43, 124.95it/s][A
27907it [03:43, 124.84it/s][A
27917it [03:43, 124.77it/s][A
27926it [03:43, 124.70it/s][A
27934it [03:44, 124.67it/s][A
27941it [03:44, 124.62it/s][A
27948it [03:44, 124.58it/s][A
27969it [03:44, 124.62it/s][A
27996it [03:44, 124.68it/s][A
28020it [03:44, 124.73it/s][A
28042it 

32551it [04:13, 128.32it/s][A
32562it [04:13, 128.30it/s][A
32573it [04:14, 128.23it/s][A
32582it [04:14, 128.19it/s][A
32590it [04:14, 128.14it/s][A
32597it [04:14, 128.09it/s][A
32606it [04:14, 128.08it/s][A
32619it [04:14, 128.08it/s][A
32629it [04:14, 128.06it/s][A
32639it [04:14, 128.05it/s][A
32650it [04:14, 128.04it/s][A
32661it [04:15, 128.03it/s][A
32674it [04:15, 128.03it/s][A
32687it [04:15, 128.03it/s][A
32699it [04:15, 128.03it/s][A
32717it [04:15, 128.05it/s][A
32754it [04:15, 128.14it/s][A
32775it [04:15, 128.16it/s][A
32797it [04:15, 128.20it/s][A
32826it [04:15, 128.26it/s][A
32863it [04:16, 128.35it/s][A
32890it [04:16, 128.40it/s][A
32916it [04:16, 128.43it/s][A
32940it [04:16, 128.36it/s][A
32959it [04:16, 128.34it/s][A
32978it [04:16, 128.37it/s][A
33014it [04:17, 128.46it/s][A
33037it [04:17, 128.45it/s][A
33057it [04:17, 128.42it/s][A
33074it [04:17, 128.43it/s][A
33091it [04:17, 128.44it/s][A
33107it [04:17, 128.39it/s][A
33123it 

36684it [04:51, 126.05it/s][A
36700it [04:51, 126.06it/s][A
36714it [04:51, 126.07it/s][A
36739it [04:51, 126.11it/s][A
36764it [04:51, 126.15it/s][A
36787it [04:51, 126.18it/s][A
36807it [04:51, 126.21it/s][A
36827it [04:51, 126.20it/s][A
36845it [04:51, 126.20it/s][A
36861it [04:52, 126.19it/s][A
36876it [04:52, 126.18it/s][A
36890it [04:52, 126.17it/s][A
36903it [04:52, 126.09it/s][A
36914it [04:53, 125.97it/s][A
36923it [04:53, 125.93it/s][A
36931it [04:53, 125.87it/s][A
36938it [04:53, 125.82it/s][A
36944it [04:53, 125.79it/s][A
36953it [04:53, 125.70it/s][A
36960it [04:54, 125.68it/s][A
36969it [04:54, 125.67it/s][A
36981it [04:54, 125.67it/s][A
36991it [04:54, 125.66it/s][A
37003it [04:54, 125.66it/s][A
37017it [04:54, 125.66it/s][A
37028it [04:54, 125.64it/s][A
37050it [04:54, 125.67it/s][A
37067it [04:54, 125.68it/s][A
37082it [04:55, 125.34it/s][A
37093it [04:56, 125.31it/s][A
37110it [04:56, 125.32it/s][A
37122it [04:56, 125.31it/s][A
37137it 

41208it [05:28, 125.31it/s][A
41220it [05:28, 125.30it/s][A
41232it [05:29, 125.30it/s][A
41244it [05:29, 125.28it/s][A
41276it [05:29, 125.33it/s][A
41294it [05:29, 125.35it/s][A
41313it [05:29, 125.37it/s][A
41331it [05:29, 125.35it/s][A
41346it [05:29, 125.34it/s][A
41360it [05:30, 125.33it/s][A
41373it [05:30, 125.30it/s][A
41384it [05:30, 125.27it/s][A
41394it [05:30, 125.24it/s][A
41403it [05:30, 125.20it/s][A
41411it [05:30, 125.18it/s][A
41422it [05:30, 125.18it/s][A
41433it [05:31, 125.17it/s][A
41446it [05:31, 125.17it/s][A
41457it [05:31, 125.17it/s][A
41469it [05:31, 125.15it/s][A
41479it [05:31, 125.08it/s][A
41487it [05:31, 125.00it/s][A
41494it [05:32, 124.91it/s][A
41500it [05:32, 124.89it/s][A
41531it [05:32, 124.95it/s][A
41544it [05:32, 124.93it/s][A
41556it [05:33, 124.68it/s][A
41570it [05:33, 124.68it/s][A
41581it [05:33, 124.68it/s][A
41593it [05:33, 124.68it/s][A
41604it [05:33, 124.66it/s][A
41615it [05:33, 124.64it/s][A
41625it 

46191it [06:07, 125.58it/s][A
46215it [06:07, 125.61it/s][A
46234it [06:08, 125.62it/s][A
46254it [06:08, 125.64it/s][A
46278it [06:08, 125.67it/s][A
46298it [06:08, 125.68it/s][A
46319it [06:08, 125.70it/s][A
46338it [06:08, 125.70it/s][A
46355it [06:08, 125.71it/s][A
46376it [06:08, 125.73it/s][A
46405it [06:08, 125.77it/s][A
46429it [06:09, 125.81it/s][A
46451it [06:09, 125.81it/s][A
46471it [06:09, 125.81it/s][A
46504it [06:09, 125.86it/s][A
46526it [06:09, 125.85it/s][A
46551it [06:09, 125.88it/s][A
46571it [06:10, 125.86it/s][A
46588it [06:10, 125.84it/s][A
46603it [06:10, 125.84it/s][A
46620it [06:10, 125.85it/s][A
46642it [06:10, 125.87it/s][A
46664it [06:10, 125.90it/s][A
46685it [06:10, 125.92it/s][A
46704it [06:10, 125.94it/s][A
46723it [06:11, 125.91it/s][A
46739it [06:11, 125.91it/s][A
46756it [06:11, 125.92it/s][A
46774it [06:11, 125.93it/s][A
46816it [06:11, 126.01it/s][A
46839it [06:11, 125.95it/s][A
46857it [06:12, 125.92it/s][A
46872it 

In [26]:
test_hand_features = baseline_features.hand_features(test_df['Headline'],test_df['Body_Text'])


0it [00:00, ?it/s][A
10it [00:00, 95.22it/s][A
22it [00:00, 106.30it/s][A
45it [00:00, 144.45it/s][A
63it [00:00, 152.34it/s][A
81it [00:00, 155.66it/s][A
95it [00:00, 148.27it/s][A
109it [00:00, 146.54it/s][A
123it [00:00, 144.24it/s][A
137it [00:00, 141.70it/s][A
150it [00:01, 130.89it/s][A
162it [00:01, 112.23it/s][A
172it [00:01, 104.87it/s][A
180it [00:01, 102.62it/s][A
188it [00:01, 99.47it/s] [A
196it [00:02, 96.34it/s][A
203it [00:02, 92.82it/s][A
210it [00:02, 89.77it/s][A
216it [00:02, 88.29it/s][A
233it [00:02, 91.33it/s][A
253it [00:02, 95.45it/s][A
268it [00:02, 97.34it/s][A
281it [00:02, 98.13it/s][A
294it [00:03, 97.36it/s][A
306it [00:03, 97.38it/s][A
322it [00:03, 99.31it/s][A
361it [00:03, 107.97it/s][A
382it [00:03, 107.97it/s][A
400it [00:03, 106.72it/s][A
415it [00:03, 105.24it/s][A
428it [00:04, 103.58it/s][A
439it [00:04, 99.96it/s] [A
448it [00:04, 97.02it/s][A
456it [00:04, 94.08it/s][A
467it [00:04, 94.38it/s][A
504it [00:05

4652it [00:37, 124.25it/s][A
4672it [00:37, 124.36it/s][A
4693it [00:37, 124.58it/s][A
4713it [00:37, 124.70it/s][A
4732it [00:37, 124.82it/s][A
4750it [00:38, 124.86it/s][A
4767it [00:38, 124.92it/s][A
4784it [00:38, 125.03it/s][A
4801it [00:38, 125.06it/s][A
4817it [00:38, 125.11it/s][A
4834it [00:38, 125.23it/s][A
4868it [00:38, 125.78it/s][A
4892it [00:38, 126.07it/s][A
4914it [00:38, 126.24it/s][A
4935it [00:39, 126.26it/s][A
4954it [00:39, 126.34it/s][A
4974it [00:39, 126.52it/s][A
4993it [00:39, 126.63it/s][A
5011it [00:39, 126.71it/s][A
5030it [00:39, 126.85it/s][A
5049it [00:39, 127.00it/s][A
5067it [00:39, 127.08it/s][A
5091it [00:39, 127.37it/s][A
5167it [00:40, 128.94it/s][A
5203it [00:40, 129.26it/s][A
5238it [00:40, 129.75it/s][A
5269it [00:40, 128.80it/s][A
5293it [00:41, 128.03it/s][A
5311it [00:41, 127.46it/s][A
5356it [00:41, 128.22it/s][A
5379it [00:41, 128.30it/s][A
5400it [00:42, 128.32it/s][A
5419it [00:42, 128.32it/s][A
5436it [00

9308it [01:16, 122.17it/s][A
9342it [01:16, 122.22it/s][A
9370it [01:16, 122.27it/s][A
9399it [01:16, 122.48it/s][A
9425it [01:16, 122.61it/s][A
9449it [01:17, 122.56it/s][A
9479it [01:17, 122.78it/s][A
9508it [01:17, 122.99it/s][A
9533it [01:17, 122.66it/s][A
9552it [01:18, 122.42it/s][A
9567it [01:18, 122.10it/s][A
9579it [01:18, 121.93it/s][A
9590it [01:18, 121.83it/s][A
9603it [01:18, 121.84it/s][A
9616it [01:18, 121.85it/s][A
9628it [01:19, 121.83it/s][A
9640it [01:19, 121.81it/s][A
9653it [01:19, 121.81it/s][A
9665it [01:19, 121.62it/s][A
9675it [01:19, 121.57it/s][A
9685it [01:19, 121.53it/s][A
9696it [01:19, 121.50it/s][A
9711it [01:19, 121.53it/s][A
9727it [01:20, 121.57it/s][A
9740it [01:20, 121.57it/s][A
9753it [01:20, 121.57it/s][A
9769it [01:20, 121.62it/s][A
9786it [01:20, 121.67it/s][A
9807it [01:20, 121.78it/s][A
9824it [01:20, 121.78it/s][A
9841it [01:20, 121.83it/s][A
9857it [01:20, 121.87it/s][A
9880it [01:20, 122.00it/s][A
9898it [01

13743it [01:56, 118.20it/s][A
13756it [01:56, 118.21it/s][A
13769it [01:56, 118.21it/s][A
13786it [01:56, 118.26it/s][A
13805it [01:56, 118.32it/s][A
13825it [01:56, 118.38it/s][A
13842it [01:56, 118.42it/s][A
13861it [01:56, 118.48it/s][A
13879it [01:57, 118.52it/s][A
13898it [01:57, 118.58it/s][A
13916it [01:57, 118.60it/s][A
13933it [01:57, 118.59it/s][A
13948it [01:57, 118.59it/s][A
13962it [01:57, 118.58it/s][A
13976it [01:57, 118.59it/s][A
13992it [01:57, 118.61it/s][A
14006it [01:58, 118.62it/s][A
14020it [01:58, 118.62it/s][A
14033it [01:58, 118.62it/s][A
14046it [01:58, 118.56it/s][A
14057it [01:58, 118.54it/s][A
14073it [01:58, 118.57it/s][A
14086it [01:58, 118.53it/s][A
14098it [01:59, 118.28it/s][A
14107it [01:59, 118.18it/s][A
14115it [01:59, 118.11it/s][A
14123it [01:59, 118.04it/s][A
14130it [01:59, 117.93it/s][A
14136it [01:59, 117.86it/s][A
14142it [02:00, 117.80it/s][A
14148it [02:00, 117.72it/s][A
14154it [02:00, 117.62it/s][A
14194it 

17599it [02:34, 114.20it/s][A
17616it [02:34, 114.23it/s][A
17633it [02:34, 114.26it/s][A
17650it [02:34, 114.29it/s][A
17667it [02:34, 114.32it/s][A
17684it [02:34, 114.33it/s][A
17699it [02:34, 114.30it/s][A
17718it [02:34, 114.34it/s][A
17733it [02:35, 114.36it/s][A
17750it [02:35, 114.39it/s][A
17765it [02:35, 114.35it/s][A
17785it [02:35, 114.40it/s][A
17810it [02:35, 114.49it/s][A
17832it [02:35, 114.55it/s][A
17851it [02:35, 114.58it/s][A
17869it [02:35, 114.60it/s][A
17886it [02:36, 114.62it/s][A
17902it [02:36, 114.64it/s][A
17918it [02:36, 114.65it/s][A
17935it [02:36, 114.69it/s][A
17956it [02:36, 114.75it/s][A
17978it [02:36, 114.82it/s][A
17998it [02:36, 114.87it/s][A
18018it [02:36, 114.92it/s][A
18048it [02:36, 115.03it/s][A
18073it [02:36, 115.12it/s][A
18096it [02:37, 115.16it/s][A
18117it [02:37, 115.20it/s][A
18137it [02:37, 115.21it/s][A
18192it [02:37, 115.48it/s][A
18226it [02:37, 115.62it/s][A
18256it [02:38, 115.54it/s][A
18280it 

22762it [03:13, 117.76it/s][A
22776it [03:13, 117.75it/s][A
22789it [03:13, 117.75it/s][A
22816it [03:13, 117.83it/s][A
22835it [03:13, 117.86it/s][A
22852it [03:13, 117.87it/s][A
22868it [03:13, 117.89it/s][A
22885it [03:14, 117.91it/s][A
22901it [03:14, 117.92it/s][A
22917it [03:14, 117.94it/s][A
22933it [03:14, 117.94it/s][A
22948it [03:14, 117.96it/s][A
22963it [03:14, 117.95it/s][A
23001it [03:14, 118.09it/s][A
23022it [03:14, 118.11it/s][A
23047it [03:15, 118.18it/s][A
23068it [03:15, 118.22it/s][A
23089it [03:15, 118.24it/s][A
23108it [03:15, 118.26it/s][A
23126it [03:15, 118.27it/s][A
23143it [03:15, 118.28it/s][A
23159it [03:15, 118.29it/s][A
23198it [03:15, 118.43it/s][A
23220it [03:16, 118.36it/s][A
23244it [03:16, 118.42it/s][A
23264it [03:16, 118.46it/s][A
23284it [03:16, 118.50it/s][A
23304it [03:16, 118.52it/s][A
23327it [03:16, 118.56it/s][A
23346it [03:16, 118.57it/s][A
23363it [03:17, 118.56it/s][A
23378it [03:17, 118.52it/s][A
23391it 

In [28]:
train_hand_features = np.array(train_hand_features)
test_hand_features = np.array(test_hand_features)

In [32]:
train_features = hstack([train_body_tfidf,train_headline_tfidf,train_hand_features])
test_features = hstack([test_body_tfidf,test_headline_tfidf,test_hand_features])

In [33]:
#Extract training and test labels
train_labels = list(train_df['Stance'])
test_labels = list(test_df['Stance'])

In [36]:
#Initialize random forest classifier (Scikit Learn)
rf_classifier = RandomForestClassifier(n_estimators=10)

y_pred = rf_classifier.fit(train_features, train_labels).predict(test_features)

score.report_score(test_labels, y_pred)

-------------------------------------------------------------
|           |   agree   | disagree  |  discuss  | unrelated |
-------------------------------------------------------------
|   agree   |    801    |     4     |    702    |    396    |
-------------------------------------------------------------
| disagree  |    229    |     0     |    184    |    284    |
-------------------------------------------------------------
|  discuss  |    960    |     9     |   2655    |    840    |
-------------------------------------------------------------
| unrelated |    82     |     1     |    493    |   17773   |
-------------------------------------------------------------
Score: 8421.25 out of 11651.25	(72.2776526124%)


72.27765261238065

In [37]:
#Initialize multinomialnb classifier
nb_classifier = MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)

y_pred = nb_classifier.fit(train_features,train_labels).predict(test_features)

score.report_score(test_labels, y_pred)

-------------------------------------------------------------
|           |   agree   | disagree  |  discuss  | unrelated |
-------------------------------------------------------------
|   agree   |   1033    |     9     |    603    |    258    |
-------------------------------------------------------------
| disagree  |    295    |     3     |    184    |    215    |
-------------------------------------------------------------
|  discuss  |   1071    |    23     |   2724    |    646    |
-------------------------------------------------------------
| unrelated |    278    |     1     |    682    |   17388   |
-------------------------------------------------------------
Score: 8653.25 out of 11651.25	(74.268855273%)


74.26885527303938

In [None]:
#Add predicted labels to test dataframe
test_df['RF_Predicted_Stance'] = list(y_pred)

In [None]:
test_df2 = test_df[['Headline','Body_Text','RF_Predicted_Stance','Stance']]

In [None]:
test_df2[test_df2['RF_Predicted_Stance'] == 'unrelated']

In [None]:
#Initialize random forest classifier (Scikit Learn)
rf_classifier = RandomForestClassifier(n_estimators=10)

y_pred = rf_classifier.fit(train_features, train_labels).predict(test_features)

score.report_score(test_labels, y_pred)