# Assessing subreddit origins via classification models.

Using CountVectorizer and Random Forest models we will assess which subreddit a sample of text originated from. 
The two sub-reddits we will be using are AskScience, and AskEngineers

### Start by import libraries

The first step will be to import the following libraries. These were determined during exploration and moved to the top for convention.

In [7]:
import requests
import pandas as pd
import numpy as np


from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

pd.set_option('display.max_columns', 999)

## Collecting data. 
Using PushShift API we will collect live subreddit data to build a model on.

#### AskEngineers Data

In [8]:
url = "https://api.pushshift.io/reddit/search/submission/?subreddit=AskEngineers&size=500&selftext:not='[removed]'&is_video=False"

req = requests.get(url)

engineers_df  = pd.DataFrame(req.json()['data'])

#### AskScience Data

In [9]:
url1 = "https://api.pushshift.io/reddit/search/submission/?subreddit=AskScience&size=500&selftext:not='[removed]'&is_video=False"

req1 = requests.get(url1)

science_df  = pd.DataFrame(req1.json()['data'])

## Concat Data

Next we will combine the two DataFrames via concatination and reset the index. (This will be necesarry later for merging data)

In [10]:
testing_df = pd.concat([science_df, engineers_df],sort=True)
testing_df = testing_df.reset_index().drop('index',axis=1)

## I set the Pandas Option to display more columns so I could explore the data better.

In [11]:
testing_df.head()

Unnamed: 0,all_awardings,allow_live_comments,author,author_cakeday,author_created_utc,author_flair_background_color,author_flair_css_class,author_flair_richtext,author_flair_template_id,author_flair_text,author_flair_text_color,author_flair_type,author_fullname,author_patreon_flair,author_premium,awarders,banned_by,can_mod_post,category,content_categories,contest_mode,created_utc,crosspost_parent,crosspost_parent_list,domain,edited,full_link,gilded,gildings,id,is_crosspostable,is_meta,is_original_content,is_reddit_media_domain,is_robot_indexable,is_self,is_video,link_flair_background_color,link_flair_css_class,link_flair_richtext,link_flair_template_id,link_flair_text,link_flair_text_color,link_flair_type,locked,media_embed,media_only,no_follow,num_comments,num_crossposts,og_description,og_title,over_18,parent_whitelist_status,permalink,pinned,post_hint,preview,pwls,removal_reason,removed_by,removed_by_category,retrieved_on,score,secure_media_embed,selftext,send_replies,spoiler,steward_reports,stickied,subreddit,subreddit_id,subreddit_subscribers,subreddit_type,suggested_sort,thumbnail,title,total_awards_received,updated_utc,url,whitelist_status,wls
0,[],False,[deleted],,,,,,,,dark,,,,,[],moderators,False,,,False,1580270334,,,self.askscience,,https://www.reddit.com/r/askscience/comments/e...,,{},evhfdb,False,False,False,False,False,True,False,#ccccff,astro,[],26929b46-8971-11e1-aa3a-12313d096aae,Astronomy,dark,text,False,,False,True,2,0,,,False,all_ads,/r/askscience/comments/evhfdb/i_dont_know_if_t...,False,,,6,,,deleted,1580271371,1,,,True,False,,False,askscience,t5_2qm4e,18550810,public,,default,I don’t know if this counts as one but...,0.0,,https://www.reddit.com/r/askscience/comments/e...,all_ads,6
1,[],False,[deleted],,,,,,,,dark,,,,,[],moderators,False,,,False,1580269517,,,self.askscience,,https://www.reddit.com/r/askscience/comments/e...,,{},evh97v,False,False,False,False,False,True,False,#cc99ff,physics,[],e8738d5c-8970-11e1-9266-12313d2c1af1,Physics,dark,text,False,,False,True,2,0,,,False,all_ads,/r/askscience/comments/evh97v/proton_decay/,False,,,6,,,deleted,1580270722,1,,,True,False,,False,askscience,t5_2qm4e,18550762,public,,default,Proton Decay,0.0,,https://www.reddit.com/r/askscience/comments/e...,all_ads,6
2,[],False,[deleted],,,,,,,,dark,,,,,[],moderators,False,,,False,1580258983,,,self.askscience,,https://www.reddit.com/r/askscience/comments/e...,,{},evezk5,False,False,False,False,False,True,False,#aaddaa,med,[],78037cd8-1e22-11e3-b776-12313d096169,Medicine,dark,text,False,,False,True,2,0,,,False,all_ads,/r/askscience/comments/evezk5/as_a_26_year_old...,False,,,6,,,deleted,1580262310,1,,,True,False,,False,askscience,t5_2qm4e,18550333,public,,default,"As a 26 year old, should I be worried about dy...",0.0,,https://www.reddit.com/r/askscience/comments/e...,all_ads,6
3,[],False,[deleted],,,,,,,,dark,,,,,[],moderators,False,,,False,1580257513,,,self.askscience,,https://www.reddit.com/r/askscience/comments/e...,,{},evempc,False,False,False,False,False,True,False,#ccff99,neuro,[],3f105c74-dfa7-11e3-99f2-12313b0b31f5,Neuroscience,dark,text,False,,False,True,0,0,,,False,all_ads,/r/askscience/comments/evempc/are_different_el...,False,,,6,,,deleted,1580261134,1,,,True,False,,False,askscience,t5_2qm4e,18550274,public,,default,Are different electroshock therapy treatments ...,0.0,,https://www.reddit.com/r/askscience/comments/e...,all_ads,6
4,[],False,[deleted],,,,,,,,dark,,,,,[],moderators,False,,,False,1580254317,,,self.askscience,,https://www.reddit.com/r/askscience/comments/e...,,{},evduz5,False,False,False,False,False,True,False,#b4d8ff,geo,[],39fec81c-8971-11e1-a59a-12313d2c1af1,Planetary Sci.,dark,text,False,,False,True,0,0,,,False,all_ads,/r/askscience/comments/evduz5/how_does_the_atm...,False,,,6,,,deleted,1580258516,1,,,True,False,,False,askscience,t5_2qm4e,18550201,public,,default,How does the atmosphere...work?,0.0,,https://www.reddit.com/r/askscience/comments/e...,all_ads,6


In [12]:
testing_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 82 columns):
all_awardings                    933 non-null object
allow_live_comments              827 non-null object
author                           1000 non-null object
author_cakeday                   7 non-null object
author_created_utc               1 non-null float64
author_flair_background_color    428 non-null object
author_flair_css_class           63 non-null object
author_flair_richtext            705 non-null object
author_flair_template_id         22 non-null object
author_flair_text                153 non-null object
author_flair_text_color          448 non-null object
author_flair_type                705 non-null object
author_fullname                  705 non-null object
author_patreon_flair             705 non-null object
author_premium                   557 non-null object
awarders                         732 non-null object
banned_by                        293 non-null obje

## I did a partial cleaning
I set a threshhold on NaN in order to maintain the ```'selftext'``` column

In [13]:
part_cleaned_df = testing_df.dropna(axis=1, thresh=707)

In [14]:
part_cleaned_df.head()

Unnamed: 0,all_awardings,allow_live_comments,author,awarders,can_mod_post,contest_mode,created_utc,domain,full_link,gildings,id,is_crosspostable,is_meta,is_original_content,is_reddit_media_domain,is_robot_indexable,is_self,is_video,link_flair_background_color,link_flair_richtext,link_flair_text_color,link_flair_type,locked,media_only,no_follow,num_comments,num_crossposts,over_18,parent_whitelist_status,permalink,pinned,pwls,retrieved_on,score,selftext,send_replies,spoiler,stickied,subreddit,subreddit_id,subreddit_subscribers,subreddit_type,thumbnail,title,total_awards_received,url,whitelist_status,wls
0,[],False,[deleted],[],False,False,1580270334,self.askscience,https://www.reddit.com/r/askscience/comments/e...,{},evhfdb,False,False,False,False,False,True,False,#ccccff,[],dark,text,False,False,True,2,0,False,all_ads,/r/askscience/comments/evhfdb/i_dont_know_if_t...,False,6,1580271371,1,,True,False,False,askscience,t5_2qm4e,18550810,public,default,I don’t know if this counts as one but...,0.0,https://www.reddit.com/r/askscience/comments/e...,all_ads,6
1,[],False,[deleted],[],False,False,1580269517,self.askscience,https://www.reddit.com/r/askscience/comments/e...,{},evh97v,False,False,False,False,False,True,False,#cc99ff,[],dark,text,False,False,True,2,0,False,all_ads,/r/askscience/comments/evh97v/proton_decay/,False,6,1580270722,1,,True,False,False,askscience,t5_2qm4e,18550762,public,default,Proton Decay,0.0,https://www.reddit.com/r/askscience/comments/e...,all_ads,6
2,[],False,[deleted],[],False,False,1580258983,self.askscience,https://www.reddit.com/r/askscience/comments/e...,{},evezk5,False,False,False,False,False,True,False,#aaddaa,[],dark,text,False,False,True,2,0,False,all_ads,/r/askscience/comments/evezk5/as_a_26_year_old...,False,6,1580262310,1,,True,False,False,askscience,t5_2qm4e,18550333,public,default,"As a 26 year old, should I be worried about dy...",0.0,https://www.reddit.com/r/askscience/comments/e...,all_ads,6
3,[],False,[deleted],[],False,False,1580257513,self.askscience,https://www.reddit.com/r/askscience/comments/e...,{},evempc,False,False,False,False,False,True,False,#ccff99,[],dark,text,False,False,True,0,0,False,all_ads,/r/askscience/comments/evempc/are_different_el...,False,6,1580261134,1,,True,False,False,askscience,t5_2qm4e,18550274,public,default,Are different electroshock therapy treatments ...,0.0,https://www.reddit.com/r/askscience/comments/e...,all_ads,6
4,[],False,[deleted],[],False,False,1580254317,self.askscience,https://www.reddit.com/r/askscience/comments/e...,{},evduz5,False,False,False,False,False,True,False,#b4d8ff,[],dark,text,False,False,True,0,0,False,all_ads,/r/askscience/comments/evduz5/how_does_the_atm...,False,6,1580258516,1,,True,False,False,askscience,t5_2qm4e,18550201,public,default,How does the atmosphere...work?,0.0,https://www.reddit.com/r/askscience/comments/e...,all_ads,6


In [15]:
part_cleaned_df.tail()

Unnamed: 0,all_awardings,allow_live_comments,author,awarders,can_mod_post,contest_mode,created_utc,domain,full_link,gildings,id,is_crosspostable,is_meta,is_original_content,is_reddit_media_domain,is_robot_indexable,is_self,is_video,link_flair_background_color,link_flair_richtext,link_flair_text_color,link_flair_type,locked,media_only,no_follow,num_comments,num_crossposts,over_18,parent_whitelist_status,permalink,pinned,pwls,retrieved_on,score,selftext,send_replies,spoiler,stickied,subreddit,subreddit_id,subreddit_subscribers,subreddit_type,thumbnail,title,total_awards_received,url,whitelist_status,wls
995,[],False,BiaxialObject48,[],False,False,1579402698,self.AskEngineers,https://www.reddit.com/r/AskEngineers/comments...,{},eqqtac,True,False,False,False,True,True,False,,[],dark,text,False,False,True,2,0,False,all_ads,/r/AskEngineers/comments/eqqtac/rail_gun_with_...,False,6,1579402699,1,Is it possible to make a rail gun where the pr...,True,False,False,AskEngineers,t5_2sebk,156073,public,self,Rail gun with spark gap?,0.0,https://www.reddit.com/r/AskEngineers/comments...,all_ads,6
996,[],False,mixtape-maker,[],False,False,1579393189,self.AskEngineers,https://www.reddit.com/r/AskEngineers/comments...,{},eqoy7c,True,False,False,False,True,True,False,,[],dark,text,False,False,True,2,0,False,all_ads,/r/AskEngineers/comments/eqoy7c/using_a_microc...,False,6,1579393191,1,The post title may sound a little confusing so...,True,False,False,AskEngineers,t5_2sebk,156057,public,self,Using a microcontroller to determine if a path...,0.0,https://www.reddit.com/r/AskEngineers/comments...,all_ads,6
997,[],False,denyingerrors,[],False,False,1579391921,self.AskEngineers,https://www.reddit.com/r/AskEngineers/comments...,{},eqoogs,True,False,False,False,True,True,False,,[],dark,text,False,False,True,7,0,False,all_ads,/r/AskEngineers/comments/eqoogs/my_engineering...,False,6,1579391922,1,I am currently a sophomore in aerospace engine...,True,False,False,AskEngineers,t5_2sebk,156055,public,self,My Engineering internship/Co-op is in a differ...,0.0,https://www.reddit.com/r/AskEngineers/comments...,all_ads,6
998,[],False,Kyleh04,[],False,False,1579391914,self.AskEngineers,https://www.reddit.com/r/AskEngineers/comments...,{},eqooe6,True,False,False,False,True,True,False,,[],dark,text,False,False,True,4,0,False,all_ads,/r/AskEngineers/comments/eqooe6/is_a_raised_pl...,False,6,1579391915,1,"I run a electronics assembly company, and we r...",True,False,False,AskEngineers,t5_2sebk,156055,public,self,Is a raised plywood slab strong enough for an ...,0.0,https://www.reddit.com/r/AskEngineers/comments...,all_ads,6
999,[],False,The-Sober-Stoner,[],False,False,1579391353,self.AskEngineers,https://www.reddit.com/r/AskEngineers/comments...,{},eqojy3,True,False,False,False,True,True,False,,[],dark,text,False,False,True,4,0,False,all_ads,/r/AskEngineers/comments/eqojy3/design_enginee...,False,6,1579391354,1,Some design work is purely desk based with phy...,True,False,False,AskEngineers,t5_2sebk,156055,public,self,"Design Engineers, how important is access to m...",0.0,https://www.reddit.com/r/AskEngineers/comments...,all_ads,6


In [16]:
more_cleaned_df = part_cleaned_df.drop(['domain', 'author', 'wls', 'url', 'subreddit_type', 'subreddit_subscribers', 'subreddit_id', 'pwls', 'permalink', 'parent_whitelist_status', 'over_18', 'media_only', 'is_video', 'is_self', 'is_robot_indexable','gildings', 'full_link', 'contest_mode', 'can_mod_post', 'awarders', 'allow_live_comments', 'all_awardings', 'total_awards_received', 'retrieved_on'], axis=1)

## I removed more columns by hand
The columns were selected by "incriminating" data. I wanted to remove any data that automatically identified which subreddit the information came from. The only column maintained was ```'subreddit'``` which will be seperated to use a the ```'y_value'``` later.

In [17]:
more_cleaned_df.head()

Unnamed: 0,created_utc,id,is_crosspostable,is_meta,is_original_content,is_reddit_media_domain,link_flair_background_color,link_flair_richtext,link_flair_text_color,link_flair_type,locked,no_follow,num_comments,num_crossposts,pinned,score,selftext,send_replies,spoiler,stickied,subreddit,thumbnail,title,whitelist_status
0,1580270334,evhfdb,False,False,False,False,#ccccff,[],dark,text,False,True,2,0,False,1,,True,False,False,askscience,default,I don’t know if this counts as one but...,all_ads
1,1580269517,evh97v,False,False,False,False,#cc99ff,[],dark,text,False,True,2,0,False,1,,True,False,False,askscience,default,Proton Decay,all_ads
2,1580258983,evezk5,False,False,False,False,#aaddaa,[],dark,text,False,True,2,0,False,1,,True,False,False,askscience,default,"As a 26 year old, should I be worried about dy...",all_ads
3,1580257513,evempc,False,False,False,False,#ccff99,[],dark,text,False,True,0,0,False,1,,True,False,False,askscience,default,Are different electroshock therapy treatments ...,all_ads
4,1580254317,evduz5,False,False,False,False,#b4d8ff,[],dark,text,False,True,0,0,False,1,,True,False,False,askscience,default,How does the atmosphere...work?,all_ads


## Create X and y for fitting

In [18]:
X = more_cleaned_df.drop('subreddit', axis=1)
y = more_cleaned_df['subreddit']

## Instantiate Count Vectorizer for text analysis

In [84]:
cvec= CountVectorizer(min_df=3,ngram_range=(1,2))

## Replace np.NaN with empty strings as not to break CountVectorizer

In [20]:
X_clean = X.replace(np.nan, '', regex=True)

## Train Test Split

In order to validate model accuracy we need to seperate a testing class using train test split

In [21]:
X_train, X_test, y_train, y_test = train_test_split(X_clean, y,
                                                   stratify=y,
                                                   random_state=42)

## Vectorize
Next we will fit our Count Vectorizer to the training data to generate word tokens for analysis

In [85]:
cvec.fit(X_train['selftext'])

X_vect_train = cvec.transform(X_train['selftext'])
X_vect_test = cvec.transform(X_test['selftext'])

## Build Token DataFrame

The Tokenized Data is generated as a sparse table. so we have to convert it a readable DataFrame. I'm assigning the column names back to the associated columns to explore trends later on.

In [86]:
vect_df_train = pd.DataFrame(columns=cvec.get_feature_names(), 
             data=X_vect_train.todense())
vect_df_test = pd.DataFrame(columns=cvec.get_feature_names(), 
             data=X_vect_test.todense())

## Merge Tokens

We have to merge the tokenized data back into our original DataFrame

In [87]:
Z_data_train = pd.merge(X_train.reset_index(drop=True), vect_df_train.reset_index(drop=True), left_index=True, right_index=True)
Z_data_test = pd.merge(X_test.reset_index(drop=True), vect_df_test.reset_index(drop=True), left_index=True, right_index=True)

## Filter for numerical Data

I then filter the data to return just numerical data

In [88]:
Z_train = Z_data_train.select_dtypes(['number'])
Z_test = Z_data_test.select_dtypes(['number'])

# Model Building
### Random Forest:

The first Model I'm testing is Random Forest

## Instantiate Random Forest Classifier

In [89]:
rf = RandomForestClassifier(n_jobs=6)

## Fit Model

In [90]:
rf.fit(Z_train, y_train)

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=6,
                       oob_score=False, random_state=None, verbose=0,
                       warm_start=False)

## Score against Test Data

In [91]:
rf.score(Z_test, y_test)

0.948

## This feels like a really high scoring model so...

Instead of testing different models I will grid search over different estimators for Random Forest.

## Instantiate Grid Search 
With Range passed for n_estimator parameters

In [29]:
params = {
    'n_estimators' : list(range(2,25))
}

grid = GridSearchCV(RandomForestClassifier(n_jobs=-1), param_grid=params)

## Fit Gridsearch to Training Data

In [30]:
grid.fit(Z_train, y_train)

GridSearchCV(cv=None, error_score=nan,
             estimator=RandomForestClassifier(bootstrap=True, ccp_alpha=0.0,
                                              class_weight=None,
                                              criterion='gini', max_depth=None,
                                              max_features='auto',
                                              max_leaf_nodes=None,
                                              max_samples=None,
                                              min_impurity_decrease=0.0,
                                              min_impurity_split=None,
                                              min_samples_leaf=1,
                                              min_samples_split=2,
                                              min_weight_fraction_leaf=0.0,
                                              n_estimators=100, n_jobs=-1,
                                              oob_score=False,
                                              rand

## Score Best Estimator against Train and Test

In [31]:
print(grid.score(Z_train, y_train))
print(grid.score(Z_test, y_test))

0.9946666666666667
0.936


## Save Model for Later Use

In [34]:
best_model = grid.best_estimator_

### SVM:

In [92]:
from sklearn.svm import SVC

In [118]:
grid_svc = GridSearchCV(SVC(break_ties=True, random_state=42),
                   param_grid={'C':np.logspace(-2,15,num=50)},
                   n_jobs=8)

# Fit on training data.
grid_svc.fit(Z_train, y_train)
# Evaluate model.
grid_svc.score(Z_test, y_test)

0.984

In [95]:
grid_svc.best_estimator_

SVC(C=17575.10624854793, break_ties=True, cache_size=200, class_weight=None,
    coef0=0.0, decision_function_shape='ovr', degree=3, gamma='scale',
    kernel='rbf', max_iter=-1, probability=False, random_state=42,
    shrinking=True, tol=0.001, verbose=False)

### Experimenting with XGBoost

In [119]:
import xgboost as xgb

In [120]:
xb = xgb.XGBClassifier()

In [122]:
xb.fit(Z_train, y_train)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.1, max_delta_step=0, max_depth=3,
              min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
              nthread=None, objective='binary:logistic', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)

In [123]:
xb.score(Z_test,y_test)

0.996

In [126]:
xg_rf = xgb.XGBRFClassifier()
params = {
    'n_estimators' : list(range(2,25))
}

grid_xg_rf = GridSearchCV(xgb.XGBRFClassifier(n_jobs=11), param_grid=params)
grid_xg_rf.fit(Z_train, y_train)
grid_xg_rf.score(Z_test, y_test)

0.984

# Conclusion

Using Subreddit Data we were able to build a model that can sort between Ask Science and Ask Engineers with 93.6% accuracy using Random Forest and 98.4 using a Support Vector Machine.

*These numbers may change when re-running cells due to new data but, they will likely be similar*

# Exploring Trends in Data



In [96]:
train_total = pd.merge(left=Z_train.reset_index(drop=True), right=y_train.reset_index(drop=True),
                      left_index=True, right_index=True)

In [97]:
test_total = pd.merge(left=Z_test.reset_index(drop=True), right=y_test.reset_index(drop=True),
                      left_index=True, right_index=True)

In [98]:
total_df = pd.concat([train_total, test_total], sort=True)
total_df = total_df.reset_index().drop('index',axis=1)

In [99]:
cleaned_analysis = total_df.drop('level_0', axis=1)

In [100]:
cleaned_analysis.head()

Unnamed: 0,00,000,01,07,10,100,10am,11,11am,12,12pm,12pm et,13,14,15,15 ut,16,16 ut,17,17 19,17 ut,18,18 ut,19,19 ut,1pm,1pm et,1st,20,20 years,2010,2014,2016,2017,2018,2019,2020,2021,21,23,24,25,250,29,2d,2pm,2pm et,30,3d,3d model,3pm,3pm et,40,50,500,60,90,ability,able,able to,about,about an,about how,about it,about me,about my,about one,about the,about this,about what,about your,above,absolutely,academic,acceleration,acceptable,accepted,accepted an,access,access to,accessible,accomplish,accomplish this,according,according to,accurate,achieve,achieve the,acid,across,across the,action,active,activities,activity,actual,actually,actuator,add,added,adding,addition,additional,additionally,advance,advance for,advanced,advancement,advice,advice any,advice from,advice is,advice on,advise,aerospace,affect,affects,affects the,after,after graduating,after the,after years,again,against,against the,age,ago,ago and,ago my,ahead,ai,air,all,all about,all am,all aspects,all at,all day,all have,all in,all my,all of,all over,all that,all the,allowed,allows,almost,almost all,almost years,alone,along,along the,along with,already,already have,also,also have,also if,also like,also what,also would,although,aluminum,always,always been,am,am also,am an,am confused,am currently,am doing,am going,am here,am in,am interested,am just,am looking,am missing,am not,am now,am really,am still,am the,am thinking,am trying,am very,am wondering,am working,ama,amazing,amazon,amazon com,america,american,among,amount,amount of,amp,amp x200b,an,an ama,an ee,an electrical,an email,an engineer,an engineering,an example,an expert,an idea,an internship,an interview,an offer,an old,an online,an opportunity,an option,an undergraduate,analysis,analyst,and,and about,and after,and all,and also,and am,and an,and another,and are,and as,and ask,and basically,and be,and behavioral,and both,and building,and can,and could,and design,and do,and don,and electrical,and engineering,and environmental,and even,and feel,and first,and for,and get,and getting,and give,and go,and got,and had,and half,and has,and have,and he,and health,and how,and human,and if,and in,and is,and it,and its,and just,and know,and later,and ll,and looking,and make,and many,and more,and my,and need,and not,and now,and on,and one,and or,and other,and others,and our,and out,and phd,and really,and said,and science,and so,and some,and stuff,and that,and the,and their,and then,and there,and they,and things,and this,and time,and to,and took,and try,and trying,and use,and ve,and want,and wanted,and was,and water,and we,and what,and when,and where,and why,and will,and work,and working,and would,and you,angle,animals,annual,another,answer,answer questions,answer your,answering,answering questions,answering your,answers,any,any advice,any help,any ideas,any input,any of,any other,any suggestions,any thoughts,anybody,anymore,anyone,anyone else,anyone had,anyone has,anyone have,anyone here,anyone know,anyone who,anything,anything else,anything from,anything with,anyway,anywhere,apart,apparently,appear,appears,appears to,application,applications,applied,applied for,applied to,apply,applying,applying for,applying to,appreciate,appreciate any,appreciated,approach,approaches,approximately,april,archaeology,architecture,arduino,are,are all,are doing,are dr,are either,are in,are more,are my,are not,are so,are some,are the,are there,are they,are you,area,area of,areas,areas of,aren,arguments,arm,around,around the,art,article,articles,artificial,artificial intelligence,as,as all,as am,as an,as can,as far,as in,as it,as manufacturing,as mechanical,as much,as my,as of,as part,as possible,as someone,as technician,as the,as they,as this,as we,as well,ask,ask me,ask them,ask us,asked,asked me,askengineers,askengineers comments,asking,asking for,askscience,askscience comments,aspects,aspects of,assemble,assembly,assess,assessment,assigned,assistant,associated,associated with,association,assume,assume that,assuming,assurance,at,at 11,at 11am,at 12pm,at 1pm,at all,at an,at best,at company,at it,at large,at least,at my,at noon,at some,at the,at this,at uc,at university,at which,at work,at your,atleast,atom,attached,attached to,attack,attend,attention,audiences,australia,author,author of,automated,automation,automotive,available,available for,available to,average,...,to spread,to start,to stay,to study,to support,to take,to talk,to tell,to test,to that,to the,to their,to these,to think,to this,to try,to understand,to university,to use,to what,to work,to your,today,today are,today is,together,told,told me,told that,tomorrow,tons,too,too much,took,tool,top,top of,topic,topics,torque,total,touch,tough,towards,track,trade,traditional,training,training to,transportation,travel,tried,tried to,trouble,troubleshooting,trucks,true,try,try and,try my,try to,trying,trying to,tube,turn,turned,turns,twitter,twitter com,twitter https,two,two different,two of,two weeks,two years,type,type of,types,types of,typical,uc,uk,ultimately,unable,unable to,under,under the,undergrad,undergraduate,understand,understand it,understand that,understand the,understanding,understanding is,understanding of,unfortunately,unique,united,united states,units,university,university of,unless,unnecessary,unsure,until,up,up and,up for,up in,up my,up on,up the,up to,up what,up with,update,upon,upper,us,us and,us anything,us to,us who,use,use it,use my,use of,use that,use the,use to,used,used as,used in,used to,used with,useful,useless,user,uses,using,using the,usually,ut,ut ama,ut ask,ut to,vague,valuable,value,variable,variety,variety of,various,ve,ve also,ve been,ve found,ve got,ve had,ve made,ve never,ve seen,vehicles,velocity,version,version of,vertical,very,very early,very good,via,video,videos,view,visit,voltage,volume,vs,wait,walked,wall,want,want to,wanted,wanted to,wanting,wanting to,wants,wants to,was,was able,was an,was bit,was hoping,was in,was just,was looking,was the,was thinking,was told,was wondering,was your,washington,wasn,wasting,wasting my,watch,watching,water,water to,wave,way,way can,way to,ways,ways to,we,we all,we also,we are,we can,we do,we don,we found,we had,we have,we know,we ll,we need,we re,we take,we ve,we were,we will,we would,web,website,week,weekend,weekends,weeks,weight,weird,welcome,welcome thank,well,well as,well in,well known,well so,went,went to,were,were to,west,what,what about,what am,what are,what can,what career,what do,what does,what have,what is,what it,what kind,what other,what should,what skills,what the,what they,what to,what want,what was,what would,what you,whatever,wheel,when,when it,when the,when you,where,where am,where could,where do,where my,where the,where to,where we,where you,whether,whether it,whether the,which,which are,which can,which is,which means,which was,which we,which would,while,white,who,who are,who can,who has,who have,who is,whole,whole life,why,why do,why does,why is,why not,why the,why we,wide,wiki,wikipedia,wildlife,will,will also,will be,will discuss,will have,will help,will it,will make,will not,will the,will work,willing,willing to,wind,windows,winter,wise,with,with all,with an,with each,with it,with me,with more,with my,with no,with other,with some,with the,with their,with these,with this,with us,with what,with your,within,within the,without,without having,women,won,won be,wonder,wonder if,wondering,wondering if,wondering what,wont,wood,wooden,word,words,work,work and,work as,work at,work experience,work for,work https,work in,work on,work or,work so,work that,work to,work ve,work what,work with,worked,worked as,workers,working,working as,working at,working for,working in,working on,working with,workload,works,world,worried,worried about,worried that,worst,worth,worth it,would,would be,would do,would have,would help,would it,would just,would like,would love,would make,would most,would seem,would the,would this,would work,would you,wouldn,wouldn be,wrapped,write,writing,written,wrong,wrote,www,www amazon,www facebook,www nature,www nytimes,www reddit,www youtube,x200b,x200b any,year,year and,year at,year in,year of,year the,years,years after,years ago,years and,years at,years now,years of,yes,yet,yet and,yield,york,york times,you,you all,you are,you can,you could,you do,you don,you feel,you for,you get,you guys,you have,you know,you ll,you may,you might,you re,you recommend,you think,you to,you use,you ve,you want,you would,your,your boss,your experience,your expertise,your questions,your thoughts,your time,youtube,youtube com,zero
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0


In [43]:
total_df.shape

(1000, 2028)

In [101]:
science_df2 = total_df[total_df['subreddit_y']=='askscience']

In [102]:
engineer_df2 = total_df[total_df['subreddit_y']=='AskEngineers']

In [103]:
science_df2.reset_index(inplace=True)

In [104]:
engineer_df2.reset_index(inplace=True)

In [105]:
engineer_clean = engineer_df2.drop(['subreddit_y','level_0'], axis=1)

In [106]:
science_clean = science_df2.drop(['subreddit_y','level_0'], axis=1)

In [107]:
science_clean.sum().sort_values(ascending=False).head(40)

created_utc     783832367939
index                 249702
score                 229958
num_comments           29112
the                     1020
and                      706
of                       629
to                       506
in                       423
is                       266
that                     236
for                      210
at                       205
on                       201
we                       190
https                    188
it                       163
are                      150
be                       132
with                     130
research                 121
about                    120
from                     119
of the                   117
www                      115
as                       113
this                     110
or                       109
in the                   105
com                      104
my                       101
you                      100
can                       96
https www                 94
our           

In [108]:
engineer_clean.sum().sort_values(ascending=False).head(40)

created_utc     789929505839
index                 249798
num_comments            5159
the                     1955
to                      1729
and                     1413
in                      1043
of                       972
is                       771
my                       756
that                     712
for                      685
it                       637
score                    506
have                     488
this                     446
be                       434
with                     429
but                      428
on                       392
would                    349
as                       323
or                       312
am                       312
at                       307
what                     303
an                       300
you                      299
if                       290
engineering              283
not                      278
can                      277
so                       269
like                     267
are           

In [109]:
engineer_only = engineer_clean.sum()[science_clean.sum()==0].sort_values(ascending=False)

In [110]:
science_only = science_clean.sum()[engineer_clean.sum()==0].sort_values(ascending=False)

In [111]:
engineer_only.shape

(1283,)

In [112]:
science_only.shape

(301,)

In [113]:
total_df.shape

(1000, 4370)

In [115]:
science_only.head(40)

ut                53
et                52
twitter           37
edu               36
nih               36
cells             35
ama               29
ph                26
ut ask            26
nih gov           25
your questions    24
twitter com       24
https twitter     24
ask us            22
us anything       21
we ll             18
scientists        18
institute         18
scientific        18
me anything       17
biological        17
published         17
answer your       16
16 ut             16
neutrons          15
19                15
disease           14
black             14
00                14
species           13
mission           13
medicine          13
the brain         13
population        13
children          13
emotional         12
dna               12
stars             12
et 16             12
articles          11
dtype: int64

In [117]:
engineer_only.head(40)

mechanical                68
advice                    57
the company               43
internship                42
interview                 40
thank you                 39
thank                     39
guys                      36
my first                  32
to work                   32
mechanical engineering    30
any advice                30
offer                     29
graduated                 27
am currently              27
my career                 27
to start                  26
of engineering            25
boss                      25
positions                 25
an engineer               24
in engineering            24
path                      23
it was                    23
motor                     23
you guys                  23
ee                        22
sector                    22
asked                     21
my resume                 21
interviews                21
civil                     21
is my                     21
my job                    20
if anyone     

In [None]:
plot.