# Feature Engineering

The next step is to create features from the raw text so we can train the machine learning models. The steps followed are:

1. **Text Cleaning and Preparation**: cleaning of special characters, downcasing, punctuation signs. possessive pronouns and stop words removal and lemmatization. 
2. **Label coding**: creation of a dictionary to map each category to a code.
3. **Train-test split**: to test the models on unseen data.
4. **Text representation**: use of TF-IDF scores to represent text.

In [1]:
import pickle
import pandas as pd
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import chi2
import numpy as np

First of all we'll load the dataset:

In [2]:
#path_df = "/home/lnc/0. Latest News Classifier/02. Exploratory Data Analysis/News_dataset.pickle"

#with open(path_df, 'rb') as data:
#    df = pickle.load(data)

In [3]:
df = pd.read_csv('ml_model.csv')


  interactivity=interactivity, compiler=compiler, result=result)


In [30]:
#21932 - 42286 drop 

df[21900:21950]

Unnamed: 0,Name,Type,Party,Republican,year,mean_word_length,word_count,some_word_count,unique_word,some_unique_word,...,positive_words_ratio,negative_words_ratio,we_count,war_count,i_count,Sentences,vader_neg,vader_pos,vader_neu,vader_com
21900,Woodrow Wilson,Inaugural Address,Democrat,0.0,1913.0,6.279631,9260.0,6788.0,669.0,557.0,...,0.011344,0.009723,37.0,0.0,4.0,we hadnot forgotten our morals.,0.322,0.0,0.678,-0.2263
21901,Woodrow Wilson,Inaugural Address,Democrat,0.0,1913.0,6.279631,9260.0,6788.0,669.0,557.0,...,0.011344,0.009723,37.0,0.0,4.0,we remembered well enough that we had set upa ...,0.0,0.315,0.685,0.9136
21902,Woodrow Wilson,Inaugural Address,Democrat,0.0,1913.0,6.279631,9260.0,6788.0,669.0,557.0,...,0.011344,0.009723,37.0,0.0,4.0,but we were very heedless and in a hurry to be...,0.0,0.361,0.639,0.7684
21903,Woodrow Wilson,Inaugural Address,Democrat,0.0,1913.0,6.279631,9260.0,6788.0,669.0,557.0,...,0.011344,0.009723,37.0,0.0,4.0,we have come now to the sober second thought.,0.0,0.0,1.0,0.0
21904,Woodrow Wilson,Inaugural Address,Democrat,0.0,1913.0,6.279631,9260.0,6788.0,669.0,557.0,...,0.011344,0.009723,37.0,0.0,4.0,the scales of heedlessnesshave fallen from our...,0.263,0.0,0.737,-0.3612
21905,Woodrow Wilson,Inaugural Address,Democrat,0.0,1913.0,6.279631,9260.0,6788.0,669.0,557.0,...,0.011344,0.009723,37.0,0.0,4.0,we have made up our minds to square every proc...,0.0,0.133,0.867,0.6825
21906,Woodrow Wilson,Inaugural Address,Democrat,0.0,1913.0,6.279631,9260.0,6788.0,669.0,557.0,...,0.011344,0.009723,37.0,0.0,4.0,our work is a work ofrestoration.,0.0,0.0,1.0,0.0
21907,Woodrow Wilson,Inaugural Address,Democrat,0.0,1913.0,6.279631,9260.0,6788.0,669.0,557.0,...,0.011344,0.009723,37.0,0.0,4.0,we have itemized with some degree of particula...,0.084,0.102,0.815,0.5423
21908,Woodrow Wilson,Inaugural Address,Democrat,0.0,1913.0,6.279631,9260.0,6788.0,669.0,557.0,...,0.011344,0.009723,37.0,0.0,4.0,we have studied as perhapsno other nation has ...,0.0,0.064,0.936,0.2952
21909,Woodrow Wilson,Inaugural Address,Democrat,0.0,1913.0,6.279631,9260.0,6788.0,669.0,557.0,...,0.011344,0.009723,37.0,0.0,4.0,nor have we studied and perfected the means by...,0.046,0.157,0.797,0.6808


And visualize one sample news content:

In [38]:
df=df.drop(df.index[21932:42286])


In [39]:
df.tail()

Unnamed: 0,Name,Type,Republican,year,mean_word_length,word_count,some_word_count,unique_word,some_unique_word,unique_word_ratio,...,positive_words_ratio,negative_words_ratio,we_count,war_count,i_count,Sentences,vader_neg,vader_pos,vader_neu,vader_com
21927,Woodrow Wilson,Inaugural Address,0.0,1913.0,6.279631,9260.0,6788.0,669.0,557.0,0.072246,...,0.011344,0.009723,37.0,0.0,4.0,men's hearts waitupon us; men's lives hang in ...,0.0,0.128,0.872,0.4215
21928,Woodrow Wilson,Inaugural Address,0.0,1913.0,6.279631,9260.0,6788.0,669.0,557.0,0.072246,...,0.011344,0.009723,37.0,0.0,4.0,who shall live up to the great trust?,0.0,0.552,0.448,0.8126
21929,Woodrow Wilson,Inaugural Address,0.0,1913.0,6.279631,9260.0,6788.0,669.0,557.0,0.072246,...,0.011344,0.009723,37.0,0.0,4.0,who dares fail totry?,0.538,0.0,0.462,-0.5423
21930,Woodrow Wilson,Inaugural Address,0.0,1913.0,6.279631,9260.0,6788.0,669.0,557.0,0.072246,...,0.011344,0.009723,37.0,0.0,4.0,"i summon all honest men, all patriotic, all fo...",0.0,0.248,0.752,0.5106
21931,Woodrow Wilson,Inaugural Address,0.0,1913.0,6.279631,9260.0,6788.0,669.0,557.0,0.072246,...,0.011344,0.009723,37.0,0.0,4.0,"god helping me, i will not fail them, if they ...",0.0,0.328,0.672,0.5215


In [13]:
# Saving the lemmatizer into an object
wordnet_lemmatizer = WordNetLemmatizer()

In [46]:
df.head(1)

Unnamed: 0,Name,Type,Republican,year,mean_word_length,word_count,some_word_count,unique_word,some_unique_word,unique_word_ratio,...,negative_words_ratio,we_count,war_count,i_count,Sentences,vader_neg,vader_pos,vader_neu,vader_com,Category_Code
0,Donald Trump,State of the Union,1.0,2019.0,6.01407,29897.0,22255.0,1659.0,1387.0,0.055491,...,0.007953,114.0,6.0,41.0,"madam speaker, mr. vice president, membe...",0.0,0.097,0.903,0.4215,1.0


In [40]:
list_columns = ['Name', 'Type', 'Republican', 'year', 'mean_word_length',
       'word_count', 'some_word_count', 'unique_word', 'some_unique_word',
       'unique_word_ratio', 'some_unique_word_ratio', 'words', 'sentences',
       'mean_sentence_length', 'positive_words', 'negative_words',
       'positive_words_ratio', 'negative_words_ratio', 'we_count', 'war_count',
       'i_count', 'Sentences', 'vader_neg', 'vader_pos', 'vader_neu',
       'vader_com']
df = df[list_columns]

df = df.rename(columns={'Speech_Parsed_6': 'Speech_Parsed'})

In [36]:
df.columns

Index(['Name', 'Type', 'Party', 'Republican', 'year', 'mean_word_length',
       'word_count', 'some_word_count', 'unique_word', 'some_unique_word',
       'unique_word_ratio', 'some_unique_word_ratio', 'words', 'sentences',
       'mean_sentence_length', 'positive_words', 'negative_words',
       'positive_words_ratio', 'negative_words_ratio', 'we_count', 'war_count',
       'i_count', 'Sentences', 'vader_neg', 'vader_pos', 'vader_neu',
       'vader_com'],
      dtype='object')

**IMPORTANT:**

We need to remember that our model will gather the latest news articles from different newspapers every time we want. For that reason, we not only need to take into account the peculiarities of the training set articles, but also possible ones that are present in the gathered news articles.

For this reason, possible peculiarities have been studied in the *05. News Scraping* folder.

## 2. Label coding

We'll create a dictionary with the label codification:

In [43]:
#Republican = {
#    'democrat': 0,
 #   'republican': 1}

In [45]:
# Category mapping
#df['Category_Code'] = df['Republican']
#df = df.replace({'Category_Code':Republican})

TypeError: Cannot compare types 'ndarray(dtype=float64)' and 'str'

In [47]:
df.head()

Unnamed: 0,Name,Type,Republican,year,mean_word_length,word_count,some_word_count,unique_word,some_unique_word,unique_word_ratio,...,negative_words_ratio,we_count,war_count,i_count,Sentences,vader_neg,vader_pos,vader_neu,vader_com,Category_Code
0,Donald Trump,State of the Union,1.0,2019.0,6.01407,29897.0,22255.0,1659.0,1387.0,0.055491,...,0.007953,114.0,6.0,41.0,"madam speaker, mr. vice president, membe...",0.0,0.097,0.903,0.4215,1.0
1,Donald Trump,State of the Union,1.0,2019.0,6.01407,29897.0,22255.0,1659.0,1387.0,0.055491,...,0.007953,114.0,6.0,41.0,"as we begin a new congress, i stand here ready...",0.0,0.122,0.878,0.3612,1.0
2,Donald Trump,State of the Union,1.0,2019.0,6.01407,29897.0,22255.0,1659.0,1387.0,0.055491,...,0.007953,114.0,6.0,41.0,millions of our fellow citizens are watching u...,0.053,0.159,0.788,0.4954,1.0
3,Donald Trump,State of the Union,1.0,2019.0,6.01407,29897.0,22255.0,1659.0,1387.0,0.055491,...,0.007953,114.0,6.0,41.0,the agenda i will lay out this evening is not ...,0.0,0.0,1.0,0.0,1.0
4,Donald Trump,State of the Union,1.0,2019.0,6.01407,29897.0,22255.0,1659.0,1387.0,0.055491,...,0.007953,114.0,6.0,41.0,it is the agenda of the american people.,0.0,0.0,1.0,0.0,1.0


## 3. Train - test split

We'll set apart a test set to prove the quality of our models. We'll do Cross Validation in the train set in order to tune the hyperparameters and then test performance on the unseen data of the test set.

In [48]:
X_train, X_test, y_train, y_test = train_test_split(df['Sentences'], 
                                                    df['Republican'], 
                                                    test_size=0.3, 
                                                    random_state=8)

We have 21,000 observations. we choose a test set size of 30% of the full dataset.

## 4. Text representation

We have various options:

* Count Vectors as features
* TF-IDF Vectors as features
* Word Embeddings as features
* Text / NLP based features
* Topic Models as features

We'll use **TF-IDF Vectors** as features.

We have to define the different parameters:

* `ngram_range`: We want to consider both unigrams and bigrams.
* `max_df`: When building the vocabulary ignore terms that have a document
    frequency strictly higher than the given threshold
* `min_df`: When building the vocabulary ignore terms that have a document
    frequency strictly lower than the given threshold.
* `max_features`: If not None, build a vocabulary that only consider the top
    max_features ordered by term frequency across the corpus.

See `TfidfVectorizer?` for further detail.

It needs to be mentioned that we are implicitly scaling our data when representing it as TF-IDF features with the argument `norm`.

In [49]:
# Parameter election
ngram_range = (1,2)
min_df = 10
max_df = 1.
max_features = 100

We have chosen these values as a first approximation. Since the models that we develop later have a very good predictive power, we'll stick to these values. But it has to be mentioned that different combinations could be tried in order to improve even more the accuracy of the models.

In [50]:
tfidf = TfidfVectorizer(encoding='utf-8',
                        ngram_range=ngram_range,
                        stop_words=None,
                        lowercase=False,
                        max_df=max_df,
                        min_df=min_df,
                        max_features=max_features,
                        norm='l2',
                        sublinear_tf=True)
                        
features_train = tfidf.fit_transform(X_train).toarray()
labels_train = y_train
print(features_train.shape)

features_test = tfidf.transform(X_test).toarray()
labels_test = y_test
print(features_test.shape)

(15352, 100)
(6580, 100)


Please note that we have fitted and then transformed the training set, but we have **only transformed** the **test set**.

We can use the Chi squared test in order to see what unigrams and bigrams are most correlated with each category:

In [52]:
from sklearn.feature_selection import chi2
import numpy as np

for speech, party in sorted(df['Republican'].items()):
    features_chi2 = chi2(features_train, labels_train == party)
    indices = np.argsort(features_chi2[0])
    feature_names = np.array(tfidf.get_feature_names())[indices]
    unigrams = [v for v in feature_names if len(v.split(' ')) == 1]
    bigrams = [v for v in feature_names if len(v.split(' ')) == 2]
    print("# '{}' category:".format(speech))
    print("  . Most correlated unigrams:\n. {}".format('\n. '.join(unigrams[-5:])))
    print("  . Most correlated bigrams:\n. {}".format('\n. '.join(bigrams[-2:])))
    print("")


# '0' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '5' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '6' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '7' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of


# '65' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '66' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '67' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '68' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '69' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '70' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '71' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '72' category:
  . Most correlated unigrams:
. do
. which
. 000
. governme

  . Most correlated bigrams:
. to the
. of the

# '131' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '132' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '133' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '134' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '135' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '136' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '137' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '138' category:
  .

# '194' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '195' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '196' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '197' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '198' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '199' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '200' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '201' category:
  . Most correlated unigrams:
. do
. which
. 000
. 

# '258' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '259' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '260' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '261' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '262' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '263' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '264' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '265' category:
  . Most correlated unigrams:
. do
. which
. 000
. 

# '324' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '325' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '326' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '327' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '328' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '329' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '330' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '331' category:
  . Most correlated unigrams:
. do
. which
. 000
. 

# '391' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '392' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '393' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '394' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '395' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '396' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '397' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '398' category:
  . Most correlated unigrams:
. do
. which
. 000
. 

# '455' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '456' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '457' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '458' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '459' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '460' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '461' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '462' category:
  . Most correlated unigrams:
. do
. which
. 000
. 

# '518' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '519' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '520' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '521' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '522' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '523' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '524' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '525' category:
  . Most correlated unigrams:
. do
. which
. 000
. 

# '580' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '581' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '582' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '583' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '584' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '585' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '586' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '587' category:
  . Most correlated unigrams:
. do
. which
. 000
. 

# '644' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '645' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '646' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '647' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '648' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '649' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '650' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '651' category:
  . Most correlated unigrams:
. do
. which
. 000
. 

  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '710' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '711' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '712' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '713' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '714' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '715' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '716' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  

# '771' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '772' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '773' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '774' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '775' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '776' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '777' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '778' category:
  . Most correlated unigrams:
. do
. which
. 000
. 

# '834' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '835' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '836' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '837' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '838' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '839' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '840' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '841' category:
  . Most correlated unigrams:
. do
. which
. 000
. 

# '900' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '901' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '902' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '903' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '904' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '905' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '906' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '907' category:
  . Most correlated unigrams:
. do
. which
. 000
. 

# '962' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '963' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '964' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '965' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '966' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '967' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '968' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '969' category:
  . Most correlated unigrams:
. do
. which
. 000
. 

# '1030' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1031' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1032' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1033' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1034' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1035' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1036' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1037' category:
  . Most correlated unigrams:
. do
. which


# '1094' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1095' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1096' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1097' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1098' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1099' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1100' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1101' category:
  . Most correlated unigrams:
. do
. which


# '1158' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1159' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1160' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1161' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1162' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1163' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1164' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1165' category:
  . Most correlated unigrams:
. do
. which


# '1224' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1225' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1226' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1227' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1228' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1229' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1230' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1231' category:
  . Most correlated unigrams:
. do
. which


# '1288' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1289' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1290' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1291' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1292' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1293' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1294' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1295' category:
  . Most correlated unigrams:
. do
. which


# '1355' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1356' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1357' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1358' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1359' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1360' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1361' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1362' category:
  . Most correlated unigrams:
. do
. which


# '1417' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1418' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1419' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1420' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1421' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1422' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1423' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1424' category:
  . Most correlated unigrams:
. do
. which


# '1483' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1484' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1485' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1486' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1487' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1488' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1489' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1490' category:
  . Most correlated unigrams:
. do
. which


# '1550' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1551' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1552' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1553' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1554' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1555' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1556' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1557' category:
  . Most correlated unigrams:
. do
. which


# '1613' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1614' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1615' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1616' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1617' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1618' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1619' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1620' category:
  . Most correlated unigrams:
. do
. which


# '1676' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1677' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1678' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1679' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1680' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1681' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1682' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1683' category:
  . Most correlated unigrams:
. do
. which


# '1741' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1742' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1743' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1744' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1745' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1746' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1747' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1748' category:
  . Most correlated unigrams:
. do
. which


# '1807' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1808' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1809' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1810' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1811' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1812' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1813' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1814' category:
  . Most correlated unigrams:
. do
. which


# '1875' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1876' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1877' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1878' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1879' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1880' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1881' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1882' category:
  . Most correlated unigrams:
. do
. which


# '1942' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1943' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1944' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1945' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1946' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1947' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1948' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '1949' category:
  . Most correlated unigrams:
. do
. which


# '2011' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2012' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2013' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2014' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2015' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2016' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2017' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2018' category:
  . Most correlated unigrams:
. do
. which


# '2078' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2079' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2080' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2081' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2082' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2083' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2084' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2085' category:
  . Most correlated unigrams:
. do
. which


# '2144' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2145' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2146' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2147' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2148' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2149' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2150' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2151' category:
  . Most correlated unigrams:
. do
. which


# '2210' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2211' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2212' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2213' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2214' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2215' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2216' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2217' category:
  . Most correlated unigrams:
. do
. which


# '2278' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2279' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2280' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2281' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2282' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2283' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2284' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2285' category:
  . Most correlated unigrams:
. do
. which


# '2344' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2345' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2346' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2347' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2348' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2349' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2350' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2351' category:
  . Most correlated unigrams:
. do
. which



# '2406' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2407' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2408' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2409' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2410' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2411' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2412' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2413' category:
  . Most correlated unigrams:
. do
. which

# '2468' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2469' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2470' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2471' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2472' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2473' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2474' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2475' category:
  . Most correlated unigrams:
. do
. which


# '2536' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2537' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2538' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2539' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2540' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2541' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2542' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2543' category:
  . Most correlated unigrams:
. do
. which


# '2601' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2602' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2603' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2604' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2605' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2606' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2607' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2608' category:
  . Most correlated unigrams:
. do
. which


# '2669' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2670' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2671' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2672' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2673' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2674' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2675' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2676' category:
  . Most correlated unigrams:
. do
. which


# '2736' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2737' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2738' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2739' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2740' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2741' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2742' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2743' category:
  . Most correlated unigrams:
. do
. which


# '2807' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2808' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2809' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2810' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2811' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2812' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2813' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2814' category:
  . Most correlated unigrams:
. do
. which


# '2877' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2878' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2879' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2880' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2881' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2882' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2883' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2884' category:
  . Most correlated unigrams:
. do
. which


# '2943' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2944' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2945' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2946' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2947' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2948' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2949' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '2950' category:
  . Most correlated unigrams:
. do
. which


# '3011' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3012' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3013' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3014' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3015' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3016' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3017' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3018' category:
  . Most correlated unigrams:
. do
. which



# '3079' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3080' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3081' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3082' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3083' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3084' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3085' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3086' category:
  . Most correlated unigrams:
. do
. which

# '3145' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3146' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3147' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3148' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3149' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3150' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3151' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3152' category:
  . Most correlated unigrams:
. do
. which


# '3211' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3212' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3213' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3214' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3215' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3216' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3217' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3218' category:
  . Most correlated unigrams:
. do
. which


# '3278' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3279' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3280' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3281' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3282' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3283' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3284' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3285' category:
  . Most correlated unigrams:
. do
. which


# '3342' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3343' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3344' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3345' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3346' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3347' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3348' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3349' category:
  . Most correlated unigrams:
. do
. which


# '3408' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3409' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3410' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3411' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3412' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3413' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3414' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3415' category:
  . Most correlated unigrams:
. do
. which


# '3470' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3471' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3472' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3473' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3474' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3475' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3476' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3477' category:
  . Most correlated unigrams:
. do
. which


# '3534' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3535' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3536' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3537' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3538' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3539' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3540' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3541' category:
  . Most correlated unigrams:
. do
. which


# '3597' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3598' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3599' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3600' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3601' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3602' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3603' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3604' category:
  . Most correlated unigrams:
. do
. which


# '3660' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3661' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3662' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3663' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3664' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3665' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3666' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3667' category:
  . Most correlated unigrams:
. do
. which


# '3723' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3724' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3725' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3726' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3727' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3728' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3729' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3730' category:
  . Most correlated unigrams:
. do
. which


# '3788' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3789' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3790' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3791' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3792' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3793' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3794' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3795' category:
  . Most correlated unigrams:
. do
. which


# '3854' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3855' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3856' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3857' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3858' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3859' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3860' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3861' category:
  . Most correlated unigrams:
. do
. which


# '3918' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3919' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3920' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3921' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3922' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3923' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3924' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3925' category:
  . Most correlated unigrams:
. do
. which


# '3982' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3983' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3984' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3985' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3986' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3987' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3988' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '3989' category:
  . Most correlated unigrams:
. do
. which


# '4045' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4046' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4047' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4048' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4049' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4050' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4051' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4052' category:
  . Most correlated unigrams:
. do
. which


# '4107' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4108' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4109' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4110' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4111' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4112' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4113' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4114' category:
  . Most correlated unigrams:
. do
. which


# '4178' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4179' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4180' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4181' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4182' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4183' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4184' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4185' category:
  . Most correlated unigrams:
. do
. which


# '4241' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4242' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4243' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4244' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4245' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4246' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4247' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4248' category:
  . Most correlated unigrams:
. do
. which


# '4308' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4309' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4310' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4311' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4312' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4313' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4314' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4315' category:
  . Most correlated unigrams:
. do
. which


# '4372' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4373' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4374' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4375' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4376' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4377' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4378' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4379' category:
  . Most correlated unigrams:
. do
. which


# '4440' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4441' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4442' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4443' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4444' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4445' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4446' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4447' category:
  . Most correlated unigrams:
. do
. which


# '4509' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4510' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4511' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4512' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4513' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4514' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4515' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4516' category:
  . Most correlated unigrams:
. do
. which


# '4573' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4574' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4575' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4576' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4577' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4578' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4579' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4580' category:
  . Most correlated unigrams:
. do
. which


# '4636' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4637' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4638' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4639' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4640' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4641' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4642' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4643' category:
  . Most correlated unigrams:
. do
. which


# '4701' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4702' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4703' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4704' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4705' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4706' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4707' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4708' category:
  . Most correlated unigrams:
. do
. which


# '4764' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4765' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4766' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4767' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4768' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4769' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4770' category:
  . Most correlated unigrams:
. do
. which
. 000
. government
. of
  . Most correlated bigrams:
. to the
. of the

# '4771' category:
  . Most correlated unigrams:
. do
. which


KeyboardInterrupt: 

As we can see, the unigrams correspond well to their category. However, bigrams do not. If we get the bigrams in our features:

In [None]:
bigrams

We can see there are only six. This means the unigrams have more correlation with the category than the bigrams, and since we're restricting the number of features to the most representative 300, only a few bigrams are being considered.

Let's save the files we'll need in the next steps:

In [54]:
# X_train
with open('Pickles/X_train.pickle', 'wb') as output:
    pickle.dump(X_train, output)
    
# X_test    
with open('Pickles/X_test.pickle', 'wb') as output:
    pickle.dump(X_test, output)
    
# y_train
with open('Pickles/y_train.pickle', 'wb') as output:
    pickle.dump(y_train, output)
    
# y_test
with open('Pickles/y_test.pickle', 'wb') as output:
    pickle.dump(y_test, output)
    
# df
with open('Pickles/df.pickle', 'wb') as output:
    pickle.dump(df, output)
    
# features_train
with open('Pickles/features_train.pickle', 'wb') as output:
    pickle.dump(features_train, output)

# labels_train
with open('Pickles/labels_train.pickle', 'wb') as output:
    pickle.dump(labels_train, output)

# features_test
with open('Pickles/features_test.pickle', 'wb') as output:
    pickle.dump(features_test, output)

# labels_test
with open('Pickles/labels_test.pickle', 'wb') as output:
    pickle.dump(labels_test, output)
    
# TF-IDF object
with open('Pickles/tfidf.pickle', 'wb') as output:
    pickle.dump(tfidf, output)