### Bag of words model

In [1]:
# load all necessary libraries
import pandas as pd
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import CountVectorizer

pd.set_option('max_colwidth', 100)

#### Let's build a basic bag of words model on three sample documents

In [2]:
documents = ["Gangs of Wasseypur is a great movie.", "The success of a movie depends on the performance of the actors.", "There are no new movies releasing this week."]
print(documents)

['Gangs of Wasseypur is a great movie.', 'The success of a movie depends on the performance of the actors.', 'There are no new movies releasing this week.']


In [3]:
def preprocess(document):
    'changes document to lower case and removes stopwords'

    # change sentence to lower case
    document = document.lower()

    # tokenize into words
    words = word_tokenize(document)

    # remove stop words
    words = [word for word in words if word not in stopwords.words("english")]

    # join words to make sentence
    document = " ".join(words)
    
    return document

documents = [preprocess(document) for document in documents]
print(documents)


['gangs wasseypur great movie .', 'success movie depends performance actors .', 'new movies releasing week .']


#### Creating bag of words model using count vectorizer function

In [5]:
vectorizer = CountVectorizer()
bow_model = vectorizer.fit_transform(documents)
print(bow_model)  # returns the rown and column number of cells which have 1 as value

  (0, 4)	1
  (0, 3)	1
  (0, 10)	1
  (0, 2)	1
  (1, 0)	1
  (1, 7)	1
  (1, 1)	1
  (1, 9)	1
  (1, 4)	1
  (2, 11)	1
  (2, 8)	1
  (2, 5)	1
  (2, 6)	1


In [7]:
# print the full sparse matrix
print(bow_model.toarray())

[[0 0 1 1 1 0 0 0 0 0 1 0]
 [1 1 0 0 1 0 0 1 0 1 0 0]
 [0 0 0 0 0 1 1 0 1 0 0 1]]


In [8]:
print(bow_model.shape)
print(vectorizer.get_feature_names())

(3, 12)
['actors', 'depends', 'gangs', 'great', 'movie', 'movies', 'new', 'performance', 'releasing', 'success', 'wasseypur', 'week']


### Let's create a bag of words model on the spam dataset.

In [26]:
# load data
spam = pd.read_csv("SMSSpamCollection.txt", sep = "\t", names=["label", "message"])
print (spam.shape)
spam.head()

(5572, 2)


Unnamed: 0,label,message
0,ham,"Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there g..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive ...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives around here though"


##### Let's take a subset of data (first 50 rows only) and create bag of word model on that.

In [27]:
spam = spam.iloc[0:100,:]
print(spam)

   label  \
0    ham   
1    ham   
2   spam   
3    ham   
4    ham   
..   ...   
95  spam   
96   ham   
97   ham   
98   ham   
99   ham   

                                                                                                message  
0   Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there g...  
1                                                                         Ok lar... Joking wif u oni...  
2   Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive ...  
3                                                     U dun say so early hor... U c already then say...  
4                                         Nah I don't think he goes to usf, he lives around here though  
..                                                                                                  ...  
95  Your free ringtone is waiting to be collected. Simply text the password "MIX" to 85069 to verify...  
96     

In [28]:
# extract the messages from the dataframe
messages = spam.message[0:100]
print(messages)

0     Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there g...
1                                                                           Ok lar... Joking wif u oni...
2     Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive ...
3                                                       U dun say so early hor... U c already then say...
4                                           Nah I don't think he goes to usf, he lives around here though
                                                     ...                                                 
95    Your free ringtone is waiting to be collected. Simply text the password "MIX" to 85069 to verify...
96                                                                      Watching telugu movie..wat abt u?
97                                                    i see. When we finish we have loads of loans to pay
98    Hi. Wk been ok - on hols now! Yes on for

In [29]:
# convert messages into list
messages = [message for message in messages]
print(messages)

['Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...', 'Ok lar... Joking wif u oni...', "Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)T&C's apply 08452810075over18's", 'U dun say so early hor... U c already then say...', "Nah I don't think he goes to usf, he lives around here though", "FreeMsg Hey there darling it's been 3 week's now and no word back! I'd like some fun you up for it still? Tb ok! XxX std chgs to send, Â£1.50 to rcv", 'Even my brother is not like to speak with me. They treat me like aids patent.', "As per your request 'Melle Melle (Oru Minnaminunginte Nurungu Vettam)' has been set as your callertune for all Callers. Press *9 to copy your friends Callertune", 'WINNER!! As a valued network customer you have been selected to receivea Â£900 prize reward! To claim call 09061701461. Claim code KL341. Valid 12 hours only.', 'Had your mobil

In [30]:
# preprocess messages using the preprocess function
messages = [preprocess(message) for message in messages]

print(messages)

['go jurong point , crazy.. available bugis n great world la e buffet ... cine got amore wat ...', 'ok lar ... joking wif u oni ...', "free entry 2 wkly comp win fa cup final tkts 21st may 2005. text fa 87121 receive entry question ( std txt rate ) & c 's apply 08452810075over18 's", 'u dun say early hor ... u c already say ...', "nah n't think goes usf , lives around though", "freemsg hey darling 's 3 week 's word back ! 'd like fun still ? tb ok ! xxx std chgs send , â£1.50 rcv", 'even brother like speak . treat like aids patent .', "per request 'melle melle ( oru minnaminunginte nurungu vettam ) ' set callertune callers . press *9 copy friends callertune", 'winner ! ! valued network customer selected receivea â£900 prize reward ! claim call 09061701461. claim code kl341 . valid 12 hours .', 'mobile 11 months ? u r entitled update latest colour mobiles camera free ! call mobile update co free 08002986030', "'m gon na home soon n't want talk stuff anymore tonight , k ? 've cried enoug

In [31]:
# bag of words model
vectorizer = CountVectorizer()
bow_model = vectorizer.fit_transform(messages)

In [32]:
# look at the dataframe
df=pd.DataFrame(bow_model.toarray(), columns = vectorizer.get_feature_names())

In [33]:
df.shape

(100, 644)

In [36]:
df.sum().sum()

940

In [23]:
print(vectorizer.get_feature_names())

['000', '07732584351', '08000930705', '08002986030', '08452810075over18', '09061701461', '100', '11', '12', '150p', '16', '20', '2005', '21st', '2nd', '4403ldnw1a7rw18', '4txt', '50', '6days', '81010', '87077', '87121', '87575', '8am', '900', 'abiola', 'actin', 'aft', 'ahead', 'ahhh', 'aids', 'already', 'alright', 'always', 'amore', 'amp', 'anymore', 'anything', 'apologetic', 'apply', 'arabian', 'ard', 'around', 'ask', 'available', 'back', 'badly', 'bit', 'blessing', 'breather', 'brother', 'buffet', 'bugis', 'burns', 'bus', 'ca', 'call', 'callers', 'callertune', 'calls', 'camcorder', 'camera', 'car', 'cash', 'catch', 'caught', 'chances', 'charged', 'cheers', 'chgs', 'child', 'cine', 'claim', 'clear', 'click', 'co', 'code', 'colour', 'com', 'comin', 'comp', 'confirm', 'convincing', 'copy', 'cost', 'could', 'crave', 'crazy', 'credit', 'cried', 'csh11', 'cup', 'cuppa', 'customer', 'da', 'darling', 'date', 'day', 'dbuk', 'decide', 'decided', 'delivery', 'dinner', 'done', 'dont', 'dun', 'ea

* A lot of duplicate tokens such as 'win'and 'winner'; 'reply' and 'replying'; 'want' and 'wanted' etc. 

## Stemming

In [48]:
from nltk.tokenize import word_tokenize
from nltk.stem.porter import PorterStemmer
import ast, sys
word = sys.stdin.read()
word="Singing"
# instantiate porter stemmer
stemmer = PorterStemmer()# write code here

# stem word
stemmed = stemmer.stem(word)# write your code here

# print stemmed word -- don't change the following code, it is used to evaluate your code
print(stemmed)

sing


In [50]:
from nltk.tokenize import word_tokenize
from nltk.stem.snowball import SnowballStemmer
import ast, sys
word = sys.stdin.read()
word="singing"
# instantiate porter stemmer
stemmer = SnowballStemmer("english")# write code here

# stem word
stemmed = stemmer.stem(word)# write code here

# print stemmed word -- don't change the following code, it is used to evaluate your code
print(stemmed)

sing


## Lemmatization

In [None]:
import nltk.downloader as dn
dn.download("wordnet")

In [59]:
from nltk.stem import WordNetLemmatizer
import ast, sys
word = sys.stdin.read()
word="schooling"
# instantiate wordnet lemmatizer
lemmatizer = WordNetLemmatizer()# write code here

# lemmatize word
lemmatized = lemmatizer.lemmatize(word, pos="v")# write code here. Pass the parameter -> pos='v' to the lemmatize function to lemmatize verbs correctly.

# print lemmatized word -- don't change the following code, it is used to evaluate your code
print(lemmatized)

school


In [61]:
Document1= "Vapour, Bangalore has a really great terrace seating and an awesome view of the Bangalore skyline"
Document2= "The beer at Vapour, Bangalore was amazing. My favourites are the wheat beer and the ale beer."
Document3= "Vapour, Bangalore has the best view in Bangalore."

In [64]:
doc1=preprocess(Document1).split(" ")
doc2=preprocess(Document2).split(" ")
doc3=preprocess(Document3).split(" ")


In [75]:
doc=list(set(doc1+doc2+doc3))

In [80]:
doc1

['vapour',
 ',',
 'bangalore',
 'really',
 'great',
 'terrace',
 'seating',
 'awesome',
 'view',
 'bangalore',
 'skyline']

## TERM Frequency & Inverse Document Frequency

In [82]:
import pandas as pd
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from sklearn.feature_extraction.text import TfidfVectorizer

# consider the following set of documents
documents = ["The coach lumbered on again, with heavier wreaths of mist closing round it as it began the descent.",
             "The guard soon replaced his blunderbuss in his arm-chest, and, having looked to the rest of its contents, and having looked to the supplementary pistols that he wore in his belt, looked to a smaller chest beneath his seat, in which there were a few smith's tools, a couple of torches, and a tinder-box.",
            "For he was furnished with that completeness that if the coach-lamps had been blown and stormed out, which did occasionally happen, he had only to shut himself up inside, keep the flint and steel sparks well off the straw, and get a light with tolerable safety and ease (if he were lucky) in five minutes.",
            "Jerry, left alone in the mist and darkness, dismounted meanwhile, not only to ease his spent horse, but to wipe the mud from his face, and shake the wet out of his hat-brim, which might be capable of holding about half a gallon.",
            "After standing with the bridle over his heavily-splashed arm, until the wheels of the mail were no longer within hearing and the night was quite still again, he turned to walk down the hill."]


# preprocess document
def preprocess(document):
    'changes document to lower case, removes stopwords and stems words'

    # change sentence to lower case
    document = document.lower()

    # tokenize into words
    words = word_tokenize(document)

    # remove stop words
    words = [word for word in words if word not in stopwords.words("english")]
    
    # stem
    stemmer = PorterStemmer()
    words = [stemmer.stem(word) for word in words]
    
    # join words to make sentence
    document = " ".join(words)
    
    return document

# preprocess documents using the preprocess function and store the documents again in a list
documents = [preprocess(document) for document in documents]# write code here


# create tf-idf matrix
## write code here ##
vectorizer =TfidfVectorizer()
tfid_model = vectorizer.fit_transform(documents)
print (tfid_model)


# extract score
score = -1  # replace -1 with the score of 'belt' in document two. You can manually write the value by looking at the tf_idf model

# print the score -- don't change the following piece od code, it's used to evaluate your code
print(round(score, 4))

  (0, 13)	0.28001127926354535
  (0, 46)	0.34706676322953556
  (0, 32)	0.34706676322953556
  (0, 88)	0.34706676322953556
  (0, 51)	0.28001127926354535
  (0, 12)	0.34706676322953556
  (0, 59)	0.34706676322953556
  (0, 2)	0.34706676322953556
  (0, 18)	0.34706676322953556
  (1, 27)	0.17500574860015006
  (1, 66)	0.17500574860015006
  (1, 57)	0.17500574860015006
  (1, 6)	0.17500574860015006
  (1, 1)	0.1411935360448027
  (1, 11)	0.3500114972003001
  (1, 44)	0.5250172458004502
  (1, 58)	0.17500574860015006
  (1, 15)	0.17500574860015006
  (1, 75)	0.17500574860015006
  (1, 55)	0.17500574860015006
  (1, 87)	0.17500574860015006
  (1, 3)	0.17500574860015006
  (1, 64)	0.17500574860015006
  (1, 4)	0.17500574860015006
  (1, 61)	0.17500574860015006
  :	:
  (3, 62)	0.21666637672403882
  (3, 83)	0.21666637672403882
  (3, 30)	0.21666637672403882
  (3, 9)	0.21666637672403882
  (3, 49)	0.21666637672403882
  (3, 10)	0.21666637672403882
  (3, 35)	0.21666637672403882
  (3, 28)	0.21666637672403882
  (3, 25)	0.2

In [87]:
df= pd.DataFrame(tfid_model.toarray(), columns=vectorizer.get_feature_names())
print(df.shape)
df


(5, 89)


Unnamed: 0,alon,arm,began,belt,beneath,blown,blunderbuss,box,bridl,brim,...,torch,turn,walk,well,wet,wheel,wipe,within,wore,wreath
0,0.0,0.0,0.347067,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.347067
1,0.0,0.141194,0.0,0.175006,0.175006,0.0,0.175006,0.175006,0.0,0.0,...,0.175006,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.175006,0.0
2,0.0,0.0,0.0,0.0,0.0,0.20716,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.20716,0.0,0.0,0.0,0.0,0.0,0.0
3,0.216666,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.216666,...,0.0,0.0,0.0,0.0,0.216666,0.0,0.216666,0.0,0.0,0.0
4,0.0,0.203935,0.0,0.0,0.0,0.0,0.0,0.0,0.252773,0.0,...,0.0,0.252773,0.252773,0.0,0.0,0.252773,0.0,0.252773,0.0,0.0


In [135]:
import ast, sys
word = "UpGrad" #sys.stdin.read()

# define a function that returns Soundex of the words that is passed to it 
def get_soundex(token):
    
    sounds= {"bfpv":"1", "cgjkqsxz":"2", "dt":"3", "l":"4", "mn":"5", "r":"6", "aeiouhwy":"."}
    soundex=[]
    soundex.append(token[0])
    token=token.lower()
    for i in token[1:]:
        for k,v in sounds.items():
            if i in k and v!=".":
                soundex.append(v)

    # write code here
    soundex ="".join(soundex)
    soundex.replace(".","")
    soundex = soundex[:4].ljust(4,"0")

    return soundex

# store soundex in a variable
soundex = get_soundex(word)# write code here

# print soundex -- don't change the following piece of code.
print(soundex)

U126


In [110]:
from nltk.metrics.distance import edit_distance

In [112]:
edit_distance("Damerau","Levenshtein",transpositions=True)

10

In [132]:
import numpy as np
a=np.arange(0,271,30)
b=np.arange(0,28,3)
c=np.arange(0,298,33)
#print (c)

for a1 in a:
    for b1 in b:
        t=a1+b1
        if t in c:
            print (a1,b1,t)

0 0 0
30 3 33
60 6 66
90 9 99
120 12 132
150 15 165
180 18 198
210 21 231
240 24 264
270 27 297


In [137]:
nltk.download('tagsets')

[nltk_data] Downloading package tagsets to C:\Users\admin/nltk_data...
[nltk_data]   Unzipping help\tagsets.zip.


True

In [139]:
nltk.help.upenn_tagset('RB')

RB: adverb
    occasionally unabatingly maddeningly adventurously professedly
    stirringly prominently technologically magisterially predominately
    swiftly fiscally pitilessly ...


In [138]:
import nltk

nltk.help.upenn_tagset()

$: dollar
    $ -$ --$ A$ C$ HK$ M$ NZ$ S$ U.S.$ US$
'': closing quotation mark
    ' ''
(: opening parenthesis
    ( [ {
): closing parenthesis
    ) ] }
,: comma
    ,
--: dash
    --
.: sentence terminator
    . ! ?
:: colon or ellipsis
    : ; ...
CC: conjunction, coordinating
    & 'n and both but either et for less minus neither nor or plus so
    therefore times v. versus vs. whether yet
CD: numeral, cardinal
    mid-1890 nine-thirty forty-two one-tenth ten million 0.5 one forty-
    seven 1987 twenty '79 zero two 78-degrees eighty-four IX '60s .025
    fifteen 271,124 dozen quintillion DM2,000 ...
DT: determiner
    all an another any both del each either every half la many much nary
    neither no some such that the them these this those
EX: existential there
    there
FW: foreign word
    gemeinschaft hund ich jeux habeas Haementeria Herr K'ang-si vous
    lutihaw alai je jour objets salutaris fille quibusdam pas trop Monte
    terram fiche oui corporis ...
IN: preposition or

In [141]:
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\admin/nltk_data...
[nltk_data]   Unzipping taggers\averaged_perceptron_tagger.zip.


True

In [142]:
text = word_tokenize("They refuse to permit us to obtain the refuse permit")
nltk.pos_tag(text)

[('They', 'PRP'),
 ('refuse', 'VBP'),
 ('to', 'TO'),
 ('permit', 'VB'),
 ('us', 'PRP'),
 ('to', 'TO'),
 ('obtain', 'VB'),
 ('the', 'DT'),
 ('refuse', 'NN'),
 ('permit', 'NN')]

In [143]:
text = word_tokenize("They refuse the permit us to obtain the refuse permit")
nltk.pos_tag(text)

[('They', 'PRP'),
 ('refuse', 'VBP'),
 ('the', 'DT'),
 ('permit', 'NN'),
 ('us', 'PRP'),
 ('to', 'TO'),
 ('obtain', 'VB'),
 ('the', 'DT'),
 ('refuse', 'NN'),
 ('permit', 'NN')]

In [144]:
text = word_tokenize("They refuse the permit for us but refused permit was not obstacle for us")
nltk.pos_tag(text)

[('They', 'PRP'),
 ('refuse', 'VBP'),
 ('the', 'DT'),
 ('permit', 'NN'),
 ('for', 'IN'),
 ('us', 'PRP'),
 ('but', 'CC'),
 ('refused', 'VBD'),
 ('permit', 'NN'),
 ('was', 'VBD'),
 ('not', 'RB'),
 ('obstacle', 'VBN'),
 ('for', 'IN'),
 ('us', 'PRP')]

In [145]:
text = nltk.Text(word.lower() for word in nltk.corpus.brown.words())
text.similar('woman')

man time day year car moment world house family child country boy
state job place way war girl work word


In [147]:
text.similar('bought')

made said done put had seen found given left heard was been brought
set got that took in told felt


In [146]:
text.similar('over')

in on to of and for with from at by that into as up out down through
is all about


In [148]:
tagged_token = nltk.tag.str2tuple('fly/NN')
tagged_token

('fly', 'NN')

In [149]:
sent = '''
... The/AT grand/JJ jury/NN commented/VBD on/IN a/AT number/NN of/IN
... other/AP topics/NNS ,/, AMONG/IN them/PPO the/AT Atlanta/NP and/CC
... Fulton/NP-tl County/NN-tl purchasing/VBG departments/NNS which/WDT it/PPS
... said/VBD ``/`` ARE/BER well/QL operated/VBN and/CC follow/VB generally/RB
... accepted/VBN practices/NNS which/WDT inure/VB to/IN the/AT best/JJT
... interest/NN of/IN both/ABX governments/NNS ''/'' ./.
... '''

[nltk.tag.str2tuple(t) for t in sent.split()]

[('The', 'AT'),
 ('grand', 'JJ'),
 ('jury', 'NN'),
 ('commented', 'VBD'),
 ('on', 'IN'),
 ('a', 'AT'),
 ('number', 'NN'),
 ('of', 'IN'),
 ('other', 'AP'),
 ('topics', 'NNS'),
 (',', ','),
 ('AMONG', 'IN'),
 ('them', 'PPO'),
 ('the', 'AT'),
 ('Atlanta', 'NP'),
 ('and', 'CC'),
 ('Fulton', 'NP-TL'),
 ('County', 'NN-TL'),
 ('purchasing', 'VBG'),
 ('departments', 'NNS'),
 ('which', 'WDT'),
 ('it', 'PPS'),
 ('said', 'VBD'),
 ('``', '``'),
 ('ARE', 'BER'),
 ('well', 'QL'),
 ('operated', 'VBN'),
 ('and', 'CC'),
 ('follow', 'VB'),
 ('generally', 'RB'),
 ('accepted', 'VBN'),
 ('practices', 'NNS'),
 ('which', 'WDT'),
 ('inure', 'VB'),
 ('to', 'IN'),
 ('the', 'AT'),
 ('best', 'JJT'),
 ('interest', 'NN'),
 ('of', 'IN'),
 ('both', 'ABX'),
 ('governments', 'NNS'),
 ("''", "''"),
 ('.', '.')]

In [150]:
nltk.corpus.brown.tagged_words()

[('The', 'AT'), ('Fulton', 'NP-TL'), ...]

In [151]:
 nltk.corpus.brown.tagged_words(tagset='universal')

[('The', 'DET'), ('Fulton', 'NOUN'), ...]

In [152]:
nltk.corpus.conll2000.tagged_words()

[('Confidence', 'NN'), ('in', 'IN'), ('the', 'DT'), ...]

In [153]:
nltk.corpus.treebank.tagged_words()

[('Pierre', 'NNP'), ('Vinken', 'NNP'), (',', ','), ...]

In [155]:
nltk.download('indian')

[nltk_data] Downloading package indian to C:\Users\admin/nltk_data...
[nltk_data]   Unzipping corpora\indian.zip.


True

In [156]:
nltk.corpus.indian.tagged_words()

[('মহিষের', 'NN'), ('সন্তান', 'NN'), (':', 'SYM'), ...]

In [161]:
nltk.corpus.indian.tagged_words(tagset="Hindi")

[('মহিষের', 'UNK'), ('সন্তান', 'UNK'), (':', 'UNK'), ...]

In [158]:
nltk.corpus.indian.tagged_sents()

[[('মহিষের', 'NN'), ('সন্তান', 'NN'), (':', 'SYM'), ('তোড়া', 'NNP'), ('উপজাতি', 'NN'), ('৷', 'SYM')], [('বাসস্থান-ঘরগৃহস্থালি', 'NN'), ('তোড়া', 'NNP'), ('ভাষায়', 'NN'), ('গ্রামকেও', 'NN'), ('বলে', 'VM'), ('`', 'SYM'), ('মোদ', 'NN'), ("'", 'SYM'), ('৷', 'SYM')], ...]

In [162]:
#nltk.corpus.mac_morpho.tagged_words()
#nltk.corpus.conll2002.tagged_words()


In [163]:
from nltk.corpus import brown
brown_news_tagged = brown.tagged_words(categories='news', tagset='universal')
tag_fd = nltk.FreqDist(tag for (word, tag) in brown_news_tagged)
tag_fd.most_common()

[('NOUN', 30654),
 ('VERB', 14399),
 ('ADP', 12355),
 ('.', 11928),
 ('DET', 11389),
 ('ADJ', 6706),
 ('ADV', 3349),
 ('CONJ', 2717),
 ('PRON', 2535),
 ('PRT', 2264),
 ('NUM', 2166),
 ('X', 92)]

In [183]:
word_tag_pairs = nltk.bigrams(brown_news_tagged)
noun_preceders = [a[1] for (a, b) in word_tag_pairs if b[1] == 'NOUN']
fdist = nltk.FreqDist(noun_preceders)
[tag for (tag, _) in fdist.most_common()]

['NOUN',
 'DET',
 'ADJ',
 'ADP',
 '.',
 'VERB',
 'CONJ',
 'NUM',
 'ADV',
 'PRT',
 'PRON',
 'X']

In [197]:
fdist

FreqDist({'NOUN': 7959, 'DET': 7373, 'ADJ': 4761, 'ADP': 3781, '.': 2796, 'VERB': 1842, 'CONJ': 938, 'NUM': 894, 'ADV': 186, 'PRT': 94, ...})

In [182]:
wsj = nltk.corpus.treebank.tagged_words(tagset='universal')
word_tag_fd = nltk.FreqDist(wsj)
[wt[0] for (wt, _) in word_tag_fd.most_common() if wt[1] == 'VERB']

['is',
 'said',
 'was',
 'are',
 'be',
 'has',
 'have',
 'will',
 'says',
 'would',
 'were',
 'had',
 'been',
 'could',
 "'s",
 'can',
 'do',
 'say',
 'make',
 'may',
 'did',
 'rose',
 'made',
 'does',
 'expected',
 'buy',
 'take',
 'get',
 'might',
 'sell',
 'added',
 'sold',
 'help',
 'including',
 'should',
 'reported',
 'according',
 'pay',
 'compared',
 'being',
 'fell',
 'began',
 'based',
 'used',
 'closed',
 "'re",
 'want',
 'see',
 'took',
 'yield',
 'offered',
 'set',
 'priced',
 'approved',
 'come',
 'noted',
 'cut',
 'ended',
 'found',
 'increased',
 'become',
 'think',
 'named',
 'go',
 'trying',
 'proposed',
 'received',
 'growing',
 'declined',
 'held',
 'give',
 'came',
 'use',
 'put',
 'making',
 'continue',
 'raise',
 'estimated',
 'called',
 'paid',
 'designed',
 'going',
 'expects',
 'seeking',
 'must',
 'plans',
 'wo',
 'increasing',
 'saying',
 'got',
 'owns',
 'trading',
 'acquired',
 'gained',
 'fined',
 'reached',
 'holding',
 'announced',
 'filed',
 'became',


In [198]:
word_tag_fd

FreqDist({(',', '.'): 4885, ('the', 'DET'): 4038, ('.', '.'): 3828, ('of', 'ADP'): 2319, ('to', 'PRT'): 2161, ('a', 'DET'): 1874, ('in', 'ADP'): 1554, ('and', 'CONJ'): 1505, ('*-1', 'X'): 1123, ('0', 'X'): 1099, ...})

In [169]:
cfd1 = nltk.ConditionalFreqDist(wsj)
cfd1['yield'].most_common()

[('VERB', 28), ('NOUN', 20)]

In [201]:
brown_news_tagged = brown.tagged_words(categories='news', tagset='universal')
data = nltk.ConditionalFreqDist((word.lower(), tag)
                                 for (word, tag) in brown_news_tagged)

for word in sorted(data.conditions()):
     if len(data[word]) > 3:
        tags = [tag for (tag, _) in data[word].most_common()]
        print(word, ' '.join(tags))

best ADJ ADV VERB NOUN
close ADV ADJ VERB NOUN
open ADJ VERB NOUN ADV
present ADJ ADV NOUN VERB
that ADP DET PRON ADV


In [207]:
for word in data.conditions():
    print (word, data[word])

the <FreqDist with 1 samples and 6386 outcomes>
fulton <FreqDist with 1 samples and 14 outcomes>
county <FreqDist with 1 samples and 61 outcomes>
grand <FreqDist with 2 samples and 19 outcomes>
jury <FreqDist with 1 samples and 46 outcomes>
said <FreqDist with 1 samples and 406 outcomes>
friday <FreqDist with 1 samples and 41 outcomes>
an <FreqDist with 1 samples and 311 outcomes>
investigation <FreqDist with 1 samples and 11 outcomes>
of <FreqDist with 1 samples and 2861 outcomes>
atlanta's <FreqDist with 1 samples and 4 outcomes>
recent <FreqDist with 1 samples and 20 outcomes>
primary <FreqDist with 2 samples and 17 outcomes>
election <FreqDist with 1 samples and 41 outcomes>
produced <FreqDist with 1 samples and 6 outcomes>
`` <FreqDist with 1 samples and 732 outcomes>
no <FreqDist with 2 samples and 120 outcomes>
evidence <FreqDist with 1 samples and 17 outcomes>
'' <FreqDist with 1 samples and 702 outcomes>
that <FreqDist with 4 samples and 829 outcomes>
any <FreqDist with 1 samp

sponsor <FreqDist with 2 samples and 4 outcomes>
enact <FreqDist with 1 samples and 1 outcomes>
amount <FreqDist with 2 samples and 18 outcomes>
gift <FreqDist with 1 samples and 6 outcomes>
taxpayers' <FreqDist with 1 samples and 2 outcomes>
pockets <FreqDist with 1 samples and 2 outcomes>
contention <FreqDist with 1 samples and 2 outcomes>
denied <FreqDist with 1 samples and 5 outcomes>
several <FreqDist with 1 samples and 39 outcomes>
including <FreqDist with 1 samples and 25 outcomes>
scott <FreqDist with 1 samples and 3 outcomes>
hudson <FreqDist with 1 samples and 2 outcomes>
gaynor <FreqDist with 1 samples and 1 outcomes>
jones <FreqDist with 1 samples and 22 outcomes>
houston <FreqDist with 1 samples and 15 outcomes>
brady <FreqDist with 1 samples and 1 outcomes>
harlingen <FreqDist with 1 samples and 1 outcomes>
howard <FreqDist with 1 samples and 13 outcomes>
cox <FreqDist with 1 samples and 4 outcomes>
argued <FreqDist with 1 samples and 7 outcomes>
probably <FreqDist with 1

trial <FreqDist with 1 samples and 21 outcomes>
indicating <FreqDist with 1 samples and 2 outcomes>
guilt <FreqDist with 1 samples and 3 outcomes>
arrest <FreqDist with 1 samples and 6 outcomes>
parsons <FreqDist with 1 samples and 3 outcomes>
criminal <FreqDist with 2 samples and 6 outcomes>
disclosure <FreqDist with 1 samples and 2 outcomes>
bellows <FreqDist with 1 samples and 5 outcomes>
defense <FreqDist with 1 samples and 23 outcomes>
counsel <FreqDist with 1 samples and 2 outcomes>
startled <FreqDist with 1 samples and 1 outcomes>
observers <FreqDist with 1 samples and 2 outcomes>
viewed <FreqDist with 1 samples and 4 outcomes>
prelude <FreqDist with 1 samples and 1 outcomes>
quarrel <FreqDist with 1 samples and 2 outcomes>
six <FreqDist with 2 samples and 29 outcomes>
eight <FreqDist with 1 samples and 32 outcomes>
policemen <FreqDist with 1 samples and 5 outcomes>
grant <FreqDist with 2 samples and 15 outcomes>
client <FreqDist with 1 samples and 4 outcomes>
alan <FreqDist wit

detailed <FreqDist with 1 samples and 5 outcomes>
application <FreqDist with 1 samples and 2 outcomes>
individual <FreqDist with 2 samples and 21 outcomes>
spots <FreqDist with 1 samples and 2 outcomes>
speech <FreqDist with 1 samples and 7 outcomes>
gave <FreqDist with 1 samples and 22 outcomes>
tremendous <FreqDist with 1 samples and 7 outcomes>
events <FreqDist with 1 samples and 4 outcomes>
inside <FreqDist with 2 samples and 5 outcomes>
preoccupied <FreqDist with 1 samples and 2 outcomes>
months <FreqDist with 1 samples and 42 outcomes>
core <FreqDist with 1 samples and 4 outcomes>
reiterated <FreqDist with 1 samples and 1 outcomes>
states' <FreqDist with 1 samples and 3 outcomes>
profound <FreqDist with 1 samples and 1 outcomes>
attachment <FreqDist with 1 samples and 2 outcomes>
cornerstone <FreqDist with 1 samples and 1 outcomes>
nuclear <FreqDist with 1 samples and 14 outcomes>
submarines <FreqDist with 1 samples and 5 outcomes>
eventually <FreqDist with 1 samples and 4 outcom

authorized <FreqDist with 1 samples and 4 outcomes>
adopt <FreqDist with 1 samples and 3 outcomes>
nothing <FreqDist with 1 samples and 10 outcomes>
sixth <FreqDist with 1 samples and 11 outcomes>
disposition <FreqDist with 1 samples and 2 outcomes>
hesitated <FreqDist with 1 samples and 1 outcomes>
prosecute <FreqDist with 1 samples and 1 outcomes>
heavy <FreqDist with 1 samples and 10 outcomes>
simplest <FreqDist with 1 samples and 1 outcomes>
offense <FreqDist with 1 samples and 4 outcomes>
plainfield <FreqDist with 1 samples and 3 outcomes>
mitchell <FreqDist with 1 samples and 16 outcomes>
walter <FreqDist with 1 samples and 6 outcomes>
r-bergen <FreqDist with 1 samples and 1 outcomes>
value <FreqDist with 1 samples and 14 outcomes>
using <FreqDist with 1 samples and 14 outcomes>
remark <FreqDist with 1 samples and 3 outcomes>
campaigning <FreqDist with 1 samples and 2 outcomes>
carcass <FreqDist with 1 samples and 3 outcomes>
republicanism <FreqDist with 1 samples and 3 outcomes>

weather <FreqDist with 1 samples and 6 outcomes>
halfway <FreqDist with 1 samples and 1 outcomes>
decent <FreqDist with 1 samples and 2 outcomes>
hundreds <FreqDist with 1 samples and 5 outcomes>
mass <FreqDist with 2 samples and 4 outcomes>
thoroughfare <FreqDist with 1 samples and 1 outcomes>
dwight <FreqDist with 1 samples and 5 outcomes>
leave <FreqDist with 1 samples and 6 outcomes>
oath-taking <FreqDist with 1 samples and 1 outcomes>
ceremonies <FreqDist with 1 samples and 6 outcomes>
ride <FreqDist with 2 samples and 6 outcomes>
historic <FreqDist with 1 samples and 6 outcomes>
ceremonial <FreqDist with 1 samples and 1 outcomes>
impressive <FreqDist with 1 samples and 4 outcomes>
street <FreqDist with 1 samples and 30 outcomes>
columbia <FreqDist with 1 samples and 1 outcomes>
standpoint <FreqDist with 1 samples and 2 outcomes>
viewpoint <FreqDist with 1 samples and 1 outcomes>
approach <FreqDist with 2 samples and 10 outcomes>
buildings <FreqDist with 1 samples and 6 outcomes>


$25-a-plate <FreqDist with 1 samples and 1 outcomes>
dinner <FreqDist with 1 samples and 23 outcomes>
honoring <FreqDist with 1 samples and 3 outcomes>
organized <FreqDist with 1 samples and 6 outcomes>
p.m. <FreqDist with 1 samples and 41 outcomes>
roosevelt <FreqDist with 1 samples and 3 outcomes>
4:30 <FreqDist with 1 samples and 2 outcomes>
blaine <FreqDist with 1 samples and 1 outcomes>
whipple <FreqDist with 1 samples and 1 outcomes>
oregon <FreqDist with 1 samples and 4 outcomes>
speakers <FreqDist with 1 samples and 3 outcomes>
fund-raising <FreqDist with 1 samples and 2 outcomes>
edith <FreqDist with 1 samples and 1 outcomes>
al <FreqDist with 2 samples and 10 outcomes>
ullman <FreqDist with 1 samples and 1 outcomes>
norman <FreqDist with 1 samples and 9 outcomes>
nilsen <FreqDist with 1 samples and 1 outcomes>
terry <FreqDist with 1 samples and 5 outcomes>
schrunk <FreqDist with 1 samples and 1 outcomes>
oak <FreqDist with 1 samples and 5 outcomes>
grove <FreqDist with 1 samp

gannon <FreqDist with 1 samples and 4 outcomes>
magnificent <FreqDist with 1 samples and 2 outcomes>
interference <FreqDist with 1 samples and 3 outcomes>
stops <FreqDist with 1 samples and 2 outcomes>
fumble <FreqDist with 1 samples and 1 outcomes>
fullback <FreqDist with 1 samples and 2 outcomes>
nick <FreqDist with 1 samples and 2 outcomes>
arshinkoff <FreqDist with 1 samples and 1 outcomes>
loose <FreqDist with 2 samples and 4 outcomes>
contributed <FreqDist with 1 samples and 5 outcomes>
falcons' <FreqDist with 1 samples and 1 outcomes>
aerial <FreqDist with 1 samples and 3 outcomes>
thrusts <FreqDist with 1 samples and 1 outcomes>
fourth-down <FreqDist with 1 samples and 1 outcomes>
screen <FreqDist with 1 samples and 3 outcomes>
mustang <FreqDist with 1 samples and 1 outcomes>
incomplete <FreqDist with 1 samples and 1 outcomes>
gannon's <FreqDist with 1 samples and 1 outcomes>
spotted <FreqDist with 1 samples and 4 outcomes>
isaacson <FreqDist with 1 samples and 1 outcomes>
cruc

self-sacrifice <FreqDist with 1 samples and 1 outcomes>
nor <FreqDist with 1 samples and 15 outcomes>
yen <FreqDist with 1 samples and 1 outcomes>
downtrodden <FreqDist with 1 samples and 1 outcomes>
motivated <FreqDist with 1 samples and 1 outcomes>
victimized <FreqDist with 1 samples and 1 outcomes>
respects <FreqDist with 1 samples and 2 outcomes>
aggravates <FreqDist with 1 samples and 1 outcomes>
golfer <FreqDist with 1 samples and 2 outcomes>
shooting <FreqDist with 2 samples and 2 outcomes>
below <FreqDist with 1 samples and 7 outcomes>
par <FreqDist with 1 samples and 7 outcomes>
delivering <FreqDist with 1 samples and 2 outcomes>
crusher <FreqDist with 1 samples and 1 outcomes>
boomed <FreqDist with 1 samples and 1 outcomes>
280-yard <FreqDist with 1 samples and 1 outcomes>
pixies <FreqDist with 1 samples and 1 outcomes>
zombies <FreqDist with 1 samples and 1 outcomes>
banshees <FreqDist with 1 samples and 1 outcomes>
wailed <FreqDist with 1 samples and 1 outcomes>
margin <Fre

wacker <FreqDist with 1 samples and 1 outcomes>
frau <FreqDist with 1 samples and 1 outcomes>
jana <FreqDist with 1 samples and 1 outcomes>
mason <FreqDist with 1 samples and 5 outcomes>
ex-singer <FreqDist with 1 samples and 1 outcomes>
wackers' <FreqDist with 1 samples and 1 outcomes>
goodness <FreqDist with 1 samples and 1 outcomes>
sake <FreqDist with 1 samples and 1 outcomes>
emcee <FreqDist with 1 samples and 1 outcomes>
herbert <FreqDist with 1 samples and 2 outcomes>
nixon's <FreqDist with 1 samples and 1 outcomes>
slogan <FreqDist with 1 samples and 1 outcomes>
knight <FreqDist with 1 samples and 1 outcomes>
generously <FreqDist with 1 samples and 2 outcomes>
buy <FreqDist with 1 samples and 10 outcomes>
candy <FreqDist with 1 samples and 3 outcomes>
brain <FreqDist with 1 samples and 2 outcomes>
worthiest <FreqDist with 1 samples and 1 outcomes>
charities <FreqDist with 1 samples and 2 outcomes>
darlin' <FreqDist with 1 samples and 1 outcomes>
dazzler <FreqDist with 1 samples

panels <FreqDist with 1 samples and 5 outcomes>
tomato-red <FreqDist with 1 samples and 1 outcomes>
jordan <FreqDist with 1 samples and 1 outcomes>
taffeta <FreqDist with 1 samples and 2 outcomes>
frock <FreqDist with 1 samples and 1 outcomes>
fringed <FreqDist with 1 samples and 1 outcomes>
tiers <FreqDist with 1 samples and 1 outcomes>
crimson <FreqDist with 1 samples and 1 outcomes>
silk <FreqDist with 1 samples and 3 outcomes>
slippers <FreqDist with 1 samples and 1 outcomes>
maids <FreqDist with 1 samples and 1 outcomes>
greenish <FreqDist with 1 samples and 1 outcomes>
fenwick <FreqDist with 1 samples and 1 outcomes>
ashes <FreqDist with 1 samples and 1 outcomes>
roses <FreqDist with 1 samples and 1 outcomes>
slipper <FreqDist with 1 samples and 1 outcomes>
feringa <FreqDist with 1 samples and 1 outcomes>
achaeans' <FreqDist with 1 samples and 1 outcomes>
eggshell <FreqDist with 1 samples and 1 outcomes>
filmy <FreqDist with 1 samples and 1 outcomes>
dress <FreqDist with 1 sample

havana <FreqDist with 1 samples and 11 outcomes>
cubans <FreqDist with 1 samples and 4 outcomes>
executed <FreqDist with 1 samples and 2 outcomes>
firing <FreqDist with 1 samples and 1 outcomes>
squads <FreqDist with 1 samples and 1 outcomes>
tribunals <FreqDist with 1 samples and 1 outcomes>
decreeing <FreqDist with 1 samples and 1 outcomes>
captured <FreqDist with 1 samples and 1 outcomes>
suspected <FreqDist with 1 samples and 1 outcomes>
collaborators <FreqDist with 1 samples and 1 outcomes>
mcnair <FreqDist with 1 samples and 3 outcomes>
executions <FreqDist with 1 samples and 2 outcomes>
dawn <FreqDist with 1 samples and 1 outcomes>
revolutionary <FreqDist with 1 samples and 2 outcomes>
tribunal <FreqDist with 1 samples and 1 outcomes>
pinar <FreqDist with 1 samples and 2 outcomes>
del <FreqDist with 1 samples and 2 outcomes>
rio <FreqDist with 1 samples and 2 outcomes>
plot <FreqDist with 1 samples and 4 outcomes>
seattle <FreqDist with 1 samples and 3 outcomes>
ex-marine <FreqD

perasso <FreqDist with 1 samples and 1 outcomes>
petrini <FreqDist with 1 samples and 1 outcomes>
ratto <FreqDist with 1 samples and 1 outcomes>
reilly <FreqDist with 1 samples and 1 outcomes>
schweitzer <FreqDist with 1 samples and 2 outcomes>
world-famous <FreqDist with 1 samples and 2 outcomes>
theologian <FreqDist with 1 samples and 1 outcomes>
endorsed <FreqDist with 1 samples and 1 outcomes>
disarmament <FreqDist with 1 samples and 3 outcomes>
quaker <FreqDist with 1 samples and 1 outcomes>
newer <FreqDist with 1 samples and 1 outcomes>
weapons <FreqDist with 1 samples and 6 outcomes>
otherwise <FreqDist with 1 samples and 5 outcomes>
dread <FreqDist with 1 samples and 1 outcomes>
imply <FreqDist with 1 samples and 2 outcomes>
grisly <FreqDist with 1 samples and 1 outcomes>
abolish <FreqDist with 1 samples and 2 outcomes>
obligated <FreqDist with 1 samples and 1 outcomes>
aware <FreqDist with 1 samples and 2 outcomes>
ghastly <FreqDist with 1 samples and 1 outcomes>
stupidity <Fr

contemporary <FreqDist with 1 samples and 6 outcomes>
danish <FreqDist with 1 samples and 1 outcomes>
straight-line <FreqDist with 1 samples and 1 outcomes>
sculptured <FreqDist with 1 samples and 2 outcomes>
facets <FreqDist with 1 samples and 1 outcomes>
warmth <FreqDist with 1 samples and 3 outcomes>
dignity <FreqDist with 1 samples and 2 outcomes>
utter <FreqDist with 1 samples and 1 outcomes>
livability <FreqDist with 1 samples and 1 outcomes>
avant <FreqDist with 1 samples and 1 outcomes>
garde <FreqDist with 1 samples and 1 outcomes>
indication <FreqDist with 1 samples and 3 outcomes>
boxy <FreqDist with 1 samples and 1 outcomes>
'20's <FreqDist with 1 samples and 1 outcomes>
'40's <FreqDist with 1 samples and 1 outcomes>
correlated <FreqDist with 1 samples and 1 outcomes>
sanger-harris <FreqDist with 1 samples and 1 outcomes>
woods <FreqDist with 1 samples and 4 outcomes>
assembled <FreqDist with 1 samples and 1 outcomes>
perennian <FreqDist with 1 samples and 2 outcomes>
lasti

self-plagiarisms <FreqDist with 1 samples and 1 outcomes>
totally <FreqDist with 1 samples and 3 outcomes>
kirov's <FreqDist with 1 samples and 1 outcomes>
utterly <FreqDist with 1 samples and 1 outcomes>
captivating <FreqDist with 1 samples and 1 outcomes>
precise <FreqDist with 1 samples and 1 outcomes>
enchantment <FreqDist with 1 samples and 1 outcomes>
numerous <FreqDist with 1 samples and 2 outcomes>
ova <FreqDist with 1 samples and 1 outcomes>
eva <FreqDist with 1 samples and 1 outcomes>
aya <FreqDist with 1 samples and 1 outcomes>
exclusively <FreqDist with 1 samples and 1 outcomes>
winners <FreqDist with 1 samples and 1 outcomes>
contests <FreqDist with 1 samples and 1 outcomes>
omsk <FreqDist with 1 samples and 1 outcomes>
pinsk <FreqDist with 1 samples and 1 outcomes>
stalingr <FreqDist with 1 samples and 1 outcomes>
oops <FreqDist with 1 samples and 1 outcomes>
discover <FreqDist with 1 samples and 1 outcomes>
crowning <FreqDist with 1 samples and 1 outcomes>
virtue <FreqDi

conclusions <FreqDist with 1 samples and 1 outcomes>
destroying <FreqDist with 1 samples and 1 outcomes>
students' <FreqDist with 1 samples and 1 outcomes>
familiar <FreqDist with 1 samples and 1 outcomes>
useful <FreqDist with 1 samples and 1 outcomes>
artificial <FreqDist with 1 samples and 2 outcomes>
busy-work <FreqDist with 1 samples and 1 outcomes>
ersatz <FreqDist with 1 samples and 1 outcomes>
structured <FreqDist with 1 samples and 2 outcomes>
luncheon-table <FreqDist with 1 samples and 1 outcomes>
difficulty <FreqDist with 1 samples and 3 outcomes>
i.e. <FreqDist with 1 samples and 1 outcomes>
administers <FreqDist with 1 samples and 1 outcomes>
eating <FreqDist with 1 samples and 1 outcomes>
nebraska <FreqDist with 1 samples and 1 outcomes>
anti-monopoly <FreqDist with 1 samples and 4 outcomes>
fallacious <FreqDist with 1 samples and 1 outcomes>
argue <FreqDist with 1 samples and 2 outcomes>
restricted <FreqDist with 1 samples and 1 outcomes>
restriction <FreqDist with 1 sam

disappointing <FreqDist with 1 samples and 1 outcomes>
one-over-par <FreqDist with 1 samples and 1 outcomes>
73 <FreqDist with 1 samples and 2 outcomes>
intact <FreqDist with 1 samples and 2 outcomes>
three-round <FreqDist with 1 samples and 2 outcomes>
210 <FreqDist with 1 samples and 1 outcomes>
213 <FreqDist with 1 samples and 1 outcomes>
deliberate <FreqDist with 1 samples and 1 outcomes>
baltimorean <FreqDist with 1 samples and 1 outcomes>
holed <FreqDist with 1 samples and 1 outcomes>
206 <FreqDist with 1 samples and 1 outcomes>
erratic <FreqDist with 1 samples and 2 outcomes>
washed-out <FreqDist with 1 samples and 1 outcomes>
meteorological <FreqDist with 1 samples and 1 outcomes>
mano <FreqDist with 1 samples and 2 outcomes>
spontaneously <FreqDist with 1 samples and 1 outcomes>
sight <FreqDist with 1 samples and 1 outcomes>
prestige <FreqDist with 1 samples and 1 outcomes>
pensacola <FreqDist with 1 samples and 3 outcomes>
winnings <FreqDist with 1 samples and 2 outcomes>
pai

grads <FreqDist with 1 samples and 2 outcomes>
statesmen <FreqDist with 1 samples and 1 outcomes>
conant <FreqDist with 1 samples and 1 outcomes>
pondered <FreqDist with 1 samples and 1 outcomes>
successors <FreqDist with 1 samples and 1 outcomes>
harvard's <FreqDist with 1 samples and 1 outcomes>
mcgeorge <FreqDist with 1 samples and 1 outcomes>
bundy <FreqDist with 1 samples and 1 outcomes>
trustee <FreqDist with 1 samples and 1 outcomes>
lure <FreqDist with 1 samples and 1 outcomes>
scholars <FreqDist with 1 samples and 1 outcomes>
happily <FreqDist with 1 samples and 1 outcomes>
caltech's <FreqDist with 1 samples and 1 outcomes>
geneticist <FreqDist with 1 samples and 2 outcomes>
beadle <FreqDist with 1 samples and 3 outcomes>
57 <FreqDist with 1 samples and 2 outcomes>
physiology <FreqDist with 1 samples and 1 outcomes>
discovering <FreqDist with 1 samples and 1 outcomes>
genes <FreqDist with 1 samples and 1 outcomes>
heredity <FreqDist with 1 samples and 1 outcomes>
nine-year <Fr

In [206]:
for word in data.conditions():
    print (word, data[word].most_common())

the [('DET', 6386)]
fulton [('NOUN', 14)]
county [('NOUN', 61)]
grand [('ADJ', 18), ('X', 1)]
jury [('NOUN', 46)]
said [('VERB', 406)]
friday [('NOUN', 41)]
an [('DET', 311)]
investigation [('NOUN', 11)]
of [('ADP', 2861)]
atlanta's [('NOUN', 4)]
recent [('ADJ', 20)]
primary [('NOUN', 13), ('ADJ', 4)]
election [('NOUN', 41)]
produced [('VERB', 6)]
`` [('.', 732)]
no [('DET', 112), ('ADV', 8)]
evidence [('NOUN', 17)]
'' [('.', 702)]
that [('ADP', 546), ('DET', 150), ('PRON', 128), ('ADV', 5)]
any [('DET', 94)]
irregularities [('NOUN', 3)]
took [('VERB', 47)]
place [('NOUN', 28), ('VERB', 5)]
. [('.', 4030)]
further [('ADJ', 12), ('ADV', 5), ('VERB', 1)]
in [('ADP', 1974), ('PRT', 46)]
term-end [('NOUN', 1)]
presentments [('NOUN', 1)]
city [('NOUN', 93)]
executive [('NOUN', 16), ('ADJ', 2)]
committee [('NOUN', 75)]
, [('.', 5188)]
which [('DET', 245)]
had [('VERB', 281)]
over-all [('ADJ', 2)]
charge [('NOUN', 17), ('VERB', 1)]
deserves [('VERB', 3)]
praise [('NOUN', 2)]
and [('CONJ', 218

during [('ADP', 60)]
reportedly [('ADV', 4)]
telephone [('NOUN', 8)]
too [('ADV', 37)]
subjected [('VERB', 2)]
soon [('ADV', 17)]
scheduled [('VERB', 14)]
local [('ADJ', 37), ('NOUN', 8)]
feared [('VERB', 1)]
carry [('VERB', 11)]
gun [('NOUN', 2)]
promised [('VERB', 4)]
sheriff [('NOUN', 5)]
tabb [('NOUN', 1)]
good [('ADJ', 47), ('NOUN', 3), ('ADV', 1)]
promise [('VERB', 2), ('NOUN', 1)]
everything [('NOUN', 5)]
went [('VERB', 37)]
real [('ADJ', 21), ('ADV', 3)]
smooth [('ADJ', 1)]
wasn't [('VERB', 3)]
austin [('NOUN', 15)]
approval [('NOUN', 8)]
price [('NOUN', 16)]
daniel's [('NOUN', 1)]
abandoned [('VERB', 3)]
seemed [('VERB', 12)]
certain [('ADJ', 19)]
thursday [('NOUN', 20)]
adamant [('ADJ', 2)]
protests [('NOUN', 1)]
bankers [('NOUN', 11)]
daniel [('NOUN', 5)]
personally [('ADV', 3)]
led [('VERB', 18)]
fight [('NOUN', 11), ('VERB', 3)]
measure [('NOUN', 12), ('VERB', 2)]
watered [('VERB', 1)]
down [('PRT', 41), ('ADP', 9), ('NOUN', 1)]
considerably [('ADV', 5)]
rejection [('NOUN'

degree [('NOUN', 9)]
physics [('NOUN', 1)]
chemistry [('NOUN', 2)]
math [('NOUN', 1)]
english [('NOUN', 5), ('ADJ', 1)]
permitted [('VERB', 5)]
teach [('VERB', 5)]
fifty-three [('NUM', 1)]
150 [('NUM', 3)]
immediately [('ADV', 11)]
joined [('VERB', 11)]
co-signers [('NOUN', 1)]
sp. [('NOUN', 2)]
regents [('NOUN', 1)]
named [('VERB', 10)]
dr. [('NOUN', 42)]
clarence [('NOUN', 3)]
clark [('NOUN', 6)]
hays [('NOUN', 2)]
kan. [('NOUN', 2)]
school's [('NOUN', 2)]
president [('NOUN', 142)]
succeed [('VERB', 3)]
mclemore [('NOUN', 1)]
retire [('VERB', 2)]
close [('ADV', 6), ('ADJ', 3), ('VERB', 2), ('NOUN', 1)]
holds [('VERB', 5)]
earned [('VERB', 7)]
university [('NOUN', 70)]
oklahoma [('NOUN', 8)]
master [('NOUN', 5), ('ADJ', 1)]
science [('NOUN', 10)]
& [('CONJ', 34)]
bachelor [('NOUN', 2)]
southwestern [('ADJ', 2)]
okla. [('NOUN', 3)]
addition [('NOUN', 12)]
rhode [('NOUN', 9)]
island [('NOUN', 19)]
massachusetts [('NOUN', 9)]
institute [('NOUN', 6), ('VERB', 1)]
technology [('NOUN', 7)]


setback [('NOUN', 2)]
affirmation [('NOUN', 1)]
once [('ADV', 20), ('ADP', 2)]
again [('ADV', 32)]
whole [('ADJ', 9), ('NOUN', 2)]
surveyed [('VERB', 2)]
secretary's [('NOUN', 1)]
greatest [('ADJ', 9)]
achievement [('NOUN', 15)]
perhaps [('ADV', 14)]
rekindling [('NOUN', 1)]
realization [('NOUN', 1)]
east-west [('ADJ', 3)]
friction [('NOUN', 2)]
wherever [('ADV', 1)]
around [('ADV', 16), ('ADP', 14)]
globe [('NOUN', 1)]
essence [('NOUN', 1)]
entirely [('ADV', 2)]
different [('ADJ', 8)]
societies [('NOUN', 2)]
treated [('VERB', 10)]
regard [('NOUN', 3), ('VERB', 1)]
geographical [('ADJ', 1)]
distance [('NOUN', 2)]
lack [('NOUN', 4), ('VERB', 1)]
apparent [('ADJ', 7)]
spring [('NOUN', 16)]
impetus [('NOUN', 1)]
main [('ADJ', 4), ('NOUN', 1)]
directions [('NOUN', 1)]
deeper [('ADJ', 3)]
timely [('ADJ', 2)]
consultation [('NOUN', 2)]
within [('ADP', 21)]
economic [('ADJ', 18)]
cooperation [('NOUN', 6)]
ratified [('VERB', 2)]
method [('NOUN', 3)]
coordinating [('VERB', 2)]
underdeveloped [(

intentions [('NOUN', 6)]
understand [('VERB', 5)]
seeking [('VERB', 7)]
position [('NOUN', 17)]
life [('NOUN', 20)]
demonstrate [('VERB', 3)]
judgment [('NOUN', 5)]
bad [('ADJ', 5)]
taste [('NOUN', 5)]
vicious [('ADJ', 1)]
origin [('NOUN', 1)]
desire [('NOUN', 4)]
try [('VERB', 12)]
condemning [('VERB', 1)]
stature [('NOUN', 2)]
rebound [('VERB', 1)]
discredit [('NOUN', 1)]
sees [('VERB', 2)]
ahead [('ADV', 10)]
sandman [('NOUN', 5)]
r-cape [('NOUN', 1)]
nomination [('NOUN', 6)]
addressing [('VERB', 1)]
newark [('NOUN', 1)]
essex [('NOUN', 2)]
leaders [('NOUN', 23)]
managers [('NOUN', 8)]
gathering [('NOUN', 2), ('VERB', 1)]
indicate [('VERB', 3)]
chosen [('VERB', 6)]
party's [('NOUN', 1)]
nominee [('NOUN', 2)]
majority [('NOUN', 5)]
announcement [('NOUN', 9)]
clifford [('NOUN', 2)]
spend [('VERB', 9)]
giveaway [('NOUN', 1)]
desperate [('ADJ', 2)]
prop [('VERB', 1)]
sagging [('VERB', 2)]
proven [('VERB', 1)]
answer [('NOUN', 5), ('VERB', 4)]
jersey's [('NOUN', 1)]
witnessed [('VERB', 1

commitments [('NOUN', 3)]
eight-year [('ADJ', 1)]
quest [('NOUN', 1)]
decide [('VERB', 3)]
throw [('VERB', 6), ('NOUN', 2)]
nobody [('NOUN', 4)]
lawmakers [('NOUN', 2)]
enjoy [('VERB', 5)]
re-enactment [('NOUN', 1)]
strange [('ADJ', 1)]
honeymoon [('NOUN', 2), ('VERB', 1)]
doesn't [('VERB', 12)]
decade [('NOUN', 5)]
odds [('NOUN', 2)]
restless [('ADJ', 2)]
companionship [('NOUN', 1)]
$22.50 [('NOUN', 1)]
diem [('NOUN', 1)]
agitating [('VERB', 1)]
withstand [('VERB', 1)]
sensitive [('ADJ', 3)]
cutting [('VERB', 4)]
seat [('NOUN', 3)]
eyes [('NOUN', 5)]
focused [('VERB', 1)]
delta [('NOUN', 5)]
congressman [('NOUN', 4)]
smith [('NOUN', 21)]
redistricting [('VERB', 1)]
longstanding [('ADJ', 1)]
mississippi's [('NOUN', 2)]
crossroads [('NOUN', 1)]
split [('VERB', 5)]
badly [('ADV', 4)]
equally [('ADV', 5)]
divided [('VERB', 4)]
camps [('NOUN', 2)]
loyalists [('NOUN', 1)]
independents [('NOUN', 1)]
currently [('ADV', 6)]
wreck [('NOUN', 1)]
clouded [('VERB', 2)]
titular [('ADJ', 1)]
reestab

well-established [('ADJ', 1)]
dist. [('NOUN', 1)]
powell [('NOUN', 6)]
motions [('NOUN', 4)]
fraud [('NOUN', 2)]
denials [('NOUN', 1)]
dismissal [('NOUN', 2)]
mistrial [('NOUN', 2)]
acquittal [('NOUN', 2)]
striking [('VERB', 3), ('NOUN', 1)]
verdict [('NOUN', 13)]
denying [('VERB', 1)]
trials [('NOUN', 1)]
upheld [('VERB', 3)]
conspirators [('NOUN', 2)]
schwab [('NOUN', 2)]
defendant [('NOUN', 3)]
philip [('NOUN', 2)]
weinstein [('NOUN', 2)]
linking [('VERB', 1)]
proof [('NOUN', 3)]
weinstein's [('NOUN', 1)]
mails [('NOUN', 3)]
defraud [('VERB', 2)]
burbank [('NOUN', 1)]
conpired [('VERB', 1)]
deferred [('VERB', 1)]
miami [('NOUN', 11)]
fla. [('NOUN', 8)]
orioles [('NOUN', 8)]
distinction [('NOUN', 3)]
winless [('ADJ', 2)]
major-league [('NOUN', 3)]
dropped [('VERB', 11)]
straight [('ADJ', 13), ('ADV', 4)]
exhibition [('NOUN', 8)]
athletics [('NOUN', 3)]
5 [('NUM', 14)]
indications [('NOUN', 1)]
late [('ADJ', 21), ('ADV', 8)]
birds [('NOUN', 10)]
draught [('NOUN', 1)]
coasted [('VERB',

passes [('NOUN', 4), ('VERB', 2)]
bang [('PRT', 2)]
touchdowns [('NOUN', 1)]
strikes [('NOUN', 2)]
tactic [('NOUN', 2)]
controlling [('VERB', 3)]
giving [('VERB', 6)]
abner [('NOUN', 1)]
haynes [('NOUN', 1)]
flashy [('ADJ', 1)]
ball-carriers [('NOUN', 1)]
delivered [('VERB', 8)]
145 [('NUM', 1)]
comforting [('VERB', 1)]
denver's [('NOUN', 1)]
carmichael [('NOUN', 1)]
jarred [('VERB', 1)]
grayson [('NOUN', 1)]
speedy [('ADJ', 1)]
claimed [('VERB', 3)]
resulted [('VERB', 6)]
quipping [('VERB', 1)]
book [('NOUN', 17)]
dent [('NOUN', 1)]
statistics [('NOUN', 2)]
545-yard [('ADJ', 1)]
spree [('NOUN', 2)]
3-game [('ADJ', 1)]
1,512 [('NUM', 1)]
1,065 [('NUM', 1)]
swc [('NOUN', 1)]
280 [('NUM', 2)]
64 [('NUM', 2)]
tosses [('NOUN', 1)]
tough [('ADJ', 4)]
tcu [('NOUN', 1)]
38-point [('ADJ', 1)]
bulge [('NOUN', 1)]
loop [('NOUN', 3)]
174 [('NUM', 1)]
361 [('NUM', 1)]
leads [('VERB', 1)]
per-game [('ADJ', 1)]
averages [('NOUN', 3)]
355 [('NUM', 1)]
149 [('NUM', 1)]
baylor's [('NOUN', 1)]
126 [('NU

minneapolis [('NOUN', 2)]
fourteen [('NUM', 5)]
warwick [('NOUN', 3)]
football's [('NOUN', 1)]
hall [('NOUN', 10)]
fame [('NOUN', 1)]
players' [('NOUN', 1)]
amendments [('NOUN', 1)]
fourteen-team [('ADJ', 1)]
home-and-home [('ADJ', 1)]
teams [('NOUN', 5)]
lengthening [('VERB', 1)]
rozelle [('NOUN', 1)]
therefore [('ADV', 8)]
early-season [('NOUN', 1)]
dates [('NOUN', 3)]
heed [('VERB', 1)]
mauch [('NOUN', 2)]
misled [('VERB', 1)]
pirates' [('NOUN', 1)]
slower [('ADJ', 1)]
outclass [('VERB', 1)]
vinegar [('NOUN', 1)]
bend [('NOUN', 2)]
mizell [('NOUN', 3)]
shantz [('NOUN', 1)]
breaking [('VERB', 4)]
baseball's [('NOUN', 3)]
9-6 [('NUM', 1)]
redbirds [('NOUN', 3)]
7-9 [('NUM', 1)]
solly [('NOUN', 2)]
hemus [('NOUN', 5)]
switch [('NOUN', 2)]
gibson [('NOUN', 3)]
ernie [('NOUN', 3)]
broglio [('NOUN', 2)]
broglio's [('NOUN', 1)]
4-0 [('NUM', 2)]
won-lost [('ADJ', 1)]
earned-run [('NOUN', 1)]
redbirds' [('NOUN', 1)]
disheartening [('VERB', 1)]
11-7 [('NUM', 1)]
collapse [('NOUN', 1)]
eager [

genial [('ADJ', 1)]
nightly [('ADV', 1)]
hackstaff [('NOUN', 1)]
luette [('NOUN', 1)]
bowman [('NOUN', 2)]
celebrates [('VERB', 1)]
birthday [('NOUN', 3)]
chase [('NOUN', 5)]
sheila [('NOUN', 1)]
mercy [('NOUN', 3)]
grandparents [('NOUN', 2)]
mullenax [('NOUN', 2)]
kittredge [('NOUN', 1)]
mcintosh [('NOUN', 1)]
buell [('NOUN', 1)]
santa [('NOUN', 6)]
calif. [('NOUN', 8)]
vroman [('NOUN', 1)]
manzanola [('NOUN', 1)]
plaza [('NOUN', 1)]
merrill [('NOUN', 1)]
shoup [('NOUN', 4)]
colorado [('NOUN', 4)]
palace [('NOUN', 2)]
brig. [('NOUN', 1)]
mcdermott [('NOUN', 1)]
black [('ADJ', 7), ('NOUN', 5)]
officers' [('NOUN', 1)]
piero [('NOUN', 1)]
luise [('NOUN', 1)]
emilio [('NOUN', 1)]
bassi [('NOUN', 1)]
bassis [('NOUN', 1)]
stag [('NOUN', 1)]
precede [('VERB', 1)]
cocktails [('NOUN', 2)]
dining [('VERB', 3), ('NOUN', 2)]
betsy [('NOUN', 1)]
parker [('NOUN', 5)]
eastern [('ADJ', 4)]
bldg. [('NOUN', 1)]
juniors [('NOUN', 2)]
staged [('VERB', 5)]
neusteters [('NOUN', 1)]
preceded [('VERB', 3)]
t

deficiencies [('NOUN', 1)]
snow [('NOUN', 8)]
corrected [('VERB', 2)]
councilman [('NOUN', 4)]
schaefer [('NOUN', 4)]
salting [('VERB', 1)]
crews [('NOUN', 1)]
dispatched [('VERB', 3)]
storms [('NOUN', 1)]
longer [('ADV', 3), ('ADJ', 1)]
werner [('NOUN', 5)]
conceding [('VERB', 1)]
improvements [('NOUN', 4)]
slowly [('ADV', 3)]
snowfall [('NOUN', 1)]
halting [('VERB', 1)]
operations [('NOUN', 10)]
manual [('ADJ', 1)]
laborers [('NOUN', 2)]
resumed [('VERB', 4)]
parking [('VERB', 2)]
banned [('VERB', 1)]
tires [('NOUN', 4)]
chains [('NOUN', 1)]
overlooked [('VERB', 1)]
merchants [('NOUN', 1)]
survive [('VERB', 1)]
recounting [('VERB', 1)]
observations [('NOUN', 1)]
clearance [('NOUN', 1)]
inefficient [('ADJ', 1)]
supplies [('NOUN', 6), ('VERB', 2)]
poorly [('ADV', 1)]
trained [('VERB', 3)]
plow [('NOUN', 3)]
blades [('NOUN', 1)]
layer [('NOUN', 1)]
freezes [('VERB', 1)]
15-year-old [('ADJ', 1)]
murdered [('VERB', 1)]
chesapeake [('NOUN', 3)]
bay-front [('NOUN', 1)]
detention [('NOUN', 1

963 [('NUM', 1)]
ponce [('NOUN', 1)]
leon [('NOUN', 3)]
coleman [('NOUN', 1)]
704 [('NUM', 1)]
se [('NOUN', 1)]
lacerations [('NOUN', 1)]
bruises [('NOUN', 1)]
renewed [('VERB', 2)]
picketing [('VERB', 2)]
stand-ins [('NOUN', 3)]
first-run [('NOUN', 1)]
theaters [('NOUN', 6)]
identically [('ADV', 1)]
worded [('VERB', 1)]
contacted [('VERB', 2)]
coahr [('NOUN', 5)]
gather [('VERB', 2)]
eve [('NOUN', 2)]
operators [('NOUN', 2)]
likelihood [('NOUN', 1)]
three-day [('ADJ', 1)]
sporadic [('ADJ', 1)]
negotiate [('VERB', 2)]
friday's [('NOUN', 2)]
inability [('NOUN', 1)]
indifference [('NOUN', 1)]
integrate [('VERB', 1)]
pledged [('VERB', 2)]
nonviolent [('ADJ', 1)]
extensive [('ADJ', 1)]
presence [('NOUN', 1)]
picket [('NOUN', 2)]
profits [('NOUN', 2)]
uptown [('NOUN', 1)]
buckhead [('NOUN', 1)]
killingsworth [('NOUN', 4)]
72 [('NUM', 3)]
357 [('NUM', 1)]
venable [('NOUN', 1)]
kililngsworth [('NOUN', 2)]
s [('NOUN', 1)]
w [('NOUN', 1)]
cafeteria [('NOUN', 2)]
pittsboro [('NOUN', 1)]
survivor

rogers [('NOUN', 1)]
da [('NOUN', 2)]
fonta [('NOUN', 1)]
sanctuary [('NOUN', 2)]
marin [('NOUN', 4)]
officially [('ADV', 3)]
livermore [('NOUN', 1)]
645-acre [('ADJ', 1)]
tidelands [('NOUN', 1)]
greenwood [('NOUN', 1)]
olney [('NOUN', 1)]
kentfield [('NOUN', 1)]
inviolate [('ADJ', 1)]
animals [('NOUN', 1)]
seventeen [('NUM', 2)]
willy [('NOUN', 1)]
fiedler [('NOUN', 9)]
climbed [('VERB', 2)]
cockpit [('NOUN', 3)]
installed [('VERB', 4)]
v-1 [('NOUN', 3)]
rocket-bomb [('NOUN', 1)]
attached [('VERB', 1)]
underbelly [('NOUN', 1)]
heinkel [('NOUN', 2)]
bomber [('NOUN', 4)]
rolled [('VERB', 2)]
runway [('NOUN', 2)]
earth [('NOUN', 2)]
alive [('ADJ', 3)]
pulse [('NOUN', 1)]
jet [('NOUN', 4)]
airstrip [('NOUN', 1)]
quiet-spoken [('ADJ', 1)]
middle-aged [('ADJ', 1)]
aeronautical [('ADJ', 1)]
engineer [('NOUN', 2)]
lockheed's [('NOUN', 1)]
missiles [('NOUN', 4)]
sunnyvale [('NOUN', 3)]
sat [('VERB', 2)]
pilots [('NOUN', 4)]
crashed [('VERB', 1)]
hitler's [('NOUN', 1)]
super-secret [('ADJ', 1)]

gasoline [('NOUN', 1)]
appliances [('NOUN', 2)]
chemicals [('NOUN', 1)]
haggling [('VERB', 2)]
aren't [('VERB', 4)]
conscious [('ADJ', 1)]
sioux [('NOUN', 1)]
iowa [('NOUN', 1)]
picker [('NOUN', 1)]
$2,700 [('NOUN', 1)]
dealers' [('NOUN', 1)]
25% [('NOUN', 1)]
affects [('NOUN', 1)]
shipments [('NOUN', 2)]
massey-ferguson [('NOUN', 1)]
ltd. [('VERB', 1)]
toronto [('NOUN', 1)]
2,418 [('NUM', 1)]
869 [('NUM', 1)]
staiger [('NOUN', 1)]
inventories [('NOUN', 1)]
stepped-up [('ADJ', 1)]
shortages [('NOUN', 1)]
merritt [('NOUN', 1)]
demanding [('VERB', 1)]
delivery [('NOUN', 1)]
trailed [('VERB', 2)]
year-earlier [('ADJ', 1)]
feed [('NOUN', 1), ('VERB', 1)]
grain [('NOUN', 1)]
cutback [('NOUN', 1)]
planted [('VERB', 1)]
chiefly [('ADV', 3)]
forecast [('VERB', 1)]
129% [('NOUN', 1)]
1947-49 [('NUM', 1)]
farmers' [('NOUN', 2)]
economists [('NOUN', 1)]
$1.4 [('NOUN', 1)]
subsidies [('NOUN', 1)]
incentive [('NOUN', 2)]
$639 [('NOUN', 1)]
receipts [('NOUN', 1)]
marketings [('NOUN', 1)]
$39.5 [('NO

fixture [('NOUN', 1)]
tying [('VERB', 1)]
desired [('VERB', 1)]
resting [('VERB', 1)]
accord [('NOUN', 2)]
whims [('NOUN', 1)]
shade [('NOUN', 1)]
adequate [('ADJ', 7)]
sufficiently [('ADV', 2)]
disharmony [('NOUN', 1)]
playtime [('NOUN', 1)]
reminiscent [('ADJ', 1)]
circus [('NOUN', 1)]
merry-go-round [('NOUN', 1)]
scalloped [('VERB', 1)]
edge [('NOUN', 4)]
appealing [('ADJ', 1), ('VERB', 1)]
america's [('NOUN', 4)]
home-owners [('NOUN', 1)]
decorators [('NOUN', 3)]
shrewd [('ADJ', 3)]
cabinetmakers [('NOUN', 1)]
era [('NOUN', 2)]
appreciated [('VERB', 2)]
antiques [('NOUN', 2)]
lurked [('VERB', 1)]
utilitarian [('ADJ', 1)]
junior's [('NOUN', 1)]
tricked [('VERB', 1)]
nondescript [('ADJ', 1)]
supposedly [('ADV', 1)]
knocks [('NOUN', 1)]
relegated [('VERB', 1)]
parlor [('NOUN', 2)]
homemakers [('NOUN', 1)]
decorator [('NOUN', 1)]
leland [('NOUN', 1)]
alden [('NOUN', 2)]
housewives [('NOUN', 1)]
craftsmen [('NOUN', 1)]
innate [('ADJ', 1)]
eighteenth [('ADJ', 1)]
escape's [('NOUN', 1)]
b

cathedral [('NOUN', 1)]
gogh [('NOUN', 1)]
impressionist [('NOUN', 1)]
benches [('NOUN', 1)]
downstairs [('NOUN', 1)]
lobby [('NOUN', 1)]
bales [('NOUN', 1)]
confederacy [('NOUN', 1)]
aide [('NOUN', 1)]
sculptures [('NOUN', 3)]
blue-uniformed [('ADJ', 1)]
renaissance [('NOUN', 1)]
preferred [('VERB', 1)]
boucher [('NOUN', 1)]
courbet [('NOUN', 1)]
fra [('NOUN', 1)]
angelico [('NOUN', 1)]
impressed [('VERB', 2)]
rotunda [('NOUN', 1)]
fountain [('NOUN', 1)]
seemingly [('ADV', 1)]
remote [('ADJ', 1)]
collonaded [('ADJ', 1)]
sphynxes [('NOUN', 1)]
perched [('VERB', 1)]
1733 [('NUM', 1)]
nw. [('NOUN', 1)]
bustling [('VERB', 1)]
masons [('NOUN', 2)]
pike [('NOUN', 3)]
1859 [('NUM', 1)]
1891 [('NUM', 1)]
high-ceilinged [('ADJ', 1)]
eulogized [('VERB', 1)]
historian [('NOUN', 2)]
poet [('NOUN', 1)]
journalist [('NOUN', 1)]
soldier [('NOUN', 1)]
musician [('NOUN', 1)]
laying [('VERB', 2)]
wreath [('NOUN', 2)]
crypt [('NOUN', 1)]
1500 [('NUM', 1)]
biennial [('ADJ', 1)]
jurisdiction [('NOUN', 1)]

vinson [('NOUN', 1)]
raged [('VERB', 1)]
cloakrooms [('NOUN', 1)]
caucuses [('NOUN', 1)]
lose [('VERB', 2)]
numbered [('VERB', 1)]
260-member [('ADJ', 1)]
caucus [('NOUN', 2)]
smelling [('VERB', 1)]
purged [('VERB', 1)]
mississippians [('NOUN', 1)]
maverick [('ADJ', 1)]
arenas [('NOUN', 1)]
applying [('VERB', 1)]
whiplash [('NOUN', 1)]
loomed [('VERB', 1)]
specter [('NOUN', 1)]
costlier [('ADJ', 1)]
southerners [('NOUN', 1)]
chairmanships [('NOUN', 1)]
truncated [('VERB', 1)]
unworkable [('ADJ', 1)]
arkansas' [('NOUN', 1)]
wilbur [('NOUN', 1)]
deliberately [('ADV', 1)]
coolheaded [('ADJ', 1)]
face-saving [('ADJ', 2)]
version [('NOUN', 1)]
liberal-conservative [('ADJ', 1)]
contrast [('NOUN', 1)]
guerrilla [('NOUN', 1)]
forma [('NOUN', 1)]
guide's [('NOUN', 1)]
restrict [('VERB', 1)]
legislation-delaying [('ADJ', 1)]
filibusters [('NOUN', 1)]
wide-ranging [('ADJ', 1)]
bipartisan [('ADJ', 1)]
minnesota's [('NOUN', 1)]
hubert [('NOUN', 1)]
humphrey [('NOUN', 3)]
massachusetts' [('NOUN', 1)

carpenters [('NOUN', 1)]
indefinite [('ADJ', 1)]
fearing [('VERB', 1)]
elite [('NOUN', 1)]
unrest [('NOUN', 1)]
manifestly [('ADV', 1)]
unprepared [('ADJ', 1)]
oversimplification [('NOUN', 1)]
gale [('NOUN', 1)]
prepare [('VERB', 1)]
colonies [('NOUN', 1)]
clamoring [('VERB', 1)]
unsure [('ADJ', 1)]
pas [('X', 1)]
une [('X', 1)]
goutte [('X', 1)]
sang [('X', 1)]
detested [('VERB', 2)]
pedagogue [('NOUN', 1)]
mess [('NOUN', 1)]
motivations [('NOUN', 1)]
guiltless [('ADJ', 1)]
kasavubu [('NOUN', 5)]
splitting [('VERB', 1)]
balkanizing [('VERB', 1)]
moise [('NOUN', 3)]
tshombe [('NOUN', 5)]
near-balkanization [('NOUN', 1)]
federalism [('NOUN', 1)]
notably [('ADV', 1)]
patrice [('NOUN', 4)]
lumumba [('NOUN', 10)]
hurry [('NOUN', 1)]
fragmentation [('NOUN', 1)]
provincial [('ADJ', 1)]
provinces [('NOUN', 3)]
leopoldville [('NOUN', 1)]
kasai [('NOUN', 2)]
kivu [('NOUN', 1)]
katanga [('NOUN', 5)]
equator [('NOUN', 1)]
western-style [('ADJ', 1)]
bicameral [('ADJ', 1)]
universal [('ADJ', 2)]
we

In [208]:
from nltk.tbl import demo as brill_demo
brill_demo.demo()

Loading tagged data from treebank... 
Read testing data (200 sents/5251 wds)
Read training data (800 sents/19933 wds)
Read baseline data (800 sents/19933 wds) [reused the training set]
Trained baseline tagger
    Accuracy on test set: 0.8366
Training tbl tagger...
TBL train (fast) (seqs: 800; tokens: 19933; tpls: 24; min score: 3; min acc: None)
Finding initial useful rules...
    Found 12799 useful rules.

           B      |
   S   F   r   O  |        Score = Fixed - Broken
   c   i   o   t  |  R     Fixed = num tags changed incorrect -> correct
   o   x   k   h  |  u     Broken = num tags changed correct -> incorrect
   r   e   e   e  |  l     Other = num tags changed incorrect -> incorrect
   e   d   n   r  |  e
------------------+-------------------------------------------------------
  23  23   0   0  | POS->VBZ if Pos:PRP@[-2,-1]
  18  19   1   0  | NN->VB if Pos:-NONE-@[-2] & Pos:TO@[-1]
  14  14   0   0  | VBP->VB if Pos:MD@[-2,-1]
  12  12   0   0  | VBP->VB if Pos:TO@[-1]
  