# Project: Part of Speech Tagging with Hidden Markov Models 
---
### Introduction

Part of speech tagging is the process of determining the syntactic category of a word from the words in its surrounding context. It is often used to help disambiguate natural language phrases because it can be done quickly with high accuracy. Tagging can be used for many NLP tasks like determining correct pronunciation during speech synthesis (for example, _dis_-count as a noun vs dis-_count_ as a verb), for information retrieval, and for word sense disambiguation.

In this notebook, you'll use the [Pomegranate](http://pomegranate.readthedocs.io/) library to build a hidden Markov model for part of speech tagging using a "universal" tagset. Hidden Markov models have been able to achieve [>96% tag accuracy with larger tagsets on realistic text corpora](http://www.coli.uni-saarland.de/~thorsten/publications/Brants-ANLP00.pdf). Hidden Markov models have also been used for speech recognition and speech generation, machine translation, gene recognition for bioinformatics, and human gesture recognition for computer vision, and more. 

![](_post-hmm.png)

The notebook already contains some code to get you started. You only need to add some new functionality in the areas indicated to complete the project; you will not need to modify the included code beyond what is requested. Sections that begin with **'IMPLEMENTATION'** in the header indicate that you must provide code in the block that follows. Instructions will be provided for each section, and the specifics of the implementation are marked in the code block with a 'TODO' statement. Please be sure to read the instructions carefully!

<div class="alert alert-block alert-info">
**Note:** Once you have completed all of the code implementations, you need to finalize your work by exporting the iPython Notebook as an HTML document. Before exporting the notebook to html, all of the code cells need to have been run so that reviewers can see the final implementation and output. You must then **export the notebook** by running the last cell in the notebook, or by using the menu above and navigating to **File -> Download as -> HTML (.html)** Your submissions should include both the `html` and `ipynb` files.
</div>

<div class="alert alert-block alert-info">
**Note:** Code and Markdown cells can be executed using the `Shift + Enter` keyboard shortcut. Markdown cells can be edited by double-clicking the cell to enter edit mode.
</div>

### The Road Ahead
You must complete Steps 1-3 below to pass the project. The section on Step 4 includes references & resources you can use to further explore HMM taggers.

- [Step 1](#Step-1:-Read-and-preprocess-the-dataset): Review the provided interface to load and access the text corpus
- [Step 2](#Step-2:-Build-a-Most-Frequent-Class-tagger): Build a Most Frequent Class tagger to use as a baseline
- [Step 3](#Step-3:-Build-an-HMM-tagger): Build an HMM Part of Speech tagger and compare to the MFC baseline
- [Step 4](#Step-4:-[Optional]-Improving-model-performance): (Optional) Improve the HMM tagger

<div class="alert alert-block alert-warning">
**Note:** Make sure you have selected a **Python 3** kernel in Workspaces or the hmm-tagger conda environment if you are running the Jupyter server on your own machine.
</div>

In [1]:
# Jupyter "magic methods" -- only need to be run once per kernel restart
%load_ext autoreload
%aimport helpers, tests
%autoreload 1

In [2]:
# import python modules -- this cell needs to be run again if you make changes to any of the files
import matplotlib.pyplot as plt
import numpy as np

from IPython.core.display import HTML
from itertools import chain,zip_longest
from collections import Counter, defaultdict
from helpers import show_model, Dataset
from pomegranate import State, HiddenMarkovModel, DiscreteDistribution

## Step 1: Read and preprocess the dataset
---
We'll start by reading in a text corpus and splitting it into a training and testing dataset. The data set is a copy of the [Brown corpus](https://en.wikipedia.org/wiki/Brown_Corpus) (originally from the [NLTK](https://www.nltk.org/) library) that has already been pre-processed to only include the [universal tagset](https://arxiv.org/pdf/1104.2086.pdf). You should expect to get slightly higher accuracy using this simplified tagset than the same model would achieve on a larger tagset like the full [Penn treebank tagset](https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html), but the process you'll follow would be the same.

The `Dataset` class provided in helpers.py will read and parse the corpus. You can generate your own datasets compatible with the reader by writing them to the following format. The dataset is stored in plaintext as a collection of words and corresponding tags. Each sentence starts with a unique identifier on the first line, followed by one tab-separated word/tag pair on each following line. Sentences are separated by a single blank line.

Example from the Brown corpus. 
```
b100-38532
Perhaps	ADV
it	PRON
was	VERB
right	ADJ
;	.
;	.

b100-35577
...
```

In [3]:
data = Dataset("tags-universal.txt", "brown-universal.txt", train_test_split=0.8)

print("There are {} sentences in the corpus.".format(len(data)))
print("There are {} sentences in the training set.".format(len(data.training_set)))
print("There are {} sentences in the testing set.".format(len(data.testing_set)))

assert len(data) == len(data.training_set) + len(data.testing_set), \
       "The number of sentences in the training set + testing set should sum to the number of sentences in the corpus"

There are 57340 sentences in the corpus.
There are 45872 sentences in the training set.
There are 11468 sentences in the testing set.


### The Dataset Interface

You can access (mostly) immutable references to the dataset through a simple interface provided through the `Dataset` class, which represents an iterable collection of sentences along with easy access to partitions of the data for training & testing. Review the reference below, then run and review the next few cells to make sure you understand the interface before moving on to the next step.

```
Dataset-only Attributes:
    training_set - reference to a Subset object containing the samples for training
    testing_set - reference to a Subset object containing the samples for testing

Dataset & Subset Attributes:
    sentences - a dictionary with an entry {sentence_key: Sentence()} for each sentence in the corpus
    keys - an immutable ordered (not sorted) collection of the sentence_keys for the corpus
    vocab - an immutable collection of the unique words in the corpus
    tagset - an immutable collection of the unique tags in the corpus
    X - returns an array of words grouped by sentences ((w11, w12, w13, ...), (w21, w22, w23, ...), ...)
    Y - returns an array of tags grouped by sentences ((t11, t12, t13, ...), (t21, t22, t23, ...), ...)
    N - returns the number of distinct samples (individual words or tags) in the dataset

Methods:
    stream() - returns an flat iterable over all (word, tag) pairs across all sentences in the corpus
    __iter__() - returns an iterable over the data as (sentence_key, Sentence()) pairs
    __len__() - returns the nubmer of sentences in the dataset
```

For example, consider a Subset, `subset`, of the sentences `{"s0": Sentence(("See", "Spot", "run"), ("VERB", "NOUN", "VERB")), "s1": Sentence(("Spot", "ran"), ("NOUN", "VERB"))}`. The subset will have these attributes:

```
subset.keys == {"s1", "s0"}  # unordered
subset.vocab == {"See", "run", "ran", "Spot"}  # unordered
subset.tagset == {"VERB", "NOUN"}  # unordered
subset.X == (("Spot", "ran"), ("See", "Spot", "run"))  # order matches .keys
subset.Y == (("NOUN", "VERB"), ("VERB", "NOUN", "VERB"))  # order matches .keys
subset.N == 7  # there are a total of seven observations over all sentences
len(subset) == 2  # because there are two sentences
```

<div class="alert alert-block alert-info">
**Note:** The `Dataset` class is _convenient_, but it is **not** efficient. It is not suitable for huge datasets because it stores multiple redundant copies of the same data.
</div>

#### Sentences

`Dataset.sentences` is a dictionary of all sentences in the training corpus, each keyed to a unique sentence identifier. Each `Sentence` is itself an object with two attributes: a tuple of the words in the sentence named `words` and a tuple of the tag corresponding to each word named `tags`.

In [4]:
key = 'b100-38532'
print("Sentence: {}".format(key))
print("words:\n\t{!s}".format(data.sentences[key].words))
print("tags:\n\t{!s}".format(data.sentences[key].tags))

Sentence: b100-38532
words:
	('Perhaps', 'it', 'was', 'right', ';', ';')
tags:
	('ADV', 'PRON', 'VERB', 'ADJ', '.', '.')


In [5]:
data.training_set.tagset

frozenset({'.',
           'ADJ',
           'ADP',
           'ADV',
           'CONJ',
           'DET',
           'NOUN',
           'NUM',
           'PRON',
           'PRT',
           'VERB',
           'X'})

<div class="alert alert-block alert-info">
**Note:** The underlying iterable sequence is **unordered** over the sentences in the corpus; it is not guaranteed to return the sentences in a consistent order between calls. Use `Dataset.stream()`, `Dataset.keys`, `Dataset.X`, or `Dataset.Y` attributes if you need ordered access to the data.
</div>

#### Counting Unique Elements

You can access the list of unique words (the dataset vocabulary) via `Dataset.vocab` and the unique list of tags via `Dataset.tagset`.

In [6]:
print("There are a total of {} samples of {} unique words in the corpus."
      .format(data.N, len(data.vocab)))
print("There are {} samples of {} unique words in the training set."
      .format(data.training_set.N, len(data.training_set.vocab)))
print("There are {} samples of {} unique words in the testing set."
      .format(data.testing_set.N, len(data.testing_set.vocab)))
print("There are {} words in the test set that are missing in the training set."
      .format(len(data.testing_set.vocab - data.training_set.vocab)))

assert data.N == data.training_set.N + data.testing_set.N, \
       "The number of training + test samples should sum to the total number of samples"

There are a total of 1161192 samples of 56057 unique words in the corpus.
There are 928458 samples of 50536 unique words in the training set.
There are 232734 samples of 25112 unique words in the testing set.
There are 5521 words in the test set that are missing in the training set.


#### Accessing word and tag Sequences
The `Dataset.X` and `Dataset.Y` attributes provide access to ordered collections of matching word and tag sequences for each sentence in the dataset.

In [7]:
# accessing words with Dataset.X and tags with Dataset.Y 
for i in range(2):    
    print("Sentence {}:".format(i + 1), data.X[i])
    print()
    print("Labels {}:".format(i + 1), data.Y[i])
    print()

Sentence 1: ('Mr.', 'Podger', 'had', 'thanked', 'him', 'gravely', ',', 'and', 'now', 'he', 'made', 'use', 'of', 'the', 'advice', '.')

Labels 1: ('NOUN', 'NOUN', 'VERB', 'VERB', 'PRON', 'ADV', '.', 'CONJ', 'ADV', 'PRON', 'VERB', 'NOUN', 'ADP', 'DET', 'NOUN', '.')

Sentence 2: ('But', 'there', 'seemed', 'to', 'be', 'some', 'difference', 'of', 'opinion', 'as', 'to', 'how', 'far', 'the', 'board', 'should', 'go', ',', 'and', 'whose', 'advice', 'it', 'should', 'follow', '.')

Labels 2: ('CONJ', 'PRT', 'VERB', 'PRT', 'VERB', 'DET', 'NOUN', 'ADP', 'NOUN', 'ADP', 'ADP', 'ADV', 'ADV', 'DET', 'NOUN', 'VERB', 'VERB', '.', 'CONJ', 'DET', 'NOUN', 'PRON', 'VERB', 'VERB', '.')



#### Accessing (word, tag) Samples
The `Dataset.stream()` method returns an iterator that chains together every pair of (word, tag) entries across all sentences in the entire corpus.

In [8]:
# use Dataset.stream() (word, tag) samples for the entire corpus
print("\nStream (word, tag) pairs:\n")
total = 0
for i, pair in enumerate(data.training_set.stream()):
    total += i
    #print(total)
    print("\t", pair)
    if i > 100: break


Stream (word, tag) pairs:

	 ('Whenever', 'ADV')
	 ('artists', 'NOUN')
	 (',', '.')
	 ('indeed', 'ADV')
	 (',', '.')
	 ('turned', 'VERB')
	 ('to', 'ADP')
	 ('actual', 'ADJ')
	 ('representations', 'NOUN')
	 ('or', 'CONJ')
	 ('molded', 'VERB')
	 ('three-dimensional', 'ADJ')
	 ('figures', 'NOUN')
	 (',', '.')
	 ('which', 'DET')
	 ('were', 'VERB')
	 ('rare', 'ADJ')
	 ('down', 'PRT')
	 ('to', 'ADP')
	 ('800', 'NUM')
	 ('B.C.', 'NOUN')
	 (',', '.')
	 ('they', 'PRON')
	 ('tended', 'VERB')
	 ('to', 'PRT')
	 ('reflect', 'VERB')
	 ('reality', 'NOUN')
	 ('(', '.')
	 ('see', 'VERB')
	 ('Plate', 'NOUN')
	 ('6a', 'NUM')
	 (',', '.')
	 ('9b', 'NUM')
	 (')', '.')
	 (';', '.')
	 (';', '.')
	 ('For', 'ADP')
	 ('almost', 'ADV')
	 ('two', 'NUM')
	 ('months', 'NOUN')
	 (',', '.')
	 ('the', 'DET')
	 ('defendant', 'NOUN')
	 ('and', 'CONJ')
	 ('the', 'DET')
	 ('world', 'NOUN')
	 ('heard', 'VERB')
	 ('from', 'ADP')
	 ('individuals', 'NOUN')
	 ('escaped', 'VERB')
	 ('from', 'ADP')
	 ('the', 'DET')
	 ('grave', 


For both our baseline tagger and the HMM model we'll build, we need to estimate the frequency of tags & words from the frequency counts of observations in the training corpus. In the next several cells you will complete functions to compute the counts of several sets of counts. 

## Step 2: Build a Most Frequent Class tagger
---

Perhaps the simplest tagger (and a good baseline for tagger performance) is to simply choose the tag most frequently assigned to each word. This "most frequent class" tagger inspects each observed word in the sequence and assigns it the label that was most often assigned to that word in the corpus.

### IMPLEMENTATION: Pair Counts

Complete the function below that computes the joint frequency counts for two input sequences.

In [9]:
def pair_counts(sequences_A, sequences_B):
    """Return a dictionary keyed to each unique value in the first sequence list
    that counts the number of occurrences of the corresponding value from the
    second sequences list.
    
    For example, if sequences_A is tags and sequences_B is the corresponding
    words, then if 1244 sequences contain the word "time" tagged as a NOUN, then
    you should return a dictionary such that pair_counts[NOUN][time] == 1244
    """
    # TODO: Finish this function!
    unique_count_dict = defaultdict(lambda:defaultdict(int)) #returns a default dictionary of count 0 if no sequences of the word are found
    for tag,word in zip(sequences_A,sequences_B):
        unique_count_dict[tag][word] += 1
    return unique_count_dict


# Calculate C(t_i, w_i)
tag_seq = [tag for (word,tag) in data.training_set.stream()]
word_seq = [word for (word,tag) in data.training_set.stream()]
emission_counts = pair_counts(tag_seq,word_seq)

assert len(emission_counts) == 12, \
       "Uh oh. There should be 12 tags in your dictionary."
assert max(emission_counts["NOUN"], key=emission_counts["NOUN"].get) == 'time', \
       "Hmmm...'time' is expected to be the most common NOUN."
HTML('<div class="alert alert-block alert-success">Your emission counts look good!</div>')

### IMPLEMENTATION: Most Frequent Class Tagger

Use the `pair_counts()` function and the training dataset to find the most frequent class label for each word in the training data, and populate the `mfc_table` below. The table keys should be words, and the values should be the appropriate tag string.

The `MFCTagger` class is provided to mock the interface of Pomegranite HMM models so that they can be used interchangeably.

In [10]:
word_counts1 = pair_counts(word_seq,tag_seq)
word_counts1 
for key,value in word_counts1.items():
    
    max_item = max(value.values())
    print(key,max_item)
    for pos,count in value.items():
        if count == max_item:
            print(pos)


Whenever 12
ADV
artists 34
NOUN
, 46499
.
indeed 92
ADV
turned 264
VERB
to 11784
PRT
actual 77
ADJ
representations 7
NOUN
or 3218
CONJ
molded 12
VERB
three-dimensional 8
ADJ
figures 69
NOUN
which 2844
DET
were 2596
VERB
rare 33
ADJ
down 543
PRT
800 10
NUM
B.C. 13
NOUN
they 2235
PRON
tended 18
VERB
reflect 21
VERB
reality 67
NOUN
( 1923
.
see 572
VERB
Plate 2
NOUN
6a 1
NUM
9b 1
NUM
) 1949
.
; 4418
.
For 522
ADP
almost 302
ADV
two 1051
NUM
months 142
NOUN
the 50287
DET
defendant 6
NOUN
and 22369
CONJ
world 548
NOUN
heard 189
VERB
from 3393
ADP
individuals 56
NOUN
escaped 15
VERB
grave 16
ADJ
about 972
ADP
fathers 9
NOUN
mothers 23
NOUN
graybeards 1
NOUN
adolescents 5
NOUN
babies 7
NOUN
starved 3
VERB
beaten 9
VERB
death 208
NOUN
strangled 5
VERB
machine-gunned 1
VERB
gassed 1
VERB
burned 31
VERB
. 39500
.
Clearer 1
ADJ
meaning 95
NOUN
Yes 69
ADV
gentlemen 18
NOUN
I 4140
PRON
am 176
VERB
getting 135
VERB
point 287
NOUN
my 951
DET
About 34
ADV
same 542
ADJ
time 1275
NOUN
Alleghenies 1
NOUN

Madeira 1
NOUN
Today's 6
NOUN
Voice 5
NOUN
incorrect 3
ADJ
Fromm's 14
NOUN
analysis 84
NOUN
alienation 20
NOUN
sphere 18
NOUN
production 116
NOUN
centers 37
NOUN
around 249
ADP
concepts 25
NOUN
bureaucratization 1
NOUN
corporation 52
NOUN
separation 13
NOUN
ownership 17
NOUN
broad 70
ADJ
view 129
NOUN
corporate 12
ADJ
ineffective 3
ADJ
dispersion 3
NOUN
entered 86
VERB
First 68
ADJ
World 88
NOUN
War 132
NOUN
Baker 25
NOUN
made 897
VERB
Draft 8
NOUN
1917 8
NUM
prohibited 5
VERB
sale 34
NOUN
liquor 30
NOUN
uniform 22
ADJ
provided 94
VERB
zones 2
NOUN
camps 17
NOUN
prostitution 7
NOUN
outlawed 3
VERB
reasons 79
NOUN
makes 132
VERB
superbly 8
ADV
useful 47
ADJ
psychiatric 3
ADJ
interview 26
NOUN
fact 372
NOUN
successfully 25
ADV
suggest 42
VERB
ways 97
NOUN
speed 54
NOUN
progress 87
NOUN
therapy 8
NOUN
bill 63
NOUN
passed 126
VERB
Assembly 22
NOUN
May 70
NOUN
pending 9
ADJ
careful 49
ADJ
scrutiny 12
NOUN
Are 50
VERB
partner 27
NOUN
connective 1
NOUN
ADJ
system 316
NOUN
network 27
NOUN
tail

hydroxyl-rich 2
ADJ
polyisocyanates 1
NOUN
tolylene 1
NOUN
diisocyanate 1
NOUN
popular 81
ADJ
remained 85
VERB
aside 45
ADV
Social 19
ADJ
widely 47
ADV
favored 16
VERB
gives 91
VERB
mind 232
NOUN
combinations 18
NOUN
old 452
ADJ
memories 13
NOUN
ideas 116
NOUN
experiences 40
NOUN
project 62
NOUN
environment 32
NOUN
ever-changing 4
ADJ
dance 57
NOUN
title 46
NOUN
nonstop 1
NOUN
format 3
NOUN
admirable 6
ADJ
continuity 21
NOUN
demands 36
NOUN
pauses 1
NOUN
identification 38
NOUN
dug 13
VERB
speech 57
NOUN
earlier 56
ADJ
Congressman 12
NOUN
decrying 1
VERB
statues 6
NOUN
monuments 6
NOUN
memorials 1
NOUN
dot 7
NOUN
landscape 16
NOUN
patriotic 9
ADJ
societies 28
NOUN
zealous 3
ADJ
constantly 31
ADV
hatching 2
VERB
NOUN
plans 80
NOUN
Bed 1
NOUN
slats 1
NOUN
washed 29
VERB
alum 1
NOUN
legs 53
NOUN
beds 8
NOUN
placed 98
VERB
cups 11
NOUN
kerosene 5
NOUN
woodwork 5
NOUN
liberally 4
ADV
corrosive 4
ADJ
sublimate 1
NOUN
feather 5
NOUN
charges 33
NOUN
electron 21
NOUN
proton 2
NOUN
believed 62
VE

NOUN
utility 23
NOUN
demanded 32
VERB
Lord 63
NOUN
confronted 25
VERB
loan 32
NOUN
Government 110
NOUN
India 43
NOUN
subsection 12
NOUN
G 11
NOUN
104 9
NUM
rupee 6
NOUN
equivalent 32
ADJ
$538 2
NOUN
financing 16
VERB
projects 47
NOUN
promote 26
VERB
balanced 19
VERB
mutually 9
ADV
agreed 65
VERB
return 74
NOUN
offered 67
VERB
U. 46
NOUN
S. 84
NOUN
Treasury 33
NOUN
bonds 33
NOUN
Warsaw 9
NOUN
gosh 1
PRT
want 255
VERB
promises 10
VERB
drinking 34
VERB
sickness 5
NOUN
Sat 1
VERB
sang 21
VERB
thinner 6
ADJ
5,500 1
NUM
private 148
ADJ
brand 10
NOUN
competition 44
NOUN
formulated 10
VERB
long-term 26
NOUN
researching 1
VERB
flux 21
NOUN
f 4
NOUN
nearest 14
ADJ
neighbors 31
NOUN
h 2
NOUN
e 2
NOUN
six 146
NUM
p 24
NOUN
remaining 34
VERB
intermediate 16
ADJ
numbers 91
NOUN
food 103
NOUN
Louchheim 1
NOUN
phase 53
NOUN
globetrotter 1
NOUN
knows 72
VERB
sections 54
NOUN
5 120
NUM
supplements 5
NOUN
hunch 4
NOUN
Willings 5
NOUN
ambush 5
NOUN
went 391
VERB
rules 57
NOUN
fairness 4
NOUN
Eyes 9
NOUN
h

Saturday 50
NOUN
orchestra 32
NOUN
sensibly 3
ADV
apparently 80
ADV
decent 14
ADJ
condition 71
NOUN
improved 42
VERB
November 61
NOUN
1887 2
NUM
connecting 5
VERB
several 269
ADJ
dwelling 6
NOUN
Dorset 3
NOUN
extended 46
VERB
kale 1
NOUN
hocking 1
VERB
respectable 15
ADJ
uptown 2
NOUN
returning 26
VERB
discretion 11
NOUN
moderation 3
NOUN
vague 17
ADJ
golden 22
ADJ
fading 4
VERB
rising 48
VERB
distortions 2
NOUN
window 103
NOUN
Third 11
ADJ
offer 54
VERB
adapted 11
VERB
country's 10
NOUN
spiritually 5
ADV
ran 109
VERB
Pete 19
NOUN
Vivian 9
NOUN
sympathy 26
NOUN
Gladden 3
NOUN
outspoken 6
ADJ
critic 21
NOUN
union's 2
NOUN
battle 62
NOUN
teamsters 6
NOUN
firemen 3
NOUN
1959 81
NUM
extracting 4
VERB
roleplaying 12
NOUN
previous 65
ADJ
twofold 2
ADJ
Middle 19
ADJ
Iraq 3
NOUN
Syria 2
NOUN
Egypt 13
NOUN
camp 47
NOUN
Lena 2
NOUN
Faber 4
NOUN
Hanover-Chalidale 1
NOUN
2:33 3
NUM
5.7 3
NUM
sets 33
NOUN
decline 17
NOUN
radios 5
NOUN
1961 98
NUM
nearly 103
ADV
15.0 1
NUM
- 51
ADP
15.5 1
NUM
paraps

reason 183
NOUN
failure 67
NOUN
treatments 8
NOUN
eliminate 24
VERB
nonspecific 14
ADJ
staining 19
NOUN
conjugates 10
NOUN
unnamed 3
ADJ
unnameable 1
ADJ
harbored 2
VERB
emphasized 17
VERB
dates 19
NOUN
adoption 8
NOUN
interim 5
ADJ
NOUN
budgets 5
NOUN
governmental 18
ADJ
pathetic 4
ADJ
Pausing 3
VERB
ask 90
VERB
further 74
ADJ
servant 14
NOUN
patient 56
NOUN
breathes 1
VERB
mouthpiece 7
NOUN
treadmill 1
NOUN
medically 2
ADV
tight 18
ADJ
naval 16
ADJ
blockade 6
NOUN
Cuban 11
ADJ
ports 4
NOUN
approaches 12
NOUN
waters 33
NOUN
task 47
NOUN
carrier 6
NOUN
100 60
NUM
destroyers 2
NOUN
Christ 74
NOUN
glory 13
NOUN
Incorporated 1
VERB
prepared 86
VERB
feed 50
VERB
likelihood 7
NOUN
events 87
NOUN
coincide 9
VERB
desired 39
VERB
clue 14
NOUN
cosmical 1
ADJ
plays 28
VERB
three-fold 2
ADJ
theater-going 1
NOUN
theatrical 12
ADJ
willing 57
ADJ
Although 97
ADP
startling 18
ADJ
developments 31
NOUN
shotgun 7
NOUN
invitations 12
NOUN
Molly's 2
NOUN
debut 11
NOUN
tea 23
NOUN
afternoon 77
NOUN
Arts 17

nursery 8
NOUN
Stearns 2
NOUN
grace 26
NOUN
omitted 10
VERB
recruiter 1
NOUN
aye 1
ADV
nay 1
ADV
intimate 15
ADJ
mechanics 17
NOUN
retorted 2
VERB
confirming 2
VERB
Alex's 9
NOUN
anticipations 2
NOUN
Anglican 8
ADJ
clergyman 7
NOUN
Oxford 11
NOUN
sadly 7
ADV
frankly 9
ADV
acknowledged 12
VERB
Along 6
ADP
self-satisfaction 3
NOUN
sensed 12
VERB
tension 47
NOUN
involutions 2
NOUN
projective 1
ADJ
dimensions 25
NOUN
Rachel 36
NOUN
clucked 1
VERB
tongue 29
NOUN
sunrise 6
NOUN
sunset 9
NOUN
someone 66
NOUN
build 69
VERB
workbench 7
NOUN
structure 73
NOUN
composition 17
NOUN
Table 42
NOUN
7-1 5
NUM
0.70 1
NUM
independent 54
ADJ
Jess 28
NOUN
researcher 2
NOUN
medium 29
NOUN
We're 21
PRT
tensely 3
ADV
Metabolite 1
NOUN
isolated 26
VERB
indicates 29
VERB
endogenous 1
ADJ
metabolite 5
NOUN
unlocked 11
VERB
Bridget 18
NOUN
upstairs 19
ADV
attic 12
NOUN
protocol 3
NOUN
agreeing 6
VERB
Gursel 3
NOUN
indicators 6
NOUN
explosion 13
NOUN
crashed 11
VERB
echoes 7
NOUN
rolling 16
VERB
hills 32
NOUN
Hier

VERB
brisk 7
ADJ
breeze 10
NOUN
Before 56
ADP
fight 45
NOUN
Harlem 13
NOUN
concussion 1
NOUN
Pleasure 2
NOUN
boating 14
VERB
scooting 1
VERB
crisp 3
ADJ
NOUN
breezes 2
NOUN
craft 15
NOUN
ocean 26
NOUN
lake 10
NOUN
reservoir 8
NOUN
worry 35
VERB
Only 72
ADV
advanced 38
VERB
Pakistan 7
NOUN
systematic 11
ADJ
Be 26
VERB
drops 8
NOUN
backward 13
ADV
bolt 5
NOUN
securely 3
ADV
rear 22
ADJ
couch 10
NOUN
See-through 1
ADJ
chairs 20
NOUN
combines 7
VERB
nostalgic 5
ADJ
ladder 17
NOUN
shoji 1
NOUN
flavor 11
NOUN
fraud 5
NOUN
fourteen 22
NUM
spend 41
VERB
mid-September 2
NOUN
Dean 23
NOUN
conspicuous 4
ADJ
spotted 13
VERB
Numbers 4
NOUN
multiplied 6
VERB
troubled 25
VERB
switching 6
VERB
arrangement 28
NOUN
reverses 1
VERB
cycle 19
NOUN
literally 16
ADV
extracted 7
VERB
indoors 5
ADV
Flanked 1
VERB
urns 2
NOUN
alabaster 3
NOUN
lamps 5
NOUN
tribal 3
ADJ
plunged 12
VERB
Command 8
NOUN
Initially 4
ADV
wastewater 2
NOUN
18-month 1
ADJ
locking 23
VERB
axis 28
NOUN
accelerometer 14
NOUN
perpendicular 

hesitation 5
NOUN
permission 21
NOUN
visa 3
NOUN
re-enter 5
VERB
whereupon 6
ADP
backpack 1
NOUN
guardhouse 1
NOUN
typing 7
VERB
tales 15
NOUN
lethargy 4
NOUN
invited 23
VERB
Prime 4
ADJ
Minister 14
NOUN
Gross 27
NOUN
clean-top 1
ADJ
manila 1
NOUN
folder 1
NOUN
Shell 6
NOUN
schooled 1
VERB
examine 27
VERB
prognosis 2
NOUN
Erich 4
NOUN
Auerbach's 1
NOUN
impressive 36
ADJ
Mimesis 2
NOUN
architects 7
NOUN
interior 26
ADJ
architect 15
NOUN
incapable 9
ADJ
architectural 8
ADJ
principles 52
NOUN
king 24
NOUN
possessions 9
NOUN
Barton 21
NOUN
falling 22
VERB
Fresh 6
ADJ
juicy 4
ADJ
lovin 1
VERB
sixteen 15
NUM
marry 12
VERB
majority 40
NOUN
Anglo-Saxon 15
ADJ
descent 8
NOUN
turns 16
VERB
tastes 4
VERB
dishwater 1
NOUN
concessionaire 2
NOUN
subspecies 3
NOUN
explain 47
VERB
half-hearted 1
ADJ
invasion 11
NOUN
symptomatic 5
ADJ
invoked 6
VERB
felonious 2
ADJ
wishful 7
ADJ
store 50
NOUN
deliberations 6
NOUN
committees 15
NOUN
commissions 8
NOUN
Straight 1
ADJ
vertical 13
ADJ
seam 7
NOUN
sock 3
NO

NOUN
Fairbrothers 2
NOUN
acknowledge 10
VERB
disgusting 4
ADJ
swollen 7
ADJ
admit 28
VERB
dearly 3
ADV
musicians 31
NOUN
voices 32
NOUN
controls 20
NOUN
novelist 10
NOUN
psychoanalysis 3
NOUN
Oedipus 9
NOUN
Fog 1
NOUN
Demagogues 1
NOUN
communist 4
NOUN
bogeys 2
NOUN
lurking 3
VERB
stereotyped 4
VERB
baseball's 3
NOUN
1927 17
NUM
incorrigible 1
ADJ
epicure 2
NOUN
incredible 19
ADJ
athlete 8
NOUN
Babe 6
NOUN
Ruth 20
NOUN
deposed 1
VERB
Circular 2
ADJ
NOUN
subjected 19
VERB
restorative 7
ADJ
knit 5
VERB
shrinkage 1
NOUN
Remarks 4
NOUN
Hon. 3
ADJ
Price 7
NOUN
bomber 6
NOUN
rocket 6
NOUN
faint 21
ADJ
speck 5
NOUN
hurtling 5
VERB
420 3
NUM
m.p.h. 4
NOUN
luminosity 1
NOUN
stimulation 11
NOUN
luminescent 1
ADJ
organism 5
NOUN
Noctiluca 1
NOUN
miliaris 1
NOUN
turbulence 3
NOUN
baggy 2
ADJ
swear 9
VERB
unions 20
NOUN
monopoly 8
NOUN
historian 22
NOUN
correctness 3
NOUN
fogged 1
VERB
disturbing 13
ADJ
listener 8
NOUN
overpowered 2
VERB
tactual 7
ADJ
fundamentally 7
ADV
undamaged 1
ADJ
directions 

NOUN
Maggie 20
NOUN
Brien 1
NOUN
Rev. 18
NOUN
Godfrey 1
NOUN
Burr 5
NOUN
vicar 3
NOUN
Rushall 1
NOUN
Staffordshire 1
NOUN
evidently 18
ADV
helped 53
VERB
deepen 1
VERB
seekers 2
NOUN
postulate 3
VERB
Near 7
ADP
seventeen 17
NUM
spurt 2
NOUN
matched 14
VERB
shock 23
NOUN
ol 2
ADJ
Slater 4
NOUN
buy 58
VERB
Retired 1
VERB
$94 1
NOUN
partly 40
ADV
Halting 1
VERB
deliberately 24
ADV
Granted 3
VERB
Tammany 5
NOUN
tiger 2
NOUN
badges 1
NOUN
sachems 2
NOUN
newspapers 31
NOUN
composed 27
VERB
radical 18
ADJ
novelty 4
NOUN
gathering 12
NOUN
Polish 9
ADJ
dirty 25
ADJ
handkerchief 7
NOUN
humor 36
NOUN
trembling 21
VERB
Village 17
NOUN
Towsley 1
NOUN
telegrapher 2
NOUN
Hard's 1
NOUN
drugstore 3
NOUN
Calcium 2
NOUN
phosphorus 1
NOUN
iron 27
NOUN
worthwhile 8
ADJ
trace 9
NOUN
Horace's 1
NOUN
imperishable 1
ADJ
shines 3
VERB
lucid 4
ADJ
Augustan 1
ADJ
bare-armed 1
ADJ
beard 17
NOUN
preceded 11
VERB
214,938 1
NUM
publicizing 2
VERB
distributed 26
VERB
bulletin 4
NOUN
gainer 1
NOUN
threats 10
NOUN
Those

cameras 7
NOUN
lenses 3
NOUN
clutched 7
VERB
straw 12
NOUN
suitcase 15
NOUN
shores 8
NOUN
steep 6
NOUN
ADJ
sympathized 1
VERB
pastor 14
NOUN
respondents 7
NOUN
guard's 4
NOUN
uncovered 6
VERB
Abraham 5
NOUN
Wharf 2
NOUN
similarly 14
ADV
inflexible 2
ADJ
rigidity 2
NOUN
Sixteen 2
NUM
1-1/2 4
NUM
25-ft. 1
ADJ
spokes 1
NOUN
tires 8
NOUN
wheels 16
NOUN
Seven 16
NUM
prisoners 14
NOUN
sentenced 8
VERB
heresy 1
NOUN
Macklin's 1
NOUN
coat 36
NOUN
holster 7
NOUN
Birkhead 3
NOUN
challenged 8
VERB
Timex 2
NOUN
All-Star 1
ADJ
Jazz 16
NOUN
styles 17
NOUN
Lionel 2
NOUN
Hampton's 1
NOUN
free-wheeling 1
ADJ
Dukes 2
NOUN
Dixieland 3
NOUN
awards 14
NOUN
rata 2
X
accrued 3
VERB
recognize 48
VERB
initiative 25
NOUN
three-hour 3
ADJ
coast-to-coast 1
NOUN
Father 20
NOUN
Coughlin 6
NOUN
pre-empting 1
ADJ
Eddie 24
NOUN
Cantor 2
NOUN
Merry-go-round 1
NOUN
Bowes 1
NOUN
despot 2
NOUN
yore 1
NOUN
DET
diary 3
NOUN
raiding 2
VERB
larder 1
NOUN
ransacking 1
VERB
lingerie 2
NOUN
thermos 1
NOUN
Mmm 1
PRT
Havana 13
NOU

swells 3
NOUN
recoil 2
NOUN
psyche 4
NOUN
theirs 15
PRON
paradox 9
NOUN
hates 3
VERB
Think 5
VERB
continentally 1
ADV
counseled 3
VERB
diagonalizable 13
ADJ
i 2
NOUN
Molvar 2
NOUN
reiterating 1
VERB
Company's 7
NOUN
Directors 1
NOUN
coupled 10
VERB
convert 6
VERB
heathen 1
ADJ
belligerent 5
ADJ
arm's 2
NOUN
lady's 5
NOUN
unmagnified 1
ADJ
bugged 2
VERB
DeMontez 1
NOUN
Plastics 3
NOUN
built-in 2
ADJ
optional 3
ADJ
transparency 2
NOUN
translucency 1
NOUN
corrosion 4
NOUN
fabrication 4
NOUN
performed 25
VERB
Mahayanist 1
NOUN
parade 21
NOUN
Greene 12
NOUN
Homestead 1
NOUN
Linus 1
NOUN
Pauling 6
NOUN
Nobel 4
NOUN
Prize 7
NOUN
winner 5
NOUN
choose 34
VERB
Early 14
ADJ
furor 2
NOUN
sombre 1
ADJ
illuminations 1
NOUN
piercing 3
VERB
radiance 3
NOUN
Caravaggio 2
NOUN
Dirt 1
NOUN
particulate 3
ADJ
finely 4
ADV
colloidal 2
ADJ
hardware 7
NOUN
mill 6
NOUN
Riverboat 1
NOUN
Dalzell-Cousin 1
NOUN
2:38 2
NUM
nomenclature 6
NOUN
Tetanus 1
NOUN
pouring 9
VERB
turpentine 3
NOUN
accomplish 18
VERB
Interio

NOUN
rivers 7
NOUN
Bully 1
ADJ
Gibby 6
NOUN
Preliminary 2
ADJ
tentatively 5
ADV
molecule 5
NOUN
cleaved 1
VERB
amide 1
NOUN
Spencer 22
NOUN
Fairmont 2
NOUN
Turnpike 6
NOUN
unfounded 2
ADJ
Tsar's 2
NOUN
Boris 9
NOUN
halfhearted 1
ADJ
acclamation 1
NOUN
fury 13
NOUN
Victorian 6
ADJ
Haven't 3
VERB
evokes 4
VERB
resonances 3
NOUN
Mann's 7
NOUN
pp. 4
NOUN
78-79 1
NUM
despair 16
NOUN
Shann 5
NOUN
Delegates' 1
NOUN
Tribune 9
NOUN
understandable 11
ADJ
diocesan 4
ADJ
acted 13
VERB
divisive 4
ADJ
Italo 1
NOUN
Svevo 1
NOUN
Douglas 17
NOUN
bevy 3
NOUN
sashayed 1
VERB
staircases 1
NOUN
Amsterdam 1
NOUN
N.Y. 13
NOUN
Blackwell's 1
PRT
somewheres 1
ADV
Tshombe 7
NOUN
Gigenza 1
NOUN
unwelcome 3
ADJ
Castro 28
NOUN
mortification 1
NOUN
Colombian 1
NOUN
ADJ
Ten 11
NUM
O'Clock 2
ADV
to-day 4
NOUN
Child 2
NOUN
apologized 4
VERB
editor 66
NOUN
Palfrey's 8
NOUN
advertise 3
VERB
Dominican 6
NOUN
professedly 2
ADV
benevolent 2
ADJ
dictatorship 10
NOUN
encroaching 2
VERB
photographer 3
NOUN
throne 3
NOUN
unbidd

Luis 8
NOUN
Hernandez 5
NOUN
efficaciously 2
ADV
unavoidably 3
ADV
rosy 6
ADJ
visions 6
NOUN
cheered 2
VERB
nephews 4
NOUN
linen-covered 1
ADJ
indexes 2
NOUN
covering 27
VERB
they'd 22
PRT
Unlike 8
ADP
decorative 8
ADJ
static 11
ADJ
necessitate 4
VERB
predictive 3
ADJ
forecasting 5
VERB
improving 10
VERB
know-how 4
NOUN
Raoul 2
NOUN
lent 5
VERB
Watch 4
VERB
Rhine 6
NOUN
aerosol 6
NOUN
causative 1
ADJ
epidemic 8
NOUN
typhus 3
NOUN
Rickettsia 1
NOUN
prowazwki 1
NOUN
therefrom 3
ADV
supporting 24
VERB
Capet 2
NOUN
analogy 11
NOUN
seventh 10
ADJ
withdrew 6
VERB
Connecticut 10
NOUN
occipital 2
ADJ
lobe 1
NOUN
cerebellum 1
NOUN
acute 12
ADJ
embarrassment 7
NOUN
Ike 3
NOUN
usher 1
NOUN
VERB
fur 8
NOUN
Violet 4
NOUN
whinnied 1
VERB
Computers 1
NOUN
blend 5
NOUN
cultures 11
NOUN
artfulness 1
NOUN
adopting 11
VERB
lending 4
VERB
$40,000,000 1
NOUN
$26,000,000 1
NOUN
Appeal 5
NOUN
renders 1
VERB
overturned 2
VERB
Bultmann's 7
NOUN
adhered 4
VERB
pronouncements 2
NOUN
exaggerate 7
VERB
mythologica

NOUN
termed 13
VERB
biologist 1
NOUN
subtracted 4
VERB
Income 9
NOUN
Adjusted 7
VERB
tenuous 6
ADJ
purport 2
VERB
dervishes 3
NOUN
Walking 3
VERB
Harbor's 2
NOUN
wearying 1
VERB
deficit 8
NOUN
Kimmell 2
NOUN
checks 14
NOUN
payable 2
ADJ
Hun 2
NOUN
decorator 4
NOUN
two-hour 4
ADJ
cordial 4
ADJ
Arizona 9
NOUN
Convulsively 1
ADV
Gaston 2
NOUN
Berche 1
NOUN
crimsoning 1
VERB
frilly 1
ADJ
waistcoat 1
NOUN
subtype 3
NOUN
truism 3
NOUN
Stuttgart 1
NOUN
contributor 2
NOUN
subdivision 7
NOUN
satisfies 3
VERB
refresher 1
NOUN
unexpected 22
ADJ
Delinquency 3
NOUN
all-time 3
ADJ
tragedies 3
NOUN
Chartres 2
NOUN
slave's 1
NOUN
coroner's 4
NOUN
Dandy's 1
NOUN
accusation 3
NOUN
adjudged 1
VERB
aberration 3
NOUN
disregarded 3
VERB
petty 6
ADJ
jealousies 1
NOUN
grievances 3
NOUN
trade-mark 1
NOUN
sesame 3
NOUN
oozed 2
VERB
watery 3
ADJ
fluid 16
NOUN
healing 3
NOUN
bathing 12
VERB
epsom 1
NOUN
salts 5
NOUN
advocating 6
VERB
prosodic 3
ADJ
Igbo 3
NOUN
phonology 4
NOUN
logic 17
NOUN
cafes 4
NOUN
relax 15


bridal 1
NOUN
Stevenses' 1
NOUN
eleventh 3
ADJ
globes 1
NOUN
grip 18
NOUN
skating 1
VERB
hiking 1
NOUN
VERB
Sewing 1
VERB
numbness 1
NOUN
ultimately 13
ADV
frees 1
VERB
old-fashioned 7
ADJ
Slackened 1
VERB
region's 3
NOUN
programmer 2
NOUN
six-man 1
ADJ
B-52 5
NOUN
go-to-war 1
ADJ
connotation 4
NOUN
Bake 7
NOUN
Skolman 3
NOUN
rigged 3
VERB
infrequent 4
ADJ
topics 10
NOUN
cites 9
VERB
survey-type 2
ADJ
Unique 1
ADJ
Fritzie 6
NOUN
Dalton 4
NOUN
controversies 1
NOUN
axes 4
NOUN
grind 2
VERB
Newbury 3
NOUN
Newburyport 3
NOUN
Salisbury 5
NOUN
creators 2
NOUN
Denials 1
NOUN
continuance 5
NOUN
mistrial 2
NOUN
acquittal 2
NOUN
Teller 3
NOUN
$800 6
NOUN
Sears 1
NOUN
organizational 5
ADJ
avail 2
VERB
pertaining 4
VERB
Ryan 11
NOUN
loud-voiced 1
ADJ
maintains 11
VERB
pepper 12
NOUN
Dulles 8
NOUN
appeasement 3
NOUN
calamity 2
NOUN
pretending 10
VERB
Fortune 2
NOUN
Dean's 2
NOUN
1802 1
NUM
Means 12
NOUN
goddamned 2
VERB
mothers' 2
NOUN
sons 16
NOUN
afforded 8
VERB
pose 6
VERB
Nathaniel 1
NOUN
leave

SanAntonio 1
NOUN
Solicitor 2
NOUN
Abatuno 1
NOUN
Northwest 8
NOUN
Prairie 11
NOUN
Chien 6
NOUN
ticking 1
VERB
swears 1
VERB
whoever 4
PRON
fold 4
NOUN
Grosse 8
NOUN
Dropping 1
VERB
felled 2
VERB
struggled 7
VERB
comrade 3
NOUN
Capable 2
ADJ
enduring 10
VERB
stout 1
ADJ
controversialists 1
NOUN
vitriol 1
NOUN
pens 2
NOUN
Available 5
ADJ
imaginative 9
ADJ
mythologies 2
NOUN
detectives 15
NOUN
strait 3
NOUN
Axis 3
NOUN
contemplating 4
VERB
poisonous 4
ADJ
noxious 1
ADJ
inhumane 4
ADJ
Mazowsze 1
NOUN
three-week 1
ADJ
starched 2
VERB
Equal 2
ADJ
upper-lower 1
ADJ
Serological 1
ADJ
crucified 2
VERB
nailed 7
VERB
25.3 1
NUM
Span 3
NOUN
osseous 1
ADJ
ballplayers 3
NOUN
veined 1
VERB
moth's 1
NOUN
rested 13
VERB
Scotty's 10
NOUN
Oslo 4
NOUN
candor 2
NOUN
Grandma's 2
NOUN
Mom 1
NOUN
Marr 5
NOUN
80 16
NUM
link 11
NOUN
gulf 4
NOUN
alcoholics 3
NOUN
parolees 2
NOUN
resembling 2
VERB
habitual 4
ADJ
bumptious 1
ADJ
open-handed 1
ADJ
greedy 4
ADJ
basking 2
VERB
sunshine 4
NOUN
mountains 28
NOUN
riche

VERB
aspire 3
VERB
rostrum 2
NOUN
congress 2
NOUN
Zionists 1
NOUN
convening 2
NOUN
presiding 6
VERB
Inauguration 4
NOUN
splenetic 1
ADJ
Carmack 1
NOUN
commute 9
VERB
necessities 12
NOUN
legitimately 2
ADV
overcomes 5
VERB
contour 6
NOUN
commentary 5
NOUN
repressive 2
ADJ
upper-middle-class 5
NOUN
submissive 3
ADJ
enlisted 9
VERB
condemn 3
VERB
coquette 2
NOUN
Bantus 3
NOUN
entails 6
VERB
casts 3
NOUN
Lotte 1
NOUN
Lehmann 1
NOUN
Bumbry 1
NOUN
Tannhaeuser 2
NOUN
bored 11
VERB
skipping 3
VERB
Fletcher 5
NOUN
Chi 1
NOUN
Northeast 6
NOUN
Schlesinger 2
NOUN
Department's 15
NOUN
Guthman 1
NOUN
Pulitzer 3
NOUN
CD 7
NOUN
centralized 6
VERB
Sheets 3
NOUN
Cereal 3
NOUN
grains 13
NOUN
three-fourths 2
NOUN
Beesemyers 1
NOUN
Arden 3
NOUN
Blvd. 4
NOUN
duffel 3
NOUN
Sahjunt 1
NOUN
Yoorick 1
NOUN
Roebuck 7
NOUN
Ships 2
NOUN
hull-first 1
ADV
Harvie 3
NOUN
mended 1
VERB
Buddhist 3
ADJ
joked 1
VERB
soldiery 1
NOUN
Macbeth 4
NOUN
policeman-murderer 1
NOUN
Burke's 1
NOUN
Hands 3
NOUN
Ottermole 1
NOUN
on-aga

generates 5
VERB
misinterpreted 2
VERB
equations 7
NOUN
Bloch 1
NOUN
high-pitched 6
ADJ
mais 1
X
mon 1
X
Dieu 1
X
flavoring 1
NOUN
halvah 1
NOUN
safest 3
ADJ
sanest 1
ADJ
Churchyard 2
NOUN
1901 2
NUM
anecdote 7
NOUN
seizure 6
NOUN
fumbled 5
VERB
playbacks 1
NOUN
Redstone 1
NOUN
implicitly 3
ADV
maturational 1
ADJ
counteract 4
VERB
gouging 3
VERB
endangering 3
VERB
hankered 1
VERB
Sum 1
NOUN
Substance 2
NOUN
system's 2
NOUN
jittery 1
ADJ
value-system 1
NOUN
correlated 3
VERB
transmitter 2
NOUN
Councilman 5
NOUN
Olson 1
NOUN
eminently 3
ADV
Domina 2
X
Nancy 5
NOUN
thermodynamic 2
ADJ
Gibbs 5
NOUN
Hurts 1
NOUN
disappears 2
VERB
what're 1
PRT
glutinous 1
ADJ
eyeball 2
NOUN
cinder 2
NOUN
allocable 1
ADJ
deposited 9
VERB
trustee 5
NOUN
spate 2
NOUN
vocalists 2
NOUN
Luther 5
NOUN
toured 2
VERB
Italy's 2
NOUN
Fiat 7
NOUN
typewriters 1
NOUN
lushes 1
NOUN
Continuous 3
ADJ
Claude 8
NOUN
now-famous 1
ADJ
Weider 4
NOUN
sector 10
NOUN
non-Christians 2
NOUN
widened 3
VERB
Hague 9
NOUN
Ione 1
NOUN
rig

plucking 1
VERB
attest 2
VERB
Speeches 1
NOUN
vogue 3
NOUN
Rotarians 1
NOUN
Miriani's 1
NOUN
Mayor-elect 1
NOUN
Cavanagh 1
NOUN
unfitting 3
ADJ
mid-June 2
NOUN
Kansas-Nebraska 1
NOUN
canvas 14
NOUN
Rossoff 5
NOUN
damnit 1
PRT
dusting 2
VERB
NOUN
rag 7
NOUN
snack 4
NOUN
dip 3
NOUN
architect's 2
NOUN
myocardial 2
ADJ
hypertrophied 1
VERB
irregular 6
ADJ
basophilic 2
ADJ
Willow 1
NOUN
Electric's 1
NOUN
Louisville 5
NOUN
Syracuse 1
NOUN
Pentagon 10
NOUN
Boeing 3
NOUN
Lockheed 2
NOUN
disorder 7
NOUN
Conservative 1
ADJ
Democratic-endorsed 2
ADJ
improperly 2
ADV
coordinator 5
NOUN
Francis 18
NOUN
Nolan 1
NOUN
3rd 4
ADJ
10-hour 1
ADJ
70 17
NUM
donate 3
VERB
elongated 4
VERB
elegant 12
ADJ
fountains 3
NOUN
russet-colored 1
ADJ
wandering 3
VERB
springtime 3
NOUN
entertaining 8
VERB
naughty 1
ADJ
creche 1
NOUN
Cat 1
NOUN
Flying 4
VERB
Time-Olivette 1
NOUN
padding 1
VERB
NOUN
potentially 6
ADV
harmful 3
ADJ
couches 1
NOUN
Denver-area 1
ADJ
Mays' 2
NOUN
Howsam 1
NOUN
blacked 2
VERB
Bears 9
NOUN
ind

ADJ
cone-sphere 2
NOUN
2-liter 2
ADJ
Erlenmeyer 1
NOUN
flask 4
NOUN
round-bottom 1
NOUN
deformation 5
NOUN
infinitesimal 3
ADJ
Charlotte's 2
NOUN
Ryusenji 7
NOUN
Swinburne 1
NOUN
affluent 2
ADJ
hyped-up 1
ADJ
autos 3
NOUN
lethal 5
ADJ
Chicken 4
NOUN
Superstition 1
NOUN
blended 3
VERB
cabinet 9
NOUN
Bradford 4
NOUN
Merrill 1
NOUN
S.S. 1
NOUN
Carvalho 2
NOUN
Gauer 2
NOUN
denominational 5
ADJ
polity 1
NOUN
Glenn 3
NOUN
10:05 1
NUM
P.M. 6
ADV
Slocum's 1
NOUN
Stanton 4
NOUN
negotiated 6
VERB
boosted 2
VERB
Beer 2
NOUN
brewed 1
VERB
Babylonians 2
NOUN
Egyptians 3
NOUN
certitudes 1
NOUN
imparted 3
VERB
anthropology 6
NOUN
overprotective 1
ADJ
Guatemalan 1
ADJ
Janus-faced 1
ADJ
Pragmatism 1
NOUN
Nashville 7
NOUN
1950s 2
NOUN
Devoted 1
VERB
cross-top 1
NOUN
pillow 8
NOUN
Katya 3
NOUN
Adaptation 1
NOUN
non-social 1
ADJ
occupational 5
ADJ
segregation 9
NOUN
Mauldin 1
NOUN
gouged 3
VERB
shin 3
NOUN
middle-sized 1
ADJ
pictorial 4
ADJ
utterances 1
NOUN
evasions 1
NOUN
unimpeachable 1
ADJ
non-propaga

$650 1
NOUN
lifeboats 1
NOUN
simple-minded 2
ADJ
Dogberry 1
NOUN
Verges 1
NOUN
jokes 6
NOUN
writers' 3
NOUN
condemns 3
VERB
Mattei 3
NOUN
Rome's 2
NOUN
Italo-American 1
ADJ
grownups 2
NOUN
loudest 2
ADJ
stroked 1
VERB
hens 3
NOUN
turkeys 1
NOUN
quacked 1
VERB
ducks 3
NOUN
cackled 3
VERB
geese 2
NOUN
Analysis 1
NOUN
baths 4
NOUN
lashings 1
NOUN
testicle 2
NOUN
crushers 1
NOUN
Charles' 4
NOUN
invert 1
VERB
irritability 1
NOUN
coeds 1
NOUN
ogled 1
VERB
gravy 3
NOUN
soloists 4
NOUN
Practically 4
ADV
permissibility 1
NOUN
introducing 7
VERB
all-white 4
ADJ
teetering 2
VERB
half-mincing 1
ADJ
25,000,000 1
NUM
50,000,000 1
NUM
crested 1
ADJ
three-sectioned 1
ADJ
registers 4
NOUN
marital 8
ADJ
rob 2
VERB
tenants 6
NOUN
hot-slough 1
ADJ
prowlers 1
NOUN
shoved 6
VERB
Barco 16
NOUN
confessing 3
VERB
Hypothalamic 1
ADJ
Angst 2
X
downgraded 3
VERB
retrogressive 1
ADJ
Soak 2
VERB
bucks 3
NOUN
buck 4
NOUN
petals 2
NOUN
Rabaul 1
NOUN
Gastronomes 1
NOUN
joins 2
VERB
Sloan's 5
NOUN
slower 6
ADJ
Sharpe's

onions 4
NOUN
mashed 2
VERB
disruptive 2
ADJ
appraising 1
VERB
Death's-Head 2
ADJ
Auxiliaries 2
NOUN
Einsatzkommandos 1
X
waded 2
VERB
knee-deep 1
ADV
dope-ridden 1
ADJ
maniacs 1
NOUN
Dancing 2
VERB
pre-1960 1
ADJ
admiration 7
NOUN
Cimabue's 1
NOUN
Poussin's 1
NOUN
assurances 3
NOUN
bitterness 14
NOUN
involution 6
NOUN
supposes 1
VERB
universally 6
ADV
reckons 1
VERB
imperfectability 1
NOUN
renovated 2
VERB
settler 1
NOUN
builtin 1
ADJ
job-seekers 1
NOUN
eight-year 1
ADJ
distinguishing 4
VERB
superlunary 1
ADJ
celestial 3
ADJ
sublunary 1
ADJ
terrestrial 6
ADJ
reinforcing 2
VERB
four-element 2
ADJ
Empedocles 2
NOUN
viz. 1
ADV
bossed 1
VERB
Governor's 4
NOUN
premise 5
NOUN
militarily 2
ADV
Floating 1
VERB
pulp 4
NOUN
OWI 1
NOUN
Paramount 1
ADJ
Pictures 4
NOUN
newsreel 1
NOUN
hopples 4
NOUN
searched 7
VERB
crumpled 4
VERB
tasted 8
VERB
wine- 1
NOUN
beer-cooling 1
NOUN
refreshments 2
NOUN
heir 6
NOUN
impersonal 12
ADJ
interdependence 6
NOUN
polarized 1
VERB
Elizabeth 12
NOUN
Delphine's 2
N

NOUN
Roof 1
NOUN
bracing 2
NOUN
Af-values 1
NOUN
crosses 3
VERB
Thaddeus 1
NOUN
Seymour 1
NOUN
ski 1
NOUN
crowning 2
VERB
beauteous 1
ADJ
damsel 1
NOUN
misleading 7
VERB
imputed 5
VERB
instrumentalities 2
NOUN
1731 1
NUM
1'' 3
NOUN
mandrel 1
NOUN
Persons 4
NOUN
faintly 5
ADV
silhouetted 3
VERB
Concetta's 2
NOUN
shrill 3
ADJ
rotundity 1
NOUN
Incumbent 2
ADJ
Brod 1
NOUN
re-election 2
NOUN
Njust 1
NOUN
Bubenik 1
NOUN
fifty-fifty 1
ADJ
unmarried 5
ADJ
Sue 4
NOUN
Lawless 1
NOUN
spider 2
NOUN
scratchy 1
ADJ
Hildy 1
NOUN
Weissman 1
NOUN
unquiet 1
ADJ
prevalence 3
NOUN
slang 2
NOUN
loquacity 1
NOUN
impossibility 1
NOUN
Leningrad 3
NOUN
furlough 2
NOUN
IBM 4
NOUN
worthless 3
ADJ
Predispositions 1
NOUN
togs 1
NOUN
Gregory's 1
NOUN
6-degrees-C 1
NOUN
elution 3
NOUN
Sober 3
NOUN
Peterson 8
NOUN
stock-market 1
NOUN
envisages 1
VERB
adapt 4
VERB
ruthless 5
ADJ
inefficiency 1
NOUN
overburdened 1
VERB
self-control 3
NOUN
culmination 4
NOUN
baklava 1
X
clogging 1
VERB
terminals 4
NOUN
Kipling 1
NOUN
ov

ADJ
mariner 1
NOUN
necromantic 1
ADJ
Sparta 1
NOUN
Gothic 3
ADJ
castle 4
NOUN
$27.50 1
NOUN
sparse 3
ADJ
luncheons 2
NOUN
excused 3
VERB
metaphysic 3
NOUN
mystical 4
ADJ
self-seeking 1
ADJ
sewers 4
NOUN
L-P 1
NOUN
DUF 3
NOUN
subroutines 1
NOUN
Karamazov 3
NOUN
Psychoanalytic 1
ADJ
Quarterly 1
NOUN
harassed 4
VERB
upheld 6
VERB
Walsh 2
NOUN
Tougas 1
NOUN
audio-visual 4
ADJ
superintendent's 2
NOUN
Bad 2
ADJ
seersucker 1
NOUN
half-witted 1
ADJ
mackintosh 1
NOUN
fucks 1
NOUN
Voter 1
NOUN
gallstones 1
NOUN
Debonnie 3
NOUN
:35.3 1
NUM
Characteristics 1
NOUN
loosens 1
VERB
parasol 2
NOUN
confounding 1
VERB
Aristotelian 2
ADJ
geocentric 1
ADJ
$12,500 2
NOUN
Leavitt's 1
NOUN
Spencerian 1
ADJ
feud 1
NOUN
brazen 1
ADJ
dishonor 2
NOUN
thrilling 4
ADJ
American-trained 1
ADJ
milks 2
VERB
translating 1
VERB
hay 15
NOUN
impotence 2
NOUN
Thant 2
NOUN
waylaid 1
VERB
Hawksley 9
NOUN
powerfully 2
ADV
O'Dwyers 2
NOUN
salve 2
NOUN
redder 2
ADJ
381(a) 3
NUM
subsidiary 5
NOUN
subcommittee 4
NOUN
Daddy 2
NOUN


skiddy 1
ADJ
greens 4
NOUN
flag-stick 1
NOUN
putt 2
NOUN
aspirin 3
NOUN
surly 1
ADJ
scowling 2
VERB
suffuse 1
VERB
multi-year 2
ADJ
anchorite 1
ADJ
strove 4
VERB
brim 3
NOUN
appetite 9
NOUN
Bleaching 1
VERB
grazed 1
VERB
transfusions 1
NOUN
prednisone 2
NOUN
tapering 2
VERB
Frisco 3
NOUN
Gorky 1
NOUN
Sonatas 3
NOUN
1-ton 1
NOUN
Bless 1
VERB
ginning 2
VERB
Dallas-based 1
ADJ
dispell 1
VERB
hazy 5
ADJ
fixture 3
NOUN
Ironically 2
ADV
assassin 6
NOUN
silky 1
ADJ
Billikens 2
NOUN
defeats 1
NOUN
defiance 5
NOUN
conversions 6
NOUN
lesion 2
NOUN
buggers 1
NOUN
Monte 3
NOUN
67 1
NUM
Brakes 2
NOUN
howled 1
VERB
blared 1
VERB
classmate 2
NOUN
Repnin 2
NOUN
longish 1
ADJ
arching 1
VERB
quizzical 1
ADJ
Mephistopheles 1
NOUN
Teeth 1
NOUN
flashing 5
VERB
uncombable 1
ADJ
nicotine-choked 1
ADJ
gaunt 4
ADJ
Tate 1
NOUN
Arigato 1
X
gosaimasu 1
X
Southeast 10
ADJ
state's-responsibility 1
NOUN
levy 4
NOUN
backdrop 1
NOUN
Cunningham's 2
NOUN
Summerspace 1
NOUN
shimmering 2
VERB
Stoic-patristic 1
ADJ
embark 

NOUN
theatergoer 1
NOUN
gilded 2
VERB
straightening 3
VERB
excessively 3
ADV
frictional 2
ADJ
loadings 3
NOUN
Wheelan's 1
NOUN
Study 4
VERB
Helps 1
NOUN
VERB
Technique 1
NOUN
talkin' 1
VERB
Continue 2
VERB
Knights 3
NOUN
Malta 3
NOUN
fulfills 2
VERB
Kingdom 6
NOUN
Benet's 1
NOUN
Steinbeck's 1
NOUN
Grapes 1
NOUN
Wrath 1
NOUN
prototype 3
NOUN
paraphrase 2
NOUN
walrus 1
NOUN
manic 1
ADJ
sealing 3
VERB
dooms 1
NOUN
commoners 1
NOUN
skippers 2
NOUN
leaps 2
NOUN
plumpness 4
NOUN
Fiedler's 1
NOUN
half-life 1
NOUN
demonstrating 3
VERB
wide-awake 1
ADJ
Caleb 1
NOUN
drafty 2
ADJ
hangar 1
NOUN
Proceeding 3
VERB
Parry's 1
NOUN
schemata 1
NOUN
legacy 4
NOUN
bards 1
NOUN
Brandywine 4
NOUN
statistically 2
ADV
questioningly 1
ADV
shackled 2
VERB
uselessly 1
ADV
atomisation 1
NOUN
dietary 4
ADJ
hyperplasia 1
NOUN
Aggie 1
NOUN
Theoretically 1
ADV
Ghoreyeb 1
NOUN
Karsner 1
NOUN
'13 1
NUM
Comroe 1
NOUN
'58 2
NUM
atop 2
ADV
ADP
intrinsically 3
ADV
unachievable 1
ADJ
operationally 1
ADV
bettering 1
VERB
Wha

valiantly 1
ADV
eight-thirty 1
NUM
Successful 1
ADJ
behaves 2
VERB
Circle 1
NOUN
Portuguese 1
ADJ
bandit 2
NOUN
brothers' 2
NOUN
remake 2
VERB
phenomenal 2
ADJ
horsepower 5
NOUN
uncommonly 1
ADV
Intercollegiate 1
ADJ
Ski 1
NOUN
Explains 1
VERB
bloodstream 4
NOUN
intima 3
NOUN
potboiler 2
NOUN
shoestring 1
NOUN
platinum 4
NOUN
sun-suit 1
NOUN
post-census 1
ADJ
1,083,000 2
NUM
1,525,000 1
NUM
nos. 1
NOUN
concertos 4
NOUN
Calvinist 1
ADJ
tragedians 1
NOUN
contrarieties 1
NOUN
rest-room 1
NOUN
hulking 2
VERB
artifice 1
NOUN
hackwork 1
NOUN
Assessors 1
NOUN
birdied 2
VERB
birdies 2
NOUN
pars 2
NOUN
2500 1
NUM
interpeople 1
ADJ
evoking 1
VERB
Coosa 1
NOUN
fruition 2
NOUN
Selective 1
ADJ
kinesthetically 1
ADV
spatial 7
ADJ
obligingly 1
ADV
vine-shaded 1
ADJ
rippling 2
VERB
Multiple 2
ADJ
routings 1
NOUN
comportment 1
NOUN
inattentive 1
ADJ
five-volume 1
ADJ
1906 3
NUM
photocathodes 2
NOUN
S-11 1
ADJ
NOUN
S-20 2
NOUN
sideboard 1
NOUN
Nevada 5
NOUN
Luther's 2
NOUN
Hymn 2
NOUN
Fortress 1
NOUN
Mo

grease 7
NOUN
Nae 1
ADV
ye 7
PRON
countin' 3
VERB
weariness 2
NOUN
disrespect 2
NOUN
Drew 4
NOUN
kidnapped 1
VERB
teen-age 3
ADJ
tramped 1
VERB
fog-enshrouded 1
ADJ
mustn't 4
VERB
Schramm 1
NOUN
Rottger 1
NOUN
fluorescein 1
NOUN
isocyanate-labeled 1
ADJ
manhours 3
NOUN
handiest 1
ADJ
689-page 1
ADJ
refreshed 3
VERB
spellbound 1
VERB
ultrasonically 1
ADV
distinctively 2
ADV
Mercury 2
NOUN
easement 1
NOUN
hurl 3
VERB
journey's 2
NOUN
Facts 1
NOUN
K's 1
NOUN
halfways 1
ADV
blackberry 1
NOUN
purple 6
ADJ
intercollegiate 2
ADJ
rotate 1
VERB
intensifying 2
VERB
eradication 3
NOUN
fast-spreading 1
ADJ
Constantin 1
NOUN
Philippe 1
NOUN
squalid 1
ADJ
semi-circle 1
NOUN
elapsed 5
VERB
snobbishly 1
ADV
time-temperature 1
NOUN
equivalence 2
NOUN
Peak 1
NOUN
Cypress 1
NOUN
waterskiing 1
VERB
incompetents 2
NOUN
meditating 2
VERB
variant 3
NOUN
cleric 1
NOUN
woulda 1
VERB
relaxing 3
VERB
non-Jew 1
NOUN
self-awareness 1
NOUN
sculptured 4
VERB
await 7
VERB
unpleasantly 1
ADV
embargo 2
NOUN
peanut 4
NO

NOUN
trapping 1
VERB
accelerate 5
VERB
positivism 2
NOUN
democratize 3
VERB
parliament 3
NOUN
judiciary 1
NOUN
Michaelson 1
NOUN
remanding 1
VERB
$9.2 1
NOUN
$10.1 1
NOUN
truer 2
ADJ
displace 2
VERB
Latter 1
ADJ
Chef 1
NOUN
Yokel 1
NOUN
specialize 2
VERB
shrimp 1
NOUN
comprises 2
VERB
19,000,000 1
NUM
$20,000,000,000 1
NOUN
ninety-nine 1
NUM
DiSimone 1
NOUN
Tolek 1
NOUN
Alterman 1
NOUN
swamps 2
NOUN
irrigating 1
VERB
ripple 3
NOUN
jaws 9
NOUN
Ion 1
NOUN
interplanetary 1
ADJ
Comes 2
VERB
Waterways 1
NOUN
Colfax 1
NOUN
122 2
NUM
docketed 1
VERB
adjourned 1
VERB
role-experimentation 1
NOUN
infantryman 1
NOUN
Electoral 1
ADJ
pith 1
NOUN
joiner 2
NOUN
tyrant 2
NOUN
madman 2
NOUN
aspires 1
VERB
cross-striations 2
NOUN
myofibrillae 2
NOUN
Reno 5
NOUN
Ablard 1
NOUN
Corne 1
NOUN
pouch 2
NOUN
purified 7
VERB
Stoll 1
NOUN
corrode 1
VERB
Increases 1
NOUN
VERB
diarrhea 5
NOUN
synthesized 2
VERB
Wilkes-Barre 1
NOUN
Perfect 3
ADJ
Huntley 5
NOUN
robberies 3
NOUN
holdup 2
NOUN
intrigued 2
VERB
counter-

Taoism 2
NOUN
tamp 1
VERB
Comfortably 1
ADV
pseudo-capitalism 1
NOUN
estimating 2
VERB
rollicking 1
VERB
zoologist 1
NOUN
parachute 1
NOUN
Starkey 1
NOUN
underwear 3
NOUN
non-intellectual 1
ADJ
phenonenon 1
NOUN
key-punched 1
VERB
punch 3
NOUN
beasties 1
NOUN
ably 2
ADV
Tito 1
NOUN
prepositional 3
ADJ
complements 2
NOUN
C'est 1
X
Interruptions 1
NOUN
rebuked 1
VERB
Compared 2
VERB
spewings 1
NOUN
glacier 1
NOUN
monsters 3
NOUN
granite 2
NOUN
frowzy 1
ADJ
jabberings 1
NOUN
outsider 2
NOUN
hiring 3
VERB
acclimatized 1
VERB
Dick's 2
NOUN
Mortar 2
NOUN
Sharing 1
VERB
French-Canadian 2
ADJ
Verreau 1
NOUN
sharpest 1
ADJ
Core 1
NOUN
decelerate 1
VERB
breadth 4
NOUN
bijouterie 1
X
Boggs 1
NOUN
Longwood 1
NOUN
Kennett 1
NOUN
Del. 1
NOUN
Constitutional 1
ADJ
Connally 7
NOUN
Amendments 3
NOUN
compactly 1
ADV
expounding 1
VERB
tangibly 1
ADV
monogamous 1
ADJ
rectilinear 1
ADJ
Manny 2
NOUN
ultrasonic 7
ADJ
dimensioning 2
VERB
dyeing 1
VERB
waxing 1
VERB
gassing 1
VERB
goodness' 1
NOUN
Paulus 1
NOUN

Vere 1
NOUN
Pembroke 2
NOUN
Lusignan 1
NOUN
Carnarvon's 1
NOUN
Piers 1
NOUN
Gaveston 1
NOUN
Pontissara 1
NOUN
Hotham 2
NOUN
jocund 1
ADJ
archbishop 2
NOUN
Nurse 1
NOUN
Hidden 1
VERB
reinstitution 1
NOUN
Writes 1
VERB
regrets 1
VERB
NOUN
169 2
NUM
semisecret 1
ADJ
Hemingway 1
NOUN
Looky 1
VERB
re-thinking 1
VERB
ipso 1
X
Dramatic 1
ADJ
Mutants 1
NOUN
unsteadily 1
ADV
oddly 6
ADV
Vroman 1
NOUN
Manzanola 1
NOUN
Plaza 1
NOUN
Dip 1
VERB
proceeds 5
VERB
NOUN
chirped 1
VERB
chattered 3
VERB
dewy-eyed 1
ADJ
caressed 2
VERB
Fraction 3
NOUN
$325 1
NOUN
$65 2
NOUN
seers 1
NOUN
lactating 1
VERB
ruddy 2
ADJ
unlined 2
ADJ
Opening 1
VERB
schnapps 1
NOUN
thimble-sized 1
ADJ
disturbingly 1
ADV
Hollywood's 4
NOUN
foodstuffs 1
NOUN
yeasts 1
NOUN
side-conclusions 1
NOUN
prosecute 2
VERB
vaguely-imagined 1
ADJ
Soviets' 1
NOUN
2:33.2 1
NUM
incubi 1
NOUN
self-indulgence 3
NOUN
cooked-over 1
ADJ
oatmeal 1
NOUN
isolating 4
VERB
one-sixth 1
NOUN
steely 1
ADJ
Lewellyn 1
NOUN
Lundeen 1
NOUN
booster 1
NOUN
64-13 1

irreducible 1
ADJ
integers 1
NOUN
inflected 2
VERB
acidulous 1
ADJ
Improved 1
VERB
requisition 1
NOUN
invalids 1
NOUN
lamming 1
VERB
Situs 1
NOUN
spooky 2
ADJ
Recounting 1
VERB
freezes 1
VERB
would-be 4
ADJ
pillage 2
VERB
embraced 3
VERB
iodoprotein 1
NOUN
cell-free 3
ADJ
Cennino 1
NOUN
Cennini 1
NOUN
byword 1
NOUN
durin' 1
ADP
magnifies 1
VERB
disagreeable 1
ADJ
ring-around-the-rosie 1
NOUN
ninety-five 1
NUM
remarried 1
VERB
Cavallinis 1
NOUN
Cecilia 3
NOUN
Retracing 1
VERB
Sultan 2
NOUN
Ahmet 2
NOUN
minarets 2
NOUN
apologize 1
VERB
forcibly 2
ADV
light-colored 2
ADJ
waived 1
VERB
shootin' 1
VERB
maturities 1
NOUN
resolve 7
VERB
stepchild 1
NOUN
Meet 2
VERB
blazing 5
VERB
varicolored 1
ADJ
leopards 1
NOUN
tipping 1
VERB
Borromini's 2
NOUN
four-year 2
ADJ
expire 1
VERB
Bobbie's 2
NOUN
Funk 2
NOUN
Furnaces 1
NOUN
hormones 2
NOUN
amino 1
NOUN
acids 5
NOUN
gliders 1
NOUN
important-looking 1
ADJ
loafed 1
VERB
blistered 1
VERB
curbs 3
NOUN
therapies 1
NOUN
Afraid 2
ADJ
Theory 1
NOUN
Percept

defying 2
VERB
calmness 2
NOUN
afar 2
ADV
second-class 1
NOUN
horsemanship 3
NOUN
tapes 3
NOUN
10:30 2
NUM
summation 3
NOUN
antics 2
NOUN
microphone 3
NOUN
dauntless 1
ADJ
refocusing 1
NOUN
civil-rights 1
NOUN
Selections 1
NOUN
duets 1
NOUN
Griffin 4
NOUN
Trenchard 1
NOUN
gritty-eyed 1
ADJ
deprive 3
VERB
role-experiment 1
VERB
Aggregate 1
ADJ
Regulative 1
ADJ
blackmailer 2
NOUN
enormously 9
ADV
Tossing 1
VERB
Liz 1
NOUN
Peabody 1
NOUN
grieving 2
VERB
cheerfulness 1
NOUN
nun 2
NOUN
jostle 1
VERB
512 2
NUM
Dietetic 1
ADJ
facsimile 1
NOUN
fiercest 1
ADJ
pupated 1
VERB
self-restraint 1
NOUN
Pets 1
NOUN
skylight 1
NOUN
Diets 1
NOUN
interplay 4
NOUN
gibe 1
NOUN
Presbyterianism 1
NOUN
Harley's 1
NOUN
Bolingbroke's 1
NOUN
impiety 1
NOUN
Draper 3
NOUN
Palo 1
NOUN
Alto 1
NOUN
Gaither 1
NOUN
Reprints 1
NOUN
Large-package 1
NOUN
Impartiality 1
NOUN
eclipsing 1
VERB
Forget 2
VERB
Smith's 3
NOUN
entrepreneur 6
NOUN
Flip 1
NOUN
Fireside 1
NOUN
Steak 2
NOUN
Ranch 1
NOUN
soul's 2
NOUN
Desprez 1
NOUN
Co

jerky 2
ADJ
primeval 4
ADJ
clams 1
NOUN
seaweed 2
NOUN
wells 4
NOUN
implacable 1
ADJ
hurting 3
VERB
glamor 3
NOUN
gals 1
NOUN
percussion 2
NOUN
34.7 1
NUM
rents 3
NOUN
royalties 2
NOUN
subtract 2
VERB
clinches 1
NOUN
beep 4
NOUN
Pirate 1
NOUN
Murtaugh 3
NOUN
Vern 1
NOUN
Wednesday's 1
NOUN
pompous 3
ADJ
barrage 2
NOUN
Companies 1
NOUN
peeked 2
VERB
broken-nosed 1
ADJ
wrestler's 1
NOUN
caretaker 1
NOUN
Beaujolais 1
NOUN
Alsop 3
NOUN
whiff 1
NOUN
Lilac 3
NOUN
Fairy 1
NOUN
Shade 1
NOUN
uneconomical 2
ADJ
suburbanized 1
VERB
studios 2
NOUN
April-June 1
NOUN
lumen 1
NOUN
Wildcat 1
NOUN
Unsinkable 4
ADJ
originals 1
NOUN
jaunty 1
ADJ
charmingly 1
ADV
Irma 1
NOUN
Douce 1
X
second-look 1
NOUN
Estimate 3
NOUN
lowers 1
NOUN
striding 1
VERB
crackling 1
NOUN
information-seeking 1
ADJ
Fergeson 1
NOUN
Tenn. 1
NOUN
Humor 1
NOUN
Resident 1
ADJ
Diseases 2
NOUN
Histochemistry 2
NOUN
Orthopedic 1
ADJ
Forensic 3
ADJ
Elements 3
NOUN
permeate 1
VERB
unrecoverable 1
ADJ
warmer 2
ADJ
off-color 1
ADJ
Bennington 

Southland 1
NOUN
preside 2
VERB
Weird 1
ADJ
Sisters 1
NOUN
stitch 3
NOUN
shuld 3
VERB
effecte 1
VERB
wynne 1
VERB
sworde 1
NOUN
Achilles 1
NOUN
Siegfried 1
NOUN
Nibelungenlied 1
NOUN
swiftest 1
ADJ
swift-footed 1
ADJ
whining 4
VERB
god-like 1
ADJ
duffers 1
NOUN
reactivated 1
VERB
Extruded 1
VERB
price-wise 1
ADV
kaleidoscope 1
NOUN
tycoon 1
NOUN
Anton 2
NOUN
Giuseppe 1
NOUN
Sammartini 1
NOUN
Comenico 1
NOUN
Dragonetti 1
NOUN
Janitsch 1
NOUN
instrumentalists 3
NOUN
Anabel 1
NOUN
Brieff 1
NOUN
flutist 1
NOUN
Josef 2
NOUN
oboist 1
NOUN
harpsichordist 1
NOUN
errand 4
NOUN
Howell 1
NOUN
Carol 1
NOUN
Lorlyn 1
NOUN
Zurcher 2
NOUN
arraigned 2
VERB
goddammit 1
PRT
courtliness 1
NOUN
take-up 3
ADJ
thermoplastic 1
NOUN
ruinous 1
ADJ
Matamoras 1
NOUN
Technical 5
ADJ
secondhand 1
NOUN
Fifty-three 1
NUM
co-signers 1
NOUN
smoothing 1
VERB
thankfulness 2
NOUN
moralistic 1
ADJ
drowning 4
VERB
sorrows 2
NOUN
untimely 1
ADJ
Toys 1
NOUN
676 1
NUM
parent's 1
NOUN
hardwoods 1
NOUN
birches 1
NOUN
whisky-on-t

ADJ
Naomi 2
NOUN
Boaz 2
NOUN
rhyming 1
VERB
sin-ned 1
VERB
Infrequently 1
ADV
whodunnit 1
NOUN
insinuation 2
NOUN
teakettle 1
NOUN
nostril 1
NOUN
Culbertson 1
NOUN
Steeves 1
NOUN
Piersee 1
NOUN
W.M. 1
NOUN
Sexton 1
NOUN
Heitschmidt 1
NOUN
big-business 1
NOUN
Babbitt 2
NOUN
bootleggers 2
NOUN
translator 1
NOUN
Choctaw 1
ADJ
Telefunken 1
NOUN
bargain-priced 1
ADJ
$2.98 1
NOUN
Dvorak 1
NOUN
Canteloube 1
NOUN
Copland 2
NOUN
Britten 2
NOUN
South-East 1
ADJ
possemen 1
NOUN
Day's 1
NOUN
Telling 1
VERB
Practice 1
NOUN
deadweight 1
NOUN
constraint 2
NOUN
constrictions 1
NOUN
thatched-roof 1
NOUN
fortunes 5
NOUN
Dragons 1
NOUN
sidelight 1
NOUN
deferments 1
NOUN
quipping 1
VERB
amulets 1
NOUN
ripening 2
VERB
bellicosity 1
NOUN
throes 1
NOUN
land-locked 1
ADJ
Reporters 2
NOUN
linoleum 1
NOUN
24% 2
NOUN
standardizing 1
VERB
build-up 4
NOUN
tattle-tale 1
NOUN
laminated 1
VERB
mustached 2
ADJ
rudely 2
ADV
life-contracts 1
NOUN
Hutton 2
NOUN
Gorshin 1
NOUN
Explosion 1
NOUN
Directionality 1
NOUN
Recogn

floppy 1
ADJ
shredded 1
VERB
joyfully 1
ADV
Mauve-colored 1
ADJ
subservient 1
ADJ
Reflecting 1
VERB
high-end 1
NOUN
$1.7 1
NOUN
Eighty-Four 2
NUM
Orwell 2
NOUN
dystopia 2
NOUN
Anniversary 3
NOUN
fourteen-team 1
ADJ
home-and-home 1
ADJ
grille-route 1
NOUN
walk-to 1
ADJ
lipstick 2
NOUN
Oneupmanship 1
NOUN
transcendant 1
ADJ
nightingale 2
NOUN
discontinuous 3
ADJ
sterilized 1
VERB
detachable 1
ADJ
Sparling 1
NOUN
DeGroot 1
NOUN
Ringel 1
NOUN
Locked 2
VERB
Bromley 1
NOUN
Fortescue 2
NOUN
townsmen 1
NOUN
$135 3
NOUN
Coltsman 1
NOUN
imperceptible 1
ADJ
amici 2
X
curiae 2
X
non-forthcoming 1
ADJ
bright-looking 1
ADJ
black-eyed 1
ADJ
Boxford 1
NOUN
indexing 1
NOUN
VERB
dissemination 1
NOUN
bibliographies 1
NOUN
Burlington's 1
NOUN
easements 2
NOUN
Balaguer's 1
NOUN
dumbbells 1
NOUN
543 1
NUM
horrifying 3
ADJ
thinning 1
NOUN
VERB
Newburger 1
NOUN
dogtrot 1
NOUN
pepping 1
VERB
dissenter 1
NOUN
decays 1
VERB
rotogravures 1
NOUN
Couple 1
NOUN
24-degrees 1
NOUN
synchrony 2
NOUN
EEG 2
NOUN
asynchron

ADJ
densities 2
NOUN
metamorphosis 1
NOUN
adipic 1
ADJ
contaminate 1
VERB
Britons 1
NOUN
chestnuts 1
NOUN
candid 2
ADJ
fatigued 1
VERB
Pittsboro 1
NOUN
Shucks 1
PRT
Cause 1
NOUN
pug-nosed 1
ADJ
Jean-Pierre 1
NOUN
Bogartian 1
ADJ
sadistic 1
ADJ
amoral 2
ADJ
Tasti-Freeze 1
NOUN
New-Waver 1
NOUN
Fredrico 1
NOUN
Rossilini's 1
NOUN
Sour 1
ADJ
Sponge 1
NOUN
Git 1
VERB
th' 1
DET
Cohn 3
NOUN
kinda 4
ADV
leased 1
VERB
decrement 1
NOUN
Whelan 1
NOUN
injected 1
VERB
endow 2
VERB
settles 2
VERB
launch-control 1
NOUN
sidearms 1
NOUN
manning 1
VERB
pistol-packing 1
ADJ
disunity 3
NOUN
Kalmuk 1
NOUN
juggling 1
VERB
Tsvetkov 1
NOUN
Platter 1
NOUN
betrothal 1
NOUN
Summer 2
NOUN
Shepherds 1
NOUN
Azerbaijan 1
NOUN
Yamata 1
NOUN
sushi 1
X
Nostalgic 1
ADJ
Erskine 1
NOUN
Caldwell 2
NOUN
Georgians 1
NOUN
four-lane 1
ADJ
Dow-Jones 1
NOUN
Honeysuckle 1
NOUN
Bricktop 1
NOUN
Djangology 2
NOUN
rankles 1
VERB
stubbornness 1
NOUN
steel-edged 1
ADJ
carved-out-of-solid 1
ADJ
pokerfaced 1
ADJ
pseudo-thinking 2
NOUN
ps

NOUN
Former 2
ADJ
Attlee 1
NOUN
unachieved 1
ADJ
leaflet 1
NOUN
cooped 2
VERB
crashes 1
NOUN
Trimmer 1
NOUN
paternally 1
ADV
Mementoes 1
NOUN
Earp 1
NOUN
Librarians 1
NOUN
bog 1
VERB
3-48 1
NUM
non-linear 1
ADJ
narcotic 1
ADJ
NOUN
rivalries 3
NOUN
diameters 2
NOUN
unheated 2
ADJ
Tallchief 1
NOUN
Bruhn 1
NOUN
Petipa-Minkus 1
NOUN
showpiece 1
NOUN
handsomely 1
ADV
exhaustive 2
ADJ
bull-like 1
ADJ
sniper 1
NOUN
psychopathic 2
ADJ
Europeanized 1
VERB
taxpaying 1
ADJ
$2.09 1
NOUN
second-half 2
NOUN
56-yard 1
ADJ
unsophisticated 1
ADJ
$45 3
NOUN
haltingly 2
ADV
erupts 1
VERB
middles 1
NOUN
laughingly 1
ADV
Preston 2
NOUN
Kepler 1
NOUN
Bryn 1
NOUN
Mawr 1
NOUN
Serving 1
VERB
Mmes 2
NOUN
Sweazey 1
NOUN
Begley 1
NOUN
high-resolution 1
NOUN
municipally 1
ADV
boatyards 1
NOUN
industriously 1
ADV
logarithms 1
NOUN
1665 1
NUM
trapezoid 1
NOUN
Icelandic 1
NOUN
belched 2
VERB
Daytime 1
NOUN
Skywave 1
NOUN
1938-1939 1
NUM
allocations 2
NOUN
drunker 1
ADJ
framer 1
NOUN
Relentlessly 1
ADV
inhaling 1
VERB

Throw 1
NOUN
VERB
rebellions 1
NOUN
Targo 3
NOUN
wing-shooting 1
NOUN
Accident 1
NOUN
plain-clothesmen 2
NOUN
Sprague 2
NOUN
pesticides 1
NOUN
printable 1
ADJ
rootless 1
ADJ
girders 1
NOUN
pillows 1
NOUN
chambre's 1
X
Delaney 2
NOUN
O'Neill 2
NOUN
centimeter 2
NOUN
1-degree 1
NOUN
Assassination 1
NOUN
Mahler 1
NOUN
polishes 1
NOUN
critter 1
NOUN
Gouldings 1
NOUN
subscribing 1
VERB
Nan 1
NOUN
money-minded 1
ADJ
Floradora 1
NOUN
hansom 1
NOUN
two-run 1
ADJ
Athletics 2
NOUN
Writer 1
NOUN
flute 1
NOUN
cavort 1
VERB
K.C. 1
NOUN
cloth-of-gold 1
NOUN
155-yarder 1
NOUN
fenugreek 1
NOUN
cardamom 1
NOUN
deficits 1
NOUN
Lieutenant-Governor 1
NOUN
aspen 1
NOUN
panoramas 1
NOUN
Po 1
NOUN
hand-screened 1
ADJ
bookcases 1
NOUN
overdue 1
ADJ
alimony 1
NOUN
Collector 1
NOUN
exemptions 1
NOUN
stiffens 1
VERB
neighbours 1
NOUN
honeymooning 1
VERB
what's-his-name 1
NOUN
lounging 3
VERB
unalloyed 1
ADJ
bliss 2
NOUN
yokels 1
NOUN
dweller 2
NOUN
Wingman 1
NOUN
tablet 2
NOUN
toto 1
NOUN
Times-Picayune 1
NOUN
P

Stephen's 1
NOUN
Coconut 1
NOUN
conflagration 1
NOUN
forty-niners 1
NOUN
Handlers' 1
NOUN
Ass'ns' 1
NOUN
founder-originator 1
NOUN
latches 1
VERB
Rakestraw 1
NOUN
inclement 1
ADJ
Spa 1
NOUN
grizzled 1
ADJ
diplomat's 1
NOUN
Patriot 2
NOUN
T'ai-Shan 1
NOUN
Shantung 1
NOUN
mid-Victorian 1
ADJ
Flats 1
NOUN
Hockaday 1
NOUN
left-of-center 1
ADJ
mother-naked 1
ADJ
climaxed 1
VERB
unco-operative 1
ADJ
boyars 1
NOUN
irrigate 1
VERB
Baptiste 1
NOUN
Reinhardt 1
NOUN
Collins' 1
NOUN
Goodbody 2
NOUN
$15.5 1
NOUN
Consultation 1
NOUN
emasculated 1
VERB
nebula 1
NOUN
Divide 2
VERB
Longstreet 1
NOUN
subtler 1
ADJ
motion-picture 1
NOUN
ballyhooey 1
NOUN
presto 1
PRT
Valois 1
NOUN
Sicilian 3
ADJ
executor 1
NOUN
decedent 1
NOUN
leftfield 1
NOUN
Paschal 2
NOUN
Shartzer's 1
NOUN
combat-tested 1
ADJ
monsoon-shrouded 1
ADJ
road-shy 1
ADJ
guerrilla-th'-wisp 1
ADJ
savored 2
VERB
gangster 1
NOUN
Erwin 1
NOUN
Fife 1
NOUN
Marylanders 2
NOUN
enriching 1
VERB
gagged 1
VERB
pastor's 2
NOUN
caricature 1
VERB
NOUN
flow

Omega 1
NOUN
Sigma 1
NOUN
Grahamstown 1
NOUN
cost-billing 1
NOUN
24-sheet 1
ADJ
turnery 1
NOUN
hodgepodge 1
NOUN
cobwebs 1
NOUN
Taruffi 1
NOUN
two-lane 1
ADJ
165 2
NUM
algebraic 1
ADJ
riddles 2
NOUN
koan 1
X
Elks 1
NOUN
142 1
NUM
511 1
NUM
11-month-old 1
ADJ
10.6 1
NUM
Middle-Eastern 1
ADJ
tannin 1
NOUN
implausibly 1
ADV
hilltops 1
NOUN
caper 1
NOUN
maliciously 1
ADV
Lorena 1
NOUN
Gallon 1
NOUN
Gallon-Loren 1
NOUN
distresses 1
NOUN
shuns 1
VERB
unsuspecting 1
ADJ
needle-sharp 1
ADJ
fledglings 1
NOUN
astute 1
ADJ
authorizes 1
VERB
Hazards 1
NOUN
actuarially 1
ADV
mayhem 1
NOUN
Kromy 1
NOUN
sequel 1
NOUN
alias 1
NOUN
astronaut 1
NOUN
intrepid 1
ADJ
Varner 1
NOUN
Motion 1
NOUN
five-and-a-half 1
NUM
1907 1
NUM
1,107 1
NUM
conserving 1
VERB
Egalitarianism 1
NOUN
Popularism 1
NOUN
guests' 1
NOUN
dissenters 1
NOUN
fungus 1
NOUN
growths 1
NOUN
Derails 1
NOUN
efficacious 2
ADJ
Fought 1
VERB
Emcee 1
NOUN
Nixon's 1
NOUN
Anselmo 1
NOUN
Colzani 1
NOUN
hecatomb 1
NOUN
100-million-lb. 1
ADJ
shippers 

MacArthur 1
NOUN
32,589 1
NUM
pin-curl 1
NOUN
self-locking 1
ADJ
leathery 1
ADJ
topnotch 1
NOUN
cashed 1
VERB
convict's 1
NOUN
shrinks 1
VERB
Kelseyville 1
NOUN
Staunton 1
NOUN
$58,918 1
NOUN
$66,000 1
NOUN
$7,082 1
NOUN
judicious 1
ADJ
Registry 2
NOUN
impurity-doped 1
ADJ
Jenni 1
NOUN
DePaul 1
NOUN
Segovia's 1
NOUN
underwriting 1
VERB
swath 1
NOUN
asteroid 1
NOUN
remnants 1
NOUN
wounding 1
VERB
disbelieving 1
VERB
Burgundies 1
NOUN
Dissect 1
VERB
alternation 2
NOUN
Dronk's 1
NOUN
Baum 2
NOUN
untracked 1
ADJ
lords 1
NOUN
importunities 1
NOUN
55,987 1
NUM
Looked 1
VERB
Tractarians 1
NOUN
800,000 1
NUM
revels 1
NOUN
swanky 1
ADJ
writer-turned-painter 1
NOUN
Calm 1
ADJ
Henrik 1
NOUN
Kauffmann 1
NOUN
chlorothiazide 1
NOUN
edema 1
NOUN
endeavored 1
VERB
trade-preparatory 1
ADJ
Governors 2
NOUN
fallacy 1
NOUN
sponging 1
NOUN
corduroys 1
NOUN
Morphophonemic 1
ADJ
lowlands 1
NOUN
314 1
NUM
buckboard 1
NOUN
catapulting 1
VERB
recklessly 1
ADV
extremis 1
X
enforcers 1
NOUN
puberty 1
NOUN
0.25 1


NOUN
necklace 1
NOUN
6-foot-10 1
ADJ
brainwashing 1
NOUN
taut-nerved 1
ADJ
Constable's 1
NOUN
uninvited 1
ADJ
bonnet 1
NOUN
kerchief 1
NOUN
doc 1
NOUN
tip-toe 1
NOUN
Naturam 1
X
Pati 1
X
Senium 1
X
Idea 1
X
Platonica 1
X
sparkled 1
VERB
Quotations 1
NOUN
free-world 1
NOUN
marginally 1
ADV
epigrammatic 1
ADJ
utopianism 1
NOUN
protectively 1
ADV
elaboration 1
NOUN
creativity-oriented 1
ADJ
aspirant 2
NOUN
resigns 1
VERB
Westwood 1
NOUN
medicinal 1
ADJ
alkaloids 1
NOUN
Aroused 1
VERB
1846 1
NUM
Goodwin 1
NOUN
jeweler 1
NOUN
Equinox 1
NOUN
intraepithelial 1
ADJ
situ 1
X
ebb 1
VERB
offensively 1
ADV
Prompted 1
VERB
tenspot 1
NOUN
motioned 1
VERB
vacationland 1
NOUN
writing-like 1
ADJ
hasher 1
NOUN
hand-written 1
ADJ
Cate's 1
NOUN
squadron 1
NOUN
irrespective 1
ADJ
ivy 1
NOUN
green-scaled 1
ADJ
basileis 1
X
adulterated 1
VERB
peach 2
NOUN
plum 1
NOUN
life-size 1
NOUN
draper 1
NOUN
contiguous 1
ADJ
sambur 1
NOUN
elk 1
NOUN
astuteness 1
NOUN
nobles 1
NOUN
62-63 1
NUM
Redundant 1
ADJ
Archbishop

In [11]:
# Create a lookup table mfc_table where mfc_table[word] contains the tag label most frequently assigned to that word
from collections import namedtuple

FakeState = namedtuple("FakeState", "name")

class MFCTagger:
    # NOTE: You should not need to modify this class or any of its methods
    missing = FakeState(name="<MISSING>")
    
    def __init__(self, table):
        self.table = defaultdict(lambda: MFCTagger.missing)
        self.table.update({word: FakeState(name=tag) for word, tag in table.items()})
        
    def viterbi(self, seq):
        """This method simplifies predictions by matching the Pomegranate viterbi() interface"""
        return 0., list(enumerate(["<start>"] + [self.table[w] for w in seq] + ["<end>"]))


# TODO: calculate the frequency of each tag being assigned to each word (hint: similar, but not
# the same as the emission probabilities) and use it to fill the mfc_table

word_counts = pair_counts(word_seq,tag_seq)

mfc_table = {}
for key,value in word_counts.items():
    max_item = max(value.values())
    for pos,count in value.items():
        if count == max_item:
            mfc_table[key] = pos


# DO NOT MODIFY BELOW THIS LINE
mfc_model = MFCTagger(mfc_table) # Create a Most Frequent Class tagger instance

assert len(mfc_table) == len(data.training_set.vocab), ""
assert all(k in data.training_set.vocab for k in mfc_table.keys()), ""
assert sum(int(k not in mfc_table) for k in data.testing_set.vocab) == 5521, ""
HTML('<div class="alert alert-block alert-success">Your MFC tagger has all the correct words!</div>')

In [12]:
word_counts

defaultdict(<function __main__.pair_counts.<locals>.<lambda>()>,
            {'Whenever': defaultdict(int, {'ADV': 12}),
             'artists': defaultdict(int, {'NOUN': 34}),
             ',': defaultdict(int, {'.': 46499, 'X': 1}),
             'indeed': defaultdict(int, {'ADV': 92}),
             'turned': defaultdict(int, {'VERB': 264}),
             'to': defaultdict(int,
                         {'ADP': 8809,
                          'PRT': 11784,
                          'ADV': 1,
                          'NOUN': 1,
                          'X': 2}),
             'actual': defaultdict(int, {'ADJ': 77}),
             'representations': defaultdict(int, {'NOUN': 7}),
             'or': defaultdict(int, {'CONJ': 3218, 'X': 1}),
             'molded': defaultdict(int, {'VERB': 12}),
             'three-dimensional': defaultdict(int, {'ADJ': 8}),
             'figures': defaultdict(int, {'NOUN': 69, 'VERB': 2}),
             'which': defaultdict(int, {'DET': 2844, 'X': 1}),
    

### Making Predictions with a Model
The helper functions provided below interface with Pomegranate network models & the mocked MFCTagger to take advantage of the [missing value](http://pomegranate.readthedocs.io/en/latest/nan.html) functionality in Pomegranate through a simple sequence decoding function. Run these functions, then run the next cell to see some of the predictions made by the MFC tagger.

In [13]:
def replace_unknown(sequence):
    """Return a copy of the input sequence where each unknown word is replaced
    by the literal string value 'nan'. Pomegranate will ignore these values
    during computation.
    """
    return [w if w in data.training_set.vocab else 'nan' for w in sequence]

def simplify_decoding(X, model):
    """X should be a 1-D sequence of observations for the model to predict"""
    _, state_path = model.viterbi(replace_unknown(X))
    return [state[1].name for state in state_path[1:-1]]  # do not show the start/end state predictions

### Example Decoding Sequences with MFC Tagger

In [14]:
for key in data.testing_set.keys[:3]:
    print("Sentence Key: {}\n".format(key))
    print("Predicted labels:\n-----------------")
    print(simplify_decoding(data.sentences[key].words, mfc_model))
    print()
    print("Actual labels:\n--------------")
    print(data.sentences[key].tags)
    print("\n")

Sentence Key: b100-28144

Predicted labels:
-----------------
['CONJ', 'NOUN', 'NUM', '.', 'NOUN', 'NUM', '.', 'NOUN', 'NUM', '.', 'CONJ', 'NOUN', 'NUM', '.', '.', 'NOUN', '.', '.']

Actual labels:
--------------
('CONJ', 'NOUN', 'NUM', '.', 'NOUN', 'NUM', '.', 'NOUN', 'NUM', '.', 'CONJ', 'NOUN', 'NUM', '.', '.', 'NOUN', '.', '.')


Sentence Key: b100-23146

Predicted labels:
-----------------
['PRON', 'VERB', 'DET', 'NOUN', 'ADP', 'ADJ', 'ADJ', 'NOUN', 'VERB', 'VERB', '.', 'ADP', 'VERB', 'DET', 'NOUN', 'ADP', 'NOUN', 'ADP', 'DET', 'NOUN', '.']

Actual labels:
--------------
('PRON', 'VERB', 'DET', 'NOUN', 'ADP', 'ADJ', 'ADJ', 'NOUN', 'VERB', 'VERB', '.', 'ADP', 'VERB', 'DET', 'NOUN', 'ADP', 'NOUN', 'ADP', 'DET', 'NOUN', '.')


Sentence Key: b100-35462

Predicted labels:
-----------------
['DET', 'ADJ', 'NOUN', 'VERB', 'VERB', 'VERB', 'ADP', 'DET', 'ADJ', 'ADJ', 'NOUN', 'ADP', 'DET', 'ADJ', 'NOUN', '.', 'ADP', 'ADJ', 'NOUN', '.', 'CONJ', 'ADP', 'DET', '<MISSING>', 'ADP', 'ADJ', 'ADJ', 

In [15]:
data.testing_set.keys[:4]

['b100-28144', 'b100-23146', 'b100-35462', 'b100-37008']

### Evaluating Model Accuracy

The function below will evaluate the accuracy of the MFC tagger on the collection of all sentences from a text corpus. 

In [16]:
def accuracy(X, Y, model):
    """Calculate the prediction accuracy by using the model to decode each sequence
    in the input X and comparing the prediction with the true labels in Y.
    
    The X should be an array whose first dimension is the number of sentences to test,
    and each element of the array should be an iterable of the words in the sequence.
    The arrays X and Y should have the exact same shape.
    
    X = [("See", "Spot", "run"), ("Run", "Spot", "run", "fast"), ...]
    Y = [(), (), ...]
    """
    correct = total_predictions = 0
    for observations, actual_tags in zip(X, Y):
        
        # The model.viterbi call in simplify_decoding will return None if the HMM
        # raises an error (for example, if a test sentence contains a word that
        # is out of vocabulary for the training set). Any exception counts the
        # full sentence as an error (which makes this a conservative estimate).
        try:
            most_likely_tags = simplify_decoding(observations, model)
            correct += sum(p == t for p, t in zip(most_likely_tags, actual_tags))
        except:
            pass
        total_predictions += len(observations)
    return correct / total_predictions

#### Evaluate the accuracy of the MFC tagger
Run the next cell to evaluate the accuracy of the tagger on the training and test corpus.

In [17]:
mfc_training_acc = accuracy(data.training_set.X, data.training_set.Y, mfc_model)
print("training accuracy mfc_model: {:.2f}%".format(100 * mfc_training_acc))

mfc_testing_acc = accuracy(data.testing_set.X, data.testing_set.Y, mfc_model)
print("testing accuracy mfc_model: {:.2f}%".format(100 * mfc_testing_acc))

assert mfc_training_acc >= 0.955, "Uh oh. Your MFC accuracy on the training set doesn't look right."
assert mfc_testing_acc >= 0.925, "Uh oh. Your MFC accuracy on the testing set doesn't look right."
HTML('<div class="alert alert-block alert-success">Your MFC tagger accuracy looks correct!</div>')

training accuracy mfc_model: 95.72%
testing accuracy mfc_model: 93.01%


## Step 3: Build an HMM tagger
---
The HMM tagger has one hidden state for each possible tag, and parameterized by two distributions: the emission probabilties giving the conditional probability of observing a given **word** from each hidden state, and the transition probabilities giving the conditional probability of moving between **tags** during the sequence.

We will also estimate the starting probability distribution (the probability of each **tag** being the first tag in a sequence), and the terminal probability distribution (the probability of each **tag** being the last tag in a sequence).

The maximum likelihood estimate of these distributions can be calculated from the frequency counts as described in the following sections where you'll implement functions to count the frequencies, and finally build the model. The HMM model will make predictions according to the formula:

$$t_i^n = \underset{t_i^n}{\mathrm{argmax}} \prod_{i=1}^n P(w_i|t_i) P(t_i|t_{i-1})$$

Refer to Speech & Language Processing [Chapter 10](https://web.stanford.edu/~jurafsky/slp3/10.pdf) for more information.

### IMPLEMENTATION: Unigram Counts

Complete the function below to estimate the co-occurrence frequency of each symbol over all of the input sequences. The unigram probabilities in our HMM model are estimated from the formula below, where N is the total number of samples in the input. (You only need to compute the counts for now.)

$$P(tag_1) = \frac{C(tag_1)}{N}$$

In [18]:
Counter(tag_seq)

Counter({'ADV': 44877,
         'NOUN': 220632,
         '.': 117757,
         'VERB': 146161,
         'ADP': 115808,
         'ADJ': 66754,
         'CONJ': 30537,
         'DET': 109671,
         'PRT': 23906,
         'NUM': 11878,
         'PRON': 39383,
         'X': 1094})

In [19]:
def unigram_counts(sequences):
    """Return a dictionary keyed to each unique value in the input sequence list that
    counts the number of occurrences of the value in the sequences list. The sequences
    collection should be a 2-dimensional array.
    
    For example, if the tag NOUN appears 275558 times over all the input sequences,
    then you should return a dictionary such that your_unigram_counts[NOUN] == 275558.
    """
    # TODO: Finish this function!
    return Counter(sequences)

# TODO: call unigram_counts with a list of tag sequences from the training set
tags = [tag for i, (word, tag) in enumerate(data.training_set.stream())]
tag_unigrams = unigram_counts(tags)

assert set(tag_unigrams.keys()) == data.training_set.tagset, \
       "Uh oh. It looks like your tag counts doesn't include all the tags!"
assert min(tag_unigrams, key=tag_unigrams.get) == 'X', \
       "Hmmm...'X' is expected to be the least common class"
assert max(tag_unigrams, key=tag_unigrams.get) == 'NOUN', \
       "Hmmm...'NOUN' is expected to be the most common class"
HTML('<div class="alert alert-block alert-success">Your tag unigrams look good!</div>')

### IMPLEMENTATION: Bigram Counts

Complete the function below to estimate the co-occurrence frequency of each pair of symbols in each of the input sequences. These counts are used in the HMM model to estimate the bigram probability of two tags from the frequency counts according to the formula: $$P(tag_2|tag_1) = \frac{C(tag_2|tag_1)}{C(tag_2)}$$


In [20]:
data.training_set.Y[:3]
flat_list=[]
for tags in data.training_set.Y:
    ngrams = zip_longest(*[tags[i:] for i in range(2)])
    flat_list.append(ngrams)
bigram = []
for obj in flat_list:
    for tup in [*obj]:
        bigram.append(tup)
        
Counter(bigram)       

Counter({('ADV', 'NOUN'): 1478,
         ('NOUN', '.'): 62639,
         ('.', 'ADV'): 5124,
         ('ADV', '.'): 7577,
         ('.', 'VERB'): 9041,
         ('VERB', 'ADP'): 24927,
         ('ADP', 'ADJ'): 9533,
         ('ADJ', 'NOUN'): 43664,
         ('NOUN', 'CONJ'): 13185,
         ('CONJ', 'VERB'): 6012,
         ('VERB', 'ADJ'): 8423,
         ('.', 'DET'): 8008,
         ('DET', 'VERB'): 7062,
         ('ADJ', 'PRT'): 1301,
         ('PRT', 'ADP'): 2189,
         ('ADP', 'NUM'): 3467,
         ('NUM', 'NOUN'): 4524,
         ('.', 'PRON'): 5448,
         ('PRON', 'VERB'): 27860,
         ('VERB', 'PRT'): 9556,
         ('PRT', 'VERB'): 14886,
         ('VERB', 'NOUN'): 14230,
         ('NOUN', 'NUM'): 1783,
         ('NUM', '.'): 3210,
         ('.', 'NUM'): 1412,
         ('.', '.'): 12588,
         ('.', None): 44936,
         ('ADP', 'ADV'): 1805,
         ('ADV', 'NUM'): 597,
         ('DET', 'NOUN'): 68785,
         ('CONJ', 'DET'): 4636,
         ('NOUN', 'VERB'): 3497

In [21]:
tags = [tag for i, (word, tag) in enumerate(data.training_set.stream())]
bigrams = [(tags[i], tags[i+1]) for i in range(len(tags) - 1)]
#Counter(bigrams)

In [22]:
flat_list1=[]
for word,tag in data.training_set.stream():
    ngrams = zip([tag[i:] for i in range(2)])
    flat_list1.append(ngrams)
flat_list1
big_list = [tup for obj in flat_list1 for tup in [*obj]]
#big_list

In [23]:
def bigram_counts(sequences):
    """Return a dictionary keyed to each unique PAIR of values in the input sequences
    list that counts the number of occurrences of pair in the sequences list. The input
    should be a 2-dimensional array.
    
    For example, if the pair of tags (NOUN, VERB) appear 61582 times, then you should
    return a dictionary such that your_bigram_counts[(NOUN, VERB)] == 61582
    """

    # TODO: Finish this function!
    bigrams = [(sequences[i], sequences[i+1]) for i in range(len(sequences) - 1)]
    return Counter(bigrams)

# TODO: call bigram_counts with a list of tag sequences from the training set
tags = [tag for i, (word, tag) in enumerate(data.training_set.stream())]
bigrams = [(tags[i], tags[i+1]) for i in range(len(tags) - 1)]

    
tag_bigrams = bigram_counts(tags)
assert len(tags) == len(tag_seq)

assert len(tag_bigrams) == 144, \
       "Uh oh. There should be 144 pairs of bigrams (12 tags x 12 tags)"
assert min(tag_bigrams, key=tag_bigrams.get) in [('X', 'NUM'), ('PRON', 'X')], \
       "Hmmm...The least common bigram should be one of ('X', 'NUM') or ('PRON', 'X')."
assert max(tag_bigrams, key=tag_bigrams.get) in [('DET', 'NOUN')], \
       "Hmmm...('DET', 'NOUN') is expected to be the most common bigram."
HTML('<div class="alert alert-block alert-success">Your tag bigrams look good!</div>')

### IMPLEMENTATION: Sequence Starting Counts
Complete the code below to estimate the bigram probabilities of a sequence starting with each tag.

In [24]:
data.training_set.Y[:3]

(('ADV',
  'NOUN',
  '.',
  'ADV',
  '.',
  'VERB',
  'ADP',
  'ADJ',
  'NOUN',
  'CONJ',
  'VERB',
  'ADJ',
  'NOUN',
  '.',
  'DET',
  'VERB',
  'ADJ',
  'PRT',
  'ADP',
  'NUM',
  'NOUN',
  '.',
  'PRON',
  'VERB',
  'PRT',
  'VERB',
  'NOUN',
  '.',
  'VERB',
  'NOUN',
  'NUM',
  '.',
  'NUM',
  '.',
  '.',
  '.'),
 ('ADP',
  'ADV',
  'NUM',
  'NOUN',
  '.',
  'DET',
  'NOUN',
  'CONJ',
  'DET',
  'NOUN',
  'VERB',
  'ADP',
  'NOUN',
  'VERB',
  'ADP',
  'DET',
  'NOUN',
  'ADP',
  'NOUN',
  'CONJ',
  'NOUN',
  '.',
  'NOUN',
  '.',
  'NOUN',
  '.',
  'NOUN',
  '.',
  'VERB',
  '.',
  'VERB',
  'ADP',
  'NOUN',
  '.',
  'VERB',
  '.',
  'VERB',
  '.',
  'VERB',
  '.',
  'VERB',
  '.'),
 ('ADJ', 'NOUN'))

In [25]:
sequence = [seq[0] for seq in data.training_set.Y[:3]]
Counter(sequence)

Counter({'ADV': 1, 'ADP': 1, 'ADJ': 1})

In [26]:
def starting_counts(sequences):
    """Return a dictionary keyed to each unique value in the input sequences list
    that counts the number of occurrences where that value is at the beginning of
    a sequence.
    
    For example, if 8093 sequences start with NOUN, then you should return a
    dictionary such that your_starting_counts[NOUN] == 8093
    """
    # TODO: Finish this function!
    sequence = [seq[0] for seq in sequences]
    return Counter(sequence)

# TODO: Calculate the count of each tag starting a sequence
sequence = [seq[0] for seq in data.training_set.Y]
tag_starts = starting_counts(data.training_set.Y)

assert len(tag_starts) == 12, "Uh oh. There should be 12 tags in your dictionary."
assert min(tag_starts, key=tag_starts.get) == 'X', "Hmmm...'X' is expected to be the least common starting bigram."
assert max(tag_starts, key=tag_starts.get) == 'DET', "Hmmm...'DET' is expected to be the most common starting bigram."
HTML('<div class="alert alert-block alert-success">Your starting tag counts look good!</div>')

### IMPLEMENTATION: Sequence Ending Counts
Complete the function below to estimate the bigram probabilities of a sequence ending with each tag.

In [27]:
sequence_end = [seq[-1] for seq in data.training_set.Y[:3]]
Counter(sequence_end)

Counter({'.': 2, 'NOUN': 1})

In [28]:
def ending_counts(sequences):
    """Return a dictionary keyed to each unique value in the input sequences list
    that counts the number of occurrences where that value is at the end of
    a sequence.
    
    For example, if 18 sequences end with DET, then you should return a
    dictionary such that your_starting_counts[DET] == 18
    """
    # TODO: Finish this function!
    return Counter(sequences)

# TODO: Calculate the count of each tag ending a sequence
sequence_end = [seq[-1] for seq in data.training_set.Y]
tag_ends = ending_counts(sequence_end)

assert len(tag_ends) == 12, "Uh oh. There should be 12 tags in your dictionary."
assert min(tag_ends, key=tag_ends.get) in ['X', 'CONJ'], "Hmmm...'X' or 'CONJ' should be the least common ending bigram."
assert max(tag_ends, key=tag_ends.get) == '.', "Hmmm...'.' is expected to be the most common ending bigram."
HTML('<div class="alert alert-block alert-success">Your ending tag counts look good!</div>')

### IMPLEMENTATION: Basic HMM Tagger
Use the tag unigrams and bigrams calculated above to construct a hidden Markov tagger.

- Add one state per tag
    - The emission distribution at each state should be estimated with the formula: $P(w|t) = \frac{C(t, w)}{C(t)}$
- Add an edge from the starting state `basic_model.start` to each tag
    - The transition probability should be estimated with the formula: $P(t|start) = \frac{C(start, t)}{C(start)}$
- Add an edge from each tag to the end state `basic_model.end`
    - The transition probability should be estimated with the formula: $P(end|t) = \frac{C(t, end)}{C(t)}$
- Add an edge between _every_ pair of tags
    - The transition probability should be estimated with the formula: $P(t_2|t_1) = \frac{C(t_1, t_2)}{C(t_1)}$

In [29]:

#tag_unigrams
#tag_bigrams
#tag_starts
#tag_ends

#tag_state_list = []
#for tag,word_count in emission_counts.items():
 #   total = float(sum(word_count.values()))
 #   distribution = {word: count/total for word, count in word_count.items()}
  #  tag_emissions = DiscreteDistribution(distribution)
  #  tag_state = State(tag_emissions, name=tag)
  #  basic_model.add_states(tag_state)
  #  tag_state_list.append(tag_state)
    
    


In [30]:
#for tag,word_dict in emission_counts.items():
#    distribution = {word: count for word, count in word_dict.items()}
    #print(distribution)

In [31]:
#start_prob = {}
#for tag,count in tag_starts.items():
#    start_prob[tag] = count/tag_unigrams[tag]
#start_prob
    

In [32]:
basic_model = HiddenMarkovModel(name="base-hmm-tagger")

# TODO: create states with emission probability distributions P(word | tag) and add to the model
# (Hint: you may need to loop & create/add new states)
tag_state_list = []
for tag,word_count in emission_counts.items():
    total = float(sum(word_count.values()))
    distribution = {word: count/total for word, count in word_count.items()}
    tag_emissions = DiscreteDistribution(distribution)
    tag_state = State(tag_emissions, name=tag)
    basic_model.add_states(tag_state)
    tag_state_list.append(tag_state)
    
#Adding start and end tags
#first we calculate the various probabilities for tags to be start tags then we add a custom start tag to create a markov sequence
start_prob = {}
for tag,count in tag_starts.items():
    start_prob[tag] = count/tag_unigrams[tag]

for tag_state in tag_state_list:
    basic_model.add_transition(basic_model.start, tag_state, start_prob[tag_state.name])

end_prob = {}

for tag,count in tag_ends.items():
    end_prob[tag] = count/tag_unigrams[tag]

for tag_state in tag_state_list:
    basic_model.add_transition(tag_state, basic_model.end, end_prob[tag_state.name])
    
# TODO: add edges between states for the observed transition frequencies P(tag_i | tag_i-1)

# (Hint: you may need to loop & add transitions
transition_prob = {}
for tag1 in data.training_set.tagset:
    for tag2 in data.training_set.tagset:
        transition_prob[(tag1,tag2)] = tag_bigrams[(tag1, tag2)]/tag_unigrams[tag1]
        
for tag1_state in tag_state_list:
    for tag2_state in tag_state_list:
        basic_model.add_transition(tag1_state, tag2_state, transition_prob[(tag1_state.name, tag2_state.name)])



# NOTE: YOU SHOULD NOT NEED TO MODIFY ANYTHING BELOW THIS LINE
# finalize the model
basic_model.bake()

assert all(tag in set(s.name for s in basic_model.states) for tag in data.training_set.tagset), \
       "Every state in your network should use the name of the associated tag, which must be one of the training set tags."
assert basic_model.edge_count() == 168, \
       ("Your network should have an edge from the start node to each state, one edge between every " +
        "pair of tags (states), and an edge from each state to the end node.")
HTML('<div class="alert alert-block alert-success">Your HMM network topology looks good!</div>')

In [33]:
hmm_training_acc = accuracy(data.training_set.X, data.training_set.Y, basic_model)
print("training accuracy basic hmm model: {:.2f}%".format(100 * hmm_training_acc))

hmm_testing_acc = accuracy(data.testing_set.X, data.testing_set.Y, basic_model)
print("testing accuracy basic hmm model: {:.2f}%".format(100 * hmm_testing_acc))

assert hmm_training_acc > 0.97, "Uh oh. Your HMM accuracy on the training set doesn't look right."
assert hmm_testing_acc > 0.955, "Uh oh. Your HMM accuracy on the testing set doesn't look right."
HTML('<div class="alert alert-block alert-success">Your HMM tagger accuracy looks correct! Congratulations, you\'ve finished the project.</div>')

training accuracy basic hmm model: 97.48%
testing accuracy basic hmm model: 95.86%


### Example Decoding Sequences with the HMM Tagger

In [34]:
for key in data.testing_set.keys[:3]:
    print("Sentence Key: {}\n".format(key))
    print("Predicted labels:\n-----------------")
    print(simplify_decoding(data.sentences[key].words, basic_model))
    print()
    print("Actual labels:\n--------------")
    print(data.sentences[key].tags)
    print("\n")

Sentence Key: b100-28144

Predicted labels:
-----------------
['CONJ', 'NOUN', 'NUM', '.', 'NOUN', 'NUM', '.', 'NOUN', 'NUM', '.', 'CONJ', 'NOUN', 'NUM', '.', '.', 'NOUN', '.', '.']

Actual labels:
--------------
('CONJ', 'NOUN', 'NUM', '.', 'NOUN', 'NUM', '.', 'NOUN', 'NUM', '.', 'CONJ', 'NOUN', 'NUM', '.', '.', 'NOUN', '.', '.')


Sentence Key: b100-23146

Predicted labels:
-----------------
['PRON', 'VERB', 'DET', 'NOUN', 'ADP', 'ADJ', 'ADJ', 'NOUN', 'VERB', 'VERB', '.', 'ADP', 'VERB', 'DET', 'NOUN', 'ADP', 'NOUN', 'ADP', 'DET', 'NOUN', '.']

Actual labels:
--------------
('PRON', 'VERB', 'DET', 'NOUN', 'ADP', 'ADJ', 'ADJ', 'NOUN', 'VERB', 'VERB', '.', 'ADP', 'VERB', 'DET', 'NOUN', 'ADP', 'NOUN', 'ADP', 'DET', 'NOUN', '.')


Sentence Key: b100-35462

Predicted labels:
-----------------
['DET', 'ADJ', 'NOUN', 'VERB', 'VERB', 'VERB', 'ADP', 'DET', 'ADJ', 'ADJ', 'NOUN', 'ADP', 'DET', 'ADJ', 'NOUN', '.', 'ADP', 'ADJ', 'NOUN', '.', 'CONJ', 'ADP', 'DET', 'NOUN', 'ADP', 'ADJ', 'ADJ', '.', 


## Finishing the project
---

<div class="alert alert-block alert-info">
**Note:** **SAVE YOUR NOTEBOOK**, then run the next cell to generate an HTML copy. You will zip & submit both this file and the HTML copy for review.
</div>

In [35]:
!!jupyter nbconvert *.ipynb

['[NbConvertApp] Converting notebook HMM Tagger.ipynb to html',
 '[NbConvertApp] Writing 1290189 bytes to HMM Tagger.html',
 '[NbConvertApp] Converting notebook HMM Tagger-zh.ipynb to html',
 '[NbConvertApp] Writing 355982 bytes to HMM Tagger-zh.html',
 '[NbConvertApp] Converting notebook HMM warmup (optional).ipynb to html',
 '[NbConvertApp] Writing 334660 bytes to HMM warmup (optional).html',
 '[NbConvertApp] Converting notebook HMM warmup (optional)-zh.ipynb to html',
 '[NbConvertApp] Writing 321728 bytes to HMM warmup (optional)-zh.html']

## Step 4: [Optional] Improving model performance
---
There are additional enhancements that can be incorporated into your tagger that improve performance on larger tagsets where the data sparsity problem is more significant. The data sparsity problem arises because the same amount of data split over more tags means there will be fewer samples in each tag, and there will be more missing data  tags that have zero occurrences in the data. The techniques in this section are optional.

- [Laplace Smoothing](https://en.wikipedia.org/wiki/Additive_smoothing) (pseudocounts)
    Laplace smoothing is a technique where you add a small, non-zero value to all observed counts to offset for unobserved values.

- Backoff Smoothing
    Another smoothing technique is to interpolate between n-grams for missing data. This method is more effective than Laplace smoothing at combatting the data sparsity problem. Refer to chapters 4, 9, and 10 of the [Speech & Language Processing](https://web.stanford.edu/~jurafsky/slp3/) book for more information.

- Extending to Trigrams
    HMM taggers have achieved better than 96% accuracy on this dataset with the full Penn treebank tagset using an architecture described in [this](http://www.coli.uni-saarland.de/~thorsten/publications/Brants-ANLP00.pdf) paper. Altering your HMM to achieve the same performance would require implementing deleted interpolation (described in the paper), incorporating trigram probabilities in your frequency tables, and re-implementing the Viterbi algorithm to consider three consecutive states instead of two.

### Obtain the Brown Corpus with a Larger Tagset
Run the code below to download a copy of the brown corpus with the full NLTK tagset. You will need to research the available tagset information in the NLTK docs and determine the best way to extract the subset of NLTK tags you want to explore. If you write the following the format specified in Step 1, then you can reload the data using all of the code above for comparison.

Refer to [Chapter 5](http://www.nltk.org/book/ch05.html) of the NLTK book for more information on the available tagsets.

In [36]:
import nltk
from nltk import pos_tag, word_tokenize
from nltk.corpus import brown

nltk.download('brown')
training_corpus = nltk.corpus.brown
training_corpus.tagged_sents()[0]

[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Unzipping corpora/brown.zip.


[('The', 'AT'),
 ('Fulton', 'NP-TL'),
 ('County', 'NN-TL'),
 ('Grand', 'JJ-TL'),
 ('Jury', 'NN-TL'),
 ('said', 'VBD'),
 ('Friday', 'NR'),
 ('an', 'AT'),
 ('investigation', 'NN'),
 ('of', 'IN'),
 ("Atlanta's", 'NP$'),
 ('recent', 'JJ'),
 ('primary', 'NN'),
 ('election', 'NN'),
 ('produced', 'VBD'),
 ('``', '``'),
 ('no', 'AT'),
 ('evidence', 'NN'),
 ("''", "''"),
 ('that', 'CS'),
 ('any', 'DTI'),
 ('irregularities', 'NNS'),
 ('took', 'VBD'),
 ('place', 'NN'),
 ('.', '.')]