# Skip-gram Word2Vec

In this notebook, I'll lead you through using PyTorch to implement the [Word2Vec algorithm](https://en.wikipedia.org/wiki/Word2vec) using the skip-gram architecture. By implementing this, you'll learn about embedding words for use in natural language processing. This will come in handy when dealing with things like machine translation.

## Readings

Here are the resources I used to build this notebook. I suggest reading these either beforehand or while you're working on this material.

* A really good [conceptual overview](http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/) of Word2Vec from Chris McCormick 
* [First Word2Vec paper](https://arxiv.org/pdf/1301.3781.pdf) from Mikolov et al.
* [Neural Information Processing Systems, paper](http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) with improvements for Word2Vec also from Mikolov et al.

---
## Word embeddings

When you're dealing with words in text, you end up with tens of thousands of word classes to analyze; one for each word in a vocabulary. Trying to one-hot encode these words is massively inefficient because most values in a one-hot vector will be set to zero. So, the matrix multiplication that happens in between a one-hot input vector and a first, hidden layer will result in mostly zero-valued hidden outputs.

<img src='assets/one_hot_encoding.png' width=50%>

To solve this problem and greatly increase the efficiency of our networks, we use what are called **embeddings**. Embeddings are just a fully connected layer like you've seen before. We call this layer the embedding layer and the weights are embedding weights. We skip the multiplication into the embedding layer by instead directly grabbing the hidden layer values from the weight matrix. We can do this because the multiplication of a one-hot encoded vector with a matrix returns the row of the matrix corresponding the index of the "on" input unit.

<img src='assets/lookup_matrix.png' width=50%>

Instead of doing the matrix multiplication, we use the weight matrix as a lookup table. We encode the words as integers, for example "heart" is encoded as 958, "mind" as 18094. Then to get hidden layer values for "heart", you just take the 958th row of the embedding matrix. This process is called an **embedding lookup** and the number of hidden units is the **embedding dimension**.

<img src='assets/tokenize_lookup.png' width=50%>
 
There is nothing magical going on here. The embedding lookup table is just a weight matrix. The embedding layer is just a hidden layer. The lookup is just a shortcut for the matrix multiplication. The lookup table is trained just like any weight matrix.

Embeddings aren't only used for words of course. You can use them for any model where you have a massive number of classes. A particular type of model called **Word2Vec** uses the embedding layer to find vector representations of words that contain semantic meaning.

---
## Word2Vec

The Word2Vec algorithm finds much more efficient representations by finding vectors that represent the words. These vectors also contain semantic information about the words.

<img src="assets/context_drink.png" width=40%>

Words that show up in similar **contexts**, such as "coffee", "tea", and "water" will have vectors near each other. Different words will be further away from one another, and relationships can be represented by distance in vector space.

<img src="assets/vector_distance.png" width=40%>


There are two architectures for implementing Word2Vec:
>* CBOW (Continuous Bag-Of-Words) and 
* Skip-gram

<img src="assets/word2vec_architectures.png" width=60%>

In this implementation, we'll be using the **skip-gram architecture** because it performs better than CBOW. Here, we pass in a word and try to predict the words surrounding it in the text. In this way, we can train the network to learn representations for words that show up in similar contexts.

---
## Loading Data

Next, we'll ask you to load in data and place it in the `data` directory

1. Load the [text8 dataset](https://s3.amazonaws.com/video.udacity-data.com/topher/2018/October/5bbe6499_text8/text8.zip); a file of cleaned up *Wikipedia article text* from Matt Mahoney. 
2. Place that data in the `data` folder in the home directory.
3. Then you can extract it and delete the archive, zip file to save storage space.

After following these steps, you should have one file in your data directory: `data/text8`.

In [1]:
# read in the extracted text file      
with open('data/text8') as f:
    text = f.read()

# print out the first 100 characters
print(text[:100])

 anarchism originated as a term of abuse first used against early working class radicals including t


## Pre-processing

Here I'm fixing up the text to make training easier. This comes from the `utils.py` file. The `preprocess` function does a few things:
>* It converts any punctuation into tokens, so a period is changed to ` <PERIOD> `. In this data set, there aren't any periods, but it will help in other NLP problems. 
* It removes all words that show up five or *fewer* times in the dataset. This will greatly reduce issues due to noise in the data and improve the quality of the vector representations. 
* It returns a list of words in the text.

This may take a few seconds to run, since our text file is quite large. If you want to write your own functions for this stuff, go for it!

In [2]:
import utils

# get list of words
words = utils.preprocess(text)
print(words[:30])

['anarchism', 'originated', 'as', 'a', 'term', 'of', 'abuse', 'first', 'used', 'against', 'early', 'working', 'class', 'radicals', 'including', 'the', 'diggers', 'of', 'the', 'english', 'revolution', 'and', 'the', 'sans', 'of', 'the', 'french', 'revolution', 'whilst', 'the']


In [3]:
# print some stats about this word data
print("Total words in text: {}".format(len(words)))
print("Unique words: {}".format(len(set(words)))) # `set` removes any duplicate words

Total words in text: 16616688
Unique words: 53721


### Dictionaries

Next, I'm creating two dictionaries to convert words to integers and back again (integers to words). This is again done with a function in the `utils.py` file. `create_lookup_tables` takes in a list of words in a text and returns two dictionaries.
>* The integers are assigned in descending frequency order, so the most frequent word ("the") is given the integer 0 and the next most frequent is 1, and so on. 

Once we have our dictionaries, the words are converted to integers and stored in the list `int_words`.

In [4]:
vocab_to_int, int_to_vocab = utils.create_lookup_tables(words)
int_words = [vocab_to_int[word] for word in words]

print(int_words[:30])

[5233, 3080, 11, 5, 194, 1, 3133, 45, 58, 155, 127, 741, 476, 10571, 133, 0, 27349, 1, 0, 102, 854, 2, 0, 15067, 1, 0, 150, 854, 3580, 0]


## Subsampling

Words that show up often such as "the", "of", and "for" don't provide much context to the nearby words. If we discard some of them, we can remove some of the noise from our data and in return get faster training and better representations. This process is called subsampling by Mikolov. For each word $w_i$ in the training set, we'll discard it with probability given by 

$$ P(w_i) = 1 - \sqrt{\frac{t}{f(w_i)}} $$

where $t$ is a threshold parameter and $f(w_i)$ is the frequency of word $w_i$ in the total dataset.

$$ P(0) = 1 - \sqrt{\frac{1*10^{-5}}{1*10^6/16*10^6}} = 0.98735 $$

I'm going to leave this up to you as an exercise. Check out my solution to see how I did it.

> **Exercise:** Implement subsampling for the words in `int_words`. That is, go through `int_words` and discard each word given the probablility $P(w_i)$ shown above. Note that $P(w_i)$ is the probability that a word is discarded. Assign the subsampled data to `train_words`.

In [5]:
from collections import Counter
import random
import numpy as np

threshold = 1e-5
word_counts = Counter(int_words)
print(list(word_counts.items())[0])  # dictionary of int_words, how many times they appear

total_number_of_words = len(int_words)
frequencies = {word: count / total_number_of_words for word, count in word_counts.items()}


# discard some frequent words, according to the subsampling equation
# create a new list of words for training
def should_be_discarded(word):
    ''' word is an integer'''
    assert word in word_counts
    discard_probability = 1 - np.sqrt(threshold / frequencies[word])
    return random.random() < discard_probability

train_words = [word for word in int_words if not should_be_discarded(word)]

print(train_words[:30])

(5233, 303)
[155, 10571, 133, 27349, 15067, 854, 58, 10712, 1423, 2757, 686, 7088, 1052, 248, 44611, 5233, 2621, 8983, 4147, 6437, 4186, 5233, 1818, 6753, 7573, 1774, 566, 93, 11064, 7088]


In [6]:
print([int_to_vocab[word] for word in train_words[:30]])

['against', 'radicals', 'including', 'diggers', 'sans', 'revolution', 'used', 'pejorative', 'positive', 'label', 'defined', 'anarchists', 'derived', 'without', 'archons', 'anarchism', 'rulers', 'unnecessary', 'abolished', 'differing', 'interpretations', 'anarchism', 'movements', 'elimination', 'authoritarian', 'institutions', 'particularly', 'state', 'anarchy', 'anarchists']


## Making batches

Now that our data is in good shape, we need to get it into the proper form to pass it into our network. With the skip-gram architecture, for each word in the text, we want to define a surrounding _context_ and grab all the words in a window around that word, with size $C$. 

From [Mikolov et al.](https://arxiv.org/pdf/1301.3781.pdf): 

"Since the more distant words are usually less related to the current word than those close to it, we give less weight to the distant words by sampling less from those words in our training examples... If we choose $C = 5$, for each training word we will select randomly a number $R$ in range $[ 1: C ]$, and then use $R$ words from history and $R$ words from the future of the current word as correct labels."

> **Exercise:** Implement a function `get_target` that receives a list of words, an index, and a window size, then returns a list of words in the window around the index. Make sure to use the algorithm described above, where you chose a random number of words to from the window.

Say, we have an input and we're interested in the idx=2 token, `741`: 
```
[5233, 58, 741, 10571, 27349, 0, 15067, 58112, 3580, 58, 10712]
```

For `R=2`, `get_target` should return a list of four values:
```
[5233, 58, 10571, 27349]
```

In [7]:
def get_target(words, index, window_size=5):
    ''' Get a list of words in a window around an index. '''
    r = np.random.randint(1, window_size+1)
    if r > index:
        return words[:index] + words[index + 1: index + r + 1]
    elif r > len(words) - index:
        return words[index - r:index] + words[index + 1:]
    else:
        return words[index - r:index] + words[index + 1: index + r + 1]


### Generating Batches 

Here's a generator function that returns batches of input and target data for our model, using the `get_target` function from above. The idea is that it grabs `batch_size` words from a words list. Then for each of those batches, it gets the target words in a window.

In [8]:
def get_batches(words, batch_size, window_size=5):
    ''' Create a generator of word batches as a tuple (inputs, targets) '''
    
    n_batches = len(words)//batch_size
    
    # only full batches
    words = words[:n_batches*batch_size]
    
    for idx in range(0, len(words), batch_size):
        x, y = [], []
        batch = words[idx:idx+batch_size]
        for ii in range(len(batch)):
            batch_x = batch[ii]
            batch_y = get_target(batch, ii, window_size)
            y.extend(batch_y)
            x.extend([batch_x]*len(batch_y))
        yield x, y
    

In [9]:
int_text = [i for i in range(20)]
x,y = next(get_batches(int_text, batch_size=4, window_size=5))

print('x\n', x)
print('y\n', y)

x
 [0, 0, 0, 1, 1, 1, 2, 2, 3, 3]
y
 [1, 2, 3, 0, 2, 3, 1, 3, 1, 2]


## Building the graph

Below is an approximate diagram of the general structure of our network.
<img src="assets/skip_gram_net_arch.png" width=60%>

>* The input words are passed in as batches of input word tokens. 
* This will go into a hidden layer of linear units (our embedding layer). 
* Then, finally into a softmax output layer. 

We'll use the softmax layer to make a prediction about the context words by sampling, as usual.

The idea here is to train the embedding layer weight matrix to find efficient representations for our words. We can discard the softmax layer because we don't really care about making predictions with this network. We just want the embedding matrix so we can use it in _other_ networks we build using this dataset.

---
## Validation

Here, I'm creating a function that will help us observe our model as it learns. We're going to choose a few common words and few uncommon words. Then, we'll print out the closest words to them using the cosine similarity: 

$$
\mathrm{similarity} = \cos(\theta) = \frac{\vec{a} \cdot \vec{b}}{|\vec{a}||\vec{b}|}
$$


We can encode the validation words as vectors $\vec{a}$ using the embedding table, then calculate the similarity with each word vector $\vec{b}$ in the embedding table. With the similarities, we can print out the validation words and words in our embedding table semantically similar to those words. It's a nice way to check that our embedding table is grouping together words with similar semantic meanings.

In [10]:
def cosine_similarity(embedding, valid_size=16, valid_window=100, device='cpu'):
    """ Returns the cosine similarity of validation words with words in the embedding matrix.
        Here, embedding should be a PyTorch embedding module.
    """
    
    # Here we're calculating the cosine similarity between some random words and 
    # our embedding vectors. With the similarities, we can look at what words are
    # close to our random words.
    
    # sim = (a . b) / |a||b|
    
    embed_vectors = embedding.weight
    
    # magnitude of embedding vectors, |b|
    magnitudes = embed_vectors.pow(2).sum(dim=1).sqrt().unsqueeze(0)
    
    # pick N words from our ranges (0,window) and (1000,1000+window). lower id implies more frequent 
    valid_examples = np.array(random.sample(range(valid_window), valid_size//2))
    valid_examples = np.append(valid_examples,
                               random.sample(range(1000,1000+valid_window), valid_size//2))
    valid_examples = torch.LongTensor(valid_examples).to(device)
    
    valid_vectors = embedding(valid_examples)
    similarities = torch.mm(valid_vectors, embed_vectors.t())/magnitudes
        
    return valid_examples, similarities

## SkipGram model

Define and train the SkipGram model. 
> You'll need to define an [embedding layer](https://pytorch.org/docs/stable/nn.html#embedding) and a final, softmax output layer.

An Embedding layer takes in a number of inputs, importantly:
* **num_embeddings** – the size of the dictionary of embeddings, or how many rows you'll want in the embedding weight matrix
* **embedding_dim** – the size of each embedding vector; the embedding dimension

In [11]:
import torch
from torch import nn
import torch.optim as optim

In [14]:
class SkipGram(nn.Module):
    def __init__(self, n_vocab, n_embed):
        super().__init__()
        
        # complete this SkipGram model
        self.embed = nn.Embedding(num_embeddings=n_vocab, embedding_dim=n_embed)
        self.output = nn.Linear(n_embed, n_vocab)
        self.log_softmax = nn.LogSoftmax(dim=1)
    
    def forward(self, x):
        
        # define the forward behavior
        x = self.embed(x)
        x = self.output(x)
        x = self.log_softmax(x)
        return x

### Training

Below is our training loop, and I recommend that you train on GPU, if available.

**Note that, because we applied a softmax function to our model output, we are using NLLLoss** as opposed to cross entropy. This is because Softmax  in combination with NLLLoss = CrossEntropy loss .

In [None]:
# check if GPU is available
device = 'cuda' if torch.cuda.is_available() else 'cpu'

embedding_dim=300 # you can change, if you want

model = SkipGram(len(vocab_to_int), embedding_dim).to(device)
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=0.003)

print_every = 100
steps = 0
epochs = 5

# train for some number of epochs
for e in range(epochs):
    
    # get input and target batches
    for inputs, targets in get_batches(train_words, 512):
        steps += 1
        inputs, targets = torch.LongTensor(inputs), torch.LongTensor(targets)
        inputs, targets = inputs.to(device), targets.to(device)
        
        log_ps = model(inputs)
        loss = criterion(log_ps, targets)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if steps % print_every == 0:                  
            # getting examples and similarities      
            valid_examples, valid_similarities = cosine_similarity(model.embed, device=device)
            _, closest_idxs = valid_similarities.topk(6) # topk highest similarities
            
            valid_examples, closest_idxs = valid_examples.to('cpu'), closest_idxs.to('cpu')
            for ii, valid_idx in enumerate(valid_examples):
                closest_words = [int_to_vocab[idx.item()] for idx in closest_idxs[ii]][1:]
                print(int_to_vocab[valid_idx.item()] + " | " + ', '.join(closest_words))
            print("...")

which | fortified, ehr, molarity, moisture, semon
more | vanadium, dollars, helier, describe, happiness
if | kidnappers, hestia, ht, encephalopathy, ancestries
however | racing, beatty, huldrych, dept, prairies
or | euphoria, linkin, oversaw, grok, greek
only | spiritualism, microcomputers, congenial, tubas, filtering
so | migrant, plummeted, knobs, sadozai, smirnov
that | bosco, sident, mandeville, cronquist, ocular
placed | rapids, embryo, medieval, volhynia, limelight
something | reportedly, rivi, euthanasia, starboard, thermotropic
troops | combines, precambrian, eesti, catchers, distillery
file | sings, editorial, khmelnytsky, carbonation, timorese
experience | menagerie, bleak, dtmf, nut, won
engine | affixes, ilya, senex, ministers, shower
bible | tactically, fighting, infertile, chinon, centum
shows | distract, liang, narcolepsy, zagreb, codified
...
see | rigged, superficially, vowel, soldiery, display
with | accent, corroborating, fibre, penderecki, soc
over | untamed, spawni

to | milah, clark, versa, circumnavigation, breaking
over | untamed, spawning, leakey, aftermath, holden
may | transformation, lawyer, assumed, ges, subversion
is | wilfrid, amiga, opponents, tyler, hunchback
often | drive, billiard, imprisonment, wail, digitized
where | hoods, machinations, significant, salvador, scuttled
six | michelson, pepe, acquired, citadels, enforces
its | mezzo, ovum, adjective, reefs, leftmost
liberal | druze, reluctantly, governors, quality, available
arts | xinhua, pennyroyal, consults, silvery, friendly
shows | liang, zagreb, distract, narcolepsy, codified
existence | schuman, interwiki, willamette, soundhole, euskadi
writers | migrating, irritability, jessica, cady, perishable
derived | elimination, outrun, kits, monism, bei
proposed | articulatory, prophesy, toolbox, agnostic, psychosomatic
mathematics | sybil, bax, magenta, mellow, icon
...
war | herndon, daoism, social, alison, complements
history | wells, modernize, outlines, thurston, damian
or | euph

where | machinations, hoods, salvador, significant, gahan
so | migrant, thracian, maariv, dog, conspired
with | soc, hospitals, accent, qdos, corroborating
by | regan, mattingly, permafrost, floods, speaker
nine | archbishops, haworth, saeed, carrero, whigs
of | cartman, curiously, housing, timbers, plummet
or | linkin, euphoria, transept, oni, trnc
system | relation, jai, cdb, brindle, dmu
consists | embellished, higgs, withstand, freefall, revived
cost | attacking, cisc, finale, havoc, uno
paris | chd, unger, evan, unfinished, antinomy
resources | hominoid, allude, freeman, smear, sys
older | madeleine, integrative, calibration, icosidodecahedron, recipe
ice | aberystwyth, temmu, lsch, septum, sketched
scale | swears, unwitting, sar, untamed, actaeon
magazine | eliminates, leakey, consumer, ollie, wronged
...
was | refining, eduke, southey, inhumane, gainesville
that | mandeville, examinations, quranic, ocular, bosco
can | impression, neoplatonist, vowels, very, pea
by | regan, matti

no | abrahams, death, majority, punctured, abdullah
seven | canseco, itasca, singer, loretta, sandpaper
they | altering, parametric, especially, absorptive, mukhabarat
known | aortic, ratifies, sidi, yokosuka, ecce
american | craig, deposed, replicant, tallest, cartier
most | pasadena, nabataean, shalom, hearts, quantify
these | conveying, decapitated, louth, bona, scraped
see | confucian, braun, doubling, boardgamegeek, superficially
additional | onions, sales, refreshed, gotha, rard
operating | outputting, lavishly, ala, piet, axles
road | wharton, affonso, hyenas, pei, outside
pre | markt, morgue, sampo, pneumonic, arguing
pressure | jis, stile, systemic, dwelt, lux
resources | hominoid, freeman, allude, smear, exerts
prince | codebase, nazism, lynching, treads, defensible
animals | marriott, mahathir, render, dominant, literal
...
five | basra, forecasts, liddell, edison, zero
see | confucian, braun, doubling, boardgamegeek, superficially
war | withdrew, herndon, daoism, social, de

its | adjective, mezzo, reefs, leftmost, ovum
if | ht, hestia, kidnappers, stroke, feces
he | tabari, yogi, moqed, attend, merits
d | astro, dp, video, bond, blume
often | drive, imprisonment, pursuits, digitized, otherworldly
a | wisest, guinness, aleutians, essentially, slugger
from | clears, perpetuating, noh, diameter, resolutely
for | footnote, promises, convergent, implementation, irl
lived | corse, winch, hotly, clinics, victors
behind | try, collier, buddhism, rivi, immediately
dr | nonaligned, sailing, thrusting, theories, interviews
derived | outrun, kits, elimination, theory, histological
articles | adamant, gramophone, edgerton, nuremberg, interdependence
alternative | alienation, ernest, widow, encampments, herring
applications | pisa, gladiators, functional, controller, vladimir
notes | doubly, ezek, abbe, nationalisation, ontogeny
...
often | drive, imprisonment, pursuits, digitized, otherworldly
such | mcfadden, consultations, maguey, cellular, hattie
four | three, rolf

will | raves, frankenstein, szasz, unfairly, habeas
i | discerning, tunable, celebrates, exhibiting, connie
after | illa, weishaupt, beige, judges, paralyzed
been | disdain, shakespeare, cpes, stand, westerners
for | footnote, promises, implementation, adel, convergent
known | aortic, czarist, undead, ratifies, yu
where | hoods, significant, salvador, machinations, ax
th | ecclesiastic, peacock, congregationalists, cadiz, abbreviation
institute | mummies, hess, madison, idealists, suspiciously
woman | follow, ric, geocachers, blinded, mistral
creation | concretes, advisory, aspect, jew, typed
quite | pronounce, convalescence, championed, supergirl, approx
versions | top, thinning, damme, feuchtwanger, cvs
resources | hominoid, sys, proportioned, cumann, freeman
pressure | jis, haer, stile, systemic, restructure
gold | hydrazine, afrobeat, coolers, volcanic, endow
...
often | drive, imprisonment, digitized, atrocious, pursuits
the | marcellus, revolved, russ, chew, arboreal
three | four

an | geats, euskadi, lydia, spades, irian
american | craig, schleicher, replicant, invariable, overview
it | bosons, kitts, involving, breaking, peptide
nine | seven, haworth, carrero, saeed, trenton
history | damian, wells, marsyas, outlines, inspired
two | zero, one, three, howls, four
where | hoods, salvador, significant, ax, machinations
on | homestead, johansen, ignacy, advertisement, vikrant
resources | hominoid, sys, proportioned, freeman, cumann
police | confederations, prisma, mps, indications, president
grand | hashish, licit, citation, durability, aground
channel | hindsight, federation, katydids, nara, guardian
award | nominated, emma, hepburn, footballer, griese
taking | fokker, next, dyes, sectioned, nearer
placed | rapids, limelight, rorty, volhynia, medieval
mathematics | concepts, magenta, lorica, adapa, bax
...
zero | five, two, one, eight, est
people | therapeutic, habitat, individuals, said, discharged
there | yielded, eretz, anise, bun, hotham
have | shiny, cursor,

history | wells, damian, marsyas, inspired, outlines
world | questioner, plummer, olympics, freeview, rescue
were | changeover, indifferent, martyrs, wakko, chuvash
into | elitist, criminalized, stripper, between, repeatedly
would | miriam, terribly, suicides, realizes, napoli
but | go, everyone, ageing, misperception, easiest
all | salination, baldassare, pneumoniae, symbolizing, encylopedia
many | eavesdropping, brenly, transistors, incalculable, arabs
engine | ilya, configuration, existing, electricity, throat
numerous | intoxicated, loyola, seemingly, bechuanaland, phosphorylation
pre | markt, disparities, arguing, sampo, morgue
rise | merthyr, florin, artificially, conscious, davidians
versions | top, thinning, leaning, version, unser
issue | peebles, apical, hoarding, roussimoff, reimbursement
channel | federation, katydids, hindsight, nara, ieung
know | crb, monad, ifex, urges, emc
...
these | bona, clinics, louth, individuals, religions
this | fiske, much, peacetime, hath, tos


than | hommes, armouries, improvement, mono, baccio
not | unused, differentials, invalid, brightest, kickapoo
these | louth, bona, individuals, supplant, conveying
he | moqed, tabari, attend, hatches, xvii
after | arrested, illa, weishaupt, pardoned, deansgate
d | b, blume, dp, one, astro
over | untamed, spawning, unemployment, mass, falun
or | trnc, to, unicode, cosines, advantage
gold | afrobeat, coolers, hydrazine, worcester, werth
older | noam, whedon, icosidodecahedron, spread, integrative
troops | armed, uzbek, everton, eesti, hine
numerous | intoxicated, loyola, bechuanaland, cumings, phosphorylation
freedom | rawls, shattuck, nucleation, holistic, cracow
hold | elementary, tranquillity, efforts, can, lethbridge
paris | evan, eugene, coals, doon, unger
account | traynor, gig, welles, zseries, agonists
...
he | moqed, tabari, xvii, attend, hatches
most | some, pasadena, palestine, daugava, croydon
are | and, were, mmc, rangle, kingu
be | metabolic, stagnant, oversees, uncharacter

for | some, promises, caps, idealised, refineries
be | to, unbelief, simply, stagnant, uncharacteristically
they | absorptive, hotter, parametric, altering, struggled
will | legitimized, higson, aisha, linearity, kundalini
by | permafrost, regan, declarations, infinitum, witty
as | bouts, pronouncement, whatsoever, distinct, term
would | titans, accounts, realizes, terribly, carry
about | axillary, rifling, sindarin, land, precambrian
existence | schuman, euskadi, soundhole, mystical, musik
numerous | intoxicated, loyola, bechuanaland, seemingly, sourced
marriage | ank, rehearsing, denigrating, perfume, morpork
rise | merthyr, instability, withstand, lsl, florin
san | brecht, santa, islamabad, hr, filth
file | codec, homonyms, predict, hardware, users
notes | doubly, images, ontogeny, filth, noma
magazine | arp, eliminates, ollie, doug, carradine
...
when | crazy, uther, necessary, protons, conspicuously
more | discredit, lay, gheg, urban, profitable
it | bosons, rupture, hardly, perma

other | have, shiny, armenians, franciszek, adjective
people | therapeutic, agung, individuals, worse, habitat
may | subversion, ges, severing, remembering, semen
into | between, the, pass, breakdance, ji
most | some, shalom, pasadena, showdown, daugava
on | homestead, august, johansen, connect, holyrood
it | rupture, deepest, bosons, cusp, cylinders
th | century, ecclesiastic, europe, nd, joseon
prince | henry, piero, regnant, agrippa, novgorod
mean | skew, lens, wt, disappearing, inexhaustible
centre | building, shipyards, mishnayot, peloponnese, bait
dr | nonaligned, berg, brendan, theories, interviews
cost | fixed, macroeconomic, reliability, teflon, lempel
existence | euskadi, schuman, mystical, soundhole, anthropogenic
stage | terran, lla, comedians, parl, sopranos
older | median, whedon, cost, spread, lemmings
...
that | monism, have, against, unleash, improper
some | most, interpreted, cosmopolitan, wondering, altruism
of | the, in, and, south, geography
the | of, in, and, is, 

new | brill, nightmare, advisers, csonka, mariana
but | gains, as, dialects, because, fuels
other | are, have, they, non, colors
than | intermediate, not, metalworking, squeak, nicopolis
so | predictability, migrant, keenly, helmets, betrayed
be | metabolic, simply, absence, balancing, to
two | four, three, one, zero, five
years | two, zero, four, three, migration
existence | euskadi, schuman, musik, soundhole, mystical
hit | consecutive, donald, batsmen, ozzfest, protea
resources | hominoid, psychoacoustics, proportioned, sys, trace
running | divinities, prompting, stymied, shameful, khmelnytsky
universe | cosmological, brownian, identically, rossby, bang
paris | eugene, evan, doon, coals, adaption
woman | spouse, mother, fertility, children, jealous
police | nicaragua, indications, armed, embassy, fedayeen
...
many | eavesdropping, brenly, literal, statuary, heteronormativity
people | individuals, agung, mexican, populations, nationality
however | perineum, dept, difference, prairies

many | as, some, most, eavesdropping, common
their | acrylics, getting, teddy, fluffy, acquiring
by | speaker, the, cutler, mayoral, electrostatic
are | these, is, groups, other, have
new | york, brill, whitworth, texas, st
d | b, blume, g, astro, e
have | some, are, sexes, been, quizzes
known | marketshare, lusitanian, mesozoic, marquis, thorny
bbc | news, tv, announcer, oldman, listing
brother | wife, son, raped, sisters, sister
ice | aberystwyth, glaciation, septum, sarti, dynamo
joseph | ascalon, leone, protester, paul, barrow
older | median, spread, cost, family, whedon
report | publishes, hesychasts, subsequent, unaids, reports
san | santa, brecht, affiliate, juan, diego
rise | instability, artificially, withstand, greenpeace, accelerated
...
been | stand, cpes, maois, ratification, westerners
use | consults, depressant, sdk, industrialization, tools
states | united, footballing, gallup, illegal, mcclelland
not | valid, can, that, this, but
about | mishna, literacy, rifling, abus

when | caprice, gave, moscow, piloted, interim
this | a, not, that, the, is
a | this, the, is, in, if
on | handbook, traditionalists, january, homestead, of
into | through, grouped, between, ecb, korca
only | all, metrology, simply, braces, quadruple
also | u, predicates, see, arima, gleichschaltung
where | that, pi, so, hoods, to
freedom | freedoms, nucleation, shattuck, behavior, proactive
marriage | ank, marriages, she, her, married
discovered | galilei, titan, spokeswoman, discovery, discoveries
units | lipoproteins, unit, achievable, aaa, displacement
frac | x, delta, majesties, left, cos
versions | version, dynamically, salamanders, footage, bassoon
issue | peebles, foreign, liberalizing, reimbursement, roussimoff
heavy | freezing, shaman, pelagic, inlet, paulette
...
by | the, of, s, in, speaker
may | isaias, courts, assumed, scanned, remembering
up | hands, insert, shouted, tool, tapered
united | states, wtro, bonfire, act, south
first | was, early, the, tenures, reputedly
was 

history | external, outlines, kingdom, haig, fasa
its | differences, shape, reefs, the, weaker
can | be, or, a, are, definition
have | are, some, it, been, that
between | the, of, into, with, neutralization
use | certain, depressant, optional, circles, sdk
nine | one, seven, two, eight, zero
people | nationality, therapeutic, americans, agung, declaration
additional | shangri, napa, wizard, arnulf, chien
applications | integrated, application, functional, pisa, components
proposed | planckian, tightened, ludo, unworkable, godel
bill | enslave, caledonia, attorney, cooper, bryce
units | lipoproteins, unit, aaa, achievable, markus
assembly | executive, party, government, unicameral, appointed
nobel | prize, satirists, watergate, laureate, autodidacts
bible | word, christian, chalcedon, covenant, biblical
...
states | united, of, illegal, rico, embassies
no | mammuthus, slithy, randomly, clearly, majority
are | other, these, is, have, as
and | the, of, in, is, five
d | b, blume, poet, foc

are | these, and, is, some, be
while | pilgrims, secular, excrement, not, vigilance
all | are, a, only, have, of
no | be, mammuthus, clearly, majority, slithy
over | four, three, total, zero, number
known | called, are, considered, the, for
so | they, be, them, where, because
people | nationality, americans, children, populations, thousands
proposed | tightened, waned, gosford, combinatorial, uninteresting
woman | mother, children, pregnant, spouse, her
writers | poets, fiction, authors, novelists, illustrators
scale | grew, quite, expansion, processing, tt
event | events, extinction, louvre, tursiops, blackmailed
assembly | executive, party, government, members, legislative
alternative | alienation, patience, palladium, ernest, debates
primarily | prospectors, competitiveness, sizable, commercially, variety
...
see | list, article, also, timeline, the
there | is, eretz, or, every, are
not | they, valid, this, any, that
are | these, some, is, as, be
had | was, he, to, exile, were
from 

in | the, of, a, and, was
at | from, in, years, after, zero
its | part, has, northern, nyasaland, reefs
more | many, are, very, some, less
th | century, nd, seven, two, five
seven | one, four, eight, six, five
than | less, high, more, are, or
into | through, grouped, the, remove, ejecting
test | nuclear, illnesses, pharmacist, grounding, profiling
quite | supergirl, scale, elaborate, particularly, marginalized
hit | batsmen, consecutive, smash, josh, cabin
derived | word, numeral, sanskrit, koine, name
nobel | prize, laureate, physicist, physics, american
square | kilometre, headquarters, london, illinois, museum
prince | monarch, ii, queen, pat, eldest
animals | genus, domestication, human, animal, sharks
...
only | metrology, all, out, they, are
no | clearly, be, slithy, mammuthus, authorities
which | the, that, a, is, using
some | many, have, for, such, are
be | that, if, can, so, should
on | the, a, with, two, of
at | from, years, after, stood, in
was | later, his, after, during, h

an | the, a, is, result, by
that | not, any, must, to, believe
it | not, what, any, to, if
been | many, archaeologists, for, had, has
new | york, university, press, on, princeton
when | gave, had, he, override, to
th | century, nd, seven, twentieth, st
often | many, common, especially, although, some
shows | show, motif, prank, recorded, campy
bill | sitting, clinton, caledonia, enslave, attorney
governor | colony, prime, lieutenant, appointed, ministers
ice | glaciation, winter, aberystwyth, septum, lacrosse
mean | versa, integer, inequality, kajang, inexhaustible
proposed | tightened, combinatorial, uninteresting, evolutionary, gosford
operations | operation, cond, air, peacekeeping, training
resources | resource, hominoid, shrinking, mapping, lumber
...
use | tools, protection, circles, provide, such
over | zero, total, one, four, manpower
will | you, we, your, if, shine
his | he, him, was, her, himself
united | states, british, american, u, countries
american | singer, actor, autho

american | actress, actor, singer, canadian, footballer
has | its, in, is, the, and
three | four, two, five, one, seven
often | most, more, many, although, common
who | him, whom, mother, chose, beings
its | the, of, in, and, has
for | and, a, support, an, with
also | of, and, in, list, see
square | kilometre, area, one, interior, kilometers
something | you, me, crazy, things, wrong
quite | scabbard, though, supergirl, similar, marginalized
bible | biblical, testament, hebrew, text, books
prince | monarch, queen, eldest, novgorod, dukes
account | accounts, according, kanem, thom, buoyed
test | nuclear, testing, tests, program, contamination
notes | bass, piano, vol, noma, perish
...
often | most, more, many, generally, typical
will | can, if, must, you, make
while | between, especially, worst, both, arguably
b | d, one, nine, actor, actress
or | usually, is, such, typically, not
may | semen, vigil, hashanah, traitors, smelling
new | york, encyclopedia, press, on, st
years | female, zer

people | americans, population, deaths, individuals, thousands
has | its, is, for, have, major
from | the, of, which, to, and
and | of, the, in, is, as
its | the, has, is, of, it
it | is, to, which, some, a
not | be, some, that, does, should
most | are, more, often, many, in
running | grappling, androgen, vote, stymied, althea
smith | joseph, mormon, jr, bible, thomas
writers | fiction, authors, novelists, poets, dramatists
mathematics | mathematical, algebra, mathematicians, shannon, arithmetic
taking | navigating, remedial, echelons, session, chimes
grand | rebuilt, aground, concr, jeanneret, wineries
magazine | interview, magazines, book, newspaper, weekly
institute | research, institutes, harvard, university, technology
...
its | is, has, the, it, of
may | such, term, be, can, apostrophe
world | usa, major, in, united, industry
had | was, he, were, him, never
there | is, are, these, have, every
united | states, kingdom, british, countries, america
b | d, f, p, one, c
this | be, not

when | gave, had, he, she, returned
while | to, generally, insufficiency, where, more
on | the, s, of, and, anniversary
world | the, ever, and, allegory, olympic
that | have, this, to, which, it
people | deaths, births, living, population, thousands
d | b, eight, nine, seven, one
or | such, usually, are, may, typically
universe | matter, galaxies, cosmological, bang, cosmology
pre | disparities, columbian, billiards, using, tei
shows | show, appearing, scenes, television, wave
nobel | prize, laureate, physicist, biochemist, american
stage | subterfuge, film, staged, delaunay, comedians
account | almanac, accounts, thom, gravitation, contact
writers | novelists, fiction, poets, authors, dramatists
consists | branches, composed, branch, bicameral, through
...
has | is, have, been, its, major
such | other, or, as, some, these
nine | one, eight, seven, three, four
new | york, press, in, encyclopedia, nine
states | united, state, law, civil, nations
between | length, along, adjusting, the, 

would | to, could, change, might, enough
these | are, such, certain, other, tend
which | is, the, an, it, this
more | than, most, less, are, much
th | century, nd, three, st, rd
many | some, most, include, widely, numerous
so | it, be, because, they, make
american | actor, actress, nine, singer, musician
joseph | smith, james, paul, tambo, american
road | roads, rail, highway, grid, traffic
channel | channels, tv, broadcast, pbs, signal
assembly | legislative, executive, elected, party, elections
issue | policy, controversial, comic, westerner, constitutional
recorded | live, recording, relatively, iole, records
magazine | magazines, interview, published, book, weekly
instance | staccato, thus, probable, use, solution
...
had | was, he, could, to, him
these | are, other, such, their, certain
which | the, is, this, an, it
can | be, a, possible, not, very
three | two, four, one, six, seven
new | york, press, on, now, pratt
use | or, standard, optional, can, other
two | three, five, zero,

but | be, all, very, this, to
six | one, three, five, seven, eight
into | through, the, which, of, out
zero | five, three, two, four, one
was | later, the, became, until, remained
some | such, as, are, many, both
war | troops, army, fighting, soldiers, battle
the | of, in, and, from, was
numerous | as, century, profoundly, traditions, most
discovered | discovery, discoveries, discoverer, concluded, found
placed | shaped, neck, roof, controlled, the
something | me, you, nothing, thing, things
magazine | magazines, interview, weekly, published, newspaper
units | unit, mi, lipoproteins, si, metre
square | kilometre, roo, defined, includes, triangular
nobel | prize, laureate, physicist, winners, biochemist
...
he | his, him, never, himself, her
his | he, him, life, father, himself
s | the, his, of, nine, and
d | b, seven, j, e, c
is | a, are, or, of, the
time | over, spacecraft, hohenstaufen, workstations, until
this | because, thus, which, that, be
not | should, be, any, does, they
stage 

one | seven, six, five, nine, three
b | d, one, seven, writer, eight
this | thus, because, be, not, that
on | s, the, nine, of, links
from | the, in, of, and, four
by | s, the, a, and, of
five | two, four, six, three, zero
were | many, thousands, these, had, after
shows | show, guest, appearing, scenes, movie
joseph | tambo, james, faulkner, sympathetic, smith
institute | university, mit, institutes, professor, research
professional | compete, sports, professions, education, chicago
operating | dos, os, ported, unix, microsoft
animals | animal, humans, prey, species, mammals
applied | dealing, normative, confusions, sciences, derived
ocean | atlantic, pacific, islands, earth, stations
...
not | should, do, any, be, they
two | five, three, four, zero, one
so | be, if, completely, it, they
their | them, who, these, encouraged, they
zero | two, five, four, three, one
he | his, worked, gave, him, went
than | less, it, but, more, larger
are | and, other, often, various, is
scale | estimates

other | such, include, have, or, many
many | most, include, other, well, some
their | them, they, many, areas, those
years | year, at, birth, mortality, total
a | in, is, as, an, and
known | called, as, for, which, in
his | he, career, him, went, her
seven | one, eight, six, five, nine
woman | children, noun, she, nationality, fertility
smith | adam, thomas, owen, paul, keith
hit | hits, hitting, smash, batted, runs
pressure | pressures, temperature, liquid, temperatures, atmospheric
placed | shaped, shape, tube, plastic, attached
consists | branch, composed, which, consist, each
scale | measured, scales, measuring, large, masses
assembly | legislative, elected, seats, parliament, elections
...
five | two, six, four, one, three
not | did, but, does, because, should
called | are, is, an, known, the
seven | one, six, eight, nine, five
was | he, later, the, eventually, after
some | although, different, as, or, generally
other | such, include, as, have, many
is | which, a, called, equivale

four | three, five, one, two, seven
state | states, government, legislature, federal, arkansas
often | common, many, are, most, term
nine | one, seven, three, four, five
some | many, have, such, are, most
an | a, named, called, of, as
with | a, it, along, their, was
for | and, as, addition, also, similar
dr | anonymous, anthony, cumbric, geneticist, singh
grand | ettore, eight, founded, duchy, ronin
powers | sovereign, exercise, monarch, thereby, exercised
orthodox | catholic, church, catholicism, orthodoxy, christians
units | unit, metre, lipoproteins, mi, battalion
bbc | listing, links, march, day, april
account | accounts, sources, according, almanac, gaulish
proposed | proposal, proposes, human, signatory, formulating
...
between | split, any, eastern, sides, differences
this | be, however, non, because, of
b | d, one, seven, composer, writer
nine | one, seven, three, four, eight
states | united, state, u, civil, confederacy
three | four, six, two, one, five
an | a, as, called, nam

there | are, have, is, every, someone
s | was, the, eight, nine, in
american | singer, actress, actor, musician, americans
in | the, and, first, of, as
while | but, riot, took, in, was
had | was, were, after, never, to
d | b, seven, eight, e, one
six | three, two, one, seven, eight
running | run, ran, discus, deflected, down
placed | tube, shaped, line, beneath, attached
defense | military, defensive, defence, enemy, responsibility
issue | proposal, debtor, policy, legislation, censorship
existence | supernatural, belief, matter, theories, nature
pope | papal, papacy, gregory, church, rome
road | roads, highway, town, rail, trains
derived | word, meaning, name, greek, derives
...
state | federal, states, legislature, government, rhode
there | are, every, have, still, is
is | are, a, it, in, every
years | year, months, over, males, migration
also | list, see, are, include, such
been | has, discovered, remains, found, have
where | a, left, square, right, regular
often | more, are, most, 

three | two, four, one, five, zero
however | even, have, it, restrict, not
when | it, he, to, after, necessary
four | three, five, two, one, zero
from | the, by, to, through, and
where | left, to, it, this, be
had | were, was, never, after, to
between | point, divide, verdicts, within, spatial
defense | defence, military, guard, agency, enemy
powers | sovereign, monarch, parliamentary, exercise, entities
shows | show, appearing, scenes, shown, exactly
road | roads, rail, highway, railroad, railway
accepted | valid, regard, ecumenical, universally, councils
scale | scales, measured, large, measuring, masses
magazine | book, newspaper, interview, magazines, comics
smith | young, sidney, james, joseph, adam
...
between | within, presence, spatial, forms, divide
were | have, had, early, made, though
time | clock, would, computers, teammate, slow
from | the, to, by, which, of
was | later, had, became, decade, returned
so | if, it, to, make, they
this | that, be, not, it, can
one | three, fi

however | have, separate, among, not, bureaucracies
has | been, that, than, it, popular
if | must, any, we, x, given
would | could, it, forced, intentions, but
four | three, one, two, five, zero
also | see, list, other, with, the
more | much, than, often, even, very
have | are, these, among, such, were
shown | show, genetically, be, is, nontrivial
question | questions, answer, answers, how, whether
know | you, we, knows, how, nothing
mainly | various, other, including, most, ethnic
older | family, median, age, household, females
mean | meaning, derives, equivalents, means, is
ice | glaciation, glaciers, soft, cream, rocky
rise | fall, early, grew, increasing, peaked
...
that | it, this, not, has, must
people | americans, natives, ethnic, politicians, births
not | that, be, should, do, any
this | a, that, of, changes, it
system | systems, operating, applications, peripheral, limitations
states | united, kingdom, nation, citizen, countries
it | that, if, because, is, this
be | should, ca

united | states, kingdom, countries, british, u
to | in, their, the, them, thus
four | one, seven, two, three, five
were | had, was, century, lost, the
system | systems, telephone, cellular, satellite, done
only | instead, are, when, not, than
see | list, article, external, of, also
more | than, much, but, less, often
file | files, user, format, data, formats
san | diego, francisco, antonio, california, juan
additional | requirements, coupes, nims, fourteen, example
resources | resource, arable, management, hydropower, natural
defense | defence, agency, u, armed, military
orthodox | church, catholic, orthodoxy, communion, apostolic
smith | thomas, keith, adam, sidney, jones
engineering | biology, engineers, discipline, genetics, design
...
this | that, the, is, thus, but
not | they, that, should, do, exist
be | can, it, are, not, that
four | one, three, two, five, seven
these | are, have, such, various, other
or | usually, a, is, are, such
most | are, particularly, produced, in, such
s

united | states, u, kingdom, british, navy
they | do, not, their, could, them
if | we, can, x, function, any
be | can, if, example, could, cannot
have | several, that, many, been, these
had | to, was, were, who, later
an | a, for, the, first, with
six | one, five, seven, three, four
report | reports, commission, discussion, review, documents
experience | experiences, sensations, spiritual, emphasis, mental
channel | channels, cable, fi, fm, broadcast
engine | engines, piston, combustion, fuel, thrust
liberal | party, conservative, socialist, wing, opposition
ice | glaciation, glacial, winter, glaciers, soft
operating | unix, dos, os, functionality, microsoft
account | accounts, billion, money, transactions, sources
...
where | he, in, left, moved, near
all | other, were, are, of, the
state | states, constitution, federal, legislature, u
i | me, you, t, my, myself
he | his, him, later, was, went
and | nine, of, one, two, eight
was | later, in, a, to, the
or | usually, such, used, are, u

than | less, larger, measured, total, not
be | could, can, cannot, not, if
most | many, although, known, as, distinct
five | two, zero, six, three, four
many | other, most, include, are, such
he | his, him, gave, was, later
which | the, typically, system, is, a
and | of, the, also, a, other
freedom | freedoms, liberty, rights, liberties, speech
proposed | proposal, proposes, approved, agreement, proposals
experience | experiences, visual, psychiatric, spiritual, mental
arts | art, sciences, college, academy, school
shown | is, shows, p, common, discovered
prince | dukes, son, eldest, queen, princes
applied | apply, theory, sciences, nonlinear, linguistics
report | reports, commission, evidence, documents, terrorism
...
all | are, these, only, were, and
on | s, february, the, july, a
who | born, singer, their, former, immigrant
some | such, although, many, most, still
which | the, a, consists, part, is
have | are, other, several, many, but
a | s, as, and, in, the
so | it, be, could, not

two | four, five, three, zero, one
world | nations, war, during, largest, olympics
other | are, associated, these, types, such
would | that, could, had, decided, concluded
from | the, south, north, comes, eastern
his | he, was, himself, her, later
no | not, myself, be, existence, actual
up | tight, back, side, kick, bowl
numerous | and, mostly, compiled, frequent, among
event | events, extinction, wins, occurred, final
report | reports, commission, evidence, detainees, documents
versions | version, windows, microsoft, xp, mac
marriage | marriages, marry, spouse, divorce, her
existence | belief, deities, exist, supernatural, ontological
hold | not, stance, person, that, deceive
award | awards, best, academy, oscar, emmy
...
four | five, three, two, one, nine
states | united, state, nations, u, countries
all | these, only, are, other, for
some | many, such, these, most, have
in | the, of, and, first, a
most | are, some, other, many, but
between | west, differences, eastern, the, separate

nine | one, two, eight, seven, five
of | the, three, and, two, by
first | was, second, in, had, third
on | off, night, november, the, and
the | of, in, other, by, is
were | they, had, the, built, guns
there | every, are, is, or, exceeding
had | was, were, first, him, took
shown | proven, mucilage, fibrous, riff, shows
joseph | latter, mormon, paul, smith, baptise
rise | decline, rising, towards, increased, grew
engine | engines, powered, mechanical, piston, motors
numerous | and, various, frequent, other, among
police | officers, paramilitary, guard, security, arrested
nobel | prize, laureate, physicist, chemist, physiology
derived | name, derives, etymology, meaning, word
...
b | d, one, politician, writer, composer
use | used, such, these, standard, inexpensive
states | united, countries, nations, state, kingdom
to | they, the, usually, when, a
i | you, me, t, we, am
after | before, year, during, weeks, was
other | many, of, and, the, usually
it | in, that, is, still, was
universe | 

up | out, back, get, they, were
use | used, such, common, standard, or
that | all, be, thus, say, which
it | is, so, be, exist, if
over | zero, four, years, from, six
which | be, form, is, a, called
war | army, troops, allied, soldiers, forces
eight | nine, one, four, six, three
behind | back, ahead, lateral, yankees, front
existence | universe, exist, ontological, belief, things
rise | decline, increased, rising, peaked, grew
road | highway, west, rail, roads, highways
file | files, user, data, format, formats
smith | sidney, roberts, young, miller, jr
know | we, you, knows, thing, nobody
woman | children, she, women, husband, birth
...
are | is, have, usually, or, these
four | two, three, five, seven, zero
who | him, they, was, his, their
zero | two, four, five, three, six
most | are, many, some, other, these
be | can, this, which, should, to
which | be, is, a, form, that
united | states, kingdom, countries, america, panama
marriage | marriages, marry, spouse, wife, her
road | highwa

their | they, them, themselves, tend, these
was | had, after, early, s, during
people | deaths, births, politicians, activists, americans
nine | one, eight, three, zero, two
seven | one, eight, five, four, six
war | troops, soviet, forces, soldiers, army
many | some, well, more, several, most
may | or, are, occur, usually, sometimes
smith | adam, jefferson, keith, joseph, gordon
animals | animal, prey, mammals, humans, predators
quite | very, than, like, rounded, always
engineering | engineers, technology, engineer, education, biomedical
award | awards, academy, oscars, nominations, best
report | reports, commission, review, january, notified
numerous | become, early, many, and, various
universe | cosmology, bang, galaxies, universes, cosmic
...
states | united, treaties, treaty, establishment, u
s | his, was, of, and, nine
d | b, one, seven, statesman, writer
three | two, four, one, nine, six
people | deaths, births, politicians, americans, activists
th | century, rd, nd, st, centurie

it | if, this, not, so, they
see | article, links, also, external, history
from | of, the, to, in, at
known | important, named, referred, called, originated
by | the, and, in, of, s
in | the, also, by, of, a
or | is, usually, are, may, other
are | all, other, is, these, have
centre | centres, largest, situated, city, gallery
shows | show, showing, theatrical, michaels, guest
paris | la, de, le, france, mie
derived | word, meaning, name, suffix, derives
professional | juris, engineering, bachelor, professionals, amateur
heavy | fog, storms, humidity, metal, low
prince | emperor, queen, reigned, empress, pretender
creation | created, conceived, themes, inspired, god
...
had | were, remained, meantime, was, afterwards
it | if, this, so, they, is
first | s, also, was, a, final
one | four, three, seven, two, nine
three | four, one, two, seven, zero
most | many, in, some, such, modern
he | his, him, himself, wrote, she
for | a, of, tools, allows, which
running | ran, run, weicker, grub, fini

but | to, however, had, they, the
one | eight, three, two, zero, seven
by | of, the, in, as, to
zero | three, five, four, two, one
can | be, requires, or, use, usually
this | the, not, as, thus, that
only | the, except, any, be, but
six | seven, five, four, eight, one
bbc | programmes, news, television, links, profile
rise | decline, increased, peaked, europe, fall
egypt | egyptian, israel, arab, pharaoh, cairo
shows | show, theatrical, guest, television, wow
square | approximately, sq, coastline, cubic, km
proposed | proposal, approved, agreement, theories, signatory
brother | son, father, daughter, sisters, mother
recorded | record, kush, khz, recordings, records
...
s | of, and, in, the, secretary
his | he, him, himself, father, friend
th | century, rd, nd, st, lost
often | usually, such, some, or, are
than | less, fewer, large, are, higher
would | could, be, to, when, even
between | differences, both, to, neutral, assert
was | until, first, in, later, became
account | estimates, ac

four | five, three, two, six, zero
was | a, the, in, had, after
about | zero, approximately, five, there, three
where | left, moved, apartments, separated, near
people | living, americans, demographics, citizens, activists
at | near, years, home, zero, in
all | are, these, not, other, there
they | themselves, their, not, do, these
alternative | such, based, used, type, abbreviations
troops | army, war, forces, soldiers, battle
paris | france, la, de, mie, acad
pope | papal, antipope, papacy, church, gregory
joseph | benjamin, biography, james, sympathetic, louis
frac | cdot, equation, theta, mathbf, cos
existence | disembodied, belief, demons, universe, philosophers
writers | novelists, authors, fiction, poets, births
...
d | b, composer, seven, eight, writer
during | after, the, was, period, continued
see | list, external, links, disambiguation, article
for | a, and, as, including, in
world | war, international, of, nine, ii
three | two, four, five, six, one
is | are, or, refers, also

## Visualizing the word vectors

Below we'll use T-SNE to visualize how our high-dimensional word vectors cluster together. T-SNE is used to project these vectors into two dimensions while preserving local stucture. Check out [this post from Christopher Olah](http://colah.github.io/posts/2014-10-Visualizing-MNIST/) to learn more about T-SNE and other ways to visualize high-dimensional data.

In [None]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt
from sklearn.manifold import TSNE

In [None]:
# getting embeddings from the embedding layer of our model, by name
embeddings = model.embed.weight.to('cpu').data.numpy()

In [None]:
viz_words = 600
tsne = TSNE()
embed_tsne = tsne.fit_transform(embeddings[:viz_words, :])

In [None]:
fig, ax = plt.subplots(figsize=(16, 16))
for idx in range(viz_words):
    plt.scatter(*embed_tsne[idx, :], color='steelblue')
    plt.annotate(int_to_vocab[idx], (embed_tsne[idx, 0], embed_tsne[idx, 1]), alpha=0.7)