# Naive word2vec

This task can be formulated very simply. Follow this [paper](https://arxiv.org/pdf/1411.2738.pdf) and implement word2vec like a two-layer neural network with matrices $W$ and $W'$. One matrix projects words to low-dimensional 'hidden' space and the other - back to high-dimensional vocabulary space.

![word2vec](https://i.stack.imgur.com/6eVXZ.jpg)

You can use TensorFlow/PyTorch (numpy too, if you love to calculate gradients on your own and want some extra points, but don't forget to numerically check your gradients) and code from your previous task. Again: you don't have to implement negative sampling (you may reduce your vocabulary size for faster computation).

**Results of this task**:
 * trained word vectors (mention somewhere, how long it took to train)
 * plotted loss (so we can see that it has converged)
 * function to map token to corresponding word vector
 * beautiful visualizations (PCE, T-SNE), you can use TensorBoard and play with your vectors in 3D (don't forget to add screenshots to the task)
 * qualitative evaluations of word vectors: nearest neighbors, word analogies

**Extra:**
 * quantitative evaluation:
   * for intrinsic evaluation you can find datasets [here](https://aclweb.org/aclwiki/Analogy_(State_of_the_art))
   * for extrincis evaluation you can use [these](https://medium.com/@dataturks/rare-text-classification-open-datasets-9d340c8c508e)

Also, you can find any other datasets for quantitative evaluation. If you chose to do this, please use the same datasets across tasks 3, 4, 5 and 6.

Again. It is **highly recommended** to read this [paper](https://arxiv.org/pdf/1411.2738.pdf)

Example of visualization in tensorboard:
https://projector.tensorflow.org

Example of 2D visualisation:

![2dword2vec](https://www.tensorflow.org/images/tsne.png)

If you struggle with something, ask your neighbor. If it is not obvious for you, probably someone else is looking for the answer too. And in contrast, if you see that you can help someone - do it! Good luck!

In [5]:
from skipgram import SkipGram, SkipGramBatcher
import torch
import gc

#### Constants

In [6]:
VOCAB_SIZE = 5000
BATCH_SIZE = 50
EMBEDDINGS_DIM = 100
EPOCH_NUM = 2
LOGS_PERIOD = 100

#### Load corpus into batcher

In [10]:
text = []
with open('./data/text8', 'r') as text8:
    text = text8.read().split()

# text = ['first', 'used', 'against', 'early', 'working', 'class', 'radicals', 'including', 'class', 'other']
batcher = SkipGramBatcher(corpus=text, vocab_size=VOCAB_SIZE, batch_size=BATCH_SIZE)

# free memory
text = []
gc.collect()

0

In [11]:
loss_history = []
model = SkipGram(VOCAB_SIZE, EMBEDDINGS_DIM)
loss_fun = torch.nn.NLLLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)

In [12]:
for epoch in range(EPOCH_NUM):
    for i, (context, target) in enumerate(batcher):
        tensor_context = torch.from_numpy(context).type(torch.LongTensor)
        tensor_target = torch.from_numpy(target).type(torch.LongTensor)

        model.zero_grad()

        log_probs = model(tensor_context)
        loss = loss_fun(log_probs, tensor_target)
        loss.backward()
        optimizer.step()

        if i % LOGS_PERIOD == 0:
            print(f'Loss on step {i}: {loss}')
            loss_history.append(loss.data)

Loss on step 0: 4.043684005737305
Loss on step 100: 4.048892498016357
Loss on step 200: 4.153383731842041
Loss on step 300: 4.055236339569092
Loss on step 400: 3.9818148612976074
Loss on step 500: 4.088200569152832
Loss on step 600: 3.8901748657226562
Loss on step 700: 3.968705415725708
Loss on step 800: 4.036896705627441
Loss on step 900: 3.9744069576263428
Loss on step 1000: 4.024870872497559
Loss on step 1100: 4.062151908874512
Loss on step 1200: 4.134037017822266
Loss on step 1300: 4.009776592254639


KeyboardInterrupt: 