# Papers

In this notebook we will get into 4 (four) main papers about **Dynaminc Vocabulary**, **Transfer Learning** and **Neural Machine Translation** that I have selected from differents places.

For each paper I divide in 5 (five) topics of explanation, (**Goals**, **Approach**, **Experiments**, **Results** and **Thoughts**). 

<img src="https://www.nature.com/scitable/content/ne0000/ne0000/ne0000/ne0000/14239512/ECS_scientific-papers_ksm.jpg" />

Above is a list the papers that we will review here, but there is a list with more papers in this repository you can find in **RESOURCES-ENG.md**. The list of papers is always growing. 


|Ano|Titulo|Autor|Link|
|----------------|-------------------------------|----------------|-------------------------------|
|2019|Hierarchical Transfer Learning Architecture for Low-Resource Neural Machine Translation|Gongxu Luo, et al|[`PDF`](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8805098)|
|2018|Transfer Learning in Multilingual Neural Machine Translation with Dynamic Vocabulary|Surafel M. Lakew, et al.|[`PDF`](https://arxiv.org/pdf/1811.01137.pdf)|
|2018|Twitter Sentiment Analysis using Dynamic Vocabulary|Hrithik Katiyar, et al.|[`PDF`](https://ieeexplore.ieee.org/document/8722407)|
|2017|Neural Response Generation with Dynamic Vocabularies|Yu Wu, et al.|[`PDF`](https://arxiv.org/pdf/1711.11191.pdf)|

## 0. Dependencies

In [68]:
import torch
import torch.nn as nn
import numpy as np

## 1. How it works

## 2. Hierarchical Transfer Learning Architecture for Low-Resource Neural Machine Translation

### Abstract

We propose a method to **transfer knowledge** across neural machine translation (NMT) models by means of a shared **dynamic vocabulary**. Our approach allows to extend an initial model for a given language pair to cover new languages by **adapting its vocabulary as long as new data become available** (i.e., introducing new vocabulary items if they are not included in the initial model). The parameter transfer mechanism is evaluated in two scenarios: i) to adapt a trained single language NMT system to work with a new language pair and ii) to continuously add new language pairs to grow to a multilingual NMT system. In both the scenarios our goal is to
improve the translation performance, while minimizing the training convergence time. Preliminary experiments spanning five languages with different training data sizes (i.e., 5k and 50k parallel sentences) show a significant performance **gain ranging from +3.85 up to +13.63 BLEU** in different language directions. Moreover, when compared with training an NMT model from scratch, **our transfer-learning approach** allows us to reach higher performance after training up to 4% of the total training steps.

### 2.1. Papers Goals

Explore transfer-learning technique in **Multilingual Neural Machine Translation** using dynamic vocabularies (e.g German to English, Italy to English).

![Image](media/MNTL_Diagram.png)

The idea is work like Google Translation but of course with less vocabulary.

### 2.2. Approach


![Image](media/Approach.png)

• *progAdapt*, in which progressive updates are made on the assumption that new target NMT task data become available for one language direction at a time (i.e., new language directions are covered sequentially). In this condition, our goal is to maximize performance on the new target tasks by taking advantage of parameters learned in their parent task;

• *progGrow*, in which progressive updates are made on the same assumption of receiving new target task
data as in progAdapt, but with the additional goal of preserving the performance of the previous language directions.

For the **Dynamic Vocabulary** the approach simply keeps the intersection (same entries). At training time, these new entries are randomly initialized, while the intersecting items maintain the embeddings of the former model.

Example, let's imagine the our vocabulary has only **2** words (*hello* and *world*). Below you can see a code how our Word Embedding should be at the first.

In [69]:
word2index = {"hello": 0, "world": 1}
embeds = nn.Embedding(2, 5) # 2 words in vocab, 5 dimensional embeddings
hello_embed = embeds(torch.tensor([word2index["hello"]], dtype=torch.long))
world_embed = embeds(torch.tensor([word2index["world"]], dtype=torch.long))
print(hello_embed, world_embed)

tensor([[-0.4538, -1.6900, -0.6329,  0.7748, -0.0754]],
       grad_fn=<EmbeddingBackward>) tensor([[ 1.5907,  0.0649,  1.3250, -0.3813,  1.4161]],
       grad_fn=<EmbeddingBackward>)


Now you want add one more word (*keyboard*), in the proposed approach, give a new word the *Embedding* should keep our weights already trained and concatenate with a new weight for *keyboard* initialized randomly.

In [70]:
word2index = {"hello": 0, "world": 1, "keyboard": 2} # updated vocabulary
concat_embeds = torch.FloatTensor([
    hello_embed.detach().numpy()[0], # old embed
    world_embed.detach().numpy()[0], # old embed
    np.random.rand(5) # new embed initialized randomly
])
embeds = nn.Embedding.from_pretrained(concat_embeds) # 3 words in vocab, 5 dimensional embeddings
hello_embed = embeds(torch.tensor([word2index["hello"]], dtype=torch.long))
world_embed = embeds(torch.tensor([word2index["world"]], dtype=torch.long))
keyboard_embed = embeds(torch.tensor([word2index["keyboard"]], dtype=torch.long))
print(hello_embed, world_embed, keyboard_embed)

tensor([[-0.4538, -1.6900, -0.6329,  0.7748, -0.0754]]) tensor([[ 1.5907,  0.0649,  1.3250, -0.3813,  1.4161]]) tensor([[0.3461, 0.9191, 0.0903, 0.3031, 0.0520]])


### 2.3. Experiments

Their experimental setting includes the init model language pair **(German-English)** and three additional language pairs (**Italian-English**, **Romanian-English**, and **Dutch-English**) for testing the proposed approaches.

The baseline models, referred to as Bi-NMT, are separately trained from scratch in a bi-directional setting (i.e., source ↔ target). In addition, we report scores from a multilinugal (M-NMT) model trained with the concatenation of all available data in each training stage.

### 2.4. Results

![Image](media/ResultGrowAdapted.png)