<a href="https://colab.research.google.com/github/arnaujc91/experiments/blob/main/EmbeddingDropout_new.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install fastai==2.0.16

In [None]:
from fastai.text.all import *

As you can see the class `EmbeddingDropout` is using `emb` (`self.encoder` for the AWD_LSTM class) just to fetch its attributes: *weight, scale_grad_by_freq, norm_type*, etc. It is much easier to sublcass `nn.Embedding` instead, then the attributes we are looking for are *already* inside the class and we do not have to create an instance of `nn.Embedding` and pass it to the constructor of `EmbeddingDropout` as currently is happening.

In [None]:
# CURRENT CODE
class EmbeddingDropout(Module):
    "Apply dropout with probability `embed_p` to an embedding layer `emb`."

    def __init__(self, emb, embed_p):
      # self.emb is going to be an instance of the class 'nn.Embedding' 
        self.emb,self.embed_p = emb,embed_p

    def forward(self, words, scale=None):
        if self.training and self.embed_p != 0:
            size = (self.emb.weight.size(0),1)
            mask = dropout_mask(self.emb.weight.data, size, self.embed_p)
            masked_embed = self.emb.weight * mask
        else: masked_embed = self.emb.weight
        if scale: masked_embed.mul_(scale)
        return F.embedding(words, masked_embed, ifnone(self.emb.padding_idx, -1), self.emb.max_norm,
                           self.emb.norm_type, self.emb.scale_grad_by_freq, self.emb.sparse)
        
# MY PROPOSAL
class EmbeddingDropout(nn.Embedding):
    "Apply dropout with probability `embed_p` to an embedding layer `emb`."
    def __init__(self, *args, embed_p, **kwargs):
      # Instead of passing an instance, that has to be previously created, from 'nn.Embedding', 
      # we directly inherit from 'nn.Embedding' such that what previously was 'self.emb' now is simpliy 'self'.
      # Therefore we avoid the redundancy of creating previously an instance of 'nn.Embedding' 
      # and passing it as an argument to the constructor 
        super().__init__(*args, **kwargs)
        self.embed_p = embed_p

    def forward(self, words, scale=None):
        if self.training and self.embed_p != 0:
            size = (self.weight.size(0),1)
            mask = dropout_mask(self.weight.data, size, self.embed_p)
            masked_embed = self.weight * mask
        else: masked_embed = self.weight
        if scale: masked_embed.mul_(scale)
        return F.embedding(words, masked_embed, ifnone(self.padding_idx, -1), self.max_norm,
                       self.norm_type, self.scale_grad_by_freq, self.sparse)


**IMPORTANT**: Now there is just one layer `self.encoder` and not twice as before `self.encoder` and `self.encoder_dp`. Therefore the tests fail because they always expect two layers. These changes will affect also other parts of the code that expect two layers instead of one.

## What is the problem with the current code?

### 1. First issue

First of all the function `flatten_model`, which is used to create the Hooks for a given model is not going to work as expected

In [None]:
awd_lstm =  AWD_LSTM(vocab_sz=3,
                  emb_sz=5,
                  n_hid=6,
                  n_layers=2)

In [None]:
awd_lstm

AWD_LSTM(
  (encoder): Embedding(3, 5, padding_idx=1)
  (encoder_dp): EmbeddingDropout(
    (emb): Embedding(3, 5, padding_idx=1)
  )
  (rnns): ModuleList(
    (0): WeightDropout(
      (module): LSTM(5, 6, batch_first=True)
    )
    (1): WeightDropout(
      (module): LSTM(6, 5, batch_first=True)
    )
  )
  (input_dp): RNNDropout()
  (hidden_dps): ModuleList(
    (0): RNNDropout()
    (1): RNNDropout()
  )
)

You can see in the following line how the layer `Embedding` is **duplicated**.

In [None]:
modules = flatten_model(awd_lstm); modules

[Embedding(3, 5, padding_idx=1),
 Embedding(3, 5, padding_idx=1),
 LSTM(5, 6, batch_first=True),
 ParameterModule(),
 LSTM(6, 5, batch_first=True),
 ParameterModule(),
 RNNDropout(),
 RNNDropout(),
 RNNDropout()]

This is because `flatten_model` goes through all the layers and checks if they have children. The first layer is `encoder` and it does not have children, but the second layer is `encoder_dp` which indeed has children and the cildren is precisely `encoder`:

In [None]:
print('encoder has children: ', awd_lstm.encoder.has_children )
print('encoder_dp has children: ' ,awd_lstm.encoder_dp.has_children )

encoder has children:  False
encoder_dp has children:  True


And because the children of `encoder_dp` is `encoder` this layer appears twice when we use `flatten_model`.

In [None]:
next(awd_lstm.encoder_dp.children()) == awd_lstm.encoder

True

### 2. Second issue

`flatten_model` does not contain the layer `EmbeddingDropout` and this is going to be a problem because when we use the forward method of `AWD_LSTM` this forward method calls the forward method of `EmbeddingDropout`  and not the one from `nn.Embedding`. As a consequence the hooks are not fired!

In [None]:
def hook_fn(m, i, o):
  print(f"Working for layer: -- {m._get_name()} --\n")

In [None]:
awd_lstm.encoder.register_forward_hook(hook_fn)
awd_lstm(torch.randint(3, (1,4)))

tensor([[[-4.6479e-02,  3.3278e-05, -3.1192e-02,  5.2587e-02, -5.8732e-02],
         [-9.3273e-02, -3.7378e-03, -2.3787e-02,  5.5440e-02, -8.4186e-02],
         [-1.3828e-01, -6.5186e-03, -6.1914e-03,  5.5414e-02, -9.4652e-02],
         [-1.7925e-01, -7.7681e-03,  1.1929e-02,  5.5593e-02, -1.0033e-01]]],
       grad_fn=<TransposeBackward0>)

Eventhough I explicitly hooked the layer `encoder` its hooks do not get fired because its forward method is not called in the forward method of `AWD_LSTM`.
Instead `AWD_LSTM` calls the forward method for `encoder_dp`:

In [None]:
awd_lstm.encoder_dp.register_forward_hook(hook_fn)
awd_lstm(torch.randint(3, (1,4)))

Working for layer: -- EmbeddingDropout --



tensor([[[-0.1902, -0.0110,  0.0290,  0.0780, -0.1060],
         [-0.2059, -0.0089,  0.0358,  0.0850, -0.1116],
         [-0.2215, -0.0065,  0.0431,  0.0940, -0.1162],
         [-0.2369, -0.0037,  0.0466,  0.0938, -0.1204]]],
       grad_fn=<TransposeBackward0>)

## Solutions

I just see two possible solutions:
1. Modify `flatten_model`
2. Modifty `EmbeddingDropout`

So far I decided to modify the first one because I also find the code more compact. On the other hand, the modifications I propose here will have consquences in other parts of the code that assume `encoder_dp` and `encoder`, both, to be there. 