<a href="https://colab.research.google.com/github/arnaujc91/experiments/blob/main/encoder_and_encoder_dp_are_redundant.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install torch==1.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
!pip install fastai==2.1.4

In [2]:
from fastai.text.all import *

Load the weights of the pretrained model used for AWD_LSTM:

In [11]:
url = URLs.WT103_FWD
model_path = untar_data(url , c_key='model')
fnames = [list(model_path.glob(f'*.{ext}'))[0] for ext in ['pth', 'pkl']]
wgts = torch.load(fnames[0], map_location = lambda storage,loc: storage)
list(wgts.keys())

['0.encoder.weight',
 '0.encoder_dp.emb.weight',
 '0.rnns.0.weight_hh_l0_raw',
 '0.rnns.0.module.weight_ih_l0',
 '0.rnns.0.module.weight_hh_l0',
 '0.rnns.0.module.bias_ih_l0',
 '0.rnns.0.module.bias_hh_l0',
 '0.rnns.1.weight_hh_l0_raw',
 '0.rnns.1.module.weight_ih_l0',
 '0.rnns.1.module.weight_hh_l0',
 '0.rnns.1.module.bias_ih_l0',
 '0.rnns.1.module.bias_hh_l0',
 '0.rnns.2.weight_hh_l0_raw',
 '0.rnns.2.module.weight_ih_l0',
 '0.rnns.2.module.weight_hh_l0',
 '0.rnns.2.module.bias_ih_l0',
 '0.rnns.2.module.bias_hh_l0',
 '1.decoder.weight',
 '1.decoder.bias']

Both layers `0.encoder.weight` and `0.encoder_dp.emb.weight` have the **same** weights. 

In [6]:
torch.all(wgts['0.encoder.weight'] == wgts['0.encoder_dp.emb.weight'])

tensor(True)

These weights come already from the pretrained model, so in the pretrained model there is already this redundancy. This issue is what I was trying to explain when I was talking about duplication of the layers, i.e. both embedding and embedding dropout should be in the same class and one of the two layers should be removed.

One of these two layers can be removed without problem and the AWS_LSTM model will keep still working fine. Then the splitting functions have to be modified accordinlgy:

In [None]:
def awd_lstm_lm_split(model):
    "Split a RNN `model` in groups for differential learning rates."
    groups = [nn.Sequential(rnn, dp) for rnn, dp in zip(model[0].rnns, model[0].hidden_dps)]
    # RIGHT NOW:
    groups = L(groups + [nn.Sequential(model[0].encoder, model[0].encoder_dp, model[1])])
    # WHAT WOULD BE WITH THE LAYER REMOVED:
    groups = L(groups + [nn.Sequential(model[0].encoder, model[1])])
    return groups.map(params)

# Cell
def awd_lstm_clas_split(model):
    "Split a RNN `model` in groups for differential learning rates."
    # RIGHT NOW:
    groups = [nn.Sequential(model[0].module.encoder, model[0].module.encoder_dp)]
    # WHAT WOULD BE WITH THE LAYER REMOVED:
    groups = [nn.Sequential(model[0].module.encoder)]
    groups += [nn.Sequential(rnn, dp) for rnn, dp in zip(model[0].module.rnns, model[0].module.hidden_dps)]
    groups = L(groups + [model[1]])
    return groups.map(params)


### Why I kept both layers in the code then?

This is because the method [language_model_learner](https://docs.fast.ai/text.learner#language_model_learner) automatically loads the pretrained weights from some server and this weights definitely need the model to have **both** layers if you want to load them. If you could modify this file (I guess I do not have access to it) then I can fully clean the code removing this redundant layer. What I need is just a new set of weights:

In [7]:
del wgts['0.encoder_dp.emb.weight']
new_weights = wgts

In [10]:
list(new_weights.keys())

['0.encoder.weight',
 '0.rnns.0.weight_hh_l0_raw',
 '0.rnns.0.module.weight_ih_l0',
 '0.rnns.0.module.weight_hh_l0',
 '0.rnns.0.module.bias_ih_l0',
 '0.rnns.0.module.bias_hh_l0',
 '0.rnns.1.weight_hh_l0_raw',
 '0.rnns.1.module.weight_ih_l0',
 '0.rnns.1.module.weight_hh_l0',
 '0.rnns.1.module.bias_ih_l0',
 '0.rnns.1.module.bias_hh_l0',
 '0.rnns.2.weight_hh_l0_raw',
 '0.rnns.2.module.weight_ih_l0',
 '0.rnns.2.module.weight_hh_l0',
 '0.rnns.2.module.bias_ih_l0',
 '0.rnns.2.module.bias_hh_l0',
 '1.decoder.weight',
 '1.decoder.bias']

Otherwise the AWD_LSTM class will keep having these two lines of code:

````
self.encoder = EmbeddingDropout(vocab_sz, emb_sz, embed_p=embed_p, padding_idx=pad_token)
self.encoder_dp = self.encoder
````

As you can see now I solve the issue with declaring `self.encoder_dp` to be equal to `self.encoder` but I am never using it in the code. So it is a layer that it is there but is never used whatsoever. So first of all it does nothing and second it can confuse people who are looking at the code if they try to figure out what this layer is doing there.

Therefore I would suggest that if you can update the weights of the pretrained model I can clean further the code to make full sense of this PR.

Thanks a lot for all this great work Jeremy! :)