## IMDb

At Fast.ai we have introduced a new module called fastai.text which replaces the torchtext library that was used in our 2018 dl1 course. The fastai.text module also supersedes the fastai.nlp library but retains many of the key functions.

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
from fastai.text import *
import html

The Fastai.text module introduces several custom tokens.

We need to download the IMDB large movie reviews from this site: http://ai.stanford.edu/~amaas/data/sentiment/
Direct link : [Link](http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz) and untar it into the PATH location. We use pathlib which makes directory traveral a breeze.

In [3]:
BOS = 'xbos'  # beginning-of-sentence tag
FLD = 'xfld'  # data field tag

PATH=Path('data/aclImdb/')

## Standardize format

In [4]:
CLAS_PATH=Path('data/imdb_clas/')
CLAS_PATH.mkdir(exist_ok=True)

LM_PATH=Path('data/imdb_lm/')
LM_PATH.mkdir(exist_ok=True)

In [5]:
trn_lm = np.load(LM_PATH/'tmp'/'trn_ids.npy')
val_lm = np.load(LM_PATH/'tmp'/'val_ids.npy')
itos = pickle.load(open(LM_PATH/'tmp'/'itos.pkl', 'rb'))

In [6]:
vs=len(itos)
vs,len(trn_lm)

(60002, 90000)

## wikitext103 conversion

In [7]:
em_sz,nh,nl = 400,1150,3

In [8]:
PRE_PATH = PATH/'models'/'wt103'
PRE_LM_PATH = PRE_PATH/'fwd_wt103.h5'

In [9]:
wgts = torch.load(PRE_LM_PATH, map_location=lambda storage, loc: storage)

We calculate the mean of the layer0 encoder weights. This can be used to assign weights to unknown tokens when we transfer to target IMDB LM.

In [10]:
enc_wgts = to_np(wgts['0.encoder.weight'])
row_m = enc_wgts.mean(0)

In [11]:
itos2 = pickle.load((PRE_PATH/'itos_wt103.pkl').open('rb'))
stoi2 = collections.defaultdict(lambda:-1, {v:k for k,v in enumerate(itos2)})

Before we try to transfer the knowledge from wikitext to the IMDB LM, we match up the vocab words and their indexes. 
We use the defaultdict container once again, to assign mean weights to unknown IMDB tokens that do not exist in wikitext103.

In [12]:
new_w = np.zeros((vs, em_sz), dtype=np.float32)
for i,w in enumerate(itos):
    r = stoi2[w]
    new_w[i] = enc_wgts[r] if r>=0 else row_m

We now overwrite the weights into the wgts odict.
The decoder module, which we will explore in detail is also loaded with the same weights due to an idea called weight tying.

In [13]:
wgts['0.encoder.weight'] = T(new_w)
wgts['0.encoder_with_dropout.embed.weight'] = T(np.copy(new_w))
wgts['1.decoder.weight'] = T(np.copy(new_w))

Now that we have the weights prepared, we are ready to create and start training our new IMDB language pytorch model!

## Language model

In [15]:
wd=1e-7
bptt=70
bs=52
opt_fn = partial(optim.Adam, betas=(0.8, 0.99))

In [16]:
len(trn_lm), len(val_lm)

(90000, 10000)

In [17]:
trn_dl = LanguageModelLoader(np.concatenate(trn_lm[:5000]), bs, bptt)
val_dl = LanguageModelLoader(np.concatenate(val_lm[:1000]), bs, bptt)
md = LanguageModelData(PATH, 1, vs, trn_dl, val_dl, bs=bs, bptt=bptt)

In [18]:
drops = np.array([0.25, 0.1, 0.2, 0.02, 0.15])*0.7

In [19]:
learner= md.get_model(opt_fn, em_sz, nh, nl, 
    dropouti=drops[0], dropout=drops[1], wdrop=drops[2], dropoute=drops[3], dropouth=drops[4])

In [20]:
learner.metrics = [accuracy]
learner.freeze_to(-1)

In [21]:
learner.model.load_state_dict(wgts)

In [22]:
lr=1e-3
lrs = lr

In [42]:
learner.float()
learner.model

SequentialRNN(
  (0): RNN_Encoder(
    (encoder): Embedding(60002, 400, padding_idx=1)
    (encoder_with_dropout): EmbeddingDropout(
      (embed): Embedding(60002, 400, padding_idx=1)
    )
    (rnns): ModuleList(
      (0): WeightDrop(
        (module): LSTM(400, 1150)
      )
      (1): WeightDrop(
        (module): LSTM(1150, 1150)
      )
      (2): WeightDrop(
        (module): LSTM(1150, 400)
      )
    )
    (dropouti): LockedDropout(
    )
    (dropouths): ModuleList(
      (0): LockedDropout(
      )
      (1): LockedDropout(
      )
      (2): LockedDropout(
      )
    )
  )
  (1): LinearDecoder(
    (decoder): Linear(in_features=400, out_features=60002, bias=False)
    (dropout): LockedDropout(
    )
  )
)

In [43]:
# Full precision running time: [00:52<00:00, 52.77s/it]
learner.fit(lrs/2, 1, wds=wd, use_clr=(32,2), cycle_len=1)

HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))

epoch      trn_loss   val_loss   accuracy                   
    0      5.154934   4.926881   0.233583  


[array([4.92688]), 0.23358301153859576]

In [54]:
learner.half()
learner.model

FP16(
  (module): SequentialRNN(
    (0): RNN_Encoder(
      (encoder): Embedding(60002, 400, padding_idx=1)
      (encoder_with_dropout): EmbeddingDropout(
        (embed): Embedding(60002, 400, padding_idx=1)
      )
      (rnns): ModuleList(
        (0): WeightDrop(
          (module): LSTM(400, 1150)
        )
        (1): WeightDrop(
          (module): LSTM(1150, 1150)
        )
        (2): WeightDrop(
          (module): LSTM(1150, 400)
        )
      )
      (dropouti): LockedDropout(
      )
      (dropouths): ModuleList(
        (0): LockedDropout(
        )
        (1): LockedDropout(
        )
        (2): LockedDropout(
        )
      )
    )
    (1): LinearDecoder(
      (decoder): Linear(in_features=400, out_features=60002, bias=False)
      (dropout): LockedDropout(
      )
    )
  )
)

In [55]:
# Half precision running time: [00:48<00:00, 48.35s/it]
learner.fit(lrs/2, 1, wds=wd, use_clr=(32,2), cycle_len=1)

HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))

epoch      trn_loss   val_loss   accuracy                   
    0      5.252502   5.026446   0.233449  


[array([5.02645]), 0.2334489490132074]

### Larger batch sizes

In [14]:
wd=1e-7
bptt=70
bs=100
opt_fn = partial(optim.Adam, betas=(0.8, 0.99))

In [15]:
len(trn_lm), len(val_lm)

(90000, 10000)

In [16]:
trn_dl = LanguageModelLoader(np.concatenate(trn_lm[:5000]), bs, bptt)
val_dl = LanguageModelLoader(np.concatenate(val_lm[:1000]), bs, bptt)
md = LanguageModelData(PATH, 1, vs, trn_dl, val_dl, bs=bs, bptt=bptt)

In [17]:
drops = np.array([0.25, 0.1, 0.2, 0.02, 0.15])*0.7
lr=1e-3
lrs = lr

In [18]:
learner= md.get_model(opt_fn, em_sz, nh, nl, 
    dropouti=drops[0], dropout=drops[1], wdrop=drops[2], dropoute=drops[3], dropouth=drops[4])
learner.metrics = [accuracy]
learner.freeze_to(-1)
learner.model.load_state_dict(wgts)

In [23]:
# Full precision running time: [00:46<00:00, 46.51s/it]
learner.float()
learner.fit(lrs/2, 1, wds=wd, use_clr=(32,2), cycle_len=1)

HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))

epoch      trn_loss   val_loss   accuracy                   
    0      5.253021   5.020184   0.230187  



[array([5.02018]), 0.23018650514514824]

In [24]:
learner= md.get_model(opt_fn, em_sz, nh, nl, 
    dropouti=drops[0], dropout=drops[1], wdrop=drops[2], dropoute=drops[3], dropouth=drops[4])
learner.metrics = [accuracy]
learner.freeze_to(-1)
learner.model.load_state_dict(wgts)

In [25]:
# Half precision running time: [00:39<00:00, 39.92s/it]
learner.half()
learner.fit(lrs/2, 1, wds=wd, use_clr=(32,2), cycle_len=1)

HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))

epoch      trn_loss   val_loss   accuracy                   
    0      5.336145   5.095292   0.230515  



[array([5.09529]), 0.23051503026171735]

### BS=200

In [29]:
bs=200

In [30]:
learner= md.get_model(opt_fn, em_sz, nh, nl, 
    dropouti=drops[0], dropout=drops[1], wdrop=drops[2], dropoute=drops[3], dropouth=drops[4])
learner.metrics = [accuracy]
learner.freeze_to(-1)
learner.model.load_state_dict(wgts)

In [32]:
# Full precision running time: [00:46<00:00, 46.83s/it]
learner.float()
learner.fit(lrs/2, 1, wds=wd, use_clr=(32,2), cycle_len=1)

HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))

epoch      trn_loss   val_loss   accuracy                   
    0      5.115724   4.912355   0.235955  



[array([4.91236]), 0.23595530931886874]

In [31]:
# Half precision running time: [00:40<00:00, 40.61s/it]
learner.half()
learner.fit(lrs/2, 1, wds=wd, use_clr=(32,2), cycle_len=1)

HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))

epoch      trn_loss   val_loss   accuracy                   
    0      5.326368   5.098684   0.230012  



[array([5.09868]), 0.23001193529681155]