<a href="https://colab.research.google.com/github/bcollister01/course-nlp/blob/master/Ben_nn_imdb_more.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Language Modeling & Sentiment Analysis of IMDB movie reviews

In [None]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

from fastai import *
from fastai.text import *

In [None]:
# bs=48
bs=128

In [None]:
path = untar_data(URLs.IMDB)

## Language model

Starting with the Wikipedia language model. IMDB has a directory called unsup for unsupervised - movie reviews that don't have a rating attached. Initially, doesn't sound useful for sentiment classification.
Semi-supervised learning - trying to do something useful with the unlabelled data. Most companies aren't aware of this. For NLP classification, we can add unsupervised text to our language model.

In [None]:


data_lm = (TextList.from_folder(path)
            .filter_by_folder(include=['train', 'test', 'unsup']) 
            .split_by_rand_pct(0.1, seed=42)
            .label_for_lm()           
            .databunch(bs=bs, num_workers=1))

len(data_lm.vocab.itos),len(data_lm.train_ds)

In [None]:
data_lm.save('lm_databunch')

In [None]:
data_lm = load_data(path, 'lm_databunch', bs=bs)

.to_fp16() tells fastai to use mixed precision training. Rather than using 32 bit floats, use 16 bit floats. Until recently, people thought 32 bit was minimum to be useful. Deep learning models are meant to be approximate - we could maybe use less precise values. Most CPUs don't support half precision floating point but recently, some GPUs starting supporting it. In some of the GPUs, it runs 8-10x faster using half precision floats.

In some parts of calculation, half precision fine but single precision needed for other calculations like loss function or multiplying gradient by small numbers. So we have to use mixed precision - do some calculations in half precision and some in single precision.

drop_mult: Dropout is where we delete some of the activations at random, helps the model to generalise better - it can't learn to have a single activation to a single thing. 

AWD_LSTM - Regular RNN - allows for dropout at lots of different points in the model. There are actually 5 different types of dropout in this model. Luckily, documentation recommends values of these hyperparameters and for fastai, they set it so you can control with drop_multi the multiplicative factor applied to these default hyperparameter values. If you are currently overfitting, you increase dropout and vice versa.

In [None]:
learn_lm = language_model_learner(data_lm, AWD_LSTM, drop_mult=1.).to_fp16()

In [None]:
lr = 1e-2
lr *= bs/48

In [None]:
learn_lm.fit_one_cycle(1, lr, moms=(0.8,0.7))

epoch,train_loss,valid_loss,accuracy,time
0,4.604046,4.189002,0.278265,18:36


Because we have more data, we can train for much longer without overfitting language model.

In [None]:
learn_lm.unfreeze()
learn_lm.fit_one_cycle(10, lr/10, moms=(0.8,0.7))

The encoder part of the language model learner is the bit we want to keep - we aren't interested in the classifier part for predicting the next word for the task of sentiment classification.

In [None]:
learn_lm.save('fine_tuned_10')
learn_lm.save_encoder('fine_tuned_enc_10')

## Classifier

In [None]:
data_clas = (TextList.from_folder(path, vocab=data_lm.vocab)
             .split_by_folder(valid='test')
             .label_from_folder(classes=['neg', 'pos'])
             .databunch(bs=bs, num_workers=1))

In [None]:
data_clas.save('imdb_textlist_class')

In [None]:
data_clas = load_data(path, 'imdb_textlist_class', bs=bs, num_workers=1)

In [None]:
learn_c = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5).to_fp16()
learn_c.load_encoder('fine_tuned_enc_10')
learn_c.freeze()

In [None]:
lr=2e-2
lr *= bs/48

In [None]:
learn_c.fit_one_cycle(1, lr, moms=(0.8,0.7))

epoch,train_loss,valid_loss,accuracy,time
0,0.241523,0.190128,0.9266,01:16


In [None]:
learn_c.save('1')

In [None]:
learn_c.freeze_to(-2)
learn_c.fit_one_cycle(1, slice(lr/(2.6**4),lr), moms=(0.8,0.7))

epoch,train_loss,valid_loss,accuracy,time
0,0.204818,0.161675,0.93864,02:00


In [None]:
learn_c.save('2nd')

In [None]:
learn_c.freeze_to(-3)
learn_c.fit_one_cycle(1, slice(lr/2/(2.6**4),lr/2), moms=(0.8,0.7))

epoch,train_loss,valid_loss,accuracy,time
0,0.179451,0.144047,0.94584,02:56


In [None]:
learn_c.save('3rd')

In [None]:
learn_c.unfreeze()
learn_c.fit_one_cycle(2, slice(lr/10/(2.6**4),lr/10), moms=(0.8,0.7))

epoch,train_loss,valid_loss,accuracy,time
0,0.120063,0.145701,0.947,03:24
1,0.087303,0.152943,0.94808,03:09


In [None]:
learn_c.save('clas')