<a href="https://colab.research.google.com/github/amandakonet/ulmfit/blob/main/ULMFiT_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#ULMFiT Example

Based on tutorials from:
* https://docs.fast.ai/tutorial.text.html
* https://github.com/floleuerer/fastai_ulmfit
* https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson3-imdb.ipynb

This will be a basic tutorial of how to use ULMFiT on the IMBD dataset. For further exploration into the capabilities of ULMFiT, I highly recommend the third linked website above.



## Environment Setup

Before running the notebook code, be sure to switch the runtime to GPU. (Edit > Notebook Settings > Hardware Accelerator = GPU)

In [None]:
# if you are using google colab, you will need to run this line & restart runtime
# before importing fastai below
!pip install fastai --upgrade

In [1]:
from fastai.text.all import *

Read in IMDB dataset and store in a `DataLoaders`

In [None]:
path = untar_data(URLs.IMDB)
#path.ls() # see information stored in this folder
dls = TextDataLoaders.from_folder(path, valid='test', bs=8)
dls.show_batch()

Above, we see that our data has been processed and tokenized, as evidence by the special tokens added:
* xxbos: beginning of text
* xxmaj: next work is capitalized

## Fine-tune language model

Recall the steps of ULMFiT: 
1. pretrain large language model
2. fine-tune on domain data
3. add classification task head

The first step is already done for us on an AWD_LSTM model with Wiki-text data. The step we start with is 2, fine-tune on domain data.

Below, we initialize a `Learner` with our data, the AWD_LSTM model, the metrics, and a default weight decay of 0.1 and drop_mult of 0.1

In [4]:
learn_lm = language_model_learner(dls, AWD_LSTM,
                                  metrics=[accuracy, Perplexity()],
                                  path=path,
                                  wd=0.1, drop_mult=0.1).to_fp16()

To begin fine-tuning, all we need to do is set the number of epochs and the intial learning rate. NOte that the pretrained `Learner` is in a frozen state and only the head of the model will be trained. 

For the sake of time, I will only be training 4 epochs. More will be needed for better performance.

In [6]:
learn_lm.unfreeze()
learn_lm.fit_one_cycle(4, 1e-3)

Once this is done, we save all of our model except the final layer that converts activations to probabilities of picking each token in our vocabulary.

We save this final layer with `save_encoder`

In [None]:
learn_lm.save('finetuned_lm')
learn_lm.save_encoder('finetuned_lm_enc')

## Fine-tune text classifier

Like before, we gather data again. This time, we need to include only vocab that was included in the fine-tuned model.

In [None]:
dls_class = TextDataLoaders.from_folder(path, valid='test', text_vocab=dls.vocab)

Once again, we create a learner...

In [None]:
learn_class = text_classifier_learner(dls_class, AWD_LSTM, drop_mult=0.5, metrics=accuracy)

...and we load the previous encoder that we just saved!

In [None]:
learn_class = learn_class.load_encoder('finetuned_lm_enc')

Next, we fine-tune by freezing all but a few layers at a time until we unfreeze all. This is the gradual unfreezing step discussed in the paper. Note that we start with an initial learning rate here.

In [None]:
learn_class.fit_one_cycle(1, 2e-2)

Freeze all but last two layer. Note that we are changing the learning rate for these layers compared to the previous by making them smaller! This is discriminative fine-tuning.

In [None]:
learn_class.freeze_to(-2)
learn_class.fit_one_cycle(1, slice(1e-2/(2.6**4),1e-2))

Unfreeze all but last three layers

In [None]:
learn_class.freeze_to(-3)
learn_class.fit_one_cycle(1, slice(5e-3/(2.6**4),5e-3))

Finally, do entire model

In [None]:
learn_class.unfreeze()
learn_class.fit_one_cycle(2, slice(1e-3/(2.6**4),1e-3))

And save!

In [None]:
learn_class.save('learn_class_unfreezed_final')