# Basic Model Setup
I am skipping the exploration of tokenization methods etc, as I tried to complete this in org mode but ultimately crashed my session trying to run the model on my laptop. For ease of use, I'm replicating this in a notebook and will run it in colab.

In [1]:
%%capture
pip install --upgrade fastai

In [19]:
from fastai.text.all import *
path=untar_data(URLs.IMDB)

In [3]:
get_imdb = partial(get_text_files, folders = ['train', 'test', 'unsup'])

dls_lm = DataBlock(
    blocks=TextBlock.from_folder(path, is_lm=True),
    get_items=get_imdb, splitter=RandomSplitter(0.1)
).dataloaders(path,path=path,bs=128,seq_len=80)

In [11]:
learn = language_model_learner(dls_lm,
                               AWD_LSTM,
                               drop_mult=0.3,
                               metrics=[accuracy,Perplexity()]).to_fp16()

In [12]:
learn.fit_one_cycle(1, 2e-2)

epoch,train_loss,valid_loss,accuracy,perplexity,time
0,4.121666,3.921251,0.299427,50.463512,21:43


This model takes a very long time to run -- so it is good to save our model state periodically. I wonder how I download that from colab. Will have to see.

In [21]:
learn.save('1epoch')
mpath = Path('/root/.fastai/data/imdb/models/1epoch.pth')
shutil.copyfile(mpath, Path('/root/1epoch.pth'))

Path('/root/1epoch.pth')

In [None]:
learn=learn.load('1epoch')
learn.unfreeze()
learn.fit_one_cycle(10,2e-3)

epoch,train_loss,valid_loss,accuracy,perplexity,time
0,3.890222,3.781897,0.316748,43.899223,22:40


Once we complete this, we can save the whole model *except for the final layer that converts activations to probabilities of picking each token*. The model without the final layer is called the *encoder*. We can use this as the foundation for, in this example, a sentiment analyzer.

# Using our Model to Generate Text

Our model is trained to guess the next word of the sentence, so we can use it to write reviews. We just need to give it something to start with.

In [None]:
TEXT = "I hated this movie because"
N_WORDS = 40
N_SENTENCE = 2
preds = [learn.predict(TEXT, N_WORDS, temperature=0.75)
         for _ in range(N_SENTENCES)]

print("\n".join(preds))

# Creating the Classifier DataLoaders
We're going to fine-tune our model for the task of sentiment analysis of imdb reviews. To do so, we need to make a new DataLoaders with a `CategoryBlock` indicating the positivity/negativity of the reviews.

In [None]:
dls_clas = DataBlock(
    blocks = (TextBlock.from_folder(path, vocab=dls_lm.vocab), CategoryBlock),
    get_y = parent_label,
    get_items = partial(get_text_files, folders=['train','test']),
    splitter=GrandparentSplitter(valid_name='test')
).dataloaders(path,path=path,bs=128,seq_len=72)

In [None]:
dls_clas.show_batch(max_n=3)

Two important observations:
- We've removed `is_lm = True`. This tells the model we have "regular" labeled data and that we're *not* using the next token(s) as labels.
- We've passed the `vacab` created for the language model to the datablock. This is to make sure we're using identical correspondence of tokens to indices. If we failed to do so, the already-trained language model would not make sense to this new model and the fine tuning step would be useless.

## Batch Sizes for Language Models
PyTorch DataLoaders need to collate the items in a batch into a single tensor. We saw this with the image models as well. In that case, we could resize, crop, zoom, etc., without harming our model. In this case, we might rightly assume that essential information would be lost by so distorting our language input.

One technique used for images *is* still applicable here: padding. We want to expand the shortest texts to make them the same size. This is accomplished by appending a special padding token. The size of the largest document in each batch will be the target size.

This is all done automatically when using a `TextBlock` with `is_ml` set to `False`.

## Defining and Running the Model

In [None]:
learn = text_classifier_learner(dls_clas, AWD_LSTM, drop_mult=0.5,
                                metrics=accuracy).to_fp16()

# Also need to learn the encoder we trained previously
learn = learn.load_encoder('finetuned')

And now we can train the model!

In [None]:
learn.fit_one_cycle(1,2e-2)

In [None]:
learn.freeze_to(-2) # all except last two param groups
learn.fit_one_cycle(1, slice(1e-2/(2.6**4), 1e-2))

In [None]:
learn.freeze_to(-3) # unfreeze a little more
learn.fit_one_cycle(1, slice(5e-3/(2.6**4), 5e-3))

In [None]:
learn.unfreeze() # unfreeze whole model
learn.fit_one_cycle(2, slice(1e-3/(2.6**4), 1e-3))

This model was state-of-the-art only a few years ago.