# Basic Model Setup
I am skipping the exploration of tokenization methods etc, as I tried to complete this in org mode but ultimately crashed my session trying to run the model on my laptop. For ease of use, I'm replicating this in a notebook and will run it in colab.

In [1]:
%%capture
pip install --upgrade fastai

In [19]:
from fastai.text.all import *
path=untar_data(URLs.IMDB)

In [3]:
get_imdb = partial(get_text_files, folders = ['train', 'test', 'unsup'])

dls_lm = DataBlock(
    blocks=TextBlock.from_folder(path, is_lm=True),
    get_items=get_imdb, splitter=RandomSplitter(0.1)
).dataloaders(path,path=path,bs=128,seq_len=80)

In [11]:
learn = language_model_learner(dls_lm,
                               AWD_LSTM,
                               drop_mult=0.3,
                               metrics=[accuracy,Perplexity()]).to_fp16()

In [12]:
learn.fit_one_cycle(1, 2e-2)

epoch,train_loss,valid_loss,accuracy,perplexity,time
0,4.121666,3.921251,0.299427,50.463512,21:43


This model takes a very long time to run -- so it is good to save our model state periodically. I wonder how I download that from colab. Will have to see.

In [21]:
learn.save('1epoch')
mpath = Path('/root/.fastai/data/imdb/models/1epoch.pth')
shutil.copyfile(mpath, Path('/root/1epoch.pth'))

Path('/root/1epoch.pth')

In [22]:
learn=learn.load('1epoch')
learn.unfreeze()
learn.fit_one_cycle(10,2e-3)

epoch,train_loss,valid_loss,accuracy,perplexity,time
0,3.890222,3.781897,0.316748,43.899223,22:40
1,3.814314,3.722776,0.324182,41.379108,23:02
2,3.741488,3.671556,0.329733,39.313019,23:08
3,3.674833,3.635363,0.334039,37.915607,23:19
4,3.609073,3.612023,0.337057,37.040897,23:08
5,3.548353,3.594466,0.339019,36.396259,22:53
6,3.495124,3.582324,0.340893,35.956993,23:00
7,3.456281,3.57667,0.341848,35.754269,23:05
8,3.415241,3.576691,0.342263,35.755047,23:12
9,3.382459,3.578967,0.342127,35.836506,22:53


In [23]:
learn.save('all_epoch')
mpath = Path('/root/.fastai/data/imdb/models/all_epoch.pth')
shutil.copyfile(mpath, Path('/root/all_epoch.pth'))

Path('/root/all_epoch.pth')

Once we complete this, we can save the whole model *except for the final layer that converts activations to probabilities of picking each token*. The model without the final layer is called the *encoder*. We can use this as the foundation for, in this example, a sentiment analyzer.

# Using our Model to Generate Text

Our model is trained to guess the next word of the sentence, so we can use it to write reviews. We just need to give it something to start with.

In [25]:
TEXT = "I hated this movie because"
N_WORDS = 40
N_SENTENCE = 2
preds = [learn.predict(TEXT, N_WORDS, temperature=0.75)
         for _ in range(N_SENTENCE)]

print("\n".join(preds))

i hated this movie because of the reason that it was a TV movie , perhaps because there was a fairly good review in here . In fact i was generally very disappointed with the movie because the acting was , i think
i hated this movie because it did n't make me laugh . It was totally deranged . When i first saw it , i could n't believe how stupid it was . i really mean that , and i do n't know how


In [29]:
# Save the Encoder
learn.save_encoder('finetuned')

# Creating the Classifier DataLoaders
We're going to fine-tune our model for the task of sentiment analysis of imdb reviews. To do so, we need to make a new DataLoaders with a `CategoryBlock` indicating the positivity/negativity of the reviews.

In [26]:
dls_clas = DataBlock(
    blocks = (TextBlock.from_folder(path, vocab=dls_lm.vocab), CategoryBlock),
    get_y = parent_label,
    get_items = partial(get_text_files, folders=['train','test']),
    splitter=GrandparentSplitter(valid_name='test')
).dataloaders(path,path=path,bs=128,seq_len=72)

In [27]:
dls_clas.show_batch(max_n=3)

Unnamed: 0,text,category
0,"xxbos xxmaj match 1 : xxmaj tag xxmaj team xxmaj table xxmaj match xxmaj bubba xxmaj ray and xxmaj spike xxmaj dudley vs xxmaj eddie xxmaj guerrero and xxmaj chris xxmaj benoit xxmaj bubba xxmaj ray and xxmaj spike xxmaj dudley started things off with a xxmaj tag xxmaj team xxmaj table xxmaj match against xxmaj eddie xxmaj guerrero and xxmaj chris xxmaj benoit . xxmaj according to the rules of the match , both opponents have to go through tables in order to get the win . xxmaj benoit and xxmaj guerrero heated up early on by taking turns hammering first xxmaj spike and then xxmaj bubba xxmaj ray . a xxmaj german xxunk by xxmaj benoit to xxmaj bubba took the wind out of the xxmaj dudley brother . xxmaj spike tried to help his brother , but the referee restrained him while xxmaj benoit and xxmaj guerrero",pos
1,xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad,pos
2,xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad,neg


Two important observations:
- We've removed `is_lm = True`. This tells the model we have "regular" labeled data and that we're *not* using the next token(s) as labels.
- We've passed the `vacab` created for the language model to the datablock. This is to make sure we're using identical correspondence of tokens to indices. If we failed to do so, the already-trained language model would not make sense to this new model and the fine tuning step would be useless.

## Batch Sizes for Language Models
PyTorch DataLoaders need to collate the items in a batch into a single tensor. We saw this with the image models as well. In that case, we could resize, crop, zoom, etc., without harming our model. In this case, we might rightly assume that essential information would be lost by so distorting our language input.

One technique used for images *is* still applicable here: padding. We want to expand the shortest texts to make them the same size. This is accomplished by appending a special padding token. The size of the largest document in each batch will be the target size.

This is all done automatically when using a `TextBlock` with `is_ml` set to `False`.

## Defining and Running the Model

In [30]:
learn = text_classifier_learner(dls_clas, AWD_LSTM, drop_mult=0.5,
                                metrics=accuracy).to_fp16()

# Also need to learn the encoder we trained previously
learn = learn.load_encoder('finetuned')

And now we can train the model!

In [31]:
learn.fit_one_cycle(1,2e-2)

epoch,train_loss,valid_loss,accuracy,time
0,0.590335,0.371967,0.8362,01:08


In [32]:
learn.freeze_to(-2) # all except last two param groups
learn.fit_one_cycle(1, slice(1e-2/(2.6**4), 1e-2))

epoch,train_loss,valid_loss,accuracy,time
0,0.400393,0.289962,0.87852,01:14


In [33]:
learn.freeze_to(-3) # unfreeze a little more
learn.fit_one_cycle(1, slice(5e-3/(2.6**4), 5e-3))

epoch,train_loss,valid_loss,accuracy,time
0,0.283482,0.221306,0.91184,01:32


In [34]:
learn.unfreeze() # unfreeze whole model
learn.fit_one_cycle(2, slice(1e-3/(2.6**4), 1e-3))

epoch,train_loss,valid_loss,accuracy,time
0,0.246715,0.211035,0.91688,01:53
1,0.225883,0.208368,0.91792,01:54


This model was state-of-the-art only a few years ago.