# ULMFit - Universal Language Model Fine Tuning

In this ML exercise, we try to achieve SOTA performance on sentiment analysis (text classification) by first training an encoder including word embeddings and then utilizing the base encoder (which is a language i.e. **text generation** model ) for the encoder section of our classifier

In [None]:
from fastai.imports import *
import fastai
from fastai.text.all import *

In [None]:
path = untar_data(URLs.IMDB)

In [None]:
path.ls()

(#8) [Path('/root/.fastai/data/imdb/README'),Path('/root/.fastai/data/imdb/train'),Path('/root/.fastai/data/imdb/tmp_lm'),Path('/root/.fastai/data/imdb/imdb.vocab'),Path('/root/.fastai/data/imdb/tmp_clas'),Path('/root/.fastai/data/imdb/unsup'),Path('/root/.fastai/data/imdb/test'),Path('/root/.fastai/data/imdb/models')]

In [None]:
get_imdb = partial(get_text_files,folders= ['train','test','unsup'])

In [None]:
dls_lm = DataBlock(blocks=[TextBlock.from_folder(path, is_lm=True)],
                   get_items= get_imdb, splitter=RandomSplitter(0.1),
                   ).dataloaders(path, path=path, bs=128, seq_len=80)

In [None]:
lm_learner = language_model_learner(dls_lm, AWD_LSTM, metrics=[accuracy,Perplexity()], path=path, wd=0.1).to_fp16()

In [None]:
lm_learner.fit_one_cycle(1,1e-3)

epoch,train_loss,valid_loss,accuracy,perplexity,time


epoch,train_loss,valid_loss,accuracy,perplexity,time
0,4.440277,4.141591,0.283518,62.902824,32:44


FileNotFoundError: ignored

In [None]:
lm_learner.save('/content/drive/MyDrive/ULMFIT/1epoch')

Path('/content/drive/MyDrive/ULMFIT/1epoch.pth')

In [None]:
lm_learner.load('/content/drive/MyDrive/ULMFIT/1epoch')

<fastai.text.learner.LMLearner at 0x7fdb70ddba10>

In [None]:
lm_learner.unfreeze()
saver = SaveModelCallback(monitor='perplexity', comp=np.less, fname='/content/drive/MyDrive/ULMFIT/lmmodel',with_opt=True)
lm_learner.fit_one_cycle(20, 1e-3, cbs=saver)

epoch,train_loss,valid_loss,accuracy,perplexity,time
0,4.26165,3.98022,0.297787,53.528812,34:39


Better model found at epoch 0 with perplexity value: 53.528812408447266.


epoch,train_loss,valid_loss,accuracy,perplexity,time
0,4.26165,3.98022,0.297787,53.528812,34:39
1,4.138636,3.885958,0.306548,48.713585,34:30
2,4.035411,3.818329,0.312622,45.528061,34:28
3,3.96086,3.783111,0.315613,43.952568,34:38
4,3.941249,3.763521,0.31736,43.099934,34:39
5,3.905838,3.74822,0.319246,42.445461,34:43
6,3.89224,3.734706,0.320855,41.875721,34:39
7,3.870196,3.72213,0.321934,41.352364,34:46
8,3.853431,3.714268,0.323184,41.028561,34:42
9,3.822227,3.703742,0.324271,40.598923,34:46


Better model found at epoch 1 with perplexity value: 48.713584899902344.
Better model found at epoch 2 with perplexity value: 45.52806091308594.
Better model found at epoch 3 with perplexity value: 43.95256805419922.
Better model found at epoch 4 with perplexity value: 43.09993362426758.
Better model found at epoch 5 with perplexity value: 42.44546127319336.
Better model found at epoch 6 with perplexity value: 41.8757209777832.
Better model found at epoch 7 with perplexity value: 41.35236358642578.
Better model found at epoch 8 with perplexity value: 41.028560638427734.
Better model found at epoch 9 with perplexity value: 40.59892272949219.
Better model found at epoch 10 with perplexity value: 40.32080841064453.
Better model found at epoch 11 with perplexity value: 39.88899612426758.
Better model found at epoch 12 with perplexity value: 39.386478424072266.
Better model found at epoch 13 with perplexity value: 39.02882385253906.
Better model found at epoch 14 with perplexity value: 38.7

In [None]:
lm_learner.save_encoder('/content/drive/MyDrive/ULMFIT/encoder')

In [None]:
PRIMER = "I liked this movie"
N_WORDS=42
N_SENTENCES = 2
preds = [lm_learner.predict(PRIMER, N_WORDS, temperature=0.75) for _ in range(N_SENTENCES)]
preds

["i liked this movie and i enjoyed it . a lot of the acting was pretty good and the story is interesting . The one thing that i liked most was Jackie Chan 's performance . i must say that his acting is",
 "i liked this movie . i do n't think this movie was better than the original . But i guess it . If you liked Alien , Predator , Predator and Alien , you 'll get a bigger picture of"]

In [None]:
dls_clas = DataBlock((TextBlock.from_folder(path, vocab=dls_lm.vocab), CategoryBlock),
                     get_items = partial(get_text_files, folders=['train','test']),
                     get_y = parent_label,
                     splitter = GrandparentSplitter(valid_name='test')).dataloaders(path, path=path, bs=128, seq_ln=75)

In [None]:
dls_clas.show_batch(max_n=5)

Unnamed: 0,text,category
0,"xxbos xxmaj match 1 : xxmaj tag xxmaj team xxmaj table xxmaj match xxmaj bubba xxmaj ray and xxmaj spike xxmaj dudley vs xxmaj eddie xxmaj guerrero and xxmaj chris xxmaj benoit xxmaj bubba xxmaj ray and xxmaj spike xxmaj dudley started things off with a xxmaj tag xxmaj team xxmaj table xxmaj match against xxmaj eddie xxmaj guerrero and xxmaj chris xxmaj benoit . xxmaj according to the rules of the match , both opponents have to go through tables in order to get the win . xxmaj benoit and xxmaj guerrero heated up early on by taking turns hammering first xxmaj spike and then xxmaj bubba xxmaj ray . a xxmaj german xxunk by xxmaj benoit to xxmaj bubba took the wind out of the xxmaj dudley brother . xxmaj spike tried to help his brother , but the referee restrained him while xxmaj benoit and xxmaj guerrero",pos
1,"xxbos xxmaj this movie was recently released on xxup dvd in the xxup us and i finally got the chance to see this hard - to - find gem . xxmaj it even came with original theatrical previews of other xxmaj italian horror classics like "" xxunk "" and "" beyond xxup the xxup darkness "" . xxmaj unfortunately , the previews were the best thing about this movie . \n\n "" zombi 3 "" in a bizarre way is actually linked to the infamous xxmaj lucio xxmaj fulci "" zombie "" franchise which began in 1979 . xxmaj similarly compared to "" zombie "" , "" zombi 3 "" consists of a threadbare plot and a handful of extremely bad actors that keeps this ' horror ' trash barely afloat . xxmaj the gore is nearly non - existent ( unless one is frightened of people running around with",neg
2,"xxbos i thought that xxup rotj was clearly the best out of the three xxmaj star xxmaj wars movies . i find it surprising that xxup rotj is considered the weakest installment in the xxmaj trilogy by many who have voted . xxmaj to me it seemed like xxup rotj was the best because it had the most profound plot , the most suspense , surprises , most xxunk the ending ) and definitely the most episodic movie . i personally like the xxmaj empire xxmaj strikes xxmaj back a lot also but i think it is slightly less good than than xxup rotj since it was slower - moving , was not as episodic , and i just did not feel as much suspense or emotion as i did with the third movie . \n\n xxmaj it also seems like to me that after reading these surprising reviews that",pos
3,"xxbos xxmaj polish film maker xxmaj walerian xxmaj borowczyk 's xxmaj la xxmaj bête ( french , 1975 , aka xxmaj the xxmaj beast ) is among the most controversial and brave films ever made and a very excellent one too . xxmaj this film tells everything that 's generally been hidden and denied about our nature and our sexual nature in particular with the symbolism and silence of its images . xxmaj the images may look wild , perverse , "" sick "" or exciting , but they are all in relation with the lastly mentioned . xxmaj sex , desire and death are very strong and primary things and dominate all the flesh that has a human soul inside it . xxmaj they interest and xxunk us so powerfully ( and by our nature ) that they are considered scary , unacceptable and something too wild to be",pos
4,"xxbos xxmaj heavy - handed moralism . xxmaj writers using characters as mouthpieces to speak for themselves . xxmaj predictable , plodding plot points ( say that five times fast ) . a child 's imitation of xxmaj britney xxmaj spears . xxmaj this film has all the earmarks of a xxmaj lifetime xxmaj special reject . \n\n i honestly believe that xxmaj jesus xxmaj xxunk and xxmaj julia xxmaj xxunk set out to create a thought - provoking , emotional film on a tough subject , exploring the idea that things are not always black and white , that one who is a criminal by definition is not necessarily a bad human being , and that there can be extenuating circumstances , especially when one puts the well - being of a child first . xxmaj however , their earnestness ends up being channeled into preachy dialogue and trite",neg


In [None]:
classifier = text_classifier_learner(dls_clas, AWD_LSTM, seq_len=75, drop_mult=0.5, metrics=accuracy).to_fp16()

In [None]:
classifier.load_encoder('/content/drive/MyDrive/ULMFIT/encoder')

<fastai.text.learner.TextLearner at 0x7fd7d0844990>

In [None]:
classifier.fit_one_cycle(1, 2e-2)

epoch,train_loss,valid_loss,accuracy,time
0,0.217902,0.168112,0.93632,01:59


In [None]:
classifier.freeze_to(-2)
classifier.fit_one_cycle(1, slice(1e-2/(2.6**4),1e-2))

epoch,train_loss,valid_loss,accuracy,time
0,0.205713,0.159581,0.94152,02:09


In [None]:
classifier.freeze_to(-3)
classifier.fit_one_cycle(1, slice(5e-3/(2.6**4),5e-3))

epoch,train_loss,valid_loss,accuracy,time
0,0.178594,0.150851,0.94496,02:55


In [None]:
classifier.unfreeze()
stahp = EarlyStoppingCallback(patience=2)
classifier.fit_one_cycle(10, slice(1e-3/(2.6**4),1e-3), cbs=stahp)

epoch,train_loss,valid_loss,accuracy,time
0,0.093066,0.150736,0.94596,03:32
1,0.087449,0.176858,0.94168,03:32
2,0.080246,0.157911,0.94668,03:32


No improvement since epoch 0: early stopping


In [None]:
classifier.save('/content/drive/MyDrive/ULMFIT/classifier')

Path('/content/drive/MyDrive/ULMFIT/classifier.pth')