# Quick Start: Training an IMDb Sentiment model with ULMFiT
In this notebook, we will train a sentiment classifier on a sample of the popular IMDb data

1. Reading and viewing the IMDb data
2. Getting your data readu for modeling
3. Fine-tuning a language model
4. Building a classifier

In [2]:
from fastai.text import *

In [3]:
import torch

Contrary to images in Computer vision, text can't directly be transformed into numbers to be fed into a model. The first thing we need to do is to preprocess our data so that we change the raw texts to list of words, or tokens (a step that is called tokenization) then transform these tokens into numbers (a step called numericalization). 

These numbers are then passed to embedding layers that will convert them into arrays of floats before passing them through a model. 

Steps: 
1. Get your data preprocessed and ready to use
2. Create a language model with pretrained weights that you can fine-tune to your dataset
3. Create other models such as classifiers on top of the encoder of the language model

We will show an example using a sample of IMDb data - classifying a review as negative or positive given the text.

In [8]:
# Getting our sample from FastAI
path = untar_data(URLs.IMDB_SAMPLE)

In [9]:
# Checking the data with pandas DF
df = pd.read_csv(path/'texts.csv')
df.head()

Unnamed: 0,label,text,is_valid
0,negative,Un-bleeping-believable! Meg Ryan doesn't even ...,False
1,positive,This is a extremely well-made film. The acting...,False
2,negative,Every once in a long while a movie will come a...,False
3,positive,Name just says it all. I watched this movie wi...,False
4,negative,This movie succeeds at being one of the most u...,False


In [10]:
# Using FASTAI to process the data

# Language model
data_lm = TextLMDataBunch.from_csv(path, 'texts.csv')

# Classication Dataset
data_clas = TextClasDataBunch.from_csv(path, 'texts.csv', vocab=data_lm.train_ds.vocab, bs=32)

In [13]:
# This dataset is to predict the classify the sentiment given a text
data_clas.show_batch(5)

text,target
"xxbos xxmaj xxunk xxmaj victor xxmaj xxunk : a xxmaj review \n \n xxmaj you know , xxmaj xxunk xxmaj victor xxmaj xxunk is like sticking your hands into a big , xxunk xxunk of xxunk . xxmaj it 's warm and xxunk , but you 're not sure if it feels right . xxmaj try as i might , no matter how warm and xxunk xxmaj xxunk xxmaj",negative
"xxbos xxmaj now that xxmaj che(2008 ) has finished its relatively short xxmaj australian cinema run ( extremely limited xxunk screen in xxmaj sydney , after xxunk ) , i can xxunk join both xxunk of "" xxmaj at xxmaj the xxmaj movies "" in taking xxmaj steven xxmaj soderbergh to task . \n \n xxmaj it 's usually satisfying to watch a film director change his style /",negative
"xxbos xxmaj this film sat on my xxmaj tivo for weeks before i watched it . i dreaded a self - indulgent xxunk flick about relationships gone bad . i was wrong ; this was an xxunk xxunk into the screwed - up xxunk of xxmaj new xxmaj yorkers . \n \n xxmaj the format is the same as xxmaj max xxmaj xxunk ' "" xxmaj la xxmaj ronde",positive
"xxbos xxmaj many neglect that this is n't just a classic due to the fact that it 's the first xxup 3d game , or even the first xxunk - up . xxmaj it 's also one of the first xxunk games , one of the xxunk definitely the first ) truly claustrophobic games , and just a pretty well - xxunk gaming experience in general . xxmaj with graphics",positive
"xxbos xxmaj to review this movie , i without any doubt would have to quote that memorable scene in xxmaj tarantino 's "" xxmaj pulp xxmaj fiction "" ( xxunk ) when xxmaj jules and xxmaj vincent are talking about xxmaj mia xxmaj wallace and what she does for a living . xxmaj jules tells xxmaj vincent that the "" xxmaj only thing she did worthwhile was pilot "" .",negative


In [14]:
# Reminder: The LM is just used to predict the next work - so is an unsupervised generative model
data_lm.show_batch(5)

idx,text
0,"understand and i fell in love with xxmaj the xxmaj bourne xxmaj ultimatum before it had reached the xxunk ! i do n't think i have ever watched such an xxunk made , and gripping film , especially an action film . xxmaj since i usually shy away from action and thriller type movies , this was such great news to me . xxmaj ultimatum is one of the most"
1,"where he 's been and he says "" over there , "" pointing to the part of the xxunk never shown by the camera , before saying "" xxmaj hey , xxmaj mr. xxmaj turner , wait up ! "" and running off screen ( xxmaj mr. turner being another character who left ) xxmaj oh well - maybe there will be an e true xxmaj hollywood story on this"
2,"uncovers a big nest of ants . xxmaj later on we learn that , probably due to different sorts of xxunk used in the past , their bite became xxunk . xxmaj some people get bitten and xxunk to the hospital and it takes ages for the xxunk of the hospital to figure out what 's going on . xxmaj robert xxmaj xxunk figures it out first and then you"
3,"to xxmaj kill was a good solid psychological murder mystery . xxmaj the script is xxunk & slow at times but it likes to focus on the character 's so you really know them , the entire first twenty minutes is just developing xxmaj kate as a character before she is suddenly killed off , then the film switches it 's attentions to xxmaj liz & no one else gets"
4,""" xxmaj xxunk feel , "" the story taking place in a high - society xxmaj newport environment , in the days leading up to a wedding that is xxunk with trouble . \n \n xxmaj missed connections , wrong choices , and xxunk xxunk with social and family xxunk present quite a soap opera , but the quality of the writing , xxmaj koltai 's direction , and"


In [15]:
# Saving the language model ds and classification model ds
data_lm.save('data_lm_export.pkl')
data_clas.save('data_clas_export.pkl')

## Fine-tuning a language model
We can use the ```data_lm``` object we created earlier to fine-tune a pretrained language model. We will use a ```AWD_LSTM``` architecture. We can create a learner object that will directly create a model, download the pre-trained weights and be ready for fine-tuning

In [20]:
# Instantiating our model
learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.5).to_fp16()

In [21]:
# training for one epoch
learn.fit_one_cycle(1, 1e-2)

epoch,train_loss,valid_loss,accuracy,time
0,4.317602,3.909073,0.285119,00:04


In [22]:
# Let's save the model, and save the encoder of the model to use for classifying
learn.save('ft')
learn.save_encoder('ft_enc')

## Building a Classifier
We will be using the ```data_clas``` object which is the dataset for classifying the sentiment. And since we an instantiating a new model with ```AWD_LSTM```, 

We will be loading the ```encoder``` from our pre-trained language model

In [23]:
learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5).to_fp16()
learn.load_encoder('ft_enc') # grabbing our encoder

In [24]:
learn.fit_one_cycle(1, 1e-2)

epoch,train_loss,valid_loss,accuracy,time
0,0.64908,0.65982,0.621891,01:15


In [25]:
# Unfreezing to fine-tune more
learn.unfreeze()

In [26]:
learn.fit_one_cycle(3, slice(1e-4, 1e-2))

epoch,train_loss,valid_loss,accuracy,time
0,0.502766,0.412625,0.850746,01:20
1,0.427475,0.410037,0.825871,01:23
2,0.339051,0.351541,0.850746,01:24


# Conclusion
As you can see the typical workflow is the following:
1. Preprocess your datasets
    * One for the language model
    * One for classification
2. Get a pre-trained language model (using any architecture of your choosing)
3. Fine-tune this pre-trained language model on the language model dataset 
    * Find Optimal Learning Rate
    * ```fit_one_cycle``` until optimal
    * Unfreeze to train all layers
        * Use Slice, and same ```fit_one_cycle```
4. Save the model and the encoder of that language model
5. Create a new model (has to be same pre-trained model/architecture)
6. Load the encoder
7. Repeat step 3, but now for the text classification