# AWD-LSTM (no pretraining)

In this notebook we train an AWD-LSTM model for the proxy task without any language model pretraining.  This notebook has been adapted from the fast.ai [ULMFit tutorial](https://github.com/fastai/course-nlp/blob/master/nn-vietnamese.ipynb).

In [None]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

from fastai import *
from fastai.text import *
import glob
import eval_models

In [None]:
bs=48

In [None]:
torch.cuda.set_device(0)

In [None]:
data_path = Config.data_path()

In [None]:
name = 'bscore_lm'
path = data_path/name
path.mkdir(exist_ok=True, parents=True)

### Train Classifier

In [None]:
train_df = pd.read_csv(path/'train64.csv')
valid_df = pd.read_csv(path/'valid64.csv')
test_df = pd.read_csv(path/'test64.csv')

In [None]:
basicTokenizer = Tokenizer(pre_rules=[], post_rules=[])
data_clas = TextDataBunch.from_df(path, train_df, valid_df, tokenizer=basicTokenizer, bs=bs, num_workers=1)

In [None]:
len(data_clas.vocab.itos)

In [None]:
learn_c = text_classifier_learner(data_clas, AWD_LSTM, pretrained=False, drop_mult=0.5, 
                                  metrics=[accuracy, FBeta(average='macro', beta=1)])

In [None]:
learn_c.lr_find()

In [None]:
learn_c.recorder.plot()

In [None]:
lr = 3e-4

In [None]:
learn_c.fit_one_cycle(1, lr, moms=(0.8,0.7))

In [None]:
learn_c.fit_one_cycle(2, lr, moms=(0.8,0.7))

In [None]:
learn_c.fit_one_cycle(2, lr, moms=(0.8,0.7))

In [None]:
mdl_path = path/'models'
mdl_path.mkdir(exist_ok=True, parents=True)

In [None]:
learn_c.save(mdl_path/'awdlstm_clas-verify')

In [None]:
learn_c = learn_c.load(mdl_path/'awdlstm_clas')

### Evaluate Classifier

Evaluate on the proxy task -- classifying fixed-length chunks of bootleg score features.

In [None]:
data_clas_test = TextDataBunch.from_df(path, train_df, test_df, tokenizer=basicTokenizer, bs=bs, num_workers=1)

In [None]:
learn_c.validate(data_clas_test.valid_dl)

Evaluate on the original task -- classifying pages of sheet music.  We can evaluate our models in two ways:
- applying the model to a variable length sequence
- applying the model to multiple fixed-length windows and averaging the predictions

First we evaluate the model on variable length inputs.  Report results with and without applying priors.

In [None]:
train_fullpage_df = pd.read_csv(path/'train.fullpage.csv')
valid_fullpage_df = pd.read_csv(path/'valid.fullpage.csv')
test_fullpage_df = pd.read_csv(path/'test.fullpage.csv')

In [None]:
(acc, acc_with_prior), (f1, f1_with_prior) = eval_models.calcAccuracy_fullpage(learn_c, path, train_fullpage_df, valid_fullpage_df, test_fullpage_df)
(acc, acc_with_prior), (f1, f1_with_prior)

Now we evaluate the model by considering multiple fixed-length windows and averaging the predictions.

In [None]:
test_ensemble_df = pd.read_csv(path/'test.ensemble64.csv')

In [None]:
(acc, acc_with_prior), (f1, f1_with_prior) = eval_models.calcAccuracy_fullpage(learn_c, path, train_fullpage_df, valid_fullpage_df, test_ensemble_df, ensembled=True)
(acc, acc_with_prior), (f1, f1_with_prior)

### Error Analysis

In [None]:
interp = ClassificationInterpretation.from_learner(learn_c)

In [None]:
interp.plot_confusion_matrix(figsize=(12,12))