Slow unsupervised training #107

chiayewken · 2018-08-15T08:51:28Z

Thank you for your library, the supervised finetuning works very well. However, when I try to train on unlabelled data ( model.fit(unlabeledX) ), the training is much slower (9s/it) compared to supervised training (1.7s/it). This is on one K80 gpu. I am not sure why unsupervised training is slower, as doesn't the supervised training tune the language model as well?

benleetownsend · 2018-08-15T11:18:57Z

You are correct that this is what the paper states, The default for this repo is to not tune the language model. It was found by us to have a negative effect at the dataset sizes we are interested in. (Hundreds or low thousands of samples.)

This said, I am extremely surprised that you are seeing this amount of slow down. An interesting test would be to change, in the config, the lm_loss_coef to non zero and see if you still see this slowdown.

Can I ask which variant of the model you are using. This could be possible with large batch sizes or sequence lengths as the language model computes a full projection for each token in each sequence. Are you running on both processors in the k80 or just one?

I will attempt to reproduce if you can give me some more info.

chiayewken · 2018-08-15T14:05:22Z

Thank you for the reply. I'm confused, isn't the default lm_loss_coef 0.5 for both unsupervised and supervised? I assumed that that meant the language model was being trained as well. I'm currently using the Classifier. I can't share the actual data, but I have been able to reproduce the slowdowns in this demo colab notebook:

https://colab.research.google.com/drive/1M0XAbGicO8-Vtn01Tw9UhCd4NCz43IiQ

I have tried varying the lm_loss_coef to no avail, unfortunately. I used a batch size of 10, but the default of 2 did not affect the timings.

madisonmay · 2018-08-15T14:54:45Z

Hi @chiayewken -- you may be encountering a performance bug that I think was part of our last PyPI release. Could you try running installing from source? I think you will likely see a significant speedup.

In the meantime I'll make sure to update our python package -- if you'd prefer there will be release 0.3.1 up on PyPI in ~30 minutes.

The default was originally 0.5 but was changed to 0.0 in recent commits as a result of quite a bit of empirical testing. The crossover point will vary dataset to dataset and task to task but expect to need a few thousand examples before it's useful to turn on lm_loss_coef.

Thanks for the bug report!

chiayewken · 2018-08-15T15:21:44Z

Ah I see! Thank you, I will try installing from source and check out the results.

chiayewken · 2018-08-15T17:04:45Z

Just checked, the performance is way faster now! Thank you! (I did notice I had to reduce batch size to avoid oom issues though). On another topic, with the default lm_loss_coef of 0.0, I'm wondering if the lm is being trained at this point because there is this:

    if target_dim is None:
        lm_loss_coef = 1.0

Edit: Sorry, I stand corrected, the checkpoints were saved, but I'm still unclear on the state of unsupervised training in this repository in general. I'll find out how the training goes when it's done!

madisonmay · 2018-08-15T18:21:21Z

Awesome, glad to hear it!

The point of setting lm_loss_coef to 0.0 by default is, in fact, to prevent fitting the language model. The softmax over tokens is expensive, so this helps speed things up. If you're working with a dataset of 5k examples or more, you can manually set this to something other than 0.0 (probably 0.5), but we've found that at low data volumes or with particularly difficult classification tasks the classifier performance degrades because the model is able to reduce loss better by simply modeling language better.

Unsupervised training of the LM only is fully supported by finetune, with the exception that we need to fix #83 before the auto-checkpointing functionality will work. Note that this is not something that was tested by the original OpenAI paper and was an addition after the fact -- we have good reason to believe it will help improve classifier performance with large amounts of unlabeled data but haven't run many experiments to prove this out.

Curious to hear how the training goes!

chiayewken · 2018-08-17T01:55:59Z

After leaving the language model to train unsupervised for a while, I was surprised to find that loading that model into a Classifier then fitting on my labelled data didn't improve my validation scores. On the other hand, I found that for purely supervised training, lm_loss_coef=0.5 produces better results than 0.0, but this is likely influenced by my dataset size like you said. My current best setup is a semi-supervised pseudo-labelling loop with lm_loss_coef=0.5, with a slight improvement over training only on labelled examples. I need to run more experiments... Many thanks!

madisonmay · 2018-08-19T17:59:18Z

Curious that the dual objectives helped while the unsupervised training did not. Will make sure to ping you if we find anything interesting while testing out unsupervised fit on our own internal datasets.

chiayewken · 2018-08-20T01:16:04Z

Great, thanks! This repository is awesome :)

chiayewken closed this as completed Aug 19, 2018

allentran mentioned this issue Sep 18, 2018

Speed issue #146

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow unsupervised training #107

Slow unsupervised training #107

chiayewken commented Aug 15, 2018

benleetownsend commented Aug 15, 2018

chiayewken commented Aug 15, 2018 •

edited

Loading

madisonmay commented Aug 15, 2018 •

edited

Loading

chiayewken commented Aug 15, 2018

chiayewken commented Aug 15, 2018 •

edited

Loading

madisonmay commented Aug 15, 2018 •

edited

Loading

chiayewken commented Aug 17, 2018

madisonmay commented Aug 19, 2018

chiayewken commented Aug 20, 2018

Slow unsupervised training #107

Slow unsupervised training #107

Comments

chiayewken commented Aug 15, 2018

benleetownsend commented Aug 15, 2018

chiayewken commented Aug 15, 2018 • edited Loading

madisonmay commented Aug 15, 2018 • edited Loading

chiayewken commented Aug 15, 2018

chiayewken commented Aug 15, 2018 • edited Loading

madisonmay commented Aug 15, 2018 • edited Loading

chiayewken commented Aug 17, 2018

madisonmay commented Aug 19, 2018

chiayewken commented Aug 20, 2018

chiayewken commented Aug 15, 2018 •

edited

Loading

madisonmay commented Aug 15, 2018 •

edited

Loading

chiayewken commented Aug 15, 2018 •

edited

Loading

madisonmay commented Aug 15, 2018 •

edited

Loading