Skip to content
This repository has been archived by the owner on Feb 25, 2022. It is now read-only.

Efficient Sampling #32

Closed
sdtblck opened this issue Sep 8, 2020 · 6 comments · Fixed by #43
Closed

Efficient Sampling #32

sdtblck opened this issue Sep 8, 2020 · 6 comments · Fixed by #43
Assignees
Labels
feature request A feature that isn't implemented yet. research Research questions for discussion.
Projects
Milestone

Comments

@sdtblck
Copy link
Collaborator

sdtblck commented Sep 8, 2020

Currently our sampling is incredibly inefficient, doesn't store past values for k / v, and instead recomputes them for every token.

We should look again at how sampling is done in mesh / T5 (maybe ask Colin?) and see if we can store k/v values to increase the efficiency of our sampling code.

The basic infrastructure for this is already in place, but commented out (the k/v values will be stored in this Context object: https://github.com/EleutherAI/GPTNeo/blob/master/sample.py#L148, https://github.com/EleutherAI/GPTNeo/blob/master/models/gpt2/gpt2.py#L138). The problem we ran into last time was that in the mesh code they seem to feed in the inputs a token at a time, but the token is missing a batch dimension, and we're not exactly sure where batch gets added on, and therefore how to replicate this. (see: https://github.com/tensorflow/mesh/blob/a3c05f705641dfe144f70b7b5230db4933ce8ca9/mesh_tensorflow/transformer/transformer.py#L1137)

This is (the last?) issue I'd like to get solved before the code is released publicly, so I will try to look into this soon, but any help would be appreciated.

@StellaAthena StellaAthena added feature request A feature that isn't implemented yet. research Research questions for discussion. labels Sep 8, 2020
@StellaAthena StellaAthena added this to To do in 1T or Bust via automation Sep 8, 2020
@Mistobaan Mistobaan added this to the v1 milestone Sep 10, 2020
@Mistobaan
Copy link
Collaborator

@shawwn is your pr #28 related to this?

@StellaAthena
Copy link
Member

StellaAthena commented Sep 11, 2020

Per Stack Overflow, apparently you cannot have multiple assignees to a private repo. No idea why.

@Mistobaan @lucidrains should assure you both get notifications.

@lucidrains lucidrains linked a pull request Sep 12, 2020 that will close this issue
4 tasks
1T or Bust automation moved this from To do to Done Sep 12, 2020
@sdtblck sdtblck reopened this Sep 16, 2020
1T or Bust automation moved this from Done to To do Sep 16, 2020
@sdtblck
Copy link
Collaborator Author

sdtblck commented Sep 16, 2020

reopening this because there are still some problems

  1. at least with the models I've tested, 'fast sampling' seems to be of significantly worse quality than slow sampling. (see sample screenshots below)

Slow sampling:
slow

Fast sampling:
fast
It looks ok on the surface but is notably more nonsensical and more often than not devolves into repetition:
fast_repetition

  1. fast sampling does not work with larger TPU pods - I will post the stacktrace for this shortly (whenever we have a spare pod 😆 )

I'm happy to look into this if you're busy @lucidrains . I suspect 1) is just an off by one error somewhere with the attention mask, maybe. 2) might be a little trickier but I'd be happy to look into this one, since TPUs might be the one area where I can claim to have more experience, haha. I think it may be a shape error relating to the mtf splitting.

@sdtblck
Copy link
Collaborator Author

sdtblck commented Sep 17, 2020

After a bit more digging, it seems like the sample quality is only worse with MOE models. I'm going to keep investigating as to why.

@sdtblck
Copy link
Collaborator Author

sdtblck commented Sep 17, 2020

Ok, for now we have set fast sampling as the default, but revert to slow sampling for moe models.

Error no. 2 wasn't in fact an error with larger pods, or models, but was an error with fast sampling when recompute_grad was set to True. This is fixed here: 8415e56

so what's left now is to 1) figure out what's wrong with moe and 2) try out step 4 here: #43

@sdtblck sdtblck closed this as completed Nov 1, 2020
1T or Bust automation moved this from To do to Done Nov 1, 2020
@sdtblck sdtblck reopened this Nov 1, 2020
1T or Bust automation moved this from Done to To do Nov 1, 2020
@sdtblck
Copy link
Collaborator Author

sdtblck commented Dec 10, 2020

closing this issue to open a more relevant one w/r/t moe

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature request A feature that isn't implemented yet. research Research questions for discussion.
Projects
1T or Bust
  
Done
Development

Successfully merging a pull request may close this issue.

4 participants