Distill megatron - test Draft WIP by younesbelkada · Pull Request #352 · bigscience-workshop/Megatron-DeepSpeed

younesbelkada · 2022-09-28T09:45:30Z

An attempt to perform knowledge distillation using Megatron-DeepSpeed

disclaimer: this is a super ugly version of the code, the PR is here to compare the difference between the original code and this modified version - for now I don't plan to merge this PR

Updates on 28.09.2022

This version is very ugly, I had to add an argument student_ on all megatron modules since the arguments are directly retrieved from the global variable. The other solution could be to have each class re-written with the suffix Student - eg GPTModelStudentPipe. I preferred the first solution to have a quick working implementation.

The forward and backward pass seems to pass for the student model - for now I am not computing the teacher's logits..
Two solutions for that

1 - In distill_train_step - add a step where we retrieve the teacher's logits. In this case would we need to change the deepspeed.PipelineEngine internals?
2- Store the embedding layer of the teacher model inside the student model and gather the last hidden states of the teacher model. Once this is gathered apply the forward pass with the embedding layer of the teacher model to get the logits. (cc @thomasw21 as discussed offline)

main TODOs

load teacher model from a checkpoint - make sure to use a copy of the chkpt
load student model from a checkpoint - make sure to use a copy of the chkpt
make the broadcasting of the teacher logits (or last hidden states) work.
try on 176

…n-DeepSpeed into bigscience-workshop-main

This reverts commit b986a83.

younesbelkada added 30 commits July 1, 2022 17:46

use relu

b986a83

first commit

d8a0d94

modify example file

27b014a

modify

e76572b

modify

dcbc5ca

modify

2b27835

replace function name

20e7b3b

hack to support teacher student loading

3145fb4

modify script

03c80b5

fix arg name

5e5ea3d

fix arg name

06aeeb0

update args

8b55d48

modify

d082ee6

fix

90fcc4f

change order

0555a1c

Merge branch 'main' of https://github.com/bigscience-workshop/Megatro…

ab659e3

…n-DeepSpeed into bigscience-workshop-main

Merge branch 'bigscience-workshop-main' into distill_megatron

7e15c4c

Revert "use relu"

170e55a

This reverts commit b986a83.

add attn mask type

868fa5c

remove unused files

4472deb

add distill step

5c98b4a

update print

85dc0af

small update

1786b9e

fix kwarg

1f3a523

oops

d0eeca4

add kwarg

cf39568

add student

ca0f358

add kwagr

6a933e0

fix num heads

7f920c3

uncomment assert

6f4da7b

younesbelkada added 29 commits October 3, 2022 19:03

add comments

60cfaa7

add print

ecad25d

tp=2

1067804

add more prints

cf6445e

add exit

6938c6d

dp tp

be466ba

add print shape

e3e7a51

try for debug

3db071c

use eval batch

5abb642

add print

1ea3192

add more comments

555d48c

use eval batch

35e62e4

try

7f7f1ff

try outside

28c5ab5

fix

e25f04b

try

f6372ac

debug

f17dce2

try

7e0e3af

import deepcopy

053081a

remove print

490a22d

remove unused import

0976cf0

correct arg name

bcff442

fix pp

1c4dbde

tp=1

8bc5f96

prints

c05a01d

more prints

c1b7019

test without PP

38af172

add print

e6b6d04

remove comment

81e5df8

younesbelkada closed this Oct 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distill megatron - test Draft WIP#352

Distill megatron - test Draft WIP#352
younesbelkada wants to merge 175 commits intobigscience-workshop:mainfrom
younesbelkada:distill_megatron

younesbelkada commented Sep 28, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

younesbelkada commented Sep 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Updates on 28.09.2022

main TODOs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

younesbelkada commented Sep 28, 2022 •

edited

Loading