-
Notifications
You must be signed in to change notification settings - Fork 124
Fix for #2116 When all modules of a layer excluded there is an error in forward replay #2117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The error is a state bug in logbar so I need to fix this and unrelated to gpt-qmodel. I will check and merge this PR soon! |
|
@avtc There is an issue where For example, if we have a normal moe layer:
So layers subset (1) and optionally subset (2) are forced to execute serially when they can still execute under faster data-paralleized. Can you do a debug and print out the subsets and also the modules list when you now do the is_moe check? So with
|
|
The curent |
|
I will partially revert and test. Do not see how convert to draft, please assume this is not finalized |
|
I have removed the extra logic from before the loop, and only fixed the issue with undefined variable. P.S. Sometimes I have encountered the error even with 4 GPUs: Maybe it is related with current fix... |
|
This PR can be considered as final |
|
@avtc Sorry for getting back to you so late. I wanted to refractor the life cycle so Please mege with master and re-apply your fix via the new lifecycle. |
|
Ok, i will prepare changes and test them before pushing |
|
@Qubitium what will be the proper fix for the case when layer has no modules for quantization - to run forward pass and produce proper layer_inputs for next layer? or to skip forward pass and forward replay and pass layer inputs to next layer from this layer? |
|
@avtc If a full layer has no module to quantize a simple forward() is enough and output is captured to be used as next layer's input. So one pass forward (entire layer simple forward wihout need of dealing with subset loops and micro forward loops, just full layer, usally XXXDecodeLayer.forward(). So output = current_layer.forward() is enough or sometimes just calling the layer callable like Assume layer 2 has no modules to quantize. At beginniing loop for layer 2, we have layer_output from completed forward_replay() of layer 1. Then pass this to layer 2 (as a whole) as layer_input and store ouput, then immediately loop to layer 3 without any further subset work that is only necessary if we need to quantize part of a layer. |
@Qubitium please review the fix for #2116