[Post 0.7][multimodal] speedup fusion model training with deepspeed strategy #2932
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue #, if available:
Deepspeed saves model and optimizer states in a shared state in separate folder (even when using single GPU).
Previously, the conversion took place in the
_update_best_and_save
function. However, it's a bad idea to convert zero checkpoint to state dict there, since multiple checkpointing processes (when using DDP with multi-gpu) would call this function almost simultaneously.Description of changes:
_load_state_dict
function, so that the conversion is done only once, even when using DDP with multi-gpuResults
Tested on g4dn.12xl instance with 4x Tesla T4 gpu.
Code to reproduce the results:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.