improve model/optimizer loading and set directly state in sharded mode #134

borisdayma · 2022-02-09T20:46:11Z

In the training script, we do:

load params on CPU
load opt_state on CPU (if we restore them)
use pjit to shard state on correct devices

Ideally we wouldn't need to go through CPU first.

Also we implemented custom abstract_init and load_on_cpu args that should be pushed to HuggingFace transformers for better support.

The text was updated successfully, but these errors were encountered:

borisdayma · 2022-03-23T20:42:56Z

This has been fixed

borisdayma changed the title ~~load directly state in sharded mode~~ cleanup loading method and set directly state in sharded mode Feb 10, 2022

borisdayma changed the title ~~cleanup loading method and set directly state in sharded mode~~ improve model/optimizer loading and set directly state in sharded mode Feb 10, 2022

borisdayma closed this as completed Mar 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve model/optimizer loading and set directly state in sharded mode #134

improve model/optimizer loading and set directly state in sharded mode #134

borisdayma commented Feb 9, 2022 •

edited

Loading

borisdayma commented Mar 23, 2022

improve model/optimizer loading and set directly state in sharded mode #134

improve model/optimizer loading and set directly state in sharded mode #134

Comments

borisdayma commented Feb 9, 2022 • edited Loading

borisdayma commented Mar 23, 2022

borisdayma commented Feb 9, 2022 •

edited

Loading