Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a script to convert NeoX 2.0 checkpoints to DeepSpeed's universal checkpoint format #836

Open
wants to merge 115 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
115 commits
Select commit Hold shift + click to select a range
b666034
Check BigScience scripts
dashstander Jan 31, 2023
4c1850f
Import NeoxCheckpoint
dashstander Jan 31, 2023
174a182
Different file structure
dashstander Feb 1, 2023
e065d44
Reformat
dashstander Feb 1, 2023
83a4aa6
Add BigScience universal script
dashstander Feb 1, 2023
51fa5df
Use tokenizer directly
dashstander Feb 3, 2023
ea11b87
Use tokenizer directly
dashstander Feb 3, 2023
48c395f
Use tokenizer directly
dashstander Feb 3, 2023
1c9aec2
Use tokenizer directly
dashstander Feb 3, 2023
97bfd11
build_tokenizer actually returns None
dashstander Feb 6, 2023
2ab9ca2
If we have the tokenizer we already have the padded vocab size
dashstander Feb 6, 2023
d2f028e
If we have the tokenizer we already have the padded vocab size
dashstander Feb 6, 2023
7765730
Changing universal
dashstander Feb 7, 2023
8dee716
Changing universal
dashstander Feb 7, 2023
4f06a6f
Need to calculate properly
dashstander Feb 13, 2023
8097e17
Need to calculate properly
dashstander Feb 13, 2023
027fb91
Make our own function
dashstander Feb 13, 2023
3af69a5
Make our own function
dashstander Feb 13, 2023
ad23aca
Make our own function
dashstander Feb 13, 2023
04129bd
Need to reshape with tp in mind
dashstander Feb 13, 2023
6b15d57
Need to reshape with tp in mind
dashstander Feb 13, 2023
4cd7cfc
Merge branch 'main' into ckpt_reshape
dashstander Mar 13, 2023
cd09987
Get tokenizer size from tokenizer
dashstander Mar 14, 2023
04a7829
Get tokenizer size from tokenizer
dashstander Mar 14, 2023
40c4ae9
Add argument for universal checkpoint
dashstander Mar 14, 2023
858b2be
Update NeoXArgs docs automatically
invalid-email-address Mar 14, 2023
557751f
Merge branch 'main' into ckpt_reshape
Quentin-Anthony Mar 26, 2023
875945d
Update NeoXArgs docs automatically
invalid-email-address Mar 26, 2023
66e856d
Add documentation for checkpoint conversion scripts
dashstander Mar 28, 2023
fa16eea
Finish tools/README.md for checkpoint reshaping
dashstander Apr 10, 2023
e2742ab
Merge branch 'main' into ckpt_reshape
dashstander Apr 10, 2023
89dcab1
Update NeoXArgs docs automatically
invalid-email-address Apr 10, 2023
de8501f
Reorganize tools directory
Apr 10, 2023
069d93d
Update NeoXArgs docs automatically
invalid-email-address Apr 10, 2023
2f8e8df
Left off the checkpoints directory
Apr 10, 2023
b77b70c
Merge branch 'ckpt_reshape' of https://github.com/EleutherAI/gpt-neox…
Apr 10, 2023
8e40d9c
Update NeoXArgs docs automatically
invalid-email-address Apr 10, 2023
30534d1
fixed conflicts
StellaAthena Apr 17, 2023
da46b68
Update NeoXArgs docs automatically
invalid-email-address Apr 17, 2023
0966f31
Merge branch 'main' into ckpt_reshape
Quentin-Anthony May 18, 2023
71d0d68
Update NeoXArgs docs automatically
invalid-email-address May 18, 2023
c912932
pre-commit
dashstander May 30, 2023
5c33b5d
Update NeoXArgs docs automatically
invalid-email-address May 30, 2023
5325386
gitignore
dashstander May 30, 2023
1a3c00a
Merge branch 'ckpt_reshape' of https://github.com/EleutherAI/gpt-neox…
dashstander May 30, 2023
f1fde46
Update NeoXArgs docs automatically
invalid-email-address May 30, 2023
4c468b9
typo
dashstander Jun 2, 2023
6c00899
Merge branch 'ckpt_reshape' of https://github.com/EleutherAI/gpt-neox…
dashstander Jun 2, 2023
88cb93b
Update NeoXArgs docs automatically
invalid-email-address Jun 2, 2023
89623f6
Rename args
dashstander Jun 2, 2023
a6a7a84
Merge branch 'ckpt_reshape' of https://github.com/EleutherAI/gpt-neox…
dashstander Jun 2, 2023
f9d3062
Update NeoXArgs docs automatically
invalid-email-address Jun 2, 2023
7bd3500
pre commit
dashstander Jun 9, 2023
e74c8b1
Merge branch 'ckpt_reshape' of https://github.com/EleutherAI/gpt-neox…
dashstander Jun 9, 2023
c647a57
need to handle folders better
dashstander Jun 9, 2023
42a23a0
need to handle folders better
dashstander Jun 9, 2023
5e32016
need to handle folders better
dashstander Jun 9, 2023
8c1e8fb
Merge branch 'main' into ckpt_reshape
Quentin-Anthony Jun 9, 2023
218cea6
Update NeoXArgs docs automatically
invalid-email-address Jun 9, 2023
a6fd1fd
need to handle folders better
dashstander Jun 9, 2023
fa0faa2
Merge branch 'ckpt_reshape' of https://github.com/EleutherAI/gpt-neox…
dashstander Jun 9, 2023
1d0b406
Update NeoXArgs docs automatically
invalid-email-address Jun 9, 2023
a92ee40
need to handle folders better
dashstander Jun 9, 2023
45c7d8f
Merge branch 'ckpt_reshape' of https://github.com/EleutherAI/gpt-neox…
dashstander Jun 9, 2023
2a2721d
Update NeoXArgs docs automatically
invalid-email-address Jun 9, 2023
32a9578
need to handle folders better
dashstander Jun 9, 2023
985d9c8
Merge branch 'ckpt_reshape' of https://github.com/EleutherAI/gpt-neox…
dashstander Jun 9, 2023
b40cfeb
Update NeoXArgs docs automatically
invalid-email-address Jun 9, 2023
517f836
pre-commit
dashstander Jun 14, 2023
b0c9d85
Update NeoXArgs docs automatically
invalid-email-address Jun 14, 2023
b6c1845
Pre-commit
dashstander Jul 26, 2023
1c63a80
Merge branch 'ckpt_reshape' of https://github.com/EleutherAI/gpt-neox…
dashstander Jul 26, 2023
6a86723
whoops
dashstander Jul 26, 2023
f77c123
I guess I did need that
dashstander Jul 26, 2023
0459479
I guess I did need that
dashstander Jul 26, 2023
7c582d1
I guess I did need that
dashstander Jul 26, 2023
40ea6c6
I guess I did need that
dashstander Jul 26, 2023
33be30f
universal
dashstander Jul 29, 2023
2732361
more printing
dashstander Jul 29, 2023
df131b9
more printing
dashstander Jul 31, 2023
191772b
hmmmm
dashstander Jul 31, 2023
bc9bc01
hmmmm
dashstander Jul 31, 2023
eb9e278
hmmmm
dashstander Jul 31, 2023
9a63d65
blegh
dashstander Jul 31, 2023
5c8af0d
blegh
dashstander Jul 31, 2023
96c1cf1
merged
dashstander Aug 7, 2023
547f165
Update NeoXArgs docs automatically
invalid-email-address Aug 7, 2023
8d9f324
Make more robust
dashstander Aug 7, 2023
a3f79a3
merge
dashstander Aug 7, 2023
0f43323
Update NeoXArgs docs automatically
invalid-email-address Aug 7, 2023
8fbaa15
ok
dashstander Aug 8, 2023
54e9a5a
Merge branch 'ckpt_reshape' of https://github.com/EleutherAI/gpt-neox…
dashstander Aug 8, 2023
9d39106
Update NeoXArgs docs automatically
invalid-email-address Aug 8, 2023
5947a66
ok
dashstander Aug 8, 2023
00ace51
Update NeoXArgs docs automatically
invalid-email-address Aug 8, 2023
58a3c92
different printing
dashstander Aug 9, 2023
f157814
Update NeoXArgs docs automatically
invalid-email-address Aug 9, 2023
9903e8e
ok
dashstander Aug 9, 2023
8b74404
Update NeoXArgs docs automatically
invalid-email-address Aug 9, 2023
860161b
ok
dashstander Aug 9, 2023
1fec493
Update NeoXArgs docs automatically
invalid-email-address Aug 9, 2023
c0682eb
blegh
dashstander Aug 9, 2023
188c379
Merge branch 'ckpt_reshape' of https://github.com/EleutherAI/gpt-neox…
dashstander Aug 9, 2023
38b108a
Update NeoXArgs docs automatically
invalid-email-address Aug 9, 2023
70284f1
printing
dashstander Aug 9, 2023
ed59200
Update NeoXArgs docs automatically
invalid-email-address Aug 9, 2023
3dea236
change printing
dashstander Aug 21, 2023
206ffea
Merge branch 'ckpt_reshape' of https://github.com/EleutherAI/gpt-neox…
dashstander Aug 21, 2023
a697c81
Get param group info
dashstander Aug 30, 2023
f51ceee
Get param group info
dashstander Aug 30, 2023
9645980
Get param group info
dashstander Aug 30, 2023
e9d3000
Get param group info
dashstander Aug 30, 2023
c32bc10
Get param group info
dashstander Aug 30, 2023
11531ad
Log bit16 groups
dashstander Sep 5, 2023
9ab2cee
ok
dashstander Sep 7, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ data/**/*.txt
data/**/*.gz
data/**/*.np*
data/**/*.npy
checkpoints/
./checkpoints/
.vscode/
*.pt
*.ckpt
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ Or use the 20B tokenizer (for which only a single Vocab file is needed):

(alternatively, you can provide any tokenizer file that can be loaded by Hugging Face's tokenizers library with the `Tokenizer.from_pretrained()` command)

You can now pretokenize your data using `tools/preprocess_data.py`, the arguments for which are detailed below:
You can now pretokenize your data using `tools/datasets/preprocess_data.py`, the arguments for which are detailed below:

```
usage: preprocess_data.py [-h] --input INPUT [--jsonl-keys JSONL_KEYS [JSONL_KEYS ...]] [--num-docs NUM_DOCS] --tokenizer-type {HFGPT2Tokenizer,HFTokenizer,GPT2BPETokenizer,CharLevelTokenizer} [--vocab-file VOCAB_FILE] [--merge-file MERGE_FILE] [--append-eod] [--ftfy] --output-prefix OUTPUT_PREFIX
Expand Down Expand Up @@ -206,7 +206,7 @@ runtime:
For example:

```bash
python tools/preprocess_data.py \
python tools/datasets/preprocess_data.py \
--input ./data/mydataset.jsonl.zst \
--output-prefix ./data/mydataset \
--vocab ./data/gpt2-vocab.json \
Expand Down Expand Up @@ -322,7 +322,7 @@ python ./tools/convert_sequential_to_hf.py --input_dir /path/to/model/global_st
Then to upload a model to [the Hugging Face Hub](https://huggingface.co/), run:
```bash
huggingface-cli login
python ./tools/upload.py
python ./tools/checkpoints/upload.py
```
and input the requested information, including HF hub user token.

Expand Down
10 changes: 9 additions & 1 deletion configs/neox_arguments.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ Logging Arguments

- **git_hash**: str

Default = d3e481c
Default = 70284f1

current git hash of repository

Expand Down Expand Up @@ -1906,6 +1906,14 @@ Args for deepspeed config



- **load_universal**: bool

Default = False

Flag for whether the checkpoint to be loaded is a universal checkpoint.



## NeoXArgsDeepspeedRunner

Args for deepspeed runner (deepspeed.launcher.runner).
Expand Down
26 changes: 23 additions & 3 deletions megatron/model/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,28 @@ def get_params_for_weight_decay_optimization(module, neox_args):
) or (
neox_args.weight_decay == 0.0
): # also include all parameters here if no weight decay is being done
no_weight_decay_params["params"].extend(
[p for p in list(module_._parameters.values()) if p is not None]
)
# no_weight_decay_params["params"].extend(
# [p for p in list(module_._parameters.values()) if p is not None]
# )
params = []
for n, p in module_._parameters.items():
if p is not None:
p.module_name = f"{module_._get_name()}.{n}"
params.append(p)
no_weight_decay_params["params"].extend(params)
else:
wd_params = []
nwd_params = []
for n, p in module_._parameters.items():
if p is not None:
p.module_name = f"{module_._get_name()}.{n}"
if n != "bias":
wd_params.append(p)
else:
nwd_params.append(p)
weight_decay_params["params"].extend(wd_params)
no_weight_decay_params["params"].extend(nwd_params)
"""
weight_decay_params["params"].extend(
[
p
Expand All @@ -58,6 +76,8 @@ def get_params_for_weight_decay_optimization(module, neox_args):
if p is not None and n == "bias"
]
)
"""

if neox_args.weight_decay == 0.0:
# only return a single param group
# with onebitadam, we want to minimize the calls to compressed_allreduce. Every param group calls it once.
Expand Down
3 changes: 3 additions & 0 deletions megatron/neox_arguments/deepspeed_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -277,6 +277,9 @@ class NeoXArgsDeepspeedConfig(NeoXArgsTemplate):
autotuning: dict = None
"""Dictionary as described in DeepSpeed autotuning documentation: https://github.com/microsoft/DeepSpeed/tree/master/deepspeed/autotuning"""

load_universal: bool = False
"""Flag for whether the checkpoint to be loaded is a universal checkpoint."""


@dataclass
class NeoXArgsDeepspeedRunner(NeoXArgsTemplate):
Expand Down
52 changes: 52 additions & 0 deletions megatron/training.py
Original file line number Diff line number Diff line change
Expand Up @@ -610,6 +610,53 @@ def get_learning_rate_scheduler(optimizer, neox_args):
return lr_scheduler


from collections import OrderedDict
import json


def log_bit16_groups(optimizer, param_names, zero_stage):

"""Returns a dict of name to shape mapping, only for the flattened fp32 weights saved by the
optimizer. the names are exactly as in state_dict. The order is absolutely important, since
the saved data is just flattened data with no identifiers and requires reconstruction in the
same order it was saved.
We can't rely on self.module.named_parameters() to get the saved tensors, as some params
will be missing and others unsaved and then it'd be impossible to reconstruct state_dict
from the flattened weights.
optimizer.bit16_groups seems to be the easiest to use as it's in all zeroX versions.
"""
param_group_shapes = []
cnt = 0
numel = 0

# zero2 started using a round_robin_bit16_groups which is a shuffled version of bit16_groups -
# if we don't use it, we get parameters ordered incorrectly
if hasattr(optimizer, "round_robin_bit16_groups"):
bit16_groups = optimizer.round_robin_bit16_groups
else:
bit16_groups = (
optimizer.bit16_groups if zero_stage == 2 else optimizer.fp16_groups
)

for bit16_group in bit16_groups:
param_shapes = OrderedDict()
for param in bit16_group:
cnt += 1
numel += param.ds_numel if hasattr(param, "ds_numel") else param.numel()
shape = param.ds_shape if hasattr(param, "ds_shape") else param.shape
if param not in param_names:
raise ValueError(f"failed to find optimizer param in named params")
name = param_names[param]
param_shapes[name] = shape

# uncomment to debug zero_to_fp32.py problems
# if self.global_rank == 0: print(f"saving param {name} {shape} (numel={shape.numel()})")
param_group_shapes.append(param_shapes)
# if self.global_rank == 0: print(f"Total saved {numel} numels in {cnt} params")

return param_group_shapes


def setup_model_and_optimizer(neox_args, use_cache=False, iteration=None):
"""Setup model and optimizer."""
model = get_model(neox_args=neox_args, use_cache=use_cache)
Expand Down Expand Up @@ -637,6 +684,11 @@ def setup_model_and_optimizer(neox_args, use_cache=False, iteration=None):
# config_params=neox_args.deepspeed_config,
mpu=mpu if not neox_args.is_pipe_parallel else None,
)
zero_stage = neox_args.zero_optimization["stage"]
# bit16_groups = log_bit16_groups(optimizer, model.param_names, zero_stage)
bit16_groups = model._get_zero_param_shapes()
with open(f"zero{zero_stage}.json", mode="w") as jfile:
json.dump(bit16_groups, jfile)
model.total_params = get_total_params(model.module)
print_rank_0(f' > total params: {"{:,}".format(model.total_params)}')

Expand Down
15 changes: 15 additions & 0 deletions tools/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# GPT-NeoX Auxillery Tools

This directory contains a number of auxillery tools that are useful for working with GPT-NeoX but not part of the main training code.

## Bash

This directory contains some simple, frequently used bash commands to make working on multiple machines easier.

## Checkpoints

This directory contains tools for manipulating and converting checkpoints including changing the parallelism settings of a pretrained model, converting between GPT-NeoX and the transformers library, and updating checkpoints trained with Version 1.x of this library to be compatible with Version 2.x.

## Datasets

This directory contains tools for downloading and preprocessing datasets to the format expected by the GPT-NeoX library.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
50 changes: 50 additions & 0 deletions tools/checkpoints/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# GPT-NeoX Checkpoint Manipulation Tools

## Checkpoint Conversion

The default format Deepspeed checkpoints are saved in is dependent on the model and pipeline parallelism settings of the training run. Running a model on a cluster with a different number or type of GPUs is difficult. We have adapted a set of scripts developed by [BigScience](https://github.com/bigscience-workshop/Megatron-DeepSpeed/tree/main/tools/convert_checkpoint) to make this easier.

### DeeperSpeed to universal

To convert your checkpoint to the universal checkpoint format run the `ds_to_universal.py` script with a command along these lines.

```bash
CURR_CKPT="/path/to/your/old/checkpoint"
NEW_CKPT="/path/where/you/want/the/new/checkpoint"
CFG="/path/to/model/config/file"

python3 tools/ds_to_universal.py \
--input_folder $CURR_CKPT \
--output_folder $NEW_CKPT \
--config $CFG
```

To then run the model from your new checkpoint, add these lines to a new config and run your model like you normally would.

```json
{
"load": "/path/where/you/want/the/new/checkpoint",
"load_universal": true
}
```

### DeeperSpeed to DeeperSpeed Reshaping

To reshape a DeeperSpeed checkpoint to _reduce_ the parallelism settings, you can use the `deepspeed_to_deepspeed.py` script. It does not work if you would like to re-shard a model to increase the amount of tensor or pipeline parallelism. But if you would like to decrease the amount of parallelism you can run the script with a command like the one below.

```bash
CURR_CKPT="/path/to/your/old/checkpoint"
NEW_CKPT="/path/where/you/want/the/new/checkpoint"
CFG="/path/to/model/config/file"
TP=1 # Tensor (model) parallelism setting for the new checkpoint, must be less than or equal to the model's original tensor parallelism
DP=1 # Data parallelism setting for the new checkpoint
PP=1 # Model parallelism setting for the new checkpoint, must be less than or equal to the model's original pipeline parallelism

python3 tools/deepspeed_to_deepspeed.py \
--input_folder $CURR_CKPT \
--output_folder $NEW_CKPT \
--config $CFG \
--target_tp $TP \
--target_dp $DP \
--target_pp $PP
```
Loading
Loading