-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Closed
Description
Currently with zero3's huge numbers in its params the config dump looks like:
[2021-04-05 20:09:01,945] [INFO] [config.py:741:print] zero_config .................. {
"allgather_bucket_size": 500000000,
"allgather_partitions": true,
"zero_optimization":{
"contiguous_gradients":true,
"cpu_offload":true,
"cpu_offload_params":true,
"cpu_offload_use_pin_memory":true,
"overlap_comm":true,
"reduce_bucket_size":262144,
"stage":3,
"stage3_gather_fp16_weights_on_model_save":true,
"stage3_max_live_parameters":1000000000.0,
"stage3_max_reuse_distance":1000000000.0,
"stage3_param_persistence_threshold":5120,
"stage3_prefetch_bucket_size":235929.6,
"sub_group_size":100000000000000.0
}
}
100000000000000 isn't quite readable, is it? so if the intention of this dump is to debug problems, this output isn't very human readable.
I adapted the formatting function:
import json
from collections.abc import Mapping, Sequence
# adapted from https://stackoverflow.com/a/50701137/9201239
class ScientificNotationEncoder(json.JSONEncoder):
def iterencode(self, o, _one_shot=False, level=0):
indent = self.indent if self.indent is not None else 4
prefix_close = " " * level * indent
level += 1
prefix = " " * level * indent
if isinstance(o, float):
return f"{o:e}"
elif isinstance(o, Mapping):
x = [f'\n{prefix}"{k}": {self.iterencode(v, level=level)}' for k,v in o.items()]
return "{" + ', '.join(x) + f"\n{prefix_close}}}"
elif isinstance(o, Sequence) and not isinstance(o, str):
return f"[{ f', '.join(map(self.iterencode, o)) }]"
return "\n, ".join(super().iterencode(o, _one_shot))
print(json.dumps(x, indent=4, cls=ScientificNotationEncoder))
Now we get back the more readable scientific notation format:
"zero_optimization": {
"stage": 3,
"cpu_offload": true,
"cpu_offload_params": true,
"cpu_offload_use_pin_memory": true,
"overlap_comm": true,
"contiguous_gradients": true,
"sub_group_size": 1.000000e+14,
"reduce_bucket_size": 1.000000e+06,
"stage3_prefetch_bucket_size": 9.487879e+05,
"stage3_param_persistence_threshold": 1.000000e+04,
"stage3_max_live_parameters": 1.000000e+09,
"stage3_max_reuse_distance": 1.000000e+09,
"stage3_gather_fp16_weights_on_model_save": true
},
1.000000e+14 is much more human readable than 100000000000000 in the current output ;)
Not sure if you want it or not, but since I spent time hacking this together, I thought I'd leave it here for posterity.
Metadata
Metadata
Assignees
Labels
No labels