turn exponential notation back on for config dump

Currently with zero3's huge numbers in its params the config dump looks like:

```
[2021-04-05 20:09:01,945] [INFO] [config.py:741:print]   zero_config .................. {
    "allgather_bucket_size": 500000000,
    "allgather_partitions": true,

    "zero_optimization":{
        "contiguous_gradients":true,
        "cpu_offload":true,
        "cpu_offload_params":true,
        "cpu_offload_use_pin_memory":true,
        "overlap_comm":true,
        "reduce_bucket_size":262144,
        "stage":3,
        "stage3_gather_fp16_weights_on_model_save":true,
        "stage3_max_live_parameters":1000000000.0,
        "stage3_max_reuse_distance":1000000000.0,
        "stage3_param_persistence_threshold":5120,
        "stage3_prefetch_bucket_size":235929.6,
        "sub_group_size":100000000000000.0
    }
}
```

`100000000000000` isn't quite readable, is it? so if the intention of this dump is to debug problems, this output isn't very human readable.

I adapted the formatting function:
```
import json
from collections.abc import Mapping, Sequence

# adapted from https://stackoverflow.com/a/50701137/9201239
class ScientificNotationEncoder(json.JSONEncoder):
    def iterencode(self, o, _one_shot=False, level=0):
        indent = self.indent if self.indent is not None else 4
        prefix_close = " " * level * indent
        level += 1
        prefix = " " * level * indent
        if isinstance(o, float):
            return f"{o:e}"
        elif isinstance(o, Mapping):
            x = [f'\n{prefix}"{k}": {self.iterencode(v, level=level)}' for k,v in o.items()]
            return "{" + ', '.join(x)  + f"\n{prefix_close}}}"
        elif isinstance(o, Sequence) and not isinstance(o, str):
            return f"[{ f', '.join(map(self.iterencode, o)) }]"
        return "\n, ".join(super().iterencode(o, _one_shot))

print(json.dumps(x, indent=4, cls=ScientificNotationEncoder))
```

Now we get back the more readable scientific notation format:

```
    "zero_optimization": {
        "stage": 3, 
        "cpu_offload": true, 
        "cpu_offload_params": true, 
        "cpu_offload_use_pin_memory": true, 
        "overlap_comm": true, 
        "contiguous_gradients": true, 
        "sub_group_size": 1.000000e+14, 
        "reduce_bucket_size": 1.000000e+06, 
        "stage3_prefetch_bucket_size": 9.487879e+05, 
        "stage3_param_persistence_threshold": 1.000000e+04, 
        "stage3_max_live_parameters": 1.000000e+09, 
        "stage3_max_reuse_distance": 1.000000e+09, 
        "stage3_gather_fp16_weights_on_model_save": true
    }, 
```

`1.000000e+14` is much more human readable than  `100000000000000` in the current output ;)

Not sure if you want it or not, but since I spent time hacking this together, I thought I'd leave it here for posterity.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

turn exponential notation back on for config dump #929

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

turn exponential notation back on for config dump #929

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions