Skip to content

Commit

Permalink
Merge branch 'main' into bias_map
Browse files Browse the repository at this point in the history
  • Loading branch information
Andrei-Aksionov committed Jan 16, 2024
2 parents ea10f9a + 1e5afd6 commit 882b779
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 12 deletions.
2 changes: 1 addition & 1 deletion pretrain/tinyllama.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ def setup(resume: Union[bool, Path] = False):
logger = choose_logger(logger_name, name=name, resume=resume)

strategy = FSDPStrategy(auto_wrap_policy={Block}, state_dict_type="full", sharding_strategy="HYBRID_SHARD")
fabric = L.Fabric(devices=devices, strategy=strategy, precision="bf16-true", loggers=[logger])
fabric = L.Fabric(devices=devices, strategy=strategy, precision="bf16-mixed", loggers=[logger])
fabric.launch()

fabric.print(hparams)
Expand Down
23 changes: 12 additions & 11 deletions tutorials/pretrain_tinyllama.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,18 @@ This tutorial will walk you through pretraining [TinyLlama](https://github.com/j

Here is a quick fact sheet:

| Name | Description |
|------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Parameters | 1.1B |
| Model Size | Layers: 22, Heads: 32, Query Groups: 4, Embedding Size: 2048, Intermediate Size: 5632 |
| Sequence Length | 2048 |
| Batch Size | 2 million tokens (2048 * 1024) |
| Learning Rate | 4e-4 |
| Learning Rate Schedule | Cosine with 2000 warmup steps |
| Training Data | [SlimPajama](https://huggingface.co/datasets/cerebras/slimpajama-627b) (893 GB), [Starcoder](https://huggingface.co/datasets/bigcode/starcoderdata) (290 GB) |
| Combined Dataset Size | Around 950B tokens |
| Total Tokens During Training | 3 trillion |
| Name | Description |
|-------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Parameters | 1.1B |
| Model Size | Layers: 22, Heads: 32, Query Groups: 4, Embedding Size: 2048, Intermediate Size: 5632 |
| Sequence Length | 2048 |
| Learning Rate | 4e-4 |
| Learning Rate Schedule | Cosine with 2000 warmup steps |
| Training Data | [SlimPajama](https://huggingface.co/datasets/cerebras/slimpajama-627b) (893 GB), [Starcoder](https://huggingface.co/datasets/bigcode/starcoderdata) (290 GB) |
| Combined Dataset Size | Around 950B tokens |
| Total Tokens During Training | 3 trillion (3 epochs) |
| Time to complete training | ~ 4 weeks with 64 A100 GPUs |
| Model FLOPs Utilization (MFU) | 52% |

(this table was sourced from the author's [README](https://github.com/jzhang38/TinyLlama/))

Expand Down

0 comments on commit 882b779

Please sign in to comment.