Skip to content

v0.1.0

Compare
Choose a tag to compare
@github-actions github-actions released this 27 Nov 00:49
· 2019 commits to main since this release

What's new

Added 🎉

  • GPT-based model.
  • Tokenizer and data pre-processing pipeline.
  • training script.
  • Triton-based FlashAttention.

Commits

f1ba78e moving readme to notes
6c94994 Bump version to v0.1.0 for release
f09a500 Add a "constant" LR scheduler (#376)
dcdadc5 Merge pull request #377 from allenai/Muennighoff/split-model-comps
80b081b Merge pull request #374 from allenai/epwalsh/threaded-data-loading
9c8e67e Merge pull request #373 from allenai/chore/paths
1f51fec Merge pull request #375 from allenai/Muennighoff/move-torch-utils
9d5aa11 Merge pull request #370 from allenai/CheckpointLoading
38be6a7 Merge pull request #372 from allenai/epwalsh/optim-state-fix
c205912 Fix how we update grad_norm_exp_avg (#371)
9320f9b Fix unsharding local checkpoints w/ torch 2.1 (#369)
b8a174f Merge pull request #367 from allenai/FacePalm
13548fd Merge pull request #365 from allenai/wrap_and_shard
6c0e419 Add gradient clipping warmup (#363)
0afafd6 Fix stale links in README, scripts cleanup (#359)
42dba3c remove data team's stuff (#357)
4bb6966 consolidate Python configs into pyproject.toml, other clean up (#353)
a952f44 minor fixes to kempner docs (#354)
026793e Merge pull request #347 from allenai/epwalsh/block-groups-load-fix
62fc2fe Add two more FSDP wrapping strategies (#355)
4ccf2bd Merge pull request #346 from allenai/shanea/llama-block
da91f34 Merge pull request #317 from allenai/Llama
fd2425f Adds a YAML validator to automatically find the last checkpoint (#348)
1099942 Upload profiler data to remote save folder (#338)
db0756f Merge pull request #335 from allenai/Kempner
5c64338 Merge pull request #343 from allenai/ActivationCheckpointing
558102e Merge pull request #342 from allenai/S3Client
cd73387 Add option to FSDP wrap by groups of blocks (#340)
c1a4519 Fix dtype casting on CPU (#339)
104d1ce Move remaining checkpointing logic to Checkpointer class (#331)
a465caa Merge pull request #337 from allenai/UnshardSkipKeys
4980bad set mcli time limit to null
f974a1d update mitch ish configs
07404f8 Merge pull request #308 from allenai/fine-grained-metrics
4644ff5 Lazily init s3 client (#333)
809fe9d Load state dicts to CPU (#328)
1bff308 ensure bias is created in fp32 (#327)
d4744d0 Bring back global gradient clipping and improve speed of collecting metrics (#326)
54572d3 Add stop_at config option
e63b389 Fix SDP NaN bug (#323)
fddded5 Features to match OpenLM (#302)
d2e84fe Refactor checkpointing, bring back legacy sharded checkpointing as the default (#316)
fed4cf3 Merge pull request #311 from allenai/pass-thru-model-kwargs
a5cd0e6 Merge pull request #304 from allenai/ppl-suite-v3
536d029 Merge pull request #306 from allenai/keep-instance-info
0b5f68d Merge pull request #314 from allenai/ResetOptimizerState
e8bd122 Merge pull request #315 from allenai/MemoryEnvVar
da1f0b8 Merge pull request #313 from allenai/NanCheck
94133da Fixes pyspy script
602968a New-style checkpointing (again) (#307)
973090f implement bytes range for GS
18e061d Merge pull request #303 from allenai/shanea/fix-leftover-data-partitioning
0a1455b Merge pull request #301 from allenai/shanea/fix-s3-keyerror-failures
e7b92a6 comment
6ebd5d3 Add configs for v1.5 mix
8e2b8be Merge pull request #297 from allenai/PerfTests
62dde55 Make resource_path() more robust
900544e Prepare 7B config for MCLI (#295)
309bf84 Merge pull request #294 from allenai/petew/linear-schedule
91f499b Ignore warnings from urllib3, don't print config when it's huge
012e97f Merge pull request #290 from allenai/torch2.1init
aec449c update mcli config
27dd512 MCLI configs (#286)
a2b369a Merge pull request #279 from allenai/petew/train-metrics
5ad0d8c Merge pull request #282 from allenai/rsqrt
cc787ed Merge pull request #277 from allenai/shanea/add-truncated-normal-init
fabda71 Merge pull request #274 from allenai/petew/layer-norm
70a3f4c Merge pull request #280 from allenai/petew/reduce-dtype
2a7f694 Merge pull request #278 from allenai/update-hf-olmo-config
ef85d5c Merge pull request #265 from allenai/LayerNormAffine-ManualLayerNorm-Profiling
2df922b Merge pull request #276 from allenai/petew/sys-metrics
921c254 Merge pull request #275 from allenai/simplify-eff-benchmark
400a1d2 Minor cleanup of grad clipping (#273)
18f3459 fix updating grad_norm_exp_avg (#272)
54dbd48 Merge pull request #238 from allenai/inference-efficiency-pentathlon
95555f4 Refactor how we clip gradients and collect optimizer metrics (#261)
6cc09fe Merge pull request #271 from allenai/PythonProfiling2-UnwindingChanges
41b0663 Merge pull request #269 from allenai/PythonProfiling
2eedf07 Fix speed issue on LUMI with 7B model (#270)
d2abecd Merge pull request #267 from allenai/v2-pii-tagging
5b4c68e fix isort config
c8a2700 Merge pull request #253 from allenai/SavedTokenizer
26e17c3 Merge pull request #264 from allenai/LayerNormAffine-ManualLayerNorm-TurnedOffForSafety
a49f4ec Make Dropout a no-op when p=0.0 (#259)
a33dbb0 make flake8 happy
6b977d0 handle race conditions when saving to NFS on cirrascale (#255)
b4a1491 Merge pull request #250 from allenai/LayerNormAffine
4205a84 Merge pull request #257 from allenai/FasterGlobalIndices
e46b988 fix saving unsharded checkpoints
5fff93a Merge pull request #251 from allenai/soldni/fix-s2-fos
af0a584 Merge pull request #248 from allenai/TokenizerFromFile
7fbdb1c finish W&B runs quietly
9071816 Training improvements (#239)
642d0fa Add support for remote checkpoints and train data files (#237)
e350fd3 Add option to restart with new base LR (#236)
3ef79e1 Merge pull request #230 from allenai/eval-streamline
51a8a00 load state dict on gpu
3e8163e improve config resolution
7bd0ed2 medium script update
27d3538 add V1 mix small+medium configs (#211)
907e38b wait on all ranks until final ckpt dir exists (#235)
2118db5 Merge pull request #232 from allenai/ablations/soldni-gantry
698f859 Added the shuffling story
5508c04 Use numpy for shuffling instead of torch (#231)
952819b Don't reshuffle eval data each "epoch" (#229)
87f6a79 Merge pull request #223 from allenai/soldni/olmo-mixing
e64cf42 Merge pull request #227 from allenai/hf-olmo-tok
970a77c add more tests for memmap dataset
ba84b0b default to saving data indices
d02d4f1 Merge pull request #221 from allenai/faster-convert
acf372e Merge pull request #220 from allenai/hf-integration
43c29d9 Merge pull request #219 from allenai/iterable-dataset-memory-efficient
d3d00f1 Merge pull request #217 from allenai/soldni/lucy-fix
7c866c9 Merge pull request #216 from allenai/petew-cache-attn
05c6d53 clean up
fd1cfe8 Merge pull request #213 from allenai/llm-inference
ccb3869 Merge pull request #212 from allenai/gopher-fix
a80cdc1 Merge pull request #210 from allenai/soldni/filters_improvements
66c4936 fix c4-medium config
fde42f9 Merge pull request #194 from allenai/default-2x-batch-size
ab0b967 Merge pull request #209 from allenai/olmo-mix-1
b376486 Merge pull request #200 from allenai/c4-gopher-dedupe
b1584f9 Merge pull request #207 from allenai/petew-no-par-block
96f8817 Merge pull request #208 from allenai/soldni/tok_sample_code
a244f3a Merge pull request #203 from allenai/error-handling
86060d4 Merge pull request #199 from allenai/packed-evals
992838b Merge pull request #205 from allenai/hatespeech-nsfw-mixers
186fe1b Merge pull request #204 from allenai/nishant_pi_count_ablation
0b55217 Merge pull request #188 from allenai/ft-tagger-dataset
6a36cdf Merge pull request #161 from allenai/AkshitaB-stack-ablations
4074e42 Merge pull request #201 from allenai/soldni/tok_sample_code
58ad163 Merge pull request #197 from allenai/soldni/local_cache
c642d4f Merge pull request #198 from allenai/nishant_add_pi_counts_filter
eccf18c Merge pull request #162 from allenai/c4-gopher
49f9a0e Merge pull request #191 from allenai/save-indices
e89c61f Merge pull request #195 from allenai/code-eval
d33ea74 Merge pull request #193 from allenai/soldni/neox
9bfcde3 Fix secrets name in LUMI.md (#190)
b6fa4d9 add PPL evaluators to medium config
484b089 Merge pull request #189 from allenai/docs
83b39b5 remove unnused thread lock
ebc07f4 Merge pull request #187 from allenai/soldni/tokfile
ed7c0e8 ensure drop_last=True with train data
4d986ed fix speed monitor
2437cdf Merge pull request #181 from allenai/v0-small
3478cb0 Merge pull request #186 from allenai/soldni/tok_improve
348ed33 Merge pull request #185 from allenai/soldni/falcon
b79c3b7 Speed up preprocessing script (#177)
cb2c9cd Merge pull request #184 from allenai/format
72d4ff2 Merge pull request #183 from allenai/attr-merge
47c4ab9 More checkpointing improvements (#182)
a97d1f6 Merge branch 'main' of https://github.com/allenai/LLM into main
2567261 handle empty logzio token
0507c2d Restore dataset correctly when world size changes (#176)
f8eeb22 Merge pull request #178 from allenai/soldni/preview
9b21211 Merge pull request #179 from allenai/fix-tests
0d487c2 Merge pull request #174 from allenai/v0-small
4737c53 Merge pull request #175 from allenai/ClearGPUsFirst
1bdeae6 Merge pull request #173 from allenai/span-fix
87476f7 Prepare 1B baseline run (#170)
434cf67 Merge pull request #172 from allenai/fix-tests
d2442d6 add more tests
6d29ee4 fix dataloader max steps
7ffe204 Merge pull request #171 from allenai/soldni/decontamination-v2
0a485d2 Merge pull request #168 from allenai/DockerImage
27a3f3a Don't be so noisy during startup
1fba808 Merge pull request #165 from allenai/c4-medium-2x-bz
9da0e4b Merge pull request #167 from allenai/v1-small-config
391091c Merge pull request #166 from allenai/soldni/ablations_v2
9020c91 Merge pull request #164 from allenai/add-no-grad
c25f54b Merge pull request #163 from allenai/soldni/ablations
41a9969 syncronize time limits
1cd0b4b Merge pull request #134 from allenai/dependabot/pip/mypy-gte-1.0-and-lt-1.4
2a4031e Merge pull request #160 from allenai/soldni/filter-speedup
5e29078 Merge pull request #159 from allenai/mixer-docs
3e01aa8 Merge pull request #156 from allenai/mixer-deduper
e48dda6 Merge pull request #148 from allenai/CPUUnshard
e252886 Merge pull request #153 from allenai/soldni-patch-1
634ec97 Merge pull request #152 from allenai/soldni/filters_cli
8a20fbb Update LUMI.md
a95fcf2 Update LUMI.md
977c230 Merge pull request #151 from allenai/FewerRcclErrors
7ffef43 Merge pull request #130 from allenai/soldni/books_v2
02b4d15 Merge pull request #150 from allenai/soldni/msgpac
86a0dd3 Merge pull request #149 from allenai/c4-gopher
6dd29d8 Merge pull request #147 from allenai/fasttext-filter-integration
b6aa1c7 Merge pull request #146 from allenai/soldni/filters
bcdceba Merge pull request #142 from allenai/rocm55
91063a0 Merge pull request #144 from allenai/fasttext-filter-integration
ce72e81 Merge pull request #145 from allenai/DocUpdate
a4a6ca4 Merge pull request #141 from allenai/soldni/s2-v4
76a4656 write checkpoint and stop before 48 hours
ad188d8 Merge pull request #143 from allenai/ablation-datasets
7793671 Merge branch 'main' of https://github.com/allenai/LLM into main
dd750ed More tools in Docker
80b6dda Merge pull request #114 from allenai/train-in-house
337a046 Merge pull request #139 from allenai/c4
78cc6f1 Merge pull request #138 from allenai/v2
99c7875 Merge pull request #131 from allenai/soldni/parallel
18f07fb Merge pull request #137 from allenai/rodneykinney-patch-1
d507bcb Merge pull request #135 from allenai/mixer-deduping
9535b56 Merge pull request #92 from allenai/adding-the-stack
3de971d Merge pull request #129 from allenai/filter-rules
358dd74 Merge pull request #106 from allenai/shannons/c4-cleaning
54a2088 Merge pull request #125 from allenai/kylel/cleanup
cc5549e Merge pull request #128 from allenai/format-docs
b36d297 Merge pull request #115 from allenai/cc-news
1bb9fdf Merge pull request #127 from allenai/mixer-config
7352787 Merge pull request #126 from allenai/mixer
67a60e3 Merge pull request #121 from allenai/hate_speech
bcdc961 Merge pull request #124 from allenai/pii
9cc3370 Merge pull request #113 from allenai/soldni/tokenizer
cc4f0cf Merge pull request #110 from allenai/evals
b107613 Merge pull request #111 from allenai/deduper
5a40cba Merge pull request #100 from allenai/url-deduper
bb78562 Merge pull request #107 from allenai/multi-query-attn
6798b67 Merge pull request #109 from allenai/config-load
0b2010e Merge pull request #108 from allenai/kylel/gutenberg-docs
709ecd6 Merge pull request #104 from allenai/70B_take2
df22e64 clean up sequential block
15477c7 Add script to check model size
3a37d69 Make fullgraph compile mode work again (#102)
9f58310 always default fullgraph to False
cc9096e Update LOG
1efefff always disable memory efficient attention
3751ce4 Merge pull request #101 from allenai/Interconnect2
e2940d5 Merge pull request #99 from allenai/petew-LUMI-doc-update
f0fecdd Merge pull request #98 from allenai/TorchCompile
986993a Ensure tokenizer is thread safe (#96)
cf92192 Merge pull request #97 from allenai/Cpus
218270c Merge pull request #94 from allenai/config
3dec0a0 fix readme desciption
2ed1234 Merge pull request #90 from allenai/Torch2-AMD
348d431 Merge pull request #91 from allenai/AkshitaB-patch-1
983e996 Merge pull request #89 from allenai/freeze
29b7ef7 standard-g
c801d4f Save config at start of run (#87)
5dde977 Merge pull request #84 from allenai/merger
5d167cf rename DOLMA -> OLMo (#86)
5607566 Add a beam search implementation and a .generate() method to the model (#83)
9911b78 Merge pull request #85 from allenai/UploadArtifact
ef3e157 Merge pull request #61 from allenai/Torch2
caf1f32 Merge pull request #74 from allenai/merger
47db5a0 Merge pull request #82 from allenai/soldni-patch-1
0ea38ba Merge pull request #81 from allenai/soldni/books
63798ab Merge pull request #76 from allenai/soldni/data-format
793eb37 Merge pull request #72 from allenai/kylel/2023-03/template
c448672 Merge pull request #71 from allenai/soldni/wikipedia
08a48cc Merge pull request #68 from allenai/cc-notes
eb0218c Merge pull request #36 from allenai/adding-the-stack
bf08fb9 Merge pull request #67 from allenai/soldni/data_v3
7e43884 clean up some imports
e3fa695 fix config test
e64a295 Merge pull request #65 from allenai/cc-notes
c38000e add question issue template
ead01b2 clean up issue templates
229bf96 Merge pull request #62 from allenai/soldni/data_v2
d8d6ea1 Merge pull request #35 from allenai/cc-notes
e1a25fb update beaker interactive script
2516195 Add (decoupled) Lion optimizer (#56)
732df46 Merge pull request #40 from allenai/kylel/2023-03/template
26f146f Merge pull request #58 from allenai/michaelw/ib-cluster-tests
d091c33 Write about logging
4aee7b3 Merge pull request #57 from allenai/Logging
ca93d90 Use composer's built-in speed monitor (#55)
8f61cb2 Rename log file
512f514 Create Log.md
495fa26 Add decoupled AdamW, use by default (#46)
03b6c00 Merge pull request #47 from allenai/Lumi3
d411bc6 adjust wd
d1cfceb add config section for speed monitor (#44)
8d0718d add a 70b config (#43)
7e32a2e Merge pull request #42 from allenai/Lumi2
0489748 merge lumi and cirrascale configs (#41)
a867e67 Merge pull request #37 from allenai/lumi
64bd1df Populate W&B config (#39)
06fd2f6 Merge pull request #34 from allenai/soldni/data
7310146 Merge pull request #25 from allenai/kylel/2023-03/scaffold-data
1336284 Minor fixes, add a small 300m param training config (#31)
bf37554 add script for bootstrapping Beaker sessions
c809df4 fix docstrings
18143b0 update labels
b78b2b9 Improve logging (#28)
e6c5707 omit buffers from state dict
386cf60 Add option to omit bias terms (#27)
5222c35 Add triton implementation of FlashAttention (#24)
9430863 Update mypy requirement from <1.1,>=1.0 to >=1.0,<1.2 (#22)
18b1269 improve logging
edf21d7 add some train configs (#21)
35d3325 add speed monitor callback and wandb logger (#16)
141f9ba add comment about composer to train script
b04bfdf get training working (#15)
f0b5e0a comment about bug
206d10b exclude mosaicml 0.13.0
1159dc3 add missing requirement
df12e16 start training CLI
7b3a1a7 add Docker images and run GPU tests on Beaker (#14)
ba20a85 add param initialization
2e45c83 use smaller model for testing
0487d54 composer, improve data collator
9dba1a0 Add optional layer norm to keys and queries (#13)
6171915 clean up
e0b699a Add GPT model implementation (#12)
3afca2f move tokenizer out of data/
e25a9b7 Implement a Tokenizer and MemMapDataset (#10)
6d9e407 Bump actions/checkout from 1 to 3 (#11)
693365d added location for scripts
3c6be35 Merge pull request #3 from allenai/damia-dolma
4eb7334 add torch dependency
7136e02 add boilerplate