Olmo tiny scripts #628

ananyahjha93 · 2024-06-18T22:37:22Z

No description provided.

AkshitaB

Discussed my queries offline with @ananyahjha93

How were model shapes decided? Based on Pythia and then number of parameters.
How about LR? Also ballpark from Pythia.

Other things to note:

Global batch size may also require some ablation

AkshitaB · 2024-06-18T23:00:53Z

scripts/train.py

@@ -248,7 +248,7 @@ def dummy_init_fn(module: torch.nn.Module) -> None:
                    )
                    cfg.save_interval_unsharded = cfg.save_interval

-                if cfg.save_num_unsharded_checkpoints_to_keep < 1:
+                if cfg.save_num_unsharded_checkpoints_to_keep == 0:
                    log.warning(


What if save_num_checkpoints_to_keep is also 0?

it then assumes that you did not want to keep checkpoints at all!

-1 assumes you want to save all checkpoints and so I made it ==0 instead of < 1.

dirkgr

I assume the configs between the sizes are all the same, so I didn't look at all of them.

dirkgr · 2024-06-18T23:17:32Z

configs/tiny/OLMo-300M.yaml

@@ -9,17 +9,15 @@ wandb:
 model:
  d_model: 1024
  n_heads: 16
-  n_layers: 16
+  n_layers: 24
  mlp_ratio: 8
  weight_tying: false
  alibi: false
  rope: true
  flash_attention: true  # not available on AMD


Is now available on AMD

removed the comment

dirkgr · 2024-06-18T23:18:58Z

configs/tiny/OLMo-300M.yaml

  - label: commonsense_qa
    type: downstream

  - label: social_iqa
    type: downstream

-  - label: basic_arithmetic
-    type: downstream
-


What's wrong with these?

ah, basic_arithmetic should be in, others don't provide any signal based on my experience

ah this was commented out saying

# Doesn't work from cache.

Should work with cache v4

dirkgr · 2024-06-18T23:20:43Z

configs/tiny/OLMo-300M.yaml

-stop_at: 100_000
-global_train_batch_size: 2048
-device_train_microbatch_size: 8
+max_duration: 2ep


This means you'll run into this bug: #584
It might not matter. The problem is only that the second epoch will be shuffled the same way the first one is shuffled.

I'll add a stop_at 400k steps!

dirkgr · 2024-06-18T23:22:06Z

configs/tiny/OLMo-300M.yaml

@@ -9,17 +9,15 @@ wandb:
 model:


No DDP section in this file?

dirkgr · 2024-06-18T23:23:31Z

configs/tiny/OLMo-20M.yaml

+  grad_clip_warmup_steps: null
+  grad_clip_warmup_factor: 5


Don't have these settings.

took these from @AkshitaB 's llamaish1-normal-weka.yaml.

removed them for now!

dirkgr · 2024-06-18T23:30:03Z

configs/tiny/OLMo-20M.yaml

+  paths:
+    ######### NON WEB DATA #########
+    # ~> GUTENBERG BOOKS (5.256 GT)
+    - s3://ai2-llm/preprocessed/olmo-mix/v1_6-decontaminated/books/gpt-neox-olmo-dolma-v1_5/part-0-00000.npy


Can you read from weka instead?

was planning to run on pluto, now I can see free nodes on jupiter, making the change!

dirkgr · 2024-06-18T23:30:40Z

configs/tiny/OLMo-300M.yaml

 # Unsharded checkpoints (for ddp)
 save_interval_unsharded: 5000
-save_num_unsharded_checkpoints_to_keep: 3
+save_num_unsharded_checkpoints_to_keep: -1


What does -1 do?

-1 is for keeping all checkpoints, but I'll double check

dirkgr

Approved with a small comment about the long warmup.

dirkgr · 2024-06-21T23:21:43Z

configs/tiny/OLMo-150M.yaml

-  units: tokens
-  t_warmup: 4194304000
-  t_max: 3e12
+  t_warmup: 5000


For normal init, this is a lot of warmup? Not a big deal, but unusual?

smaller models, higher LR, did not take a chance! never bad doing a longer warmup!

dirkgr · 2024-06-21T23:22:15Z

configs/tiny/OLMo-150M.yaml

+max_duration: 1ep
+stop_at: 406_934


Do you need both max_duration and stop_at?

yes, from what I have observed and Dave mentioned the training goes past max_duration if stop_at is not set

dirkgr · 2024-06-21T23:22:29Z

configs/tiny/OLMo-150M.yaml

+  # Doesn't work from cache.
+  # - label: basic_arithmetic
+  #   type: downstream
+


Even with cache v4?

dirkgr · 2024-06-21T23:23:22Z

scripts/beaker/tiny/torchrun-script.sh

      --run_name=$TASK_NAME \
      --wandb.name=$TASK_NAME \
      --wandb.group=$TASK_NAME \
-      --wandb.project=tiny_olmo \
+      --wandb.project=olmo-tiny \
+      --max_grad_norm=2.0 \


Do you want to do this clipping value for all small models?

ah, let me fix this, so the model with clipping value 2.0 does not show any downstream improvement!

olmo/train.py

Co-authored-by: Pete <epwalsh10@gmail.com>

epwalsh · 2024-06-25T16:03:32Z

olmo/train.py

+                        num_fwd_flops=self.model.num_fwd_flops,  # this is per sequence
+                        num_bck_flops=self.model.num_bck_flops,  # this is per sequence


"this is per sequence" ... it's per-token now, right?

ananyahjha93 added 3 commits June 18, 2024 13:04

olmo tiny with ddp configs

f530361

.

2fa2551

final scripts for OLMo tiny

62968ca

ananyahjha93 requested a review from dirkgr June 18, 2024 22:37

final scripts for OLMo tiny

3a3c64f

ananyahjha93 requested a review from AkshitaB June 18, 2024 22:39

AkshitaB reviewed Jun 18, 2024

View reviewed changes

dirkgr requested changes Jun 18, 2024

View reviewed changes

ananyahjha93 added 15 commits June 18, 2024 22:09

updated config based on comments

6f766da

updated config based on comments

18a9592

updated config based on comments

c3483f0

updated config based on comments

fd32714

updated config based on comments

d4d39c1

updated config based on comments

c5c8100

.

76ce0f8

.

2d07c10

script

564902c

script

aec289b

.

5d225f4

.

7d0e4f5

.

cac1430

fix n_layers in 300M

4f4bc74

grad norm 2, 20M

a0f4663

dirkgr approved these changes Jun 21, 2024

View reviewed changes

ananyahjha93 added 6 commits June 24, 2024 14:21

added new config for tiny runs

e51c0a9

added FLOPs logging

89f588a

changelog

00e4a65

black

fc47a4d

config

6a00cf4

added config for 750M

d719277

type

252f470

ananyahjha93 requested a review from epwalsh June 24, 2024 23:25

epwalsh reviewed Jun 24, 2024

View reviewed changes

olmo/train.py Show resolved Hide resolved

ananyahjha93 and others added 13 commits June 24, 2024 16:39

updated config to run on pluto

25e1704

Update olmo/train.py

e512582

Co-authored-by: Pete <epwalsh10@gmail.com>

fixed bug where flops count will not accumulate after resume

6effc3e

60M run

2caad7e

updated model and config from Pete's branch

ed6b9a6

60M run

08a5f91

added back flops calc

db23930

20M run on pluto

dbd0a87

60M run

13a4d52

20M run on pluto

c8e65f8

60M run

c9768da

20M run on pluto

1b52914

60M run

5dbe772

epwalsh reviewed Jun 25, 2024

View reviewed changes

ananyahjha93 added 7 commits June 26, 2024 14:51

150M run on jupiter

71621a1

.

8dba3b3

300M

07b6d4a

.

3330468

Merge branch 'main' into olmo-tiny

943e090

Merge branch 'olmo-tiny' of ssh://github.com/allenai/OLMo into olmo-tiny

ce582c9

added back first_batch

b988efb

epwalsh approved these changes Jun 28, 2024

View reviewed changes

ananyahjha93 merged commit a1f118a into main Jun 28, 2024
12 checks passed

ananyahjha93 deleted the olmo-tiny branch June 28, 2024 16:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Olmo tiny scripts #628

Olmo tiny scripts #628

ananyahjha93 commented Jun 18, 2024

AkshitaB left a comment

AkshitaB Jun 18, 2024

ananyahjha93 Jun 19, 2024

ananyahjha93 Jun 19, 2024

dirkgr left a comment

dirkgr Jun 18, 2024

ananyahjha93 Jun 19, 2024

dirkgr Jun 18, 2024

ananyahjha93 Jun 19, 2024

ananyahjha93 Jun 19, 2024

AkshitaB Jun 21, 2024

dirkgr Jun 18, 2024

ananyahjha93 Jun 19, 2024

dirkgr Jun 18, 2024

ananyahjha93 Jun 19, 2024

dirkgr Jun 18, 2024

ananyahjha93 Jun 19, 2024

ananyahjha93 Jun 19, 2024

dirkgr Jun 18, 2024

ananyahjha93 Jun 19, 2024

dirkgr Jun 18, 2024

ananyahjha93 Jun 19, 2024

dirkgr left a comment

dirkgr Jun 21, 2024

ananyahjha93 Jun 21, 2024

dirkgr Jun 21, 2024

ananyahjha93 Jun 21, 2024

dirkgr Jun 21, 2024

dirkgr Jun 21, 2024

ananyahjha93 Jun 21, 2024

epwalsh Jun 25, 2024

ananyahjha93 Jun 28, 2024

		num_fwd_flops=self.model.num_fwd_flops, # this is per sequence
		num_bck_flops=self.model.num_bck_flops, # this is per sequence

		max_duration: 1ep
		stop_at: 406_934

Olmo tiny scripts #628

Olmo tiny scripts #628

Conversation

ananyahjha93 commented Jun 18, 2024

AkshitaB left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dirkgr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dirkgr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment