Allow specify max_duration in terms of tokens #385

epwalsh · 2023-11-27T19:47:14Z

For example, max_duration="2e12T" means train until 2T tokens. This is much more clear than specifying in terms of steps, which requires doing the math to convert tokens to steps based on batch size and sequence length. It will also make it much easier to resume runs after changing the batch size, as you won't need to change max_duration at all.

epwalsh · 2023-11-27T19:51:26Z

olmo/data/__init__.py

@@ -96,7 +96,6 @@ def build_train_dataloader(train_config: TrainConfig) -> DataLoader:
            seed=train_config.seed,
            shuffle=True,
            drop_last=train_config.data.drop_last,
-            max_examples=train_config.global_train_batch_size * train_config.max_duration,


This wasn't needed anyway and shouldn't change anything since we've always set max_duration to the equivalent of 2T tokens, and our dataset is not quite that big.

2015aroras · 2023-11-27T20:56:09Z

olmo/config.py

    """
-    Maximum number of batches to train for.
+    How long to train for. If specified as an integer (the default), the units are assumped to be steps.


nit: typo

Suggested change

How long to train for. If specified as an integer (the default), the units are assumped to be steps.

How long to train for. If specified as an integer (the default), the units are assumed to be steps.

Also, the distinction seems to be less about the typing and more about the suffix (no suffix vs "T"). It would be good to clarify that the suffix determines whether the unit is steps or tokens. You could even add a separate suffix for steps if you want.

It needs to be backwards compatible though.

dirkgr

You think this is solid enough that I could queue up the 2x batch size for the 65B without trying it in salloc first?

epwalsh · 2023-11-27T21:54:32Z

You think this is solid enough that I could queue up the 2x batch size for the 65B without trying it in salloc first?

Yea I feel good about this.

dirkgr · 2023-11-28T01:30:43Z

Though actually I'll need to fix loading unsharded checkpoints first. So I can't just schedule a job.

epwalsh added 2 commits November 27, 2023 11:45

Allow specify max_duration in terms of tokens

11470bc

break training loop when we reach max_steps

820905f

epwalsh commented Nov 27, 2023

View reviewed changes

epwalsh requested review from 2015aroras and dirkgr November 27, 2023 19:52

2015aroras approved these changes Nov 27, 2023

View reviewed changes

epwalsh added 2 commits November 27, 2023 13:13

Improve docstring, don't stop early

ab0aceb

Merge branch 'main' into epwalsh/max-duration-tokens

c46a5d8

dirkgr approved these changes Nov 27, 2023

View reviewed changes

epwalsh merged commit ff883e5 into main Nov 27, 2023
10 checks passed

epwalsh deleted the epwalsh/max-duration-tokens branch November 27, 2023 21:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow specify max_duration in terms of tokens #385

Allow specify max_duration in terms of tokens #385

epwalsh commented Nov 27, 2023

epwalsh Nov 27, 2023

2015aroras Nov 27, 2023

2015aroras Nov 27, 2023

dirkgr Nov 27, 2023

dirkgr left a comment

epwalsh commented Nov 27, 2023

dirkgr commented Nov 28, 2023

	How long to train for. If specified as an integer (the default), the units are assumped to be steps.
	How long to train for. If specified as an integer (the default), the units are assumed to be steps.

Allow specify max_duration in terms of tokens #385

Allow specify max_duration in terms of tokens #385

Conversation

epwalsh commented Nov 27, 2023

epwalsh Nov 27, 2023

Choose a reason for hiding this comment

2015aroras Nov 27, 2023

Choose a reason for hiding this comment

2015aroras Nov 27, 2023

Choose a reason for hiding this comment

dirkgr Nov 27, 2023

Choose a reason for hiding this comment

dirkgr left a comment

Choose a reason for hiding this comment

epwalsh commented Nov 27, 2023

dirkgr commented Nov 28, 2023