Warmstart automation #302

le1nux · 2025-02-13T22:44:34Z

What does this PR do?

This PR simplifies warmstarts from a checkpoint by adding a warmstart entrypoint.
We pass the warmstart config and the checkpoint info file to the endpoint. The checkpoint info file contains the path information for the model and optimizer checkpoint. The checkpoint path is injected into the config via a pydantiv env resolver.

Checklist before submitting final PR

My PR is minimal and addresses one issue in isolation
I have merged the latest version of the target branch into this feature branch
I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
I have run a sample config for model training
I have checked that all tests run through (python tests/tests.py)
I have updated the internal changelog (CHANGELOG_DEV.md)

fromm-m

great work!

flxst

Looks good, but I think the end2end test might not be sufficient, see my comment.

.pre-commit-config.yaml

src/modalities/__main__.py

src/modalities/config/config.py

tests/end2end_tests/test_fsdp_warmstart.py

flxst

LGTM! Added a few comments that should perhaps be addressed, nothing serious though.

tutorials/warmstart/configs/pre_training_config.yaml

tutorials/warmstart/configs/warmstart_config.yaml

tutorials/warmstart/scripts/pre_train_and_warmstart.sh

flxst · 2025-02-20T10:20:47Z

tutorials/warmstart/scripts/pre_train_and_warmstart.sh

+# cd to the directory of the script (absolute pathe)
+cd "$(dirname "$0")"
+
+rm -rf ../data/


I wonder if we should move this line to the end of the shell script, right above the final "Finished warmstart example" message. This would clean up the temporary data right after the test, instead of cleaning up the data of a previous test at the beginning. (This is what is done in the case of the getting started example test as well).

Co-authored-by: Felix Stollenwerk <felix.stollenwerk@ai.se>

le1nux added 6 commits February 13, 2025 18:31

feat: checkpoint info now logged along with the actual checkpoint saving

f860e94

refactor: added custom resolver functions for load_app_config_dict

efc26da

feat: added warmstart entrypoint

b56f7ca

chore: added missing danish test data

cb1070a

chore: fixed failing unit test

6437d81

feat: added check for checkpoint_info_file correctness

e67e795

le1nux requested review from flxst, fromm-m and mali-git February 13, 2025 22:44

le1nux self-assigned this Feb 14, 2025

le1nux added the enhancement New feature or request label Feb 14, 2025

chore: added missing base_freq to remaining test configs

bc37be2

fromm-m approved these changes Feb 14, 2025

View reviewed changes

flxst requested changes Feb 14, 2025

View reviewed changes

le1nux added 6 commits February 19, 2025 16:35

chore: changed pre-commit black python version back to 3.11

1cfc6b2

chore: added review changes

d3b3613

feat: improved logging for checkpoint loading and saving

de000ce

feat: added warmstart example

1725aae

feat: added warmstart example to end2end test

c3f43b8

chore: added *.pbin to .gitignore

adb9986

le1nux requested review from flxst and fromm-m February 19, 2025 15:42

chore: Merge branch 'main' into warmstart_automation_simple

83ce08c

flxst approved these changes Feb 20, 2025

View reviewed changes

le1nux and others added 5 commits February 20, 2025 12:47

Update tutorials/warmstart/configs/pre_training_config.yaml

0f73733

Co-authored-by: Felix Stollenwerk <felix.stollenwerk@ai.se>

Update tutorials/warmstart/configs/warmstart_config.yaml

eb44ac1

Co-authored-by: Felix Stollenwerk <felix.stollenwerk@ai.se>

Update tutorials/warmstart/scripts/pre_train_and_warmstart.sh

24bba06

Co-authored-by: Felix Stollenwerk <felix.stollenwerk@ai.se>

chore: Merge branch 'main' into warmstart_automation_simple

ae8e6dd

chore: removing now temporary files in warmstart example

08c8391

le1nux merged commit 0201531 into main Feb 20, 2025
3 checks passed

le1nux deleted the warmstart_automation_simple branch February 20, 2025 12:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warmstart automation #302

Warmstart automation #302

Uh oh!

le1nux commented Feb 13, 2025

Uh oh!

fromm-m left a comment

Uh oh!

flxst left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

flxst left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

flxst Feb 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Warmstart automation #302

Warmstart automation #302

Uh oh!

Conversation

le1nux commented Feb 13, 2025

What does this PR do?

Checklist before submitting final PR

Uh oh!

fromm-m left a comment

Choose a reason for hiding this comment

Uh oh!

flxst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

flxst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

flxst Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

flxst Feb 20, 2025 •

edited

Loading