Skip to content

Conversation

@XkunW
Copy link
Contributor

@XkunW XkunW commented Mar 12, 2025

PR Type

[Feature | Fix]

Short Description

  • Retire launch_server script, the launch command now takes care of setting environment variables and runs the sbatch command directly
  • Updated default behaviour for launching a model that's absent from config, only send a warning msg if the model weights exist
  • Replaced all os.path methods with pathlib
  • Moved singularity containers to model-weights and added a cache config
  • Added a json file to keep track of launch params and server address for each job
  • Updated logging file structure
  • Updated CLI integration tests to accommodate the changes, split the launch_command_model_not_found test into two
  • Minor updates to various variable names
  • Fixed Qwen2.5-Math model config issues

Tests Added

test_launch_command_model_not_in_config_with_weights
test_launch_command_model_not_found

XkunW and others added 29 commits February 28, 2025 16:59
…nstead of raising an exception if the model weights exist but missing relevant config
… addr writing, update binding to the specific model weights dir
@jwilles jwilles self-assigned this Mar 12, 2025
@jwilles jwilles requested review from amrit110, fcogidi and jwilles March 12, 2025 21:04
@jwilles jwilles removed their assignment Mar 12, 2025
Copy link
Member

@amrit110 amrit110 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I assume it has been tested on the cluster too, I'm going to do that as well. Great refactor!

@XkunW XkunW merged commit 4a9b41c into develop Mar 13, 2025
4 checks passed
@XkunW XkunW deleted the retire_launch_script branch March 13, 2025 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants