Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kempner Configs #335

Merged
merged 17 commits into from
Oct 27, 2023
Merged

Kempner Configs #335

merged 17 commits into from
Oct 27, 2023

Conversation

dirkgr
Copy link
Member

@dirkgr dirkgr commented Oct 22, 2023

No description provided.

Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't gone through all of these steps yet and probably won't get to that today, but if I run into issues later I'll make a PR to update the instructions. I have a couple comments though in light of the fact that we've decided to open-source this repo:

  1. I don't like how we're piling up docs in the root of the repo. Can we move Kempner.md and LUMI.md to a docs/ folder?
  2. Having a scripts/XXX-on-Y.sh for every training config and every platform we run on is not necessary. Can we at least keep it to a single sbatch script per platform and resource requests? See what I did in https://github.com/allenai/LLM/pull/324/files with scripts/sbatch-128.sh.

@dirkgr
Copy link
Member Author

dirkgr commented Oct 24, 2023

Can we at least keep it to a single sbatch script per platform and resource requests?

I don't think this is going to work. Look at how this plays out for the Llama-DefaultLN config: https://github.com/allenai/LLM/blob/5817f8167ce268066b8ef382306bef024d5e090c/scripts/v1_5-mix-medium-llama-on-kempner.sh#L30

I'd like to have one YAML file per model (maybe two because of the lists of files in S3), but then several configs with overrides in the scripts folder. Or maybe in some other folder, but I think we need those extra settings in the sbatch files.

@epwalsh
Copy link
Member

epwalsh commented Oct 25, 2023

Ok, fair enough. Can we at least organize the sbatch scripts a little better? For example:

scripts/lumi/
scripts/kempner/

We could merge the mcli vs non-mcli yaml configs if we implemented a wildcard/glob expansion for S3.. but we can worry about that later.

Copy link
Contributor

@2015aroras 2015aroras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the general organization argument that Pete is making (but don't think it's a deal breaker) and I don't have anything extra to add, so I'm approving.

@dirkgr
Copy link
Member Author

dirkgr commented Oct 26, 2023

I changed the organization. Another review?

Copy link
Contributor

@2015aroras 2015aroras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to add DATA_PATH and the other variables to the lumi scripts (like c4-*-on-lumi.sh)? Otherwise this looks good.

@dirkgr
Copy link
Member Author

dirkgr commented Oct 26, 2023 via email

@2015aroras
Copy link
Contributor

You didn't add to lumi-interactive.sh or olmo-small-ablation-on-lumi.sh either, but that's probably fine if you don't care about the c4 scripts too.

Copy link
Contributor

@2015aroras 2015aroras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to update the mcli configs (configs/mcli/*.yaml) to account for renaming the train configs. Other than that LGTM

@dirkgr
Copy link
Member Author

dirkgr commented Oct 27, 2023

Done, but we have some mcli yamls in config/, and some in scripts/?

@epwalsh
Copy link
Member

epwalsh commented Oct 27, 2023

Oh, Ananya's stuff? I didn't see that until now. We should just move those to configs/mcli/.

@dirkgr
Copy link
Member Author

dirkgr commented Oct 27, 2023

Is this good to go?

Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dirkgr dirkgr merged commit db0756f into main Oct 27, 2023
10 checks passed
@dirkgr dirkgr deleted the Kempner branch October 27, 2023 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants