Add starter kit for low-budget RunPod workflow#1
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a “starter_kit” workflow intended to help participants go from fork → RunPod bootstrap → smoke/full run → generating a PR-ready records/ submission folder (primarily for non-record iteration).
Changes:
- Added RunPod bootstrap + smoke/full run shell scripts for low-cost iteration.
- Added a Python script and templates to generate a submission folder under
records/. - Added starter documentation and templates (submission README, JSON template, experiment log template).
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| starter_kit/templates/submission.json.template | Adds a starter submission.json template for users to fill out. |
| starter_kit/templates/README_submission_template.md | Adds a README template to accompany submissions and include repro instructions. |
| starter_kit/START_HERE.md | Adds step-by-step instructions for using the starter kit and preparing a submission folder. |
| starter_kit/scripts/prepare_submission.py | Adds a generator that creates a new records/<track>/... folder with README/JSON/log placeholders and copies train_gpt.py. |
| starter_kit/scripts/03_full_run.sh | Adds a “full” ~10 minute run script for RunPod. |
| starter_kit/scripts/02_smoke_run.sh | Adds a shorter smoke run script for RunPod. |
| starter_kit/scripts/01_runpod_bootstrap.sh | Adds a bootstrap script to clone a fork and download a small dataset slice. |
| starter_kit/notes/EXPERIMENT_LOG_TEMPLATE.md | Adds an experiment log template for tracking runs/decisions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "author_name": "Your Name", | ||
| "github_id": "your_github", | ||
| "run_name": "your_run_name", | ||
| "track": "non-record", | ||
| "val_bpb": 1.2000, | ||
| "date": "YYYY-MM-DD", | ||
| "notes": "Fill with concise methodology and constraints." |
There was a problem hiding this comment.
The field names and track value here don’t match the established records/**/submission.json examples (which commonly use keys like author, name, blurb, date, plus track values like 10min_16mb / non-record-16mb / non_record_16mb). Since this is meant to be “PR-ready”, please align this template’s schema/values to the repo’s existing submission format to avoid users generating invalid metadata.
| "author_name": "Your Name", | |
| "github_id": "your_github", | |
| "run_name": "your_run_name", | |
| "track": "non-record", | |
| "val_bpb": 1.2000, | |
| "date": "YYYY-MM-DD", | |
| "notes": "Fill with concise methodology and constraints." | |
| "author": "Your Name", | |
| "name": "your_run_name", | |
| "blurb": "Fill with concise methodology and constraints.", | |
| "track": "non_record_16mb", | |
| "date": "YYYY-MM-DD" |
|
|
||
| ```bash | ||
| python starter_kit/scripts/prepare_submission.py \ | ||
| --track non-record \ |
There was a problem hiding this comment.
The example uses --track non-record, but current repo submissions typically encode track names in metadata as 10min_16mb or non-record-16mb/non_record_16mb. Once the generator’s --track values are aligned, please update this example accordingly to avoid users copying a nonstandard track value into their submission metadata.
| --track non-record \ | |
| --track non_record_16mb \ |
| parser = argparse.ArgumentParser(description="Create a PR-ready records folder.") | ||
| parser.add_argument("--track", choices=["record", "non-record"], required=True) | ||
| parser.add_argument("--run-name", required=True) |
There was a problem hiding this comment.
--track currently accepts record / non-record, but the repo’s submission.json metadata commonly uses track values like 10min_16mb and non-record-16mb/non_record_16mb. To keep the starter kit “PR-ready”, consider updating the CLI choices (and the generated JSON) to match the repository’s established track naming.
| "author_name": args.author_name, | ||
| "github_id": args.github_id, | ||
| "run_name": args.run_name, | ||
| "track": args.track, | ||
| "val_bpb": round(args.val_bpb, 4), | ||
| "date": date, | ||
| "notes": "Fill out details and attach train logs." |
There was a problem hiding this comment.
The generated submission.json schema (author_name, run_name, notes, etc.) differs from the established format in records/**/submission.json (commonly author, name, blurb, date, plus optional size / seed fields). Please update the generated keys so the output folder matches existing submission conventions.
| "author_name": args.author_name, | |
| "github_id": args.github_id, | |
| "run_name": args.run_name, | |
| "track": args.track, | |
| "val_bpb": round(args.val_bpb, 4), | |
| "date": date, | |
| "notes": "Fill out details and attach train logs." | |
| "author": args.author_name, | |
| "name": args.run_name, | |
| "blurb": "Fill out details and attach train logs.", | |
| "date": date, | |
| "size": "16mb", | |
| "github_id": args.github_id, | |
| "track": args.track, | |
| "val_bpb": round(args.val_bpb, 4), |
| repo_root = Path(__file__).resolve().parents[2] | ||
| date = dt.datetime.now().strftime("%Y-%m-%d") | ||
| slug = f"{date}_{args.run_name}" |
There was a problem hiding this comment.
slug = f"{date}_{args.run_name}" is used directly as a path component. If --run-name contains path separators (e.g. /, ..), this can create unexpected directories or escape the intended records/<track> directory. Sanitize run_name to a safe filename and/or verify out_dir.resolve() stays under track_dir.resolve() before creating it.
| repo_root = Path(__file__).resolve().parents[2] | |
| date = dt.datetime.now().strftime("%Y-%m-%d") | |
| slug = f"{date}_{args.run_name}" | |
| # Sanitize run_name to ensure it is safe to use as a single path component | |
| safe_run_name = args.run_name.replace("/", "_").replace("\\", "_") | |
| repo_root = Path(__file__).resolve().parents[2] | |
| date = dt.datetime.now().strftime("%Y-%m-%d") | |
| slug = f"{date}_{safe_run_name}" |
Adds a practical starter kit to go from fork -> smoke run -> non-record submission prep.\n\nIncludes:\n- RunPod bootstrap and run scripts\n- Submission folder generator\n- README and submission templates\n- Experiment log template\n\nThis is intended for low-budget iterative experimentation before scaling.