Skip to content

Add starter kit for low-budget RunPod workflow#1

Merged
SID-6921 merged 1 commit intomainfrom
starter/full-scale-repo
Apr 2, 2026
Merged

Add starter kit for low-budget RunPod workflow#1
SID-6921 merged 1 commit intomainfrom
starter/full-scale-repo

Conversation

@SID-6921
Copy link
Copy Markdown
Owner

@SID-6921 SID-6921 commented Apr 2, 2026

Adds a practical starter kit to go from fork -> smoke run -> non-record submission prep.\n\nIncludes:\n- RunPod bootstrap and run scripts\n- Submission folder generator\n- README and submission templates\n- Experiment log template\n\nThis is intended for low-budget iterative experimentation before scaling.

Copilot AI review requested due to automatic review settings April 2, 2026 18:24
@SID-6921 SID-6921 merged commit f215f74 into main Apr 2, 2026
2 checks passed
@SID-6921 SID-6921 deleted the starter/full-scale-repo branch April 2, 2026 18:24
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a “starter_kit” workflow intended to help participants go from fork → RunPod bootstrap → smoke/full run → generating a PR-ready records/ submission folder (primarily for non-record iteration).

Changes:

  • Added RunPod bootstrap + smoke/full run shell scripts for low-cost iteration.
  • Added a Python script and templates to generate a submission folder under records/.
  • Added starter documentation and templates (submission README, JSON template, experiment log template).

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
starter_kit/templates/submission.json.template Adds a starter submission.json template for users to fill out.
starter_kit/templates/README_submission_template.md Adds a README template to accompany submissions and include repro instructions.
starter_kit/START_HERE.md Adds step-by-step instructions for using the starter kit and preparing a submission folder.
starter_kit/scripts/prepare_submission.py Adds a generator that creates a new records/<track>/... folder with README/JSON/log placeholders and copies train_gpt.py.
starter_kit/scripts/03_full_run.sh Adds a “full” ~10 minute run script for RunPod.
starter_kit/scripts/02_smoke_run.sh Adds a shorter smoke run script for RunPod.
starter_kit/scripts/01_runpod_bootstrap.sh Adds a bootstrap script to clone a fork and download a small dataset slice.
starter_kit/notes/EXPERIMENT_LOG_TEMPLATE.md Adds an experiment log template for tracking runs/decisions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2 to +8
"author_name": "Your Name",
"github_id": "your_github",
"run_name": "your_run_name",
"track": "non-record",
"val_bpb": 1.2000,
"date": "YYYY-MM-DD",
"notes": "Fill with concise methodology and constraints."
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field names and track value here don’t match the established records/**/submission.json examples (which commonly use keys like author, name, blurb, date, plus track values like 10min_16mb / non-record-16mb / non_record_16mb). Since this is meant to be “PR-ready”, please align this template’s schema/values to the repo’s existing submission format to avoid users generating invalid metadata.

Suggested change
"author_name": "Your Name",
"github_id": "your_github",
"run_name": "your_run_name",
"track": "non-record",
"val_bpb": 1.2000,
"date": "YYYY-MM-DD",
"notes": "Fill with concise methodology and constraints."
"author": "Your Name",
"name": "your_run_name",
"blurb": "Fill with concise methodology and constraints.",
"track": "non_record_16mb",
"date": "YYYY-MM-DD"

Copilot uses AI. Check for mistakes.

```bash
python starter_kit/scripts/prepare_submission.py \
--track non-record \
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example uses --track non-record, but current repo submissions typically encode track names in metadata as 10min_16mb or non-record-16mb/non_record_16mb. Once the generator’s --track values are aligned, please update this example accordingly to avoid users copying a nonstandard track value into their submission metadata.

Suggested change
--track non-record \
--track non_record_16mb \

Copilot uses AI. Check for mistakes.
Comment on lines +10 to +12
parser = argparse.ArgumentParser(description="Create a PR-ready records folder.")
parser.add_argument("--track", choices=["record", "non-record"], required=True)
parser.add_argument("--run-name", required=True)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--track currently accepts record / non-record, but the repo’s submission.json metadata commonly uses track values like 10min_16mb and non-record-16mb/non_record_16mb. To keep the starter kit “PR-ready”, consider updating the CLI choices (and the generated JSON) to match the repository’s established track naming.

Copilot uses AI. Check for mistakes.
Comment on lines +45 to +51
"author_name": args.author_name,
"github_id": args.github_id,
"run_name": args.run_name,
"track": args.track,
"val_bpb": round(args.val_bpb, 4),
"date": date,
"notes": "Fill out details and attach train logs."
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated submission.json schema (author_name, run_name, notes, etc.) differs from the established format in records/**/submission.json (commonly author, name, blurb, date, plus optional size / seed fields). Please update the generated keys so the output folder matches existing submission conventions.

Suggested change
"author_name": args.author_name,
"github_id": args.github_id,
"run_name": args.run_name,
"track": args.track,
"val_bpb": round(args.val_bpb, 4),
"date": date,
"notes": "Fill out details and attach train logs."
"author": args.author_name,
"name": args.run_name,
"blurb": "Fill out details and attach train logs.",
"date": date,
"size": "16mb",
"github_id": args.github_id,
"track": args.track,
"val_bpb": round(args.val_bpb, 4),

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +21
repo_root = Path(__file__).resolve().parents[2]
date = dt.datetime.now().strftime("%Y-%m-%d")
slug = f"{date}_{args.run_name}"
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slug = f"{date}_{args.run_name}" is used directly as a path component. If --run-name contains path separators (e.g. /, ..), this can create unexpected directories or escape the intended records/<track> directory. Sanitize run_name to a safe filename and/or verify out_dir.resolve() stays under track_dir.resolve() before creating it.

Suggested change
repo_root = Path(__file__).resolve().parents[2]
date = dt.datetime.now().strftime("%Y-%m-%d")
slug = f"{date}_{args.run_name}"
# Sanitize run_name to ensure it is safe to use as a single path component
safe_run_name = args.run_name.replace("/", "_").replace("\\", "_")
repo_root = Path(__file__).resolve().parents[2]
date = dt.datetime.now().strftime("%Y-%m-%d")
slug = f"{date}_{safe_run_name}"

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants