multimodal_research

HPP 10K multimodal modeling (CGM, DEXA, retina, metabolites): loaders in data/, LeJEPA-style pretraining in training/.

Setup

uv venv && source .venv/bin/activate
uv pip install -r requirements.txt

Training

This machine: python -m training --max-steps 10 (uses CPU if no CUDA).
GPU (mcluster11): python scripts/train_elysium_mcluster11.py --max-steps 10 — needs passwordless ssh mcluster11 and the same NFS project path on genie and the cluster.

Slurm metadata and logs land in elysium_runs/ (gitignored). Optional: scripts/train_elysium_local.py runs Elysium without SSH (no cluster GPU).

Training writes embeddings/<run_id>/checkpoint.pt and embeddings.h5; add --run-eval to run Ridge probes against the tabular baseline table (prints EVAL_SCORE=...). Or alone: python -m eval.run_eval --embeddings embeddings/<run_id>/embeddings.h5.

Autoresearch Agents

Autonomous ML research runs through Claude Code (.claude/agents/) and OpenCode (opencode.json + .opencode/agents/).

Supervisor (talk to this one): your single conversation partner for fleet status and priorities.

claude --agent supervisor
# or:
opencode run --agent supervisor

It reads training/results.tsv and the shared diary, then answers questions like "how are agents doing?" and writes your ideas as directives workers pick up on their next cycle.

Research workers run one per worktree on autoresearch/agent-<suffix> branches. Start a loop:

# Create worktree (once):
bash scripts/agent_worktree.sh a

# Start loop (or ask the supervisor to do it):
bash scripts/supervisor_spawn_agent.sh a
# Gemini CLI instead of OpenCode: AGENT_CLI=gemini bash scripts/supervisor_spawn_agent.sh a
# logs → agent_logs/agent-a.log

Training from a shell: use bash (genie's default shell is often tcsh, which rejects $(…) and prints Illegal variable name.). training/run_experiment.sh always runs on mcluster11 GPU via Elysium.

Slurm Requested node configuration is not available: override GPU spec with export ELYSIUM_TRAINING_GPU=1 or e.g. L40S:1.

Remote job uv: command not found: GPU nodes don't have uv; run_training_elysium.py uses .venv/bin/python on the cluster. Ensure uv sync has created .venv at the NFS project path.

Quiet Elysium: run_training_elysium.py is quiet by default (RUN_TRAINING_ELYSIUM_QUIET=1). Use --human for live output.

Why is --max-steps 1 still minutes? Queue time + CUDA init dominate; one step is cheap.

Agent logs: agent_logs/agent-<suffix>.log; remote training → elysium_runs/collection_*/0000/stdout.log.

Tests

python -m pytest tests/ -q

Data

Building H5s from the cohort requires MULTIMODAL_BENCHMARK_CSV (see env.example); processed H5 paths are under data/constants.py. Large files stay out of git (*.h5 ignored).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

multimodal_research

Setup

Training

Autoresearch Agents

Tests

Data

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.claude		.claude
.gemini		.gemini
.opencode/agents		.opencode/agents
agent_logs		agent_logs
archive		archive
data		data
embeddings		embeddings
eval		eval
mohets_simple		mohets_simple
scripts		scripts
tests		tests
training		training
.gitignore		.gitignore
1		1
AUTORESEARCH.md		AUTORESEARCH.md
CLAUDE.md		CLAUDE.md
GEMINI.md		GEMINI.md
README.md		README.md
env.example		env.example
opencode.json		opencode.json
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

multimodal_research

Setup

Training

Autoresearch Agents

Tests

Data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages