Skip to content

Commit cee3ab0

Browse files
committed
Updated AGENTS.md, by GPT5-high
Prompt: Gather information about the OCANNL project: read CLAUDE.md , docs/slides-basics_backprop_training_codegen.md , lib/syntax_extensions.md , lib/shape_inference.md , arrayjit/lib/anatomy_of_a_backend.md . Condensate it into insights, guidelines, and research tips for a programmer tasked with developing new features and debugging for OCANNL. Update the file AGENTS.md with all that knowledge. When in doubt, keep it concise. Summary by GPT5: Updated AGENTS.md with a concise, developer-focused guide synthesizing the requested docs. What I added Developer Cheatsheet: Overview of packages, execution model, key types, and backends. Syntax Extensions: Practical %op and %cd usage patterns, accumulation ops, projections, array node references, inline params, and the numeric pow operator. Shape & Projection Inference: End-to-end flow, monomorphic inference, broadcasting rows, projections via union-find, and convolution padding notes. Backend Anatomy: Frontend modules, backend interfaces/impls, schedulers, lifting to user-facing backends, and lifecycle. Scheduling/Streams/Transfers: Stream semantics, merge buffers, streaming vs copy modes, and automated host transfers. Debugging & Tracing: Tips for logs, PPX labeling, shape inspection, streams/merges checks, and backend selection strategy. Adding Features: Steps for new ops, tensor APIs, shape rules, backend codegen anchors, and documentation/testing notes. Testing & Validation: Running subsets, golden file diffs, and sane backend coverage. Research Tips: High-value files to read, conceptual comparisons, and knobs to experiment with. Key file changed AGENTS.md — extended with new sections; corrected op path to arrayjit/lib/ops.ml.
1 parent 58a0af5 commit cee3ab0

File tree

1 file changed

+67
-3
lines changed

1 file changed

+67
-3
lines changed

AGENTS.md

Lines changed: 67 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
- bin/: Executable examples and demos (e.g., `hello_world.ml`, `moons_benchmark.ml`).
77
- test/: Expect and inline tests grouped by topic (`einsum/`, `operations/`, `training/`, `ppx/`).
88
- docs/: Slides and reference docs; datasets/: helper data; build_files/ and log_files/: generated artifacts.
9-
- Key config: copy `ocannl_config.example` to `ocannl_config` and adjust backend.
9+
- Global configuration explained in `ocannl_config.example`.
1010

1111
## Build, Test, and Development Commands
1212
- opam deps: `opam install . --deps-only` (OCaml ≥ 5.3 per `dune-project`).
@@ -30,6 +30,70 @@
3030
- PRs: clear description, linked issues, reproduction or `dune runtest` output, and mention backend(s) exercised. Include any new example commands.
3131

3232
## Configuration & Backends
33-
- Backend selection and runtime options are read from `ocannl_config` and `OCANNL_BACKEND`. See `ocannl_config.example` for available keys (debug logging, device, precision).
34-
- For CUDA/Metal specifics and debugging, consult README “Using the tracing debugger” and adjust config accordingly.
33+
- Backend selection and runtime options are read from the file `ocannl_config` in the current directory (or from test/config/ocannl_config for tests), from environment variables e.g. `OCANNL_BACKEND=sync_cc` (but it is not reliable for tests other than env var `OCANNL_BACKEND` which has dedicated support), commandline arguments e.g. `--ocannl_backend=sync_cc` (but this doesn't work with `dune test` which runs multiple tests). See `ocannl_config.example` for available keys (debug logging, device, precision).
3534

35+
**Developer Cheatsheet**
36+
- **Packages:** `arrayjit` (compiler/backends) and `neural_nets_lib` (DL framework). Build high-level tensors in `lib/`, lower/compile in `arrayjit/`.
37+
- **Execution Model:** Express computations as tensors → derive forward/backprop → infer shapes/projections → lower `Assignments` → compile/link per-backend → run on streams (CPU cores/CUDA streams).
38+
- **Key Types:** `Tensor.t` (value/grad nodes), `Tnode.t` (node-level arrays), `Assignments.comp` (accumulating statements), `Indexing.projections` (loop derivation), `Ndarray.t` (host/device buffers).
39+
- **Backends:** `sync_cc`/`multicore_cc` (C via schedulers), `gccjit`, `cuda`, `metal` (if built). Use `Backends.fresh_backend ()` in examples/tests.
40+
41+
**Syntax Extensions**
42+
- **`%op` (operations):** Builds differentiable tensors using `Operation.TDSL`.
43+
- **Inline params:** `{ w; o = [ dims ] }` creates parameters; with initialization requires `Operation.PDSL` in scope.
44+
- **Convenience:** Regular OCaml works for many tensor expressions; `%op` mainly improves labels and inline decls.
45+
- **`%cd` (code):** Builds `Assignments.comp` for forward/backward code via `Operation.NTDSL` (non‑diff tensors inside).
46+
- **Accum ops:** Infix assignment operators pick accumulation: `=+`, `=-`, `=*`, `=/`, `=**`, and variants `=:+` etc.
47+
- **Projections:** Provide `~projections` or rely on mnemonics (`v`, `v1`, `v2`, `g`, `g1`, `g2`, `lhs`, `rhs1`, `rhs2`) to select slots.
48+
- **Array refs:** `.v` for value node, `.grad` for gradient node, `.merge` for stream merge buffers.
49+
- **Embedded tensors:** `%cd` auto‑inserts forward code for created tensors and tracks `embedded_nodes` to avoid recompute.
50+
- **Pow operator:** Use `**.` for pointwise power with numeric exponent; gradients are specialized (fast path for p=1,2).
51+
- **Generalized einsum:** Use `~logic:"...=>..."` for concise projections; shapes use `batch|input->output` notation.
52+
53+
**Shape & Projection Inference**
54+
- **Pipeline:** `propagate_shapes` during build; `finish_inference` before jitting closes shapes (LUB or 1/broadcastable); then `derive_projections` freshens projection ids to avoid cross‑op contamination.
55+
- **Monomorphic now:** Existential `row`/`dim` vars; future polymorphism could reuse `%op ~config` functions with abstract namespaces.
56+
- **Rows:** Three rows per tensor: batch | input -> output; broadcasting can happen “in the middle” with fixed head/tail axes.
57+
- **Indexing:** Projections unify per‑assignment instances (union‑find), yield iterators for product dims; dim=1 maps to `Fixed_idx 0`.
58+
- **Convolutions:** Low‑level buffers include padding in `dims`; high‑level shapes exclude it—padding becomes observable after forcing dims.
59+
60+
**Backend Anatomy**
61+
- **Frontend modules:** `Task`, `Ops`, `Ndarray`, `Tnode` (per‑device arrays, can be virtual), `Indexing`, `Assignments`, `Low_level`.
62+
- **Interfaces:** `Backend_intf` (records parametric in `'buffer_ptr`, `'dev`, `'runner`, `'event`); `Backend_impl` for implementations; `C_syntax` helpers.
63+
- **Implementations:** `Cc_backend`, `Gcc_backend`, `Cuda_backend`, `Metal_backend` plus `Schedulers` for CPU parallelism.
64+
- **Lifting:** `Backends.Add_device` + `Schedulers` → CPU backends; `Raise_backend` maps `Low_level` to `Assignments` and adds buffer retrieval + syncing.
65+
- **Lifecycle:** Compile routines in batches; link per‑stream context; free arrays with `Backends.finalize`.
66+
67+
**Scheduling, Streams, Transfers**
68+
- **Streams:** Loose notion (CPU core/CUDA stream). Linking binds compiled code to a stream; scheduling preserves W→R order via events.
69+
- **Transfers:** `from_host`, `to_host`, `device_to_device` are scheduled like compute; destination waits non‑blocking on source.
70+
- **Merge buffers:** One per stream; use `.merge` in `%cd` (e.g., `[%cd p.grad =+ p.grad.merge]`). Modes: `Streaming_for` (source ptr, may fall back to `Copy` across devices) and `Copy` (physical buffer grown as needed).
71+
- **Auto host transfers:** If `automatic_host_transfers`:
72+
- `Tnode.do_read/do_write` perform sync and schedule `to_host`/sync; fields: `prepare_read`, `prepare_write`, `devices_not_lagging_host`.
73+
- `Raise_backend.sync_routine` pre‑schedules `from_host` for untagged inputs; `update_writer_event` tags writers and sets `to_host`.
74+
- `Raise_backend.alloc_if_needed` schedules `from_host` for constants and tags device.
75+
76+
**Debugging & Tracing**
77+
- **Logs:** Enable tracing in config; `%cd` supports block comments to annotate generated files; debug prints/plots appear in logs.
78+
- **PPX tips:** Keep `%op` parameters non‑nested when labels matter; avoid capturing inner function params for labels.
79+
- **Shape issues:** Inspect `Tensor.shape` after `finish_inference`; watch for padding effects when dims are forced.
80+
- **Streams/merges:** Mismatch of expected vs. scheduled merge node is detected at scheduling; check `.merge` usage and stream contexts.
81+
- **Backend checks:** Start with `sync_cc` for clarity; move to `multicore_cc`/`cuda` once semantics are validated.
82+
83+
**Adding Features (Guidelines)**
84+
- **New op:** Define in `arrayjit/lib/ops.ml` + `Ir.Ops`; add infix if needed; implement forward/backprop with `%cd` (use `~projections`).
85+
- **Tensor API:** Prefer small composable helpers in `lib/operation.ml`; mirror `%op` conveniences when useful.
86+
- **Shape rules:** Add constraints in `lib/shape.ml` and rows in `lib/row.ml`; ensure `propagate_shapes` derives intended LUBs; update `derive_projections` if new projection forms.
87+
- **Backend codegen:** Prefer `Low_level` lowering hooks; reuse `C_syntax`; keep kernel/routine boundaries stable for batching.
88+
- **Docs/tests:** Add `%expect` examples under `test/` showing shapes, projections, and generated code snippets.
89+
90+
**Testing & Validation**
91+
- **Unit slices:** Run subsets like `dune runtest test/einsum` or `test/operations` to iterate quickly.
92+
- **Golden files:** Many tests diff emitted `.ll/.c/.cu/.metal`; update expected outputs only when semantics are intended.
93+
- **Backends in CI:** Use `OCANNL_BACKEND=sync_cc` locally first; selectively exercise `cuda`/`metal` if available.
94+
95+
**Research Tips**
96+
- **Read paths:** `lib/operation.ml` (ops), `lib/tensor.ml` (graph), `lib/shape.ml`/`lib/row.ml` (inference), `arrayjit/lib/*backend*.ml` (runtimes), `arrayjit/lib/indexing.ml` (projections), `arrayjit/lib/low_level.ml` (loops).
97+
- **Compare designs:** Multi‑stream + merge buffers vs. typical single‑stream AD frameworks; generalized einsum for projections vs. manual loops.
98+
- **Trace small models:** Use `bin/micrograd_demo*.ml` and `bin/moons_demo*.ml` with `%cd` comments and higher log level to understand pipeline.
99+
- **Experiment knobs:** Toggle `automatic_host_transfers`, switch backends, vary precision, inspect shapes before/after jitting.

0 commit comments

Comments
 (0)