@@ -141,10 +141,11 @@ opam install cudajit # for CUDA backend
141141- ` c_syntax.ml ` provides a functor with default C code generation patterns
142142- ` cc_backend.ml ` uses defaults from ` c_syntax.ml ` with minimal overrides
143143- ` cuda_backend.ml ` overrides more functions for CUDA-specific syntax (e.g., ` __float2half ` )
144- - Both backends must provide ` convert_precision ` for type conversions
144+ - ` metal_backend.ml ` overrides using MSL-specific syntax
145+ - Backends must provide ` convert_precision ` for type conversions
145146- Builtin functions (e.g., type conversions) must be implemented in:
146147 - ` builtins.c ` for C backends
147- - ` builtins_cuda_small .ml` for CUDA backend
148+ - ` builtins_cuda .ml` for CUDA backend, ` builtins_metal.ml ` form Metal backend
148149- When adding new precision types, ensure conversion functions exist in all backend builtins
149150
150151### Syntax Extensions
@@ -166,7 +167,7 @@ opam install cudajit # for CUDA backend
166167
167168## Common Development Tasks
168169
169- ### Adding New Operations
170+ ### Adding New Primitive Operations
170171
1711721 . Add primitive operation to ` arrayjit/lib/ops.ml `
1721732 . Implement interpretation in the same file
@@ -176,7 +177,7 @@ opam install cudajit # for CUDA backend
176177### Debugging Backend Discrepancies
177178
178179When outputs differ between backends:
179- 1 . Compare runtime logs in ` <backend>-<stream >-<stream>.log ` files (might require minimizing test tensors)
180+ 1 . Compare runtime logs in ` <backend>-<device >-<stream>.log ` files (might require minimizing test tensors)
1801812 . Check generated code in ` build_files/*.c ` vs ` *.cu ` / ` *.metal ` for differences
1811823 . Common issues:
182183 - Incorrect type conversion in ` convert_precision ` overrides
@@ -208,5 +209,4 @@ When outputs differ between backends:
208209
209210- Virtual nodes are inlined automatically (controlled by ` virtualize_max_visits ` )
210211- Scalar constants can be inlined via ` inline_scalar_constexprs=true `
211- - Memory sharing optimizations through cross-stream tensor nodes
212- - Backend-specific optimization levels configurable per backend
212+ - Memory sharing optimizations through cross-stream tensor nodes
0 commit comments