Skip to content

Commit 22d78d8

Browse files
committed
README cleanups
1 parent 10ea9c6 commit 22d78d8

File tree

2 files changed

+13
-8
lines changed

2 files changed

+13
-8
lines changed

CLAUDE.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -46,8 +46,8 @@ opam install cudajit # for CUDA backend
4646
- `nn_blocks.ml`: Basic neural network building blocks (transformers, attention, etc.)
4747
- `ocannl.ml`: Re-exports of all public modules for backward compatibility
4848

49-
- `tensor/`: Framework internals (separate package `ocannl_tensor`)
50-
- `tensor.ml/mli`: Main tensor type and operations
49+
- `tensor/`: Framework internals (separate library `ocannl_tensor`)
50+
- `tensor.ml/mli`: Main tensor type and operation construction
5151
- `shape.ml/mli`: Shape inference system (see detailed docs there for einsum notation)
5252
- `operation.ml`: Tensor operations and DSL modules
5353
- `row.ml`: Row variables for shape inference
@@ -59,9 +59,10 @@ opam install cudajit # for CUDA backend
5959
- `backend_intf.ml`: Backend interface definitions
6060
- `assignments.ml`: High-level assignment-based IR
6161
- `low_level.ml`: Low-level for-loop based IR
62-
- `tnode.ml`: Tensor node representation
63-
- `indexing.ml`: Array indexing and projections
62+
- `tnode.ml`: Tensor node representation (partially user-facing)
63+
- `indexing.ml`: Array indexing and projections (partially user-facing)
6464
- `*_backend.ml`: Device-specific backend implementations
65+
- `context.ml`: runtime consistency for routines, user-facing interface
6566

6667
- `test/`: Integration tests and tutorials
6768
- `bin/`: Command-line utilities

README.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,10 @@ This is very tentative.
6969
* Add convnet examples starting with MNIST.
7070
* Add a GPT-2 or Llama style example. Tokenization using llama.cpp extracted tokenizer.
7171
* **0.7: CPU-style performance and memory efficiency.**
72+
* Cleanup of deprecated streams functionality.
73+
* Migrating from the "hosted tensor" idea to always requiring a context when accessing tensors and dealing with devices directly.
74+
* Optimizations: loop invariant lifting and common subexpression elimination.
75+
* Universal Pool Allocator.
7276
* Milestone phrasing: Enhancements for: inlining-related and simplification-related optimizations, memory management, session management.
7377
* **0.7.1: HIP backend (AMD hardware) and WebGPU backend.**
7478
* **0.8: GPU-style performance -- low hanging fruit.**
@@ -82,7 +86,7 @@ This is very tentative.
8286
* Add concatenation to the einsum syntax (an axis that isq a concatenation of two axes each from another tensor); it's a generalization of stacking tensors.
8387
* **0.9: Optimize performance: program search.**
8488
* Instead of dynamic scheduling as in tinygrad, we can schedule statically by program search.
85-
* We should also reproduce the search that tinygrad is doing.
89+
* We should also reproduce the search that tinygrad is doing. Inspiration: Halide.
8690
* Check which optimizations are missing against the implementation of [llm.c](https://github.com/karpathy/llm.c).
8791
* Milestone phrasing: Program search with execution-based per-backend or aggregate-of-backends cost functions. Starting with augmenting the tiling and layout mechanisms from v0.8 with cost functions, progressing to a broader range of code graph rewriting rules.
8892
* **1.0: Few documentation gaps, some degree of feature completeness, ergonomics, safety.**
@@ -153,18 +157,18 @@ OCANNL follows different design choices than [OWL](https://ocaml.xyz/). For exam
153157

154158
Although the project is called `ocannl`, the main package is called `neural_nets_lib`, to avoid the (opam linter's) complaint that the name can be confused with other packages. This also clarifies that `ocannl` is composed of `arrayjit` and `neural_nets_lib`.
155159

156-
The dependency on `cudajit` and `metal` is optional, so you have to install them first to enable the CUDA or Apple Metal backends.
160+
The dependency on `cudajit` is optional so you have to install it first to enable the CUDA backend. The dependency on `metal` is MacOS-specific but automatic.
157161

158162
### Code Organization
159163

160164
The codebase is organized to separate user-facing recipes from framework internals:
161165

162166
- **`lib/`**: User-facing recipes and utilities
163167
- `train.ml` - Training utilities and optimizers
164-
- `nn_blocks.ml` - Neural network building blocks (transformers, attention, etc.)
168+
- `nn_blocks.ml` - Neural network building blocks (transformers, attention, convolution, etc.)
165169
- `ocannl.ml` - Re-exports for backward compatibility
166170

167-
- **`tensor/`**: Framework internals (separate package `ocannl_tensor`)
171+
- **`tensor/`**: Framework internals (separate library `ocannl_tensor`)
168172
- `tensor.ml/mli` - Core tensor type and operations
169173
- `shape.ml/mli` - Shape inference system
170174
- `operation.ml` - Tensor operations and DSL modules

0 commit comments

Comments
 (0)