README cleanups

lukstafi · lukstafi · commit 22d78d8b3231 · 2025-09-10T10:30:54.000+02:00
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -46,8 +46,8 @@ opam install cudajit  # for CUDA backend
   - `nn_blocks.ml`: Basic neural network building blocks (transformers, attention, etc.)
   - `ocannl.ml`: Re-exports of all public modules for backward compatibility
 
-- `tensor/`: Framework internals (separate package `ocannl_tensor`)
-  - `tensor.ml/mli`: Main tensor type and operations
+- `tensor/`: Framework internals (separate library `ocannl_tensor`)
+  - `tensor.ml/mli`: Main tensor type and operation construction
   - `shape.ml/mli`: Shape inference system (see detailed docs there for einsum notation)
   - `operation.ml`: Tensor operations and DSL modules
   - `row.ml`: Row variables for shape inference
@@ -59,9 +59,10 @@ opam install cudajit  # for CUDA backend
     - `backend_intf.ml`: Backend interface definitions
     - `assignments.ml`: High-level assignment-based IR
     - `low_level.ml`: Low-level for-loop based IR
-    - `tnode.ml`: Tensor node representation
-    - `indexing.ml`: Array indexing and projections
+    - `tnode.ml`: Tensor node representation (partially user-facing)
+    - `indexing.ml`: Array indexing and projections (partially user-facing)
     - `*_backend.ml`: Device-specific backend implementations
+    - `context.ml`: runtime consistency for routines, user-facing interface
 
 - `test/`: Integration tests and tutorials
 - `bin/`: Command-line utilities
diff --git a/README.md b/README.md
@@ -69,6 +69,10 @@ This is very tentative.
   * Add convnet examples starting with MNIST.
   * Add a GPT-2 or Llama style example. Tokenization using llama.cpp extracted tokenizer.
 * **0.7: CPU-style performance and memory efficiency.**
+  * Cleanup of deprecated streams functionality.
+  * Migrating from the "hosted tensor" idea to always requiring a context when accessing tensors and dealing with devices directly.
+  * Optimizations: loop invariant lifting and common subexpression elimination.
+  * Universal Pool Allocator.
   * Milestone phrasing: Enhancements for: inlining-related and simplification-related optimizations, memory management, session management.
 * **0.7.1: HIP backend (AMD hardware) and WebGPU backend.**
 * **0.8: GPU-style performance -- low hanging fruit.**
@@ -82,7 +86,7 @@ This is very tentative.
   * Add concatenation to the einsum syntax (an axis that isq a concatenation of two axes each from another tensor); it's a generalization of stacking tensors.
 * **0.9: Optimize performance: program search.**
   * Instead of dynamic scheduling as in tinygrad, we can schedule statically by program search.
-  * We should also reproduce the search that tinygrad is doing.
+  * We should also reproduce the search that tinygrad is doing. Inspiration: Halide.
   * Check which optimizations are missing against the implementation of [llm.c](https://github.com/karpathy/llm.c).
   * Milestone phrasing: Program search with execution-based per-backend or aggregate-of-backends cost functions. Starting with augmenting the tiling and layout mechanisms from v0.8 with cost functions, progressing to a broader range of code graph rewriting rules.
 * **1.0: Few documentation gaps, some degree of feature completeness, ergonomics, safety.**
@@ -153,18 +157,18 @@ OCANNL follows different design choices than [OWL](https://ocaml.xyz/). For exam
 
 Although the project is called `ocannl`, the main package is called `neural_nets_lib`, to avoid the (opam linter's) complaint that the name can be confused with other packages. This also clarifies that `ocannl` is composed of `arrayjit` and `neural_nets_lib`.
 
-The dependency on `cudajit` and `metal` is optional, so you have to install them first to enable the CUDA or Apple Metal backends.
+The dependency on `cudajit` is optional so you have to install it first to enable the CUDA backend. The dependency on `metal` is MacOS-specific but automatic.
 
 ### Code Organization
 
 The codebase is organized to separate user-facing recipes from framework internals:
 
 - **`lib/`**: User-facing recipes and utilities
   - `train.ml` - Training utilities and optimizers
-  - `nn_blocks.ml` - Neural network building blocks (transformers, attention, etc.)
+  - `nn_blocks.ml` - Neural network building blocks (transformers, attention, convolution, etc.)
   - `ocannl.ml` - Re-exports for backward compatibility
   
-- **`tensor/`**: Framework internals (separate package `ocannl_tensor`)
+- **`tensor/`**: Framework internals (separate library `ocannl_tensor`)
   - `tensor.ml/mli` - Core tensor type and operations
   - `shape.ml/mli` - Shape inference system  
   - `operation.ml` - Tensor operations and DSL modules