You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+17-15Lines changed: 17 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,5 @@
1
1
# ocannl
2
2
3
-
NOTE TO POTENTIAL CONTRIBUTORS: reach out so I can adjust my work style -- start using branches for refactoring. Otherwise you face frustration as the code might be broken. Tagged versions of the code are guaranteed to work as well as the given stage of the project permitted.
4
-
5
3
OCANNL is sponsored by [Ahrefs](https://ocaml.org/success-stories/peta-byte-scale-web-crawler)! [Visit the Ahrefs website.](https://ahrefs.com/)
6
4
7
5
## OCANNL -- OCaml Compiles Algorithms for Neural Networks Learning
@@ -66,11 +64,13 @@ NOTE: debug logging from CUDA in complex settings is a bit tricky, it involves a
66
64
67
65
This is very tentative.
68
66
69
-
* 0.6: more precisions, convolution, block tensors, improvements to dimension labels.
70
-
* DONE at head: BF16, FP8.
71
-
* Requires extending expressivity of projections and the generalized einsum notation.
72
-
* Then, we can add convnet building blocks and corresponding examples starting with MNIST.
73
-
* Verify or rethink usefulness of dimension labels, and whether to introduce axis labels.
67
+
* 0.6: more precisions, initialization, counter-based randomness, convolution, block tensors, improvements to dimension labels.
68
+
* DONE: BF16, FP8.
69
+
* DONE: Extended expressivity of projections and the generalized einsum notation to cover strided iteration and convolution.
70
+
* DONE: Parameter initialization on devices.
71
+
* TODO: counter-based randomness via threefry.
72
+
* TODO: Add convnet building blocks and corresponding examples starting with MNIST.
73
+
* TODO: Verify or rethink usefulness of dimension labels, and whether to introduce axis labels.
74
74
* 0.7: Replicate the scaffolding from [llm.c](https://github.com/karpathy/llm.c) for training GPT-2.
75
75
* Useful building blocks for models in [lib/nn_blocks.ml](lib/nn_blocks.ml).
76
76
* A language model example.
@@ -81,16 +81,14 @@ This is very tentative.
81
81
* Then harvested from [How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog](https://siboehm.com/articles/22/CUDA-MMM).
82
82
* Finally from [llm.c](https://github.com/karpathy/llm.c).
83
83
* These will require splitting a routine into multiple CUDA kernels.
84
-
* 0.9: A new abstraction layer automating compilation/linking, execution, and some data transfers.
85
-
* E.g. host-device transfers: copy from host if host update is later than the previous device update.
86
-
* Concise syntax for transfers into the merge buffer since we know which tensor node is transferred and where to.
87
-
* At the end of 0.8.x, OCANNL has a REPL.
88
-
* 0.10: Optimize performance: program search.
84
+
* 0.9: Optimize performance: program search.
89
85
* Instead of dynamic scheduling as in tinygrad, we can schedule statically by program search.
90
86
* We should also reproduce the search that tinygrad is doing.
91
87
* Check which optimizations are missing against the implementation of [llm.c](https://github.com/karpathy/llm.c).
92
-
* 1.0: Few documentation gaps, some degree of feature completeness.
88
+
* 1.0: Few documentation gaps, some degree of feature completeness, ergonomics, safety.
93
89
* Feature completeness demonstrated by resolving / implementing a few of the $\color{green}{\text{explore}}$ issues.
90
+
* Concise syntax for transfers into the merge buffer since we know which tensor node is transferred and where to.
91
+
* Similarly to how contexts track initialization dependencies for compilation, we should also track them for execution.
94
92
95
93
### Releases
96
94
@@ -130,7 +128,7 @@ For more details, see [CHANGES](CHANGES.md).
130
128
131
129
OCANNL follows different design choices than [OWL](https://ocaml.xyz/). For example:
132
130
133
-
* OCANNL is not functorized.
131
+
* OCANNL is not functorized, except that it uses first-class modules for backends.
134
132
* OCANNL has fewer abstraction layers.
135
133
* OCANNL has a more powerful shape inference.
136
134
* OCANNL only supports backpropagation, while OWL supports full forward and backward auto-diff.
@@ -148,4 +146,8 @@ OCANNL follows different design choices than [OWL](https://ocaml.xyz/). For exam
148
146
149
147
Although the project is called `ocannl`, the main package is called `neural_nets_lib`, to avoid the (opam linter's) complaint that the name can be confused with other packages. This also clarifies that `ocannl` is composed of `arrayjit` and `neural_nets_lib`.
150
148
151
-
The dependency on `ocaml-cudajit` is optional, so you have to install it first to enable the Cuda backend.
149
+
The dependency on `cudajit` and `metal` is optional, so you have to install them first to enable the CUDA or Apple Metal backends.
150
+
151
+
## Development
152
+
153
+
NOTE TO POTENTIAL CONTRIBUTORS: while I am starting to work with PRs in separate branches rather than just a stream of commits on the main branch, design migrations will be broken into small PRs to avoid main (master) branch staleness. We allow for failing tests on the main branch, although going forward this should be happening less for unit tests. Tagged i.e. released versions of the code are guaranteed to work as well as the given stage of the project permitted, the policy is that all tests must pass for releases.
0 commit comments