You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+13-11Lines changed: 13 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -68,20 +68,22 @@ This is very tentative.
68
68
* DONE: BF16, FP8.
69
69
* DONE: Extended expressivity of projections and the generalized einsum notation to cover strided iteration and convolution.
70
70
* DONE: Parameter initialization on devices.
71
+
* TODO: New syntax for inline parameter definitions; record-based syntax instead of string-based.
71
72
* TODO: counter-based randomness via threefry.
72
-
* TODO: Add convnet building blocks and corresponding examples starting with MNIST.
73
-
* TODO: Verify or rethink usefulness of dimension labels, and whether to introduce axis labels.
74
-
* 0.7: Replicate the scaffolding from [llm.c](https://github.com/karpathy/llm.c) for training GPT-2.
75
-
* Useful building blocks for models in [lib/nn_blocks.ml](lib/nn_blocks.ml).
76
-
* A language model example.
77
-
* Port (translate or bind) the Python files from [llm.c](https://github.com/karpathy/llm.c) to implement tokenization, data loading and saving etc.
78
-
* At the end of 0.6.x, we should have an apples-to-apples benchmark comparing OCANNL to [llm.c](https://github.com/karpathy/llm.c) for both CPU and GPU.
79
-
* 0.8: Optimize performance -- low hanging fruit.
73
+
* 0.6.1:
74
+
* Add convnet building blocks and corresponding examples starting with MNIST.
75
+
* Add transformer building blocks.
76
+
* Integrate with huggingface-tokenizers
77
+
* Add a GPT-2 style example, ideally benchmarkable against [llm.c](https://github.com/karpathy/llm.c).
78
+
* 0.6.2:
79
+
* Verify or rethink usefulness of dimension labels, and whether to introduce axis labels.
80
+
* Add concatenation to the einsum syntax (an axis that isq a concatenation of two axes each from another tensor); it's a generalization of stacking tensors.
81
+
* 0.7: Optimize performance -- low hanging fruit.
80
82
* First harvested from [Fast Multidimensional Matrix Multiplication on CPU from Scratch](https://siboehm.com/articles/22/Fast-MMM-on-CPU).
81
83
* Then harvested from [How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog](https://siboehm.com/articles/22/CUDA-MMM).
82
84
* Finally from [llm.c](https://github.com/karpathy/llm.c).
83
-
* These will require splitting a routine into multiple CUDA kernels.
84
-
* 0.9: Optimize performance: program search.
85
+
* These will either require splitting a routine into multiple kernels, or implementing the megakernel approach.
86
+
* 0.8: Optimize performance: program search.
85
87
* Instead of dynamic scheduling as in tinygrad, we can schedule statically by program search.
86
88
* We should also reproduce the search that tinygrad is doing.
87
89
* Check which optimizations are missing against the implementation of [llm.c](https://github.com/karpathy/llm.c).
@@ -150,4 +152,4 @@ The dependency on `cudajit` and `metal` is optional, so you have to install them
150
152
151
153
## Development
152
154
153
-
NOTE TO POTENTIAL CONTRIBUTORS: while I am starting to work with PRs in separate branches rather than just a stream of commits on the main branch, design migrations will be broken into small PRs to avoid main (master) branch staleness. We allow for failing tests on the main branch, although going forward this should be happening less for unit tests. Tagged i.e. released versions of the code are guaranteed to work as well as the given stage of the project permitted, the policy is that all tests must pass for releases.
155
+
NOTE TO POTENTIAL CONTRIBUTORS: while I am slowly starting to work with PRs in separate branches rather than just a stream of commits on the main branch, design migrations will be broken into small PRs to avoid main (master) branch staleness; and many changes will still be commits on the main branch. We allow for failing tests on the main branch, although going forward this should be happening less for unit tests. Tagged i.e. released versions of the code are guaranteed to work as well as the given stage of the project permitted, the policy is that all tests must pass for releases.
0 commit comments