You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+11-4Lines changed: 11 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -58,18 +58,24 @@ NOTE: debug logging from CUDA in complex settings is a bit tricky, it involves a
58
58
59
59
This is very tentative.
60
60
61
-
***0.6.2: Shape inference improvements, convolution NNs, toy generative transformer.**
61
+
***0.6.2: "you forgot to specify a hidden dimension".**
62
+
* Detection of user errors where there is missing information about a hidden dimension: disables guessing "no axes" or "dimension 1" for shapes of parameters.
63
+
* RoPE embeddings.
62
64
* Transformer for the Names dataset.
65
+
***0.6.3: Padding inference for convolutions; HIP backend.**
63
66
* Padding inference during shape inference.
64
-
* Add convnet examples starting with MNIST.
67
+
* A standalone bindings package for HIP (AMD hardware), and a backend using the bindings.
68
+
* Sokoban RL policy gradient example with a CNN.
69
+
***0.6.4: Real world examples.**
70
+
* Add convnet examples: MNIST and CIFAR.
71
+
* Bindings to a tokenizer (e.g. _llama.cpp_).
72
+
* Transformer inference for a small open-weights model (one of GPT2, LLaMA, Gemma).
65
73
***0.7: CPU-style performance and memory efficiency.**
66
74
* Cleanup of deprecated streams functionality.
67
75
* Migrating from the "hosted tensor" idea to always requiring a context when accessing tensors and dealing with devices directly.
68
76
* Optimizations: loop invariant lifting and common subexpression elimination.
* First harvested from [Fast Multidimensional Matrix Multiplication on CPU from Scratch](https://siboehm.com/articles/22/Fast-MMM-on-CPU).
75
81
* Then harvested from [How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog](https://siboehm.com/articles/22/CUDA-MMM).
@@ -79,6 +85,7 @@ This is very tentative.
79
85
***0.8.1: shape understanding and manipulation enhancements.**
80
86
* Verify or rethink usefulness of dimension labels aka. dimension units, and whether to introduce axis labels.
81
87
* Add concatenation to the einsum syntax (an axis that isq a concatenation of two axes each from another tensor); it's a generalization of stacking tensors.
88
+
* An academic-style paper e.g. for the OCaml Workshop.
82
89
***0.9: Optimize performance: program search.**
83
90
* Instead of dynamic scheduling as in tinygrad, we can schedule statically by program search.
84
91
* We should also reproduce the search that tinygrad is doing. Inspiration: Halide.
0 commit comments