@@ -64,7 +64,6 @@ IMPORTANT: due to potential bugs, debug logging from CUDA in complex settings cu
6464
6565This is very tentative.
6666
67- * 0.5.1: Automatic synchronization for transfers between host and devices where unambiguous.
6867* 0.5.2: Apple Metal backend.
6968* 0.6: Replicate the scaffolding from [ llm.c] ( https://github.com/karpathy/llm.c ) for training GPT-2.
7069 * More of primitive numeric operations.
@@ -96,11 +95,13 @@ This is very tentative.
9695
9796For more details, see [ CHANGES] ( CHANGES.md ) .
9897
99- * ** 0.5: Stream-to-stream synchronization at the buffer level.**
100- * Support for CUDA events, and ` Condition ` -based events for CPU backends.
101- * Overhaul of the backend interfaces, both user-facing but especially internal: full code sharing.
102- * Automatic stream-to-stream synchronization on a per-tensor-node basis.
103- * ** 0.4.1 Half precision, mixed precision, CUDA virtual devices** (virtual devices renamed to streams in 0.4.2)
98+ * ** 0.5: Synchronization and automation at the buffer level.**
99+ * ** 0.5.1: Automatic synchronization for transfers between host and devices.**
100+ * ** 0.5.0: Stream-to-stream synchronization at the buffer level.**
101+ * Support for CUDA events, and ` Condition ` -based events for CPU backends.
102+ * Overhaul of the backend interfaces, both user-facing but especially internal: full code sharing.
103+ * Automatic stream-to-stream synchronization on a per-tensor-node basis.
104+ * ** 0.4.1 Half precision, mixed precision, CUDA virtual devices** (virtual devices renamed to streams in 0.5.0)
104105 * Half precision. Maybe improvements for mixed-precision computations.
105106 * Resolve remaining issues with the new scheduler.
106107 * Initial version of [ lib/nn_blocks.ml] ( lib/nn_blocks.ml ) .
0 commit comments