|
1 | | -## [0.4.1] -- next |
2 | | - |
3 | | -### Added |
4 | | - |
5 | | -- TODO: API improvements for mixed precision computations. |
6 | | - |
7 | | -### Fixed |
8 | | - |
9 | | -- TODO: Proper implementation of half precision. Requires OCaml 5.2. |
10 | | - |
11 | | -## [0.4.0] -- 2024-07-?? |
| 1 | +## [0.4.0] -- 2024-09-04 |
12 | 2 |
|
13 | 3 | ### Added |
14 | 4 |
|
|
17 | 7 | - backends just need to support device-to-device transfers, |
18 | 8 | - merging gets implemented in "user space". |
19 | 9 | - CUDA streaming multiprocessor parallelism via streams <-> virtual devices. |
20 | | -- TODO(#262): "term punning" for `%cd`. |
| 10 | +- Support for `cuda-gdb` and `compute-sanitizer` (pass the right arguments to cudajit). |
| 11 | +- Inline declarations for (non-differentiable) tensors in the `%cd` syntax. |
| 12 | +- A minimal wrapper `Sync_backend` creating CPU backends with a single device only, where all calls are synchronous. (It's a baseline and helps debugging.) |
| 13 | +- In progress: proper (condition variables based) scheduler. The legacy scheduler (pipes based) kept for now as baseline and to help debugging. |
| 14 | +- Documentation for the syntax extensions. |
| 15 | +- `%op` syntax: when under a `~config` parameter, refine the inline declared params' labels with `config.label`. |
| 16 | +- `%op` syntax: incorporate the input tensor's (if any) label in the resulting tensor's label. |
| 17 | +- Comments in config files using the line prefix `~~`. |
21 | 18 |
|
22 | 19 | ### Changed |
23 | 20 |
|
|
31 | 28 | - split the `device` type into virtual `device` and `physical_device`, |
32 | 29 | - removed the direct support for `merge`, instead relying on merge buffers. |
33 | 30 | - Updated to cudajit 0.4. |
34 | | -- TODO: a template for C-syntax backends, refactoring CC and CUDA backends. |
| 31 | +- A template for C-syntax backends, refactoring CC and CUDA backends. |
| 32 | +- Improvements to handling of tensor node labels, and to the `Tnode.debug_name` function. |
| 33 | +- Output files generated by backends, and files generated by logging, in separate subdirectories. |
| 34 | +- C-syntax logging: also output the pre-assignment value when logging an assignment. |
| 35 | +- Migrated to ppx_minidebug 2.0 with the benefits it brings: no runtime passing, `Utils.settings.log_level` unified with ppx_minidebug's log levels. |
| 36 | + |
| 37 | +### Fixed |
| 38 | + |
| 39 | +- Allow verifying that non-embedded tensor nodes of the tensor(s) associated with a linked code are already in the context passed to `link` (resp. `link_batch`), since they won't get introduced into the context. It is the responsibility of helper functions (such as those in `Train`) to ensure the check. |
| 40 | +- Fixed both known and newly discovered shortcomings of the syntax extensions. |
| 41 | +- In particular, `%op` syntax: lift `~config` applications out of (tensor) functions. |
| 42 | +- Multiple other tiny fixes. |
35 | 43 |
|
36 | 44 | ## [0.3.3] -- 2024-04-24 |
37 | 45 |
|
|
0 commit comments