Skip to content

feat(cuda): add attention forward and backward kernel declarations#64

Merged
Eamon2009 merged 5 commits into
masterfrom
codeaddict-master
May 31, 2026
Merged

feat(cuda): add attention forward and backward kernel declarations#64
Eamon2009 merged 5 commits into
masterfrom
codeaddict-master

Conversation

@Eamon2009
Copy link
Copy Markdown
Owner

Summary

Adds the CUDA header declarations (#pragma once) for the core attention mechanism's forward and backward passes within the quadtrix::cuda namespace. This sets up the interface for the upcoming GPU kernel implementations.

Key Additions

  • attention_forward: Computes the attention mechanism given a combined QKV tensor (input_qkv), storing intermediate states in preatt and att, and writing the final result to output.

  • attention_backward: Handles the gradient passes, computing grad_input_qkv, grad_preatt, and grad_att from the incoming grad_output.

  • Configuration Flexibility: Both functions accept an explicit number of attention heads (num_heads) and an optional cudaStream_t for non-blocking asynchronous execution.

  • Return Types: Functions utilize the internal Status type for unified error handling.

codeaddict-119 and others added 2 commits May 31, 2026 19:36
 Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms.
Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900
Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms.
Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900

Co-authored-by: Max <eamon5174@gmail.com>
@Eamon2009 Eamon2009 self-assigned this May 31, 2026
@Eamon2009 Eamon2009 requested a review from codeaddict-119 May 31, 2026 18:42
Introduces the header declarations for `attention_forward` and
`attention_backward` operations inside the `quadtrix::cuda` namespace.
Configured with support for custom CUDA streams and head partitioning.
Repository owner deleted a comment from github-actions Bot May 31, 2026
Repository owner deleted a comment from github-actions Bot May 31, 2026
Repository owner deleted a comment from github-actions Bot May 31, 2026
@Eamon2009
Copy link
Copy Markdown
Owner Author

/run-checks

@github-actions
Copy link
Copy Markdown

✅ All checks passed!

@Eamon2009 Eamon2009 merged commit 40b8bd9 into master May 31, 2026
6 checks passed
@Eamon2009 Eamon2009 added the cuda label May 31, 2026
Eamon2009 added a commit that referenced this pull request Jun 1, 2026
* feat(cuda): add attention forward backward kernel declarations (#64)

* docs: report [run_20260530_165216] (~791 tok/s)

 Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms.
Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900

* docs:report [run_20260530_165216](~791 tok/s)  (#61)

Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms.
Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900

Co-authored-by: Max <eamon5174@gmail.com>

* feat(cuda): add attention forward and backward kernel declarations

Introduces the header declarations for `attention_forward` and
`attention_backward` operations inside the `quadtrix::cuda` namespace.
Configured with support for custom CUDA streams and head partitioning.

---------

Co-authored-by: Max <eamon5174@gmail.com>

* feat(cuda): add checkpoint metadata struct and stub functions

* feat(cuda): introduce core type definitions and error handling utilities

- Defines `DType` and `DeviceKind` enums supporting standard types (F32, F16, BF16, I32, U8).
- Implements `dtype_name` and `dtype_size` metadata helper functions.
- Adds an explicit `Status` struct for non-throwing error propagation alongside `checked_mul` for safe allocation size computation.
- Introduces `check_cuda` and `abort_on_cuda` error macros and handling mechanisms, exposed via the `QUADTRIX_CUDA_CHECK` macro.

* feat(cuda): add TokenBatchView struct and DataLoader stub class

* feat(cuda): add GeLU activation forward and backward declarations

- Introduces the `GeluMode` enum to toggle between `Exact` and `Approximate` mathematical variants.
- Declares the `gelu_forward` and `gelu_backward` kernel entrypoints.
- Configures both signatures with optional stream execution and a default mode of `GeluMode::Approximate`.

* feat(cuda): add gradient norm calculation and clipping interfaces

---------

Co-authored-by: Max <eamon5174@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants