Skip to content

feat :tensor management with benchmarks (#52)#52

Merged
Eamon2009 merged 1 commit into
masterfrom
exp
May 25, 2026
Merged

feat :tensor management with benchmarks (#52)#52
Eamon2009 merged 1 commit into
masterfrom
exp

Conversation

@codeaddict-119
Copy link
Copy Markdown
Collaborator

No description provided.

@Eamon2009 Eamon2009 changed the title Enhance tensor management and CUDA utilities with benchmarks (#51) feat :tensor management with benchmarks (#51) May 25, 2026
@Eamon2009 Eamon2009 added the github_actions Pull requests that update GitHub Actions code label May 25, 2026
@Eamon2009 Eamon2009 merged commit c7a1e01 into master May 25, 2026
6 checks passed
@Eamon2009 Eamon2009 changed the title feat :tensor management with benchmarks (#51) feat :tensor management with benchmarks (#52) May 25, 2026
codeaddict-119 added a commit that referenced this pull request May 25, 2026
## Summary
## Causal Multi-Head Attention Forward Pass (CUDA)
PR implements the CUDA forward pass for causal multi-head attention
(attention_forward). It includes the core GPU kernel, custom block-level
reduction primitives, and tensor validation helpers.

## Core Attention Kernelattention_forward_kernel:
- Computes scaled dot-product attention on an interleaved QKV input
tensor structured as [Batch, Time, 3 * Channels].
- Causal Masking: Enforces autoregressive constraints by preventing
tokens from attending to future time steps ($t2 > t$).
- Implements parallelized block_max and block_sum device functions.
- Leverages cooperative warp shuffles (warp_max, warp_sum) and shared
memory to handle stable online softmax normalization

#52 
#11 
#12 
#14 
#29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

github_actions Pull requests that update GitHub Actions code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants