Skip to content

[Example] add example of vector addition#111

Merged
yaoyaoding merged 2 commits intoNVIDIA:mainfrom
WilliamZhang20:main
Apr 7, 2026
Merged

[Example] add example of vector addition#111
yaoyaoding merged 2 commits intoNVIDIA:mainfrom
WilliamZhang20:main

Conversation

@WilliamZhang20
Copy link
Copy Markdown
Contributor

Summary

Add a minimal vector add example that shows how to implement elementwise c[i] = a[i] + b[i] in Tilus. Each thread block loads a contiguous tile from a and b, adds in registers, and stores c, in a single kernel launch (simple global load/add/store pattern).

Kernel uses a fixed tile size (1024 elements per block) and 4 warps per block (no autotune).

Benchmark results (NVIDIA H200 NVL 143 GB)

n torch (ms) tilus (ms) speedup
1048576 0.015 0.016 0.94x
16777216 0.091 0.090 1.02x

Test Plan

  • pytest tests/examples/test_examples.py -v -k "vector_add" passes
  • test_all_examples_are_listed passes (vector_add registered in EXAMPLES)
  • test_no_missing_examples passes

Signed-off-by: William Zhang wzhang20@yahoo.com

Signed-off-by: William Zhang <wzhang20@yahoo.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 4, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yaoyaoding
Copy link
Copy Markdown
Member

/ok to test 4281491

@yaoyaoding
Copy link
Copy Markdown
Member

Hi @WilliamZhang20 , could you run pre-commit run --all to lint the changes, thank you!

Signed-off-by: William Zhang <wzhang20@yahoo.com>
@yaoyaoding
Copy link
Copy Markdown
Member

/ok to test c1d50d0

@yaoyaoding yaoyaoding merged commit 28d629d into NVIDIA:main Apr 7, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants