Add PyTorch comparison for flash-attn state test #11

Zijie-Tian · 2025-06-23T22:03:12Z

Summary

extend test-flash-attn-state.cpp with optional PyTorch verification
compare outputs of segmented, standard and torch implementations
print element-wise comparison table for first 128 elements

Testing

cmake -G Ninja -D GGML_GRAPH_PROFILER=ON -D GGML_CUDA=OFF -D GGML_TMAC=OFF -D LLAMA_TORCH=ON -B build-x86_64
cmake --build build-x86_64 --config Release -j12
./build-x86_64/bin/test-flash-attn-state

https://chatgpt.com/codex/tasks/task_e_6859cc8cc3b08332ac84da4077269746

Zijie-Tian

Reviewed

tests/test-flash-attn-state.cpp

Copilot

Pull Request Overview

This PR extends test-flash-attn-state.cpp to include an optional PyTorch-based verification alongside the existing segmented and standard flash-attention implementations, and enriches the result comparison with an element-wise table.

Added PyTorch headers, tensor conversion, and verification logic under LLAMA_TORCH_AVAILABLE
Replaced the single diff-based comparison with manual loops computing max differences across standard, segmented, and PyTorch outputs
Introduced a detailed element-wise comparison table for the first 128 elements and unified print formatting

Comments suppressed due to low confidence (1)

tests/test-flash-attn-state.cpp:464

[nitpick] The comment label still reads 'Test 3' for the comparison section, which now follows two other 'Test 3' sections; consider renumbering it to 'Test 4' for clarity.

    // Test 3: Compare Results

tests/test-flash-attn-state.cpp

* oai moe * compat with new checkpoint * add attn sink impl * add rope scaling yarn * logits match with latest transformers code * wip chat template * rm trailing space * use ggml_scale_bias * rm redundant is_swa_all * convert interleaved gate_up * graph : fix activation function to match reference (#7) * vocab : handle o200k_harmony special tokens * ggml : add attention sinks support (#1) * llama : add attn sinks * ggml : add attn sinks * cuda : add attn sinks * vulkan : add support for sinks in softmax remove unnecessary return * ggml : add fused swiglu_oai op (#11) * ggml : add fused swiglu_oai op * Update ggml/src/ggml-cpu/ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * update CUDA impl * cont : metal impl * add vulkan impl * test-backend-ops : more test cases, clean up * llama : remove unfused impl * remove extra lines --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com> * repack mxfp4 upon conversion * clean up a bit * enable thinking * add quick hack to render only some special tokens * fix bf16 conversion * remove vocab hack * webui ok * support chat parsing for gpt-oss * fix webui * direct mapping mxfp4, FINALLY * force using mxfp4 * properly use lazy tensor * ggml : add mxfp4 ggml : use e8m0 conversion instead of powf Co-authored-by: Diego Devesa <slarengh@gmail.com> change kvalues_mxfp4 table to match e2m1 (#6) metal : remove quantization for now (not used) cuda : fix disabled CUDA graphs due to ffn moe bias vulkan : add support for mxfp4 cont : add cm2 dequant * ggml : add ggml_add_id (#13) * ggml : add ggml_add_id * add cuda impl * llama : add weight support check for add_id * perf opt * add vulkan impl * rename cuda files * add metal impl * allow in-place ggml_add_id * llama : keep biases on CPU with --cpu-moe * llama : fix compile error ggml-ci * cuda : add fallback for __nv_cvt_e8m0_to_bf16raw ggml-ci * cleanup ggml-ci * sycl : fix supports_op for MXFP4 ggml-ci * fix Unknown reasoning format * ggml-cpu : fix AVX build ggml-ci * fix hip build ggml-ci * cuda : add mxfp4 dequantization support for cuBLAS ggml-ci * ggml-cpu : fix mxfp4 fallback definitions for some architectures ggml-ci * cuda : fix version required for __nv_cvt_e8m0_to_bf16raw --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: slaren <slarengh@gmail.com>

Add PyTorch comparison to flash attention state test

71cc797

Zijie-Tian added the codex label Jun 23, 2025 — with ChatGPT Codex Connector

Zijie-Tian requested a review from Copilot June 23, 2025 22:03

This comment was marked as outdated.

Sign in to view

Zijie-Tian commented Jun 23, 2025

View reviewed changes

tests/test-flash-attn-state.cpp Outdated Show resolved Hide resolved

tests/test-flash-attn-state.cpp Show resolved Hide resolved

tests/test-flash-attn-state.cpp Show resolved Hide resolved

Zijie-Tian requested a review from Copilot June 23, 2025 22:12

Copilot AI reviewed Jun 23, 2025

View reviewed changes

tests/test-flash-attn-state.cpp Outdated Show resolved Hide resolved

test(flash-attn): update head dimensions in test parameters

3422116

Zijie-Tian merged commit 055e46f into tzj/qlutattn Jun 23, 2025

Zijie-Tian deleted the codex/modify-tests-flash-attn-state-with-torch-comparison branch June 23, 2025 22:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add PyTorch comparison for flash-attn state test #11

Add PyTorch comparison for flash-attn state test #11

Uh oh!

Zijie-Tian commented Jun 23, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Zijie-Tian left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add PyTorch comparison for flash-attn state test #11

Add PyTorch comparison for flash-attn state test #11

Uh oh!

Conversation

Zijie-Tian commented Jun 23, 2025

Summary

Testing

Uh oh!

This comment was marked as outdated.

Uh oh!

Zijie-Tian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants