[wip] refactor: add more features to `vec_t` #1142

yzh119 · 2025-06-12T05:54:33Z

📌 Description

Per recent communication operators #1096 #1108 #1131 #1134 indicates we need to add more features to vec_t class to make them more accessible:

access elements with the granularity of 2 (because CUDA supports vector instructions for bf16, etc).
binary operations on vec_t (+/-/*/max/etc)
determine whether the vector include element satisfy some condition: vec_t<T, N>::any(cond)
fp4 support
rich vector type casting such as int to float, or float to int

This PR implements these functionalities.

🔍 Related Issues

#1096 #1108 #1131 #1134

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

gemini-code-assist

Summary of Changes

Hello @yzh119, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request starts the process of enhancing the vec_t class to provide more flexible and accessible features, addressing points raised in recent issues. The current changes introduce the ability to access sub-vectors with specified granularity via a new lane method and include some necessary type corrections for internal vector data representation and casting.

Highlights

Vector Granularity Access: I've added a new lane<size_t G>(size_t lane_idx) method to the vec_t class template and its specializations. This allows accessing a sub-vector of size G (where G must divide the total vector size) at a specific index lane_idx. This is intended to support operations on smaller vector units, like pairs of BF16/FP16 elements.
Type Corrections: I've corrected the internal data type for some vec_t specializations (e.g., vec_t<__nv_fp8_e4m3, 8>, vec_t<half, 4>) from uint2 to int2. Corresponding pointer casts in load, store, memcpy, and the fast_dequant_f8f16x4 helper function have also been updated from uint2* to int2*.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configureGemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new lane method to vec_t for accessing sub-vectors and changes some internal data types from uint2 to int2. The primary concern is a recurring pointer arithmetic error in the implementation of the lane method across nearly all its specializations. The uint2 to int2 changes appear to be for consistency and are generally acceptable. It's recommended to correct the lane method's pointer logic and add specific tests for this new functionality.

gemini-code-assist · 2025-06-12T05:58:06Z