Refine prefill CUDA graph capture sizes#1331
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the generation of graph_handle_token_nums in prefill_cuda_graph.py to use a combination of specific ranges instead of exponential values. A critical runtime bug was identified where set[int](...) is used, which will raise a TypeError in Python because generic type subscription is not callable at runtime. The reviewer recommended reverting this to set(...).
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| graph_handle_token_nums.append(self.max_handle_token_num) | ||
|
|
||
| graph_handle_token_nums = list(set(graph_handle_token_nums)) | ||
| graph_handle_token_nums = list(set[int](graph_handle_token_nums)) |
There was a problem hiding this comment.
Using set[int](...) at runtime will raise a TypeError: 'types.GenericAlias' object is not callable in Python. Subscription of built-in types like set[...] is only for type annotations and cannot be instantiated directly. Use set(...) instead.
| graph_handle_token_nums = list(set[int](graph_handle_token_nums)) | |
| graph_handle_token_nums = list(set(graph_handle_token_nums)) |
No description provided.