Skip to content

[BUG] multi_tensor_apply: int32 overflow in TensorListMetadata::sizes causes illegal memory access for tensors with numel > INT_MAX #2918

@yezhengmao1

Description

@yezhengmao1

multi_tensor_apply silently truncates per-tensor sizes from 64-bit to 32-bit, causing illegal memory access when any input tensor has numel() > INT_MAX (2,147,483,647).

In transformer_engine/common/multi_tensor/multi_tensor_apply.cuh, TensorListMetadataBase::sizes is declared as int sizes[...] (int32), but it is populated from Tensor::numel() (size_t / int64):

  // multi_tensor_apply.cuh:24
  int sizes[depth_to_max_tensors[n - 1]];
  ...
  // multi_tensor_apply.cuh:68
  tl.sizes[loc_tensor_info] = tensor_lists[0][t]->numel();   // int64 -> int32, silent truncation

For a tensor with numel = 2,476,250,368 (e.g. an embedding of shape [19345706, 128]), this field becomes 2476250368 - 2^32 = -1,818,716,928. The resulting negative / bogus size is then consumed by downstream kernels (e.g. multi_tensor_l2norm_kernel) which compute element offsets from it, producing out-of-bounds global-memory accesses and the following error at the next CUDA sync:

RuntimeError: .../multi_tensor_apply.cuh:92 in function multi_tensor_apply:
CUDA Error: an illegal memory access was encountered

This is hit by any real-world use that feeds a tensor with numel > 2^31 to TE's multi_tensor utilities. In particular, megatron.training.utils.calc_params_l2_norm →
multi_tensor_applier(multi_tensor_l2norm, ...) crashes for any model containing a single parameter with >2.14B elements (common for large-vocab embeddings, tied output layers, over-encoding tables, etc.).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions