Skip to content

TransformerLayer input_mask format #828

@sshleifer

Description

@sshleifer

I am trying to use the DeepSpeedTransformerLayer and wondering what format the attention mask should be for left to right language model training.
From https://github.com/microsoft/DeepSpeed/blob/44bd538b110ce0e8fc69626854631c3aee0dc094/tests/unit/test_cuda_forward.py#L181 , it seems like (bs, 1, seq_len, seq_len) could be correct,

but input_size: torch.Size([1, 501, 512]) and input_mask.shape=[1, 501, 501] raises

            input_mask = torch.cat((input_mask, torch.ones((inp_size[0], input_mask.shape[1], input_mask.shape[2], \>                                           (16 - (inp_size[1] % 16))), device=input_mask.device, dtype=input_mask.dtype) * -10000), 3)
E           IndexError: Dimension out of range (expected to be in range of [-3, 2], but got 3)

There is no docstring so I figured I'd ask. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions