ValueError: hidden_size should be divisible by num_heads 

while testing the **anomaly_detction_with_transformers.ipynb** from /GenerativeModels/tutorials/generative/anomaly_detection/anomaly_detection_with_transformers, inside the cell 34 of "Define network, inferer, optimizer, and loss function", the script encounters **ValueError: hidden_size should be divisible by num_heads.**

ValueError                                Traceback (most recent call last)
File <command-969315745613718>:3
      1 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
----> 3 transformer_model = DecoderOnlyTransformer(
      4     num_tokens=16 + 1,
      5     max_seq_len=spatial_shape[0] * spatial_shape[1],
      6     attn_layers_dim=128,
      7     attn_layers_depth=16,
      8     attn_layers_heads=12,
      9 )
     10 transformer_model.to(device)
     12 inferer = VQVAETransformerInferer()

File /dbfs/mnt/POC-data-science/MONAI_tutorials/GenerativeModels/generative/networks/nets/transformer.py:80, in DecoderOnlyTransformer.__init__(self, num_tokens, max_seq_len, attn_layers_dim, attn_layers_depth, attn_layers_heads, with_cross_attention, embedding_dropout_rate, use_flash_attention)
     76 self.position_embeddings = AbsolutePositionalEmbedding(max_seq_len=max_seq_len, embedding_dim=attn_layers_dim)
     77 self.embedding_dropout = nn.Dropout(embedding_dropout_rate)
     79 self.blocks = nn.ModuleList(
---> 80     [
     81         TransformerBlock(
     82             hidden_size=attn_layers_dim,
     83             mlp_dim=attn_layers_dim * 4,
     84             num_heads=attn_layers_heads,
     85             dropout_rate=0.0,
     86             qkv_bias=False,
     87             causal=True,
     88             sequence_length=max_seq_len,
     89             with_cross_attention=with_cross_attention,
     90             use_flash_attention=use_flash_attention,
     91         )
     92         for _ in range(attn_layers_depth)
     93     ]
     94 )
     96 self.to_logits = nn.Linear(attn_layers_dim, num_tokens)
...

How to fix the issue? Thank you very much.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ValueError: hidden_size should be divisible by num_heads #359

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ValueError: hidden_size should be divisible by num_heads #359

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions