-
Notifications
You must be signed in to change notification settings - Fork 102
ValueError: hidden_size should be divisible by num_heads #359
Description
while testing the anomaly_detction_with_transformers.ipynb from /GenerativeModels/tutorials/generative/anomaly_detection/anomaly_detection_with_transformers, inside the cell 34 of "Define network, inferer, optimizer, and loss function", the script encounters ValueError: hidden_size should be divisible by num_heads.
ValueError Traceback (most recent call last)
File :3
1 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
----> 3 transformer_model = DecoderOnlyTransformer(
4 num_tokens=16 + 1,
5 max_seq_len=spatial_shape[0] * spatial_shape[1],
6 attn_layers_dim=128,
7 attn_layers_depth=16,
8 attn_layers_heads=12,
9 )
10 transformer_model.to(device)
12 inferer = VQVAETransformerInferer()
File /dbfs/mnt/POC-data-science/MONAI_tutorials/GenerativeModels/generative/networks/nets/transformer.py:80, in DecoderOnlyTransformer.init(self, num_tokens, max_seq_len, attn_layers_dim, attn_layers_depth, attn_layers_heads, with_cross_attention, embedding_dropout_rate, use_flash_attention)
76 self.position_embeddings = AbsolutePositionalEmbedding(max_seq_len=max_seq_len, embedding_dim=attn_layers_dim)
77 self.embedding_dropout = nn.Dropout(embedding_dropout_rate)
79 self.blocks = nn.ModuleList(
---> 80 [
81 TransformerBlock(
82 hidden_size=attn_layers_dim,
83 mlp_dim=attn_layers_dim * 4,
84 num_heads=attn_layers_heads,
85 dropout_rate=0.0,
86 qkv_bias=False,
87 causal=True,
88 sequence_length=max_seq_len,
89 with_cross_attention=with_cross_attention,
90 use_flash_attention=use_flash_attention,
91 )
92 for _ in range(attn_layers_depth)
93 ]
94 )
96 self.to_logits = nn.Linear(attn_layers_dim, num_tokens)
...
How to fix the issue? Thank you very much.