Skip to content

GPU test failure: TestGemma3WithExpertFSDPWrap.test_fsdp_wrap_forward_backward — nn.Embedding(num_embeddings=None) #281

@WilliamYue37

Description

@WilliamYue37

Failing job

GPU pytest run 25465660575 / job 74718218197

FAILED tests/policies/test_pi07_low_level.py::TestGemma3WithExpertFSDPWrap::test_fsdp_wrap_forward_backward
TypeError: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), ...

Root cause

tests/policies/test_pi07_low_level.py:866-876 constructs Gemma3WithExpertConfig without setting discrete_action_vocab_size, which defaults to None (src/opentau/policies/pi07/gemma3_with_expert.py:116).

Gemma3WithExpertModel.__init__ at src/opentau/policies/pi07/gemma3_with_expert.py:572 then calls:

self.discrete_action_embedding = nn.Embedding(
    num_embeddings=config.discrete_action_vocab_size,   # None
    embedding_dim=text_hidden,
    padding_idx=0,
)

which blows up inside torch.nn.modules.sparse.Embedding.__init__ because torch.empty((None, embedding_dim), ...) is not a valid call.

da_head = nn.Linear(out_features=config.discrete_action_vocab_size, ...) on line 579 has the same dependency — anyone constructing the bare Gemma3WithExpertConfig without going through the high-level / low-level wrappers (which inject the FAST tokenizer vocab via discrete_action_vocab_size=getattr(self.discrete_action_processor, "vocab_size", None)) hits this.

Fix options

  • Pass an explicit discrete_action_vocab_size=... in test_fsdp_wrap_forward_backward (matches what the production wrappers do).
  • Or guard Gemma3WithExpertModel.__init__ so the discrete-action head is only built when discrete_action_vocab_size is not None, which would also harden the constructor against future direct callers.

Provenance

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions