Failing job
GPU pytest run 25465660575 / job 74718218197
FAILED tests/policies/test_pi07_low_level.py::TestGemma3WithExpertFSDPWrap::test_fsdp_wrap_forward_backward
TypeError: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), ...
Root cause
tests/policies/test_pi07_low_level.py:866-876 constructs Gemma3WithExpertConfig without setting discrete_action_vocab_size, which defaults to None (src/opentau/policies/pi07/gemma3_with_expert.py:116).
Gemma3WithExpertModel.__init__ at src/opentau/policies/pi07/gemma3_with_expert.py:572 then calls:
self.discrete_action_embedding = nn.Embedding(
num_embeddings=config.discrete_action_vocab_size, # None
embedding_dim=text_hidden,
padding_idx=0,
)
which blows up inside torch.nn.modules.sparse.Embedding.__init__ because torch.empty((None, embedding_dim), ...) is not a valid call.
da_head = nn.Linear(out_features=config.discrete_action_vocab_size, ...) on line 579 has the same dependency — anyone constructing the bare Gemma3WithExpertConfig without going through the high-level / low-level wrappers (which inject the FAST tokenizer vocab via discrete_action_vocab_size=getattr(self.discrete_action_processor, "vocab_size", None)) hits this.
Fix options
- Pass an explicit
discrete_action_vocab_size=... in test_fsdp_wrap_forward_backward (matches what the production wrappers do).
- Or guard
Gemma3WithExpertModel.__init__ so the discrete-action head is only built when discrete_action_vocab_size is not None, which would also harden the constructor against future direct callers.
Provenance
Failing job
GPU pytest run 25465660575 / job 74718218197
Root cause
tests/policies/test_pi07_low_level.py:866-876constructsGemma3WithExpertConfigwithout settingdiscrete_action_vocab_size, which defaults toNone(src/opentau/policies/pi07/gemma3_with_expert.py:116).Gemma3WithExpertModel.__init__atsrc/opentau/policies/pi07/gemma3_with_expert.py:572then calls:which blows up inside
torch.nn.modules.sparse.Embedding.__init__becausetorch.empty((None, embedding_dim), ...)is not a valid call.da_head = nn.Linear(out_features=config.discrete_action_vocab_size, ...)on line 579 has the same dependency — anyone constructing the bareGemma3WithExpertConfigwithout going through the high-level / low-level wrappers (which inject the FAST tokenizer vocab viadiscrete_action_vocab_size=getattr(self.discrete_action_processor, "vocab_size", None)) hits this.Fix options
discrete_action_vocab_size=...intest_fsdp_wrap_forward_backward(matches what the production wrappers do).Gemma3WithExpertModel.__init__so the discrete-action head is only built whendiscrete_action_vocab_size is not None, which would also harden the constructor against future direct callers.Provenance
e7a1da7,feat(pi07,fsdp): enable FSDP-FULL_SHARD ...).discrete_action_embedding/da_headintroduced in feat(pi07): π0.7 policy with Gemma 3 backbone and SpaceTime SigLIP video encoder #244 (1ab755a, π0.7 initial landing).