GPU test failure: TestGemma3WithExpertFSDPWrap.test_fsdp_wrap_forward_backward — nn.Embedding(num_embeddings=None)

## Failing job

[GPU pytest run 25465660575 / job 74718218197](https://github.com/TensorAuto/OpenTau/actions/runs/25465660575/job/74718218197)

```
FAILED tests/policies/test_pi07_low_level.py::TestGemma3WithExpertFSDPWrap::test_fsdp_wrap_forward_backward
TypeError: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), ...
```

## Root cause

`tests/policies/test_pi07_low_level.py:866-876` constructs `Gemma3WithExpertConfig` without setting `discrete_action_vocab_size`, which defaults to `None` (`src/opentau/policies/pi07/gemma3_with_expert.py:116`).

`Gemma3WithExpertModel.__init__` at `src/opentau/policies/pi07/gemma3_with_expert.py:572` then calls:

```python
self.discrete_action_embedding = nn.Embedding(
    num_embeddings=config.discrete_action_vocab_size,   # None
    embedding_dim=text_hidden,
    padding_idx=0,
)
```

which blows up inside `torch.nn.modules.sparse.Embedding.__init__` because `torch.empty((None, embedding_dim), ...)` is not a valid call.

`da_head = nn.Linear(out_features=config.discrete_action_vocab_size, ...)` on line 579 has the same dependency — anyone constructing the bare `Gemma3WithExpertConfig` without going through the high-level / low-level wrappers (which inject the FAST tokenizer vocab via `discrete_action_vocab_size=getattr(self.discrete_action_processor, "vocab_size", None)`) hits this.

## Fix options

- Pass an explicit `discrete_action_vocab_size=...` in `test_fsdp_wrap_forward_backward` (matches what the production wrappers do).
- Or guard `Gemma3WithExpertModel.__init__` so the discrete-action head is only built when `discrete_action_vocab_size is not None`, which would also harden the constructor against future direct callers.

## Provenance

- Test added in #273 (`e7a1da7`, `feat(pi07,fsdp): enable FSDP-FULL_SHARD ...`).
- `discrete_action_embedding` / `da_head` introduced in #244 (`1ab755a`, π0.7 initial landing).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU test failure: TestGemma3WithExpertFSDPWrap.test_fsdp_wrap_forward_backward — nn.Embedding(num_embeddings=None) #281

Failing job

Root cause

Fix options

Provenance

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

GPU test failure: TestGemma3WithExpertFSDPWrap.test_fsdp_wrap_forward_backward — nn.Embedding(num_embeddings=None) #281

Description

Failing job

Root cause

Fix options

Provenance

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions