Revert "Update geotransolver for 2d and 3d use cases"#1619
Conversation
This reverts commit b193e03.
|
/blossom-ci |
Greptile SummaryThis PR reverts #1502 ("Update geotransolver for 2d and 3d use cases"), removing structured 2D/3D mesh support, the Muon optimizer, multi-model Hydra dispatch, JSONL metrics logging, and the shared free-function refactor (
Important Files Changed
Reviews (1): Last reviewed commit: "Revert "Update geotransolver for 2d and ..." | Re-trigger Greptile |
| if use_te: | ||
| raise ValueError( | ||
| "GALE_FA does not support Transformer Engine backend. " | ||
| "Use use_te=False; TE disables FlashAttention for differing q/k sizes in FLARE attention." | ||
| ) | ||
| super().__init__() | ||
| if state_mixing_mode not in ("weighted", "concat_project"): | ||
| raise ValueError( | ||
| f"Invalid state_mixing_mode: {state_mixing_mode!r}. " | ||
| f"Expected 'weighted' or 'concat_project'." | ||
| ) | ||
| self.state_mixing_mode = state_mixing_mode | ||
| self.use_te = use_te | ||
| self.heads = heads | ||
| self.dim_head = dim_head | ||
| self.scale = 1.0 | ||
| # It is recommended by the FLARE authors to use self.scale = 1 if self.dim_head <= 8 else (self.dim_head ** -0.5) | ||
| # but we use self.scale = 1.0 because the recommended scaling is not tested yet. | ||
| inner_dim = dim_head * heads | ||
|
|
||
| linear_layer = te.Linear if self.use_te else nn.Linear | ||
|
|
||
| # Global queries for FLARE self-attention | ||
| self.q_global = nn.Parameter(torch.randn(1, heads, n_global_queries, dim_head)) | ||
|
|
||
| # Linear projections for self-attention | ||
| self.in_project_x = linear_layer(dim, inner_dim) | ||
| self.self_k = linear_layer(dim_head, dim_head) | ||
| self.self_v = linear_layer(dim_head, dim_head) | ||
|
|
||
| if context_dim > 0: | ||
| # Linear projections for cross-attention | ||
| self.cross_q = linear_layer(dim_head, dim_head) | ||
| self.cross_k = linear_layer(context_dim, dim_head) | ||
| self.cross_v = linear_layer(context_dim, dim_head) | ||
|
|
||
| # Mixing layers for blending self-attention and cross-attention | ||
| if state_mixing_mode == "weighted": | ||
| # Learnable mixing weight between self and cross attention | ||
| self.state_mixing = nn.Parameter(torch.tensor(0.0)) | ||
| else: | ||
| # Concatenate self and cross attention and project back to dim_head | ||
| self.concat_project = nn.Sequential( | ||
| linear_layer(2 * dim_head, dim_head), | ||
| nn.GELU(), | ||
| ) | ||
|
|
||
| # te attention | ||
| if self.use_te: | ||
| self.attn_fn = te.DotProductAttention( | ||
| num_attention_heads=self.heads, | ||
| kv_channels=self.dim_head, | ||
| attention_dropout=dropout, | ||
| qkv_format="bshd", | ||
| softmax_scale=self.scale | ||
| ) |
There was a problem hiding this comment.
Unreachable TE code paths throughout constructor and
forward
use_te=True raises ValueError at line 129 before super().__init__() is ever called, so self.use_te is always False. Every subsequent if self.use_te: branch — including self.attn_fn = te.DotProductAttention(...) in __init__ (lines 177–184) and the entire TE blocks inside forward() (lines 237–249, 261–268) — is permanently dead code. This also means the te.Linear option in linear_layer (line 149) can never be selected. Consider removing the dead branches or adding a TODO comment explaining TE support is planned for the future.
| Whether to use learned concrete dropout instead of standard dropout. | ||
| Default is ``False``. | ||
| state_mixing_mode : str, optional | ||
| How to blend self-attention and cross-attention outputs. ``"weighted"`` uses |
There was a problem hiding this comment.
Same extra-space docstring issue as in
gale.py for state_mixing_mode — extraneous spaces between the period and the first backtick-quoted option.
| How to blend self-attention and cross-attention outputs. ``"weighted"`` uses | |
| How to blend self-attention and cross-attention outputs. ``"weighted"`` uses |
|
None of the greptile comments will be resolved because we are reverting here. |
|
This PR fixes the issue with the CFD checkpoints |
Reverts #1502