mv adapt implementation from gitlab; upgrade to 3D by pzhanggit · Pull Request #13 · ORNL/MATEY

pzhanggit · 2026-01-11T00:27:37Z

The long-delayed moving of adaptive tokenization implementation in gitlab repo here

move adaptive tokens from data to model forward
supporting 3D and multiple levels (not tested yet)
clean up codes

To test the codes, run sbatch submit_batch_tests.sh and sbatch submit_batch_adapt.sh on Frontier.

TsChala · 2026-01-12T16:43:18Z

matey/train.py

                forward_time = forward_end-model_start
                if torch.isnan(loss) or  not torch.isfinite(loss):
                    print(f"NaN detected in loss at batch {batch_idx}. Skipping batch...")
+                    continue


I'm wondering how would this skip affect the loss if gradient accumulation is turned on? I think it might lead to incorrect loss calculation for the samples near the skipped ones, but I'm not sure how it would work exactly.

Good point, it will break the gradient accumulation. Let me disable it for gradient accumulation.

TsChala · 2026-01-12T18:56:01Z

matey/data_utils/shared_utils.py

+    for idim, dim in enumerate(space_dims):
+        ntokendim.append(dim//ps[idim])
+    num_tokens=reduce(mul, ntokendim)
+    #T,B,C,D,H,W-->T,B,C,ntz,ntx,nty,psz,psx,psy->B,C,ntz,ntx,nty->B,ntz,nty,nty


Last dimensions in the comment should be B,ntz,ntx,nty?

TsChala · 2026-01-12T19:00:37Z

matey/data_utils/shared_utils.py

-        variance = xdata.unfold(2,ps[0],ps[0]).unfold(3,ps[1],ps[1]).unfold(4,ps[2],ps[2]).var(dim=(0,-3,-2,-1)).mean(dim=0)
-        assert ntokendim==list(variance.shape)
-        variance = rearrange(variance, 'ntz ntx nty -> (ntz ntx nty)')
+        #T,B,C,D,H,W-->T,B,C,ntz,ntx,nty,psz,psx,psy->B,C,ntz,ntx,nty->B,ntz,nty,nty


last dimension: B,ntz,ntx,nty?

TsChala · 2026-01-12T19:29:17Z

matey/models/attention_modules.py

+        #mask2d_padding = repeat(mask_padding, 'b c1 len -> b (len1 c1) len', len1=len).unsqueeze(1)
+        valid = mask_padding.squeeze(1).to(torch.bool) #(B, L),True=meaningful
+        mask2d_padding = valid[:, None, None, :] #(B,1,1,L)
+        #without this backend, ran into RuntimeError: Function 'ScaledDotProductEfficientAttentionBackward0' returned nan values in its 0th output.


Seems like we could use this SPDA backend selector to select a flash attention backend instead of using the flash_attn_func if we want. It's not clear if they are exactly the same though.
Also F.scaled_dot_product_attention is called on line 268 as well in the forward of Attention2DBlock, so maybe add the backend there as well for consistency?

We could, but this is a temporary fix. This might be related to pytorch versions, pytorch/pytorch#119320. We will come back to test it later.

Re line 268, we do not want to change it. We still want the more efficient implementations as long as they do not break the runs.

mv adapt implementation from gitlab; upgrade to 3D

04c21b0

pzhanggit requested review from TsChala, aprokop and mtlaiu January 11, 2026 00:27

pzhanggit mentioned this pull request Jan 11, 2026

Dev list before release #11

Open

TsChala approved these changes Jan 12, 2026

View reviewed changes

fix Hunor's review comments

18a5ec3

pzhanggit requested a review from TsChala January 12, 2026 20:42

TsChala approved these changes Jan 12, 2026

View reviewed changes

pzhanggit merged commit 8203197 into ORNL:main Jan 12, 2026

aprokop mentioned this pull request Jan 13, 2026

Add Flow3D data loader #9

Merged

Lyxien mentioned this pull request Mar 24, 2026

RuntimeError: Function 'ScaledDotProductEfficientAttentionBackward0' returned nan values in its 0th output when using attn_mask in SDPA backward pytorch/pytorch#178251

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mv adapt implementation from gitlab; upgrade to 3D#13

mv adapt implementation from gitlab; upgrade to 3D#13
pzhanggit merged 2 commits intoORNL:mainfrom
pzhanggit:cleanup-adapttoken

pzhanggit commented Jan 11, 2026

Uh oh!

TsChala Jan 12, 2026

Uh oh!

pzhanggit Jan 12, 2026

Uh oh!

TsChala Jan 12, 2026

Uh oh!

TsChala Jan 12, 2026

Uh oh!

TsChala Jan 12, 2026

Uh oh!

pzhanggit Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pzhanggit commented Jan 11, 2026

Uh oh!

TsChala Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

pzhanggit Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

TsChala Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

TsChala Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

TsChala Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

pzhanggit Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants