AutoencoderKLWan - support grandient_checkpointing #11071

agwmon · 2025-03-16T14:49:43Z

Do you have a plan for supporting gradient checkpointing for AutoencoderKLWan?

Thank you for always working hard for open source 🙏🙏

a-r-r-o-w · 2025-03-18T09:48:08Z

Hey @agwmon! We'd love if you could contribute the changes. Any of the other modeling implementations are good examples of how to apply it. I think you will just have to call the self._gradient_checkpointing_func around up/down/mid and resnet blocks.

victolee0 · 2025-03-18T14:03:08Z

@a-r-r-o-w
I'm encountering an error when running test code with gradient checkpointing enabled in my PR.

When backward() is called, the following function is executed:

x = self._gradient_checkpointing_func(resnet, x, feat_cache, feat_idx)

This function calls the forward method of the ResidualBlock. The problem is that the forward method include feat_idx += 1 (feat_idx[0] += 1), which eventually causes feat_idx > len(feat_cache), resulting in an IndexError.

diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py

Lines 239 to 279 in 3be6706

    
           def forward(self, x, feat_cache=None, feat_idx=[0]): 
        
               # Apply shortcut connection 
        
               h = self.conv_shortcut(x) 
        
               # First normalization and activation 
        
               x = self.norm1(x) 
        
               x = self.nonlinearity(x) 
        
               if feat_cache is not None: 
        
                   idx = feat_idx[0] 
        
                   cache_x = x[:, :, -CACHE_T:, :, :].clone() 
        
                   if cache_x.shape[2] < 2 and feat_cache[idx] is not None: 
        
                       cache_x = torch.cat([feat_cache[idx][:, :, -1, :, :].unsqueeze(2).to(cache_x.device), cache_x], dim=2) 
        
                   x = self.conv1(x, feat_cache[idx]) 
        
                   feat_cache[idx] = cache_x 
        
                   feat_idx[0] += 1 
        
               else: 
        
                   x = self.conv1(x) 
        
               # Second normalization and activation 
        
               x = self.norm2(x) 
        
               x = self.nonlinearity(x) 
        
               # Dropout 
        
               x = self.dropout(x) 
        
               if feat_cache is not None: 
        
                   idx = feat_idx[0] 
        
                   cache_x = x[:, :, -CACHE_T:, :, :].clone() 
        
                   if cache_x.shape[2] < 2 and feat_cache[idx] is not None: 
        
                       cache_x = torch.cat([feat_cache[idx][:, :, -1, :, :].unsqueeze(2).to(cache_x.device), cache_x], dim=2) 
        
                   x = self.conv2(x, feat_cache[idx]) 
        
                   feat_cache[idx] = cache_x 
        
                   feat_idx[0] += 1 
        
               else: 
        
                   x = self.conv2(x) 
        
               # Add residual connection 
        
               return x + h

I'm not sure how to resolve this issue and would appreciate some guidance. Is there a recommended way to handle the feat_idx incrementing when using gradient checkpointing?

Thank you for your help!

a-r-r-o-w · 2025-03-19T05:22:00Z

Hmm, there's not really an easy way around this from a quick look. I believe what we're doing in this code is framewise-forwards. We don't really need a cache here but it saves some amount of computation to speed up decoding a bit.

A VAE refactor might be needed here, or atleast handle feat_idx not in-place. @yiyixuxu Do you have plans to refactor this (as we discussed that we'll merge Wan PR quickly and refactor later) or should I take a stab at it?

a-r-r-o-w added the contributions-welcome label Mar 18, 2025

victolee0 linked a pull request Mar 18, 2025 that will close this issue

[WIP]Add gradient checkpointing support for AutoencoderKLWan #11105

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoencoderKLWan - support grandient_checkpointing #11071

AutoencoderKLWan - support grandient_checkpointing #11071

agwmon commented Mar 16, 2025

a-r-r-o-w commented Mar 18, 2025

victolee0 commented Mar 18, 2025 •

edited

Loading

a-r-r-o-w commented Mar 19, 2025

AutoencoderKLWan - support grandient_checkpointing #11071

AutoencoderKLWan - support grandient_checkpointing #11071

Comments

agwmon commented Mar 16, 2025

a-r-r-o-w commented Mar 18, 2025

victolee0 commented Mar 18, 2025 • edited Loading

a-r-r-o-w commented Mar 19, 2025

victolee0 commented Mar 18, 2025 •

edited

Loading