The images created by diffusers CNXS for **SD2.1** and canny input are clearly not right.

Interestingly, the intermediate results are identical for the first denoiser step (see `Compare intermediate results -- 15.ipynb`), but still the image is too brown.

Let's look at all denoiser step outputs now.

In [53]:
import os
os.environ['CUBLAS_WORKSPACE_CONFIG'] = ':4096:8'  # needed to make torch deterministic

In [54]:
import torch
from torch.testing import assert_close
from torch import allclose, nn, tensor
torch.set_printoptions(linewidth=200, precision=3, sci_mode=False)

In [55]:
device = 'cuda' if torch.cuda.is_available() else 'mps'
device_dtype = torch.float16 if device == 'cuda' else torch.float32

## Load logs

In [56]:
from diffusers.umer_debug_logger import UmerDebugLogger

In [68]:
cloud_cuda = UmerDebugLogger.load_log_objects_from_dir('logs/cloud')
local_cuda = UmerDebugLogger.load_log_objects_from_dir('logs/local_cuda')

print(len(cloud_cuda), len(local_cuda))

for i, (c,l) in enumerate(zip(cloud_cuda, local_cuda)):
    if c.msg!=l.msg: print(f'{i:<3}{c.msg:>20}{l.msg:>20}')

550 550


**Q:** I'm doing a 50-step decoding. Why does diffusers have 51 outputs? (Edit: now 51*4=204)
**A:** Changing `prediction_type` to `"epsilon"` made diffuers only have 4*50 outputs.

In [69]:
from itertools import zip_longest
for i, (c,l) in enumerate(zip_longest(cloud_cuda, local_cuda)):
    if c is not None and l is not None:
        #print(f'{i:<3}{c.msg:>25}{l.msg:>25}')
        pass
    elif c is not None:
        print(f'{i:<3}{c.msg:>25}{"-":>25}')
    elif l is not None:
        print(f'{i:<3}{"-":>25}{l.msg:>25}')

## Compare intermediate results

In [70]:
def mae(t1,t2):
    assert t1.shape==t2.shape
    return (t1-t2).abs().mean()

In [71]:
from functools import partial
from util_inspect import fmt_bool

def compare_intermediate_results(n=None,n_start=0,prec=5, compare_prec=3, ignore_base=False):
    if n is None: n=max(len(cloud_cuda), len(local_cuda))

    print(f'{"":<3} | {"block":<30} | {"shape":<20} | {"same names?":<12} | {"same shapes?":<12} | {"same values?":<12} | {"Δ cuda local -> cloud":<20}')
    print(f'{"":<3} | {"":<30} | {"":<20} | {"":<12} | {"":<12} | {"prec="+str(compare_prec):^12} | {"prec="+str(prec):^20}')

    def calc_total_len(lens): return sum(lens)+3*len(lens)-1
    total_len = calc_total_len((3,30,20,12,12,12,20))

    line = partial(
        lambda txt, width: print(txt * (width//len(txt))),
        width=total_len
    )
    
    line('#')
    for i in range(n_start,n):
        cc,lc = cloud_cuda[i], local_cuda[i]
                
        eq_name = cc.msg==lc.msg
        eq_shape = cc.shape==lc.shape
        
        if eq_shape:
            eq_vals = torch.allclose(cc.t,lc.t,atol=10**-compare_prec)
            mae_2 = mae(lc.t,cc.t) 
            mae_2 = ("{:>20."+str(prec)+"f}").format(mae_2)
        else:
            eq_vals,mae_2=False,'inf'
        
        print(f'{i+1:<3} | {cc.msg:<30} | {cc.shape:>20} | {fmt_bool(eq_name, "^12")} | {fmt_bool(eq_shape, "^12")} | {fmt_bool(eq_vals, "^12")} | {mae_2}')

In [72]:
compare_intermediate_results(compare_prec=3, prec=3, ignore_base=False)

    | block                          | shape                | same names?  | same shapes? | same values? | Δ cuda local -> cloud
    |                                |                      |              |              |    prec=3    |        prec=3       
#################################################################################################################################
1   | noise_pred_uncond              |       [1, 4, 64, 64] | [92m     y      [0m | [92m     y      [0m | [92m     y      [0m |                0.000
2   | noise_pred_text                |       [1, 4, 64, 64] | [92m     y      [0m | [92m     y      [0m | [92m     y      [0m |                0.000
3   | noise_pred__after_cfg          |       [1, 4, 64, 64] | [92m     y      [0m | [92m     y      [0m | [91m     n      [0m |                0.000
4   | pred_x0                        |       [1, 4, 64, 64] | [92m     y      [0m | [92m     y      [0m | [91m     n      [0m |               

Let's save the noise used by Heidelberg, so we can use it in diffuers

In [65]:
c_noises = []
noise_shape = None

for i,c in enumerate(cloud_cuda):
    if c.msg == 'noise':
        if noise_shape is None: noise_shape = c.t.shape
        else: assert c.t.shape == noise_shape
        c_noises.append(c.t)
        
print(f'Found {len(c_noises)} noises, all of shape {noise_shape}')

Found 50 noises, all of shape torch.Size([1, 4, 64, 64])


In [66]:
stacked_noises = torch.stack(c_noises)
stacked_noises.shape

torch.Size([50, 1, 4, 64, 64])

In [67]:
torch.save(stacked_noises, 'ddim_noises.pth')