You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm just a bit confused on how the loss is computed. From my understanding, for a given training loop we have a ground truth image denoted as GT. GT is passed through a series of 1000 timesteps t, and at each timestep a small amount of random gaussian noise is added.
Lets say the network takes in as input the noisy GT image from timestep 50. The network should predict the small amount of noise that was generated at timestep 50 right? So when we compute the loss, it should be the noise the network predicted that was generated at timestep 50 vs the actual noise that was generated at timestep 50? Or am I understanding it wrong.
In that case, why when the loss is calculated, the value for the actual noise is computed as being torch.randn_like(y_0) and not the noise at t=50?
noise = default(noise, lambda: torch.randn_like(y_0))
y_noisy = self.q_sample(
y_0=y_0, sample_gammas=sample_gammas.view(-1, 1, 1, 1), noise=noise)
if mask is not None:
noise_hat = self.denoise_fn(torch.cat([y_cond, y_noisy*mask+(1.-mask)*y_0], dim=1), sample_gammas)
loss = self.loss_fn(mask*noise, mask*noise_hat)
The text was updated successfully, but these errors were encountered:
Thanks for your attention.
Here just the noise added from y_0 is predicted.
At inference, y_0 is obtained by predict noise, and then y_{t-1} is obtained by the posterior distribution of y_0 with y_t
Hi,
I'm just a bit confused on how the loss is computed. From my understanding, for a given training loop we have a ground truth image denoted as GT. GT is passed through a series of 1000 timesteps t, and at each timestep a small amount of random gaussian noise is added.
Lets say the network takes in as input the noisy GT image from timestep 50. The network should predict the small amount of noise that was generated at timestep 50 right? So when we compute the loss, it should be the noise the network predicted that was generated at timestep 50 vs the actual noise that was generated at timestep 50? Or am I understanding it wrong.
In that case, why when the loss is calculated, the value for the actual noise is computed as being torch.randn_like(y_0) and not the noise at t=50?
The text was updated successfully, but these errors were encountered: