Skip to content

Loss Function Problem #87

@lazyeden

Description

@lazyeden

Thank you for your work. I have the following questions to discuss with you:

  1. why does the loss mention in equation 2 of the paper need to sum the t=2...T losses, while the code implementation only samples one time step t at each batch: t, weights = self.schedule_sampler.sample(micro.shape[0], dist_util.dev())
  2. why is the MSE loss written in equation 2 of the paper ||EMB(w)-f(z1,1)||, while the code implements ||EMB(w)-f(z0,0)||:
    t0_mask = (t == 0)
    t0_loss = mean_flat((x_start_mean - model_out_x_start) ** 2)
    terms["mse"] = th.where(t0_mask, t0_loss, terms["mse"])
  3. what is the meaning of predict_xstart, and how is the result of model_out_x_start different when it is True or False
  4. Is the rounding process reversible? In other words, can I generate text and then work backwards to get the embedding before rounding, or even go forward to get the noise of each step of the denoising process?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions