Doesn't work well in speech generation task.

Thank you for your open-source work. I would like to ask you some questions. I tried to use diffloss for a speech generation task, adopting the next token prediction approach. This corresponds to the order=raster, direction=causal, #preds=1 in your paper. However, it did not converge well. Could you help me analyze what might be causing this issue? Thanks a lot!