-
Notifications
You must be signed in to change notification settings - Fork 104
Open
Description
Thank you for your amazing work. However, we have identified some limitations in certain high-frequency specific scenarios. Therefore, we attempted further fine-tuning (LoRA, scenario-specific datasets) on the basis of your model. Yet, we also found that the top-k Mask of Block-Sparse-Attention used in the inference code has no gradients. We would like to consult you: how did you implement this in Step 2 of the training?
Alexis-Fab
Metadata
Metadata
Assignees
Labels
No labels