Model fine-tuning with lora

Thank you for your amazing work. However, we have identified some limitations in certain high-frequency specific scenarios. Therefore, we attempted further fine-tuning (LoRA, scenario-specific datasets) on the basis of your model. Yet, we also found that the top-k Mask of Block-Sparse-Attention used in the inference code has no gradients. We would like to consult you: how did you implement this in Step 2 of the training?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model fine-tuning with lora #73

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model fine-tuning with lora #73

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions