You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks a lot for this clear and fat-free code base!
I'm training Falcon-7B with adapters-v2 and an Alpaca-formated dataset of mine.
As usual, I'm trying to max out the vram use for best training time but in this case, there is no significant gain since the step time is almost proportional to the batch size.
Note:
I'll also open a new issue as advised with my attempt at batch inference, exhibiting the same lack of gain when batching at inference, see Lightning-AI/lit-llama#188 (comment)
The text was updated successfully, but these errors were encountered:
This would be expected in a memory-bound regime, where you are limited by the transfer of data. However, you would need to profile the code in your hardware to verify this.
Hi,
Thanks a lot for this clear and fat-free code base!
I'm training Falcon-7B with adapters-v2 and an Alpaca-formated dataset of mine.
As usual, I'm trying to max out the vram use for best training time but in this case, there is no significant gain since the step time is almost proportional to the batch size.
step times:
micro_batch_size 1, 159ms
micro_batch_size 2, 293ms
micro_batch_size 4, 560ms
Is this expected, or can this be optimized?
Note:
I'll also open a new issue as advised with my attempt at batch inference, exhibiting the same lack of gain when batching at inference, see
Lightning-AI/lit-llama#188 (comment)
The text was updated successfully, but these errors were encountered: