-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async sampling #198
Async sampling #198
Conversation
Code Metrics Report─────────────────────────────────────────────────────────────────────────────── Language Files Lines Blanks Comments Code Complexity ─────────────────────────────────────────────────────────────────────────────── Rust 70 23237 1543 508 21186 1278 ─────────────────────────────────────────────────────────────────────────────── Total 70 23237 1543 508 21186 1278 ─────────────────────────────────────────────────────────────────────────────── Estimated Cost to Develop 66,725 Estimated Schedule Effort 11.790000 months Estimated People Required 5.023991 ─────────────────────────────────────────────────────────────────────────────── Processed 764768 bytes, 0.765 megabytes (SI) ─────────────────────────────────────────────────────────────────────────────── |
@lucasavila00, thanks for your work here. I think we definitely need to pursue the throughput decrease as the batch increase. I like the idea of async being incorporated, did this not work out? |
@EricLBuehler It had mixed results. On bs=8 it improved from 5ms to 2ms. But on bs=1 it got worse, from 0.8ms to 1.2ms. Clippy raised an issue of a non-async mutex being held across an await point. The best fix for it would be to use an async aware mutex? Or drop the lock a lot of times and re-lock when needed? I'm still learning async rust and don't feel confident enough to work on this MR yet, as it requires defining the overall async structure for the engine. Also, profiling CPU code became harder (eg: samply profiler). Only the nvidia profiler showed interpretable results. |
Great! Perhaps we could profile it and see when the performance gains go away, and use this then.
We could use this type. I think this sort of structure is very interesting, I'll take a look in the next few days. Thanks for working on it. |
I'll leave it open, so it picks up my commits. I just fixed the clippy issue, by not holding the lock across awaits. |
@EricLBuehler cargo test fails because it can't download the mistral tokenizer from HF. Any chance the CI account does not have access to the model, which has been recently locked (locked like llama where one needs to request access) |
I implemented it in the last commit. It only uses the async pool if there's more than one batch. Then, the performance loss is gone. It means that the async code is free if not used. |
I added 2 nvidia profiler screenshots to the main PR body, comparing master to this PR, showing the improvements. |
I set the
|
@lucasavila00, it looks like there are some merge conflicts. |
I'm fixing it |
) -> Result<()> { | ||
let seqs_len = seqs.len(); | ||
let logits_seq = logits.chunk(seqs_len, 0).unwrap(); | ||
let logits_seq = logits.to_device(&Device::Cpu)?.chunk(seqs_len, 0)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we do a synchronization before we start sampling, we can get statistics about sampling speed.
Basically reverting #151
Master
This
|
@lucasavila00, this looks good. However, I think there is one more merge conflict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I think I made some mistakes when doing the conflict resolution, so I've marked them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you for adding this!
@lucasavila00, I think there are unfortunately still some merge conflicts. |
@EricLBuehler please consider squashing this MR Or please let me know if I should squash it |
@lucasavila00 thank you for adding this! |
Master
This PR