Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async sampling #198

Merged
merged 21 commits into from
Apr 28, 2024
Merged

Conversation

lucasavila00
Copy link
Contributor

@lucasavila00 lucasavila00 commented Apr 23, 2024

./target/profiling/mistralrs-bench -p 0 -g 64 -r 1 -c 8  gguf -t mistralai/Mistral-7B-Instruct-v0.1 -m TheBloke/Mistral-7B-Instruct-v0.1-GGUF -f mistral-7b-instruct-v0.1.Q4_K_M.gguf

Master

image

This PR

image

Copy link

github-actions bot commented Apr 23, 2024

Code Metrics Report
  ───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines   Blanks  Comments     Code Complexity
───────────────────────────────────────────────────────────────────────────────
Rust                        70     23237     1543       508    21186       1278
───────────────────────────────────────────────────────────────────────────────
Total                       70     23237     1543       508    21186       1278
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop 66,725
Estimated Schedule Effort 11.790000 months
Estimated People Required 5.023991
───────────────────────────────────────────────────────────────────────────────
Processed 764768 bytes, 0.765 megabytes (SI)
───────────────────────────────────────────────────────────────────────────────
  

@EricLBuehler
Copy link
Owner

@lucasavila00, thanks for your work here. I think we definitely need to pursue the throughput decrease as the batch increase. I like the idea of async being incorporated, did this not work out?

@lucasavila00
Copy link
Contributor Author

@EricLBuehler It had mixed results.

On bs=8 it improved from 5ms to 2ms.

But on bs=1 it got worse, from 0.8ms to 1.2ms.

Clippy raised an issue of a non-async mutex being held across an await point.

The best fix for it would be to use an async aware mutex? Or drop the lock a lot of times and re-lock when needed?

I'm still learning async rust and don't feel confident enough to work on this MR yet, as it requires defining the overall async structure for the engine.

Also, profiling CPU code became harder (eg: samply profiler). Only the nvidia profiler showed interpretable results.

@EricLBuehler
Copy link
Owner

EricLBuehler commented Apr 26, 2024

On bs=8 it improved from 5ms to 2ms.

Great! Perhaps we could profile it and see when the performance gains go away, and use this then.

Clippy raised an issue of a non-async mutex being held across an await point.

We could use this type.

I think this sort of structure is very interesting, I'll take a look in the next few days. Thanks for working on it.

@lucasavila00 lucasavila00 reopened this Apr 26, 2024
@lucasavila00
Copy link
Contributor Author

I'll leave it open, so it picks up my commits.

I just fixed the clippy issue, by not holding the lock across awaits.

@lucasavila00
Copy link
Contributor Author

@EricLBuehler cargo test fails because it can't download the mistral tokenizer from HF.

Any chance the CI account does not have access to the model, which has been recently locked (locked like llama where one needs to request access)

@lucasavila00
Copy link
Contributor Author

Great! Perhaps we could profile it and see when the performance gains go away, and use this then.

I implemented it in the last commit. It only uses the async pool if there's more than one batch. Then, the performance loss is gone.

It means that the async code is free if not used.

@lucasavila00
Copy link
Contributor Author

lucasavila00 commented Apr 26, 2024

I added 2 nvidia profiler screenshots to the main PR body, comparing master to this PR, showing the improvements.

@lucasavila00 lucasavila00 marked this pull request as ready for review April 26, 2024 06:12
@lucasavila00 lucasavila00 changed the title Add async sampling POC Async sampling Apr 26, 2024
@lucasavila00 lucasavila00 mentioned this pull request Apr 26, 2024
@EricLBuehler
Copy link
Owner

@lucasavila00

cargo test fails because it can't download the mistral tokenizer from HF.

I set the HF_TOKEN secret during CI:

TESTS_HF_TOKEN: ${{ secrets.HF_TOKEN }}

@EricLBuehler
Copy link
Owner

@lucasavila00, it looks like there are some merge conflicts.

@lucasavila00
Copy link
Contributor Author

@lucasavila00, it looks like there are some merge conflicts.

I'm fixing it

) -> Result<()> {
let seqs_len = seqs.len();
let logits_seq = logits.chunk(seqs_len, 0).unwrap();
let logits_seq = logits.to_device(&Device::Cpu)?.chunk(seqs_len, 0)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we do a synchronization before we start sampling, we can get statistics about sampling speed.

Basically reverting #151

@lucasavila00
Copy link
Contributor Author

Master

+------------------------------------+---------+--------+--------------+--------------+-------------+--------------+
| model                              | backend | test   | t/s          | ms/t         | concurrency | throughput/s |
+------------------------------------+---------+--------+--------------+--------------+-------------+--------------+
| mistralai/Mistral-7B-Instruct-v0.1 | CUDA    | tg 128 | 58.712±0.644 | 17.034±0.190 |           1 |    58.712006 |
+------------------------------------+---------+--------+--------------+--------------+-------------+--------------+
+------------------------------------+---------+--------+--------------+--------------+-------------+--------------+
| model                              | backend | test   | t/s          | ms/t         | concurrency | throughput/s |
+------------------------------------+---------+--------+--------------+--------------+-------------+--------------+
| mistralai/Mistral-7B-Instruct-v0.1 | CUDA    | tg 128 | 45.956±0.805 | 21.766±0.380 |           2 |      91.9128 |
+------------------------------------+---------+--------+--------------+--------------+-------------+--------------+
+------------------------------------+---------+--------+--------------+--------------+-------------+--------------+
| model                              | backend | test   | t/s          | ms/t         | concurrency | throughput/s |
+------------------------------------+---------+--------+--------------+--------------+-------------+--------------+
| mistralai/Mistral-7B-Instruct-v0.1 | CUDA    | tg 128 | 29.011±0.321 | 34.473±0.388 |           4 |     116.0458 |
+------------------------------------+---------+--------+--------------+--------------+-------------+--------------+
+------------------------------------+---------+--------+--------------+--------------+-------------+--------------+
| model                              | backend | test   | t/s          | ms/t         | concurrency | throughput/s |
+------------------------------------+---------+--------+--------------+--------------+-------------+--------------+
| mistralai/Mistral-7B-Instruct-v0.1 | CUDA    | tg 128 | 15.469±0.388 | 64.685±1.639 |           8 |    123.75505 |
+------------------------------------+---------+--------+--------------+--------------+-------------+--------------+

This

+------------------------------------+---------+--------+--------------+--------------+-------------+--------------+
| model                              | backend | test   | t/s          | ms/t         | concurrency | throughput/s |
+------------------------------------+---------+--------+--------------+--------------+-------------+--------------+
| mistralai/Mistral-7B-Instruct-v0.1 | CUDA    | tg 128 | 58.752±0.501 | 17.022±0.147 |           1 |    58.752266 |
+------------------------------------+---------+--------+--------------+--------------+-------------+--------------+
+------------------------------------+---------+--------+--------------+--------------+-------------+--------------+
| model                              | backend | test   | t/s          | ms/t         | concurrency | throughput/s |
+------------------------------------+---------+--------+--------------+--------------+-------------+--------------+
| mistralai/Mistral-7B-Instruct-v0.1 | CUDA    | tg 128 | 48.121±0.124 | 20.781±0.053 |           2 |     96.24125 |
+------------------------------------+---------+--------+--------------+--------------+-------------+--------------+
+------------------------------------+---------+--------+--------------+--------------+-------------+--------------+
| model                              | backend | test   | t/s          | ms/t         | concurrency | throughput/s |
+------------------------------------+---------+--------+--------------+--------------+-------------+--------------+
| mistralai/Mistral-7B-Instruct-v0.1 | CUDA    | tg 128 | 30.030±0.020 | 33.300±0.022 |           4 |    120.12018 |
+------------------------------------+---------+--------+--------------+--------------+-------------+--------------+
+------------------------------------+---------+--------+--------------+--------------+-------------+--------------+
| model                              | backend | test   | t/s          | ms/t         | concurrency | throughput/s |
+------------------------------------+---------+--------+--------------+--------------+-------------+--------------+
| mistralai/Mistral-7B-Instruct-v0.1 | CUDA    | tg 128 | 16.839±0.008 | 59.384±0.027 |           8 |    134.71562 |
+------------------------------------+---------+--------+--------------+--------------+-------------+--------------+

@EricLBuehler EricLBuehler added this to the Version 0.1.0 milestone Apr 26, 2024
@EricLBuehler
Copy link
Owner

@lucasavila00, this looks good. However, I think there is one more merge conflict.

Copy link
Owner

@EricLBuehler EricLBuehler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I think I made some mistakes when doing the conflict resolution, so I've marked them.

mistralrs-core/src/pipeline/mod.rs Outdated Show resolved Hide resolved
mistralrs-core/src/pipeline/mod.rs Outdated Show resolved Hide resolved
@EricLBuehler EricLBuehler modified the milestones: 0.1.0, 0.2.0 Apr 27, 2024
Copy link
Owner

@EricLBuehler EricLBuehler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you for adding this!

@EricLBuehler
Copy link
Owner

@lucasavila00, I think there are unfortunately still some merge conflicts.

@EricLBuehler EricLBuehler mentioned this pull request Apr 28, 2024
@lucasavila00
Copy link
Contributor Author

@EricLBuehler please consider squashing this MR

Or please let me know if I should squash it

@EricLBuehler EricLBuehler merged commit 73e4acf into EricLBuehler:master Apr 28, 2024
8 of 11 checks passed
@EricLBuehler
Copy link
Owner

@lucasavila00 thank you for adding this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants