Skip to content

Commit

Permalink
Update link to E2E notebook in LLaMA-2 blog (#20724)
Browse files Browse the repository at this point in the history
### Description
This PR updates a reference link in the LLaMA-2 blog post and fixes a
word formatting issue.

### Motivation and Context
With these changes, the link to the example E2E notebook works again.
  • Loading branch information
kunal-vaishnavi committed May 20, 2024
1 parent ca6b0f8 commit 5fd617a
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions src/routes/blogs/accelerating-llama-2/+page.svelte
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@
>batch size * (prompt length + token generation length) / wall-clock latency</i
> where wall-clock latency = the latency from running end-to-end and token generation length =
256 generated tokens. The E2E throughput is 2.4X more (13B) and 1.8X more (7B) when compared to
PyTorch compile. For higher batch size, sequence length like 16, 2048 pytorch eager times out,
PyTorch compile. For higher batch size, sequence length pairs such as (16, 2048), PyTorch eager times out,
while ORT shows better performance than compile mode.
</p>
<div class="grid grid-cols-1 lg:grid-cols-2 gap-4">
Expand Down Expand Up @@ -151,7 +151,7 @@

<p class="mb-4">
More details on these metrics can be found <a
href="https://github.com/microsoft/onnxruntime-inference-examples/blob/main/python/models/llama2/README.md"
href="https://github.com/microsoft/onnxruntime-inference-examples/blob/main/python/models/llama/README.md"
class="text-blue-500">here</a
>.
</p>
Expand Down

0 comments on commit 5fd617a

Please sign in to comment.