From 178551ba52ef15a9037cf6e99b0ec790ca678d55 Mon Sep 17 00:00:00 2001 From: qihqi Date: Thu, 13 Jun 2024 10:44:00 -0700 Subject: [PATCH] Update summary.md --- benchmarks/summary.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/benchmarks/summary.md b/benchmarks/summary.md index e9c31ee3..41b011de 100644 --- a/benchmarks/summary.md +++ b/benchmarks/summary.md @@ -22,6 +22,8 @@ Date | Device | dtype | batch size | cache length |max input length |max output ----| ------- | ------ |---------- | -------------|-----------------|------------------|---------------------- 2024-05-14 | TPU v5e-8 | bfloat16 | 512 | 2048 | 1024 | 1024 | 8700 2024-05-14 | TPU v5e-8 | int8 | 1024 | 2048 | 1024 | 1024 | 8746 +2024-06-13 | TPU v5e-1 | bfloat16 | 1024 | 2048 | 1024 | 1024 | 4249 + ** NOTE: ** Gemma 2B uses `--shard_on_batch` flag so it's data parallel instead of model parallel.