diff --git a/benchmarks/summary.md b/benchmarks/summary.md index e9c31ee3..41b011de 100644 --- a/benchmarks/summary.md +++ b/benchmarks/summary.md @@ -22,6 +22,8 @@ Date | Device | dtype | batch size | cache length |max input length |max output ----| ------- | ------ |---------- | -------------|-----------------|------------------|---------------------- 2024-05-14 | TPU v5e-8 | bfloat16 | 512 | 2048 | 1024 | 1024 | 8700 2024-05-14 | TPU v5e-8 | int8 | 1024 | 2048 | 1024 | 1024 | 8746 +2024-06-13 | TPU v5e-1 | bfloat16 | 1024 | 2048 | 1024 | 1024 | 4249 + ** NOTE: ** Gemma 2B uses `--shard_on_batch` flag so it's data parallel instead of model parallel.