关于token生成速率的计算问题

您好！感谢您出色的工作！我在复现您的工作的时候尝试计算每一个模型的生成速率（针对于您提供的Evaluation中的Falcon-40b）。代码为
`./build/bin/main -m /data/models/falcon-40b-relu-powerinfer/falcon-40b-relu.q4.powerinfer.gguf -n 512 -t 8 -p "In the depths of twilight, where shadows dance with whispers, ancient secrets stir beneath the surface, beckoning the curious to unravel mysteries that linger within the fabric of time and space, awaiting discovery and enlightenment. The moon casts its gentle glow, illuminating pathways obscured by darkness, guiding intrepid souls on a journey towards the unknown, where truths and wonders intertwine in the eternal dance of existence." --vram-budget 22`

我使用以下的方法计算Falcon-40B：
llama_print_timings:        load time =   13806.71 ms
llama_print_timings:      sample time =     240.38 ms /   254 runs   (    **0.95** ms per token,  1056.67 tokens per second)
llama_print_timings: prompt eval time =    2031.23 ms /    79 tokens (   **25.71** ms per token,    38.89 tokens per second)
llama_print_timings:        eval time =   10300.94 ms /   253 runs   (   **40.72** ms per token,    24.56 tokens per second)
llama_print_timings:       total time =   12701.22 ms
Log end

1000 / （0.95 + 25.71 + 40.72）= 1000 / 67.38 = 14.84 token/s

想请问您这样的计算方式是正确的吗？如果不正确，您论文中的计算方式是什么的呢？期待您的解答！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于token生成速率的计算问题 #150

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

关于token生成速率的计算问题 #150

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions