Skip to content

关于token生成速率的计算问题 #150

@bulaikexiansheng

Description

@bulaikexiansheng

您好!感谢您出色的工作!我在复现您的工作的时候尝试计算每一个模型的生成速率(针对于您提供的Evaluation中的Falcon-40b)。代码为
./build/bin/main -m /data/models/falcon-40b-relu-powerinfer/falcon-40b-relu.q4.powerinfer.gguf -n 512 -t 8 -p "In the depths of twilight, where shadows dance with whispers, ancient secrets stir beneath the surface, beckoning the curious to unravel mysteries that linger within the fabric of time and space, awaiting discovery and enlightenment. The moon casts its gentle glow, illuminating pathways obscured by darkness, guiding intrepid souls on a journey towards the unknown, where truths and wonders intertwine in the eternal dance of existence." --vram-budget 22

我使用以下的方法计算Falcon-40B:
llama_print_timings: load time = 13806.71 ms
llama_print_timings: sample time = 240.38 ms / 254 runs ( 0.95 ms per token, 1056.67 tokens per second)
llama_print_timings: prompt eval time = 2031.23 ms / 79 tokens ( 25.71 ms per token, 38.89 tokens per second)
llama_print_timings: eval time = 10300.94 ms / 253 runs ( 40.72 ms per token, 24.56 tokens per second)
llama_print_timings: total time = 12701.22 ms
Log end

1000 / (0.95 + 25.71 + 40.72)= 1000 / 67.38 = 14.84 token/s

想请问您这样的计算方式是正确的吗?如果不正确,您论文中的计算方式是什么的呢?期待您的解答!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions