Hi,
I try benchmark_serving.py to check the througput of lightllm, But seems benchmark process stuck after server print the "freed all gpu mem", then http post print would no longer print except last one.
Any idea?
current batch size: 1 token used ratio: 0.31983333333333336
freed all gpu mem size 6000
INFO: 127.0.0.1:34050 - "POST /generate HTTP/1.1" 200 OK
Hi,
I try benchmark_serving.py to check the througput of lightllm, But seems benchmark process stuck after server print the "freed all gpu mem", then http post print would no longer print except last one.
Any idea?