Replies: 3 comments 4 replies
-
I think you specified "-t 2" in another machine. It running multi-threading benchmark. |
Beta Was this translation helpful? Give feedback.
-
What's your question? |
Beta Was this translation helpful? Give feedback.
-
You are using single threaded test, usually on server you should use multi-threading for benchmark. In single thread case, inference can use as much CPU as possible, your linux machine only has 4 CPU, while your windows has 10. You need to find a balance between throughput vs latency. That's why we create the |
Beta Was this translation helpful? Give feedback.
-
In the Docker container virtualized on the company's Linux server
djl-bench -e PyTorch -u https://alpha-djl-demos.s3.amazonaws.com/model/djl-blockrunner/pytorch_resnet18.zip -n traced_resnet18 -c 1000 -s 1,3,224,224
result:
[INFO ] - Number of inter-op threads is 4
[INFO ] - Number of intra-op threads is 8
[INFO ] - Load PyTorch (2.1.1) in 0.086 ms.
[INFO ] - Running Benchmark on: cpu().
Loading: 100% |████████████████████████████████████████|
[INFO ] - Model traced_resnet18 loaded in: 1605.037 ms.
[INFO ] - Warmup with 2 iteration ...
[INFO ] - Warmup latency, min: 115.385 ms, max: 265.870 ms
Iteration: 100% |████████████████████████████████████████|
[INFO ] - Inference result: [-0.06938132, 0.6169942, -1.9312556 ...]
[INFO ] - Throughput: 12.79, completed 1000 iteration in 78192 ms.
[INFO ] - Model loading time: 1605.037 ms.
[INFO ] - total P50: 39.692 ms, P90: 165.100 ms, P99: 573.895 ms
[INFO ] - inference P50: 39.146 ms, P90: 161.749 ms, P99: 572.999 ms
[INFO ] - preprocess P50: 0.179 ms, P90: 2.093 ms, P99: 10.505 ms
[INFO ] - postprocess P50: 0.107 ms, P90: 0.152 ms, P99: 0.446 ms
However, on another Windows machine, it was run directly.
[INFO ] - Number of inter-op threads is 1
[INFO ] - Number of intra-op threads is 1
[INFO ] - Load PyTorch (2.1.1) in 0.019 ms.
[INFO ] - Running MultithreadedBenchmark on: [cpu()].
[INFO ] - Multithreading inference with 2 threads.
Loading: 100% |========================================|
[INFO ] - Model traced_resnet18 loaded in: 1583.521 ms.
[INFO ] - Warmup with 2 iteration ...
[INFO ] - Warmup latency, min: 37.997 ms, max: 119.212 ms
[INFO ] - Completed 100 requests
[INFO ] - Inference result: [-0.06938224, 0.616994, -1.9312545 ...]
[INFO ] - Throughput: 46.66, completed 100 iteration in 2143 ms.
[INFO ] - Model loading time: 1583.521 ms.
[INFO ] - total P50: 42.663 ms, P90: 44.797 ms, P99: 55.650 ms
[INFO ] - inference P50: 42.545 ms, P90: 44.657 ms, P99: 55.518 ms
[INFO ] - preprocess P50: 0.074 ms, P90: 0.092 ms, P99: 0.259 ms
[INFO ] - postprocess P50: 0.041 ms, P90: 0.054 ms, P99: 0.346 ms
why ?
Beta Was this translation helpful? Give feedback.
All reactions