The performance of the DJL (Deep Java Library) framework #3201

funkaikai · 2024-05-17T14:53:18Z

funkaikai
May 17, 2024

In the Docker container virtualized on the company's Linux server
djl-bench -e PyTorch -u https://alpha-djl-demos.s3.amazonaws.com/model/djl-blockrunner/pytorch_resnet18.zip -n traced_resnet18 -c 1000 -s 1,3,224,224
result:
[INFO ] - Number of inter-op threads is 4
[INFO ] - Number of intra-op threads is 8
[INFO ] - Load PyTorch (2.1.1) in 0.086 ms.
[INFO ] - Running Benchmark on: cpu().
Loading: 100% |████████████████████████████████████████|
[INFO ] - Model traced_resnet18 loaded in: 1605.037 ms.
[INFO ] - Warmup with 2 iteration ...
[INFO ] - Warmup latency, min: 115.385 ms, max: 265.870 ms
Iteration: 100% |████████████████████████████████████████|
[INFO ] - Inference result: [-0.06938132, 0.6169942, -1.9312556 ...]
[INFO ] - Throughput: 12.79, completed 1000 iteration in 78192 ms.
[INFO ] - Model loading time: 1605.037 ms.
[INFO ] - total P50: 39.692 ms, P90: 165.100 ms, P99: 573.895 ms
[INFO ] - inference P50: 39.146 ms, P90: 161.749 ms, P99: 572.999 ms
[INFO ] - preprocess P50: 0.179 ms, P90: 2.093 ms, P99: 10.505 ms
[INFO ] - postprocess P50: 0.107 ms, P90: 0.152 ms, P99: 0.446 ms
However, on another Windows machine, it was run directly.
[INFO ] - Number of inter-op threads is 1
[INFO ] - Number of intra-op threads is 1
[INFO ] - Load PyTorch (2.1.1) in 0.019 ms.
[INFO ] - Running MultithreadedBenchmark on: [cpu()].
[INFO ] - Multithreading inference with 2 threads.
Loading: 100% |========================================|
[INFO ] - Model traced_resnet18 loaded in: 1583.521 ms.
[INFO ] - Warmup with 2 iteration ...
[INFO ] - Warmup latency, min: 37.997 ms, max: 119.212 ms
[INFO ] - Completed 100 requests
[INFO ] - Inference result: [-0.06938224, 0.616994, -1.9312545 ...]
[INFO ] - Throughput: 46.66, completed 100 iteration in 2143 ms.
[INFO ] - Model loading time: 1583.521 ms.
[INFO ] - total P50: 42.663 ms, P90: 44.797 ms, P99: 55.650 ms
[INFO ] - inference P50: 42.545 ms, P90: 44.657 ms, P99: 55.518 ms
[INFO ] - preprocess P50: 0.074 ms, P90: 0.092 ms, P99: 0.259 ms
[INFO ] - postprocess P50: 0.041 ms, P90: 0.054 ms, P99: 0.346 ms

why ?

frankfliu · 2024-05-17T15:32:58Z

frankfliu
May 17, 2024

I think you specified "-t 2" in another machine. It running multi-threading benchmark.

2 replies

funkaikai May 17, 2024
Author

windows PC
.\benchmark.bat -e PyTorch -u https://alpha-djl-demos.s3.amazonaws.com/model/djl-blockrunner/pytorch_resnet18.zip -n traced_resnet18 -c 100 -s 1,3,224,224
[INFO ] - PyTorch graph executor optimizer is enabled, this may impact your inference latency and throughput. See: https://docs.djl.ai/docs/development/inference_performance_optimization.html#graph-executor-optimization
[INFO ] - Number of inter-op threads is 10
[INFO ] - Number of intra-op threads is 14
[INFO ] - Load PyTorch (2.1.1) in 0.020 ms.
[INFO ] - Running Benchmark on: cpu().
Loading: 100% |========================================|
[INFO ] - Model traced_resnet18 loaded in: 1264.451 ms.
[INFO ] - Warmup with 2 iteration ...
[INFO ] - Warmup latency, min: 34.200 ms, max: 84.176 ms
Iteration: 100% |========================================|
[INFO ] - Inference result: [-0.06938224, 0.616994, -1.9312545 ...]
[INFO ] - Throughput: 59.49, completed 100 iteration in 1681 ms.
[INFO ] - Model loading time: 1264.451 ms.
[INFO ] - total P50: 16.276 ms, P90: 18.224 ms, P99: 21.684 ms
[INFO ] - inference P50: 16.113 ms, P90: 17.998 ms, P99: 21.370 ms
[INFO ] - preprocess P50: 0.147 ms, P90: 0.169 ms, P99: 0.853 ms
[INFO ] - postprocess P50: 0.049 ms, P90: 0.059 ms, P99: 0.119 ms

funkaikai May 17, 2024
Author

windows PC: total P50: 16.276 ms, P90: 18.224 ms, P99: 21.684 ms
docker container: total P50: 39.692 ms, P90: 165.100 ms, P99: 573.895 ms
？

frankfliu · 2024-05-17T16:34:42Z

frankfliu
May 17, 2024

What's your question?

1 reply

funkaikai May 18, 2024
Author

1.I want to know if this performance is normal? It performs poorly on a Linux docker container. Is it because our server's performance is not up to the mark? Or do we need to use a GPU for the inference process? Under normal circumstances, the prediction process of the model we use in the production environment takes about 20ms per prediction.
2.We are considering using it in a production environment, where the QPS for a single scenario is about 1000/s. The throughput on the Linux server is limited and the processing time is quite long. Or could you give us some suggestions?

frankfliu · 2024-05-18T02:22:33Z

frankfliu
May 18, 2024

You are using single threaded test, usually on server you should use multi-threading for benchmark.

In single thread case, inference can use as much CPU as possible, your linux machine only has 4 CPU, while your windows has 10. You need to find a balance between throughput vs latency. That's why we create the djl-bench tool to help you find out the performance of the model with different configuration.

1 reply

funkaikai May 18, 2024
Author

do u mean I need to add the parameter -t n for benchmark ?
I tried adding this parameter earlier, but found that it only increased the throughput, the inference time did not change.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The performance of the DJL (Deep Java Library) framework #3201

{{title}}

Replies: 3 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

The performance of the DJL (Deep Java Library) framework #3201

funkaikai May 17, 2024

Replies: 3 comments · 4 replies

frankfliu May 17, 2024

funkaikai May 17, 2024 Author

funkaikai May 17, 2024 Author

frankfliu May 17, 2024

funkaikai May 18, 2024 Author

frankfliu May 18, 2024

funkaikai May 18, 2024 Author

funkaikai
May 17, 2024

Replies: 3 comments 4 replies

frankfliu
May 17, 2024

funkaikai May 17, 2024
Author

funkaikai May 17, 2024
Author

frankfliu
May 17, 2024

funkaikai May 18, 2024
Author

frankfliu
May 18, 2024

funkaikai May 18, 2024
Author