Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running ONNX Resnet18 model gets stuck with command ‘-O 99’ #198

Open
Alwinnnn opened this issue Nov 28, 2023 · 3 comments
Open

Running ONNX Resnet18 model gets stuck with command ‘-O 99’ #198

Alwinnnn opened this issue Nov 28, 2023 · 3 comments

Comments

@Alwinnnn
Copy link

Hi,
I have implemented Rocket64b1gem16 on my FPGA with default configs and 8GiB DDR3.
The ONNX Resnet18 Model sometimes can run with command '-O 99' and I can get the right result. But sometimes it gets stuck.
With the optimizing command '-O 1' , the model can run every time but it takes more time.
Besides, chipyard spike simulator can always run this model with '-O 1' and '-O 99' correctly.
Here are the compared results.

Below is rocket64b1gem16 with '-O 99' result. This model can run correctly with '-O 99' occasionally.

debian@debian:~/imagenet_runner_0.7.1$ ./ort_test_gem16 -1 detection_quanV2.onnx -i images/2.jpg -x 2 -O 99
Loaded runner program
Using systolic in mode 2
Using Onnxruntime C++ API
2023-02-28 11:29:18.129004800 [W:onnxruntime:, graph.cc:1074 Graph] Initializer 301 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2023-02-28 11:29:18.188448000 [W:onnxruntime:, graph.cc:1074 Graph] Initializer 302 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2023-02-28 11:29:18.217388800 [W:onnxruntime:, graph.cc:1074 Graph] Initializer 303 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
Number of inputs = 1
Input 0 : name=input, type=1, num_dims=4: [1, 3, 320, 320, ]
Number of outputs = 12
Output 0 : name=299, type=1, num_dims=4: [1, 12, 20, 20, ]
Output 1 : name=301, type=1, num_dims=4: [1, 12, 10, 10, ]
Output 2 : name=303, type=1, num_dims=4: [1, 12, 5, 5, ]
Output 3 : name=305, type=1, num_dims=4: [1, 12, 3, 3, ]
Output 4 : name=307, type=1, num_dims=4: [1, 12, 2, 2, ]
Output 5 : name=309, type=1, num_dims=4: [1, 12, 1, 1, ]
Output 6 : name=300, type=1, num_dims=4: [1, 24, 20, 20, ]
Output 7 : name=302, type=1, num_dims=4: [1, 24, 10, 10, ]
Output 8 : name=304, type=1, num_dims=4: [1, 24, 5, 5, ]
Output 9 : name=306, type=1, num_dims=4: [1, 24, 3, 3, ]
Output 10 : name=308, type=1, num_dims=4: [1, 24, 2, 2, ]
Output 11 : name=310, type=1, num_dims=4: [1, 24, 1, 1, ]
Number of inputs = 1
Input 0 : name=input.1, type=1, num_dims=4: [1, 3, 256, 256, ]
Number of outputs = 1
Output 0 : name=231, type=1, num_dims=4: [1, 21, 64, 64, ]
yolox init
pose init
Loading image
Image dimensions: 256 256 3
Called into systolic conv
Using systolic pooling
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
0
11
2
0
0
0
84 140 207 264 0
resize took 0 cycles 5.487418 s
normalize_transpose took 0 cycles 3.011997 s
Done! Pre Process 1 took 0 cycles 8.499517 s
Done! Inference 1 took 0 cycles 5.220877 s
Done! Pre Process 1 took 0 cycles 1.010774 s

Below is rocket64b1gem16 with '-O 99' stuck result. This model sometimes gets stuck at the same place.

debian@debian:~/imagenet_runner_0.7.1$ ./ort_test_gem16 -1 detection_quanV2.onnx -2 pose_quanV2.onnx -i images/2.jpg -x 2 -O 99
Loaded runner program
Using systolic in mode 2
Using Onnxruntime C++ API
Number of inputs = 1
Input 0 : name=input, type=1, num_dims=4: [1, 3, 320, 320, ]
Number of outputs = 12
Output 0 : name=299, type=1, num_dims=4: [1, 12, 20, 20, ]
Output 1 : name=301, type=1, num_dims=4: [1, 12, 10, 10, ]
Output 2 : name=303, type=1, num_dims=4: [1, 12, 5, 5, ]
Output 3 : name=305, type=1, num_dims=4: [1, 12, 3, 3, ]
Output 4 : name=307, type=1, num_dims=4: [1, 12, 2, 2, ]
Output 5 : name=309, type=1, num_dims=4: [1, 12, 1, 1, ]
Output 6 : name=300, type=1, num_dims=4: [1, 24, 20, 20, ]
Output 7 : name=302, type=1, num_dims=4: [1, 24, 10, 10, ]
Output 8 : name=304, type=1, num_dims=4: [1, 24, 5, 5, ]
Output 9 : name=306, type=1, num_dims=4: [1, 24, 3, 3, ]
Output 10 : name=308, type=1, num_dims=4: [1, 24, 2, 2, ]
Output 11 : name=310, type=1, num_dims=4: [1, 24, 1, 1, ]
yolox init
Loading image
Image dimensions: 256 256 3
Called into systolic conv
Using systolic pooling
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic

Below is rocket64b1gem16 with '-O 1' result. This model can run correctly with '-O 1'.

debian@debian:~/imagenet_runner_0.7.1$ ./ort_test_gem16 -1 detection_quanV2.onnx -i images/2.jpg -x 2 -O 1
Loaded runner program
Using systolic in mode 2
Using Onnxruntime C++ API
Number of inputs = 1
Input 0 : name=input, type=1, num_dims=4: [1, 3, 320, 320, ]
Number of outputs = 12
Output 0 : name=299, type=1, num_dims=4: [1, 12, 20, 20, ]
Output 1 : name=301, type=1, num_dims=4: [1, 12, 10, 10, ]
Output 2 : name=303, type=1, num_dims=4: [1, 12, 5, 5, ]
Output 3 : name=305, type=1, num_dims=4: [1, 12, 3, 3, ]
Output 4 : name=307, type=1, num_dims=4: [1, 12, 2, 2, ]
Output 5 : name=309, type=1, num_dims=4: [1, 12, 1, 1, ]
Output 6 : name=300, type=1, num_dims=4: [1, 24, 20, 20, ]
Output 7 : name=302, type=1, num_dims=4: [1, 24, 10, 10, ]
Output 8 : name=304, type=1, num_dims=4: [1, 24, 5, 5, ]
Output 9 : name=306, type=1, num_dims=4: [1, 24, 3, 3, ]
Output 10 : name=308, type=1, num_dims=4: [1, 24, 2, 2, ]
Output 11 : name=310, type=1, num_dims=4: [1, 24, 1, 1, ]
yolox init
Loading image
Image dimensions: 256 256 3
Called into systolic matmul!
Using accelerated matmul with dimensions (16, 25600, 147)
Called into systolic matmul!
Using accelerated matmul with dimensions (16, 6400, 144)
Called into systolic matmul!
Using accelerated matmul with dimensions (16, 6400, 144)
Called into systolic add
Called into systolic matmul!
Using accelerated matmul with dimensions (16, 6400, 144)
Called into systolic matmul!
Using accelerated matmul with dimensions (16, 6400, 144)
Called into systolic add
Called into systolic matmul!
Using accelerated matmul with dimensions (32, 1600, 16)
Called into systolic matmul!
Using accelerated matmul with dimensions (32, 1600, 144)
Called into systolic matmul!
Using accelerated matmul with dimensions (32, 1600, 288)
Called into systolic add
Called into systolic matmul!
Using accelerated matmul with dimensions (32, 1600, 288)
Called into systolic matmul!
Using accelerated matmul with dimensions (32, 1600, 288)
Called into systolic add
Called into systolic matmul!
Using accelerated matmul with dimensions (64, 400, 32)
Called into systolic matmul!
Using accelerated matmul with dimensions (64, 400, 288)
Called into systolic matmul!
Using accelerated matmul with dimensions (64, 400, 576)
Called into systolic add
Called into systolic matmul!
Using accelerated matmul with dimensions (64, 400, 576)
Called into systolic matmul!
Using accelerated matmul with dimensions (64, 400, 576)
Called into systolic add
Called into systolic matmul!
Using accelerated matmul with dimensions (128, 100, 64)
Called into systolic matmul!
Using accelerated matmul with dimensions (128, 100, 576)
Called into systolic matmul!
Using accelerated matmul with dimensions (128, 100, 1152)
Called into systolic add
Called into systolic matmul!
Using accelerated matmul with dimensions (128, 100, 1152)
Called into systolic matmul!
Using accelerated matmul with dimensions (128, 100, 1152)
Called into systolic add
1x1 case!
Called into systolic matmul!
Using accelerated matmul with dimensions (256, 100, 128)
Called into systolic matmul!
Using accelerated matmul with dimensions (512, 25, 2304)
1x1 case!
Called into systolic matmul!
Using accelerated matmul with dimensions (128, 25, 512)
Called into systolic matmul!
Using accelerated matmul with dimensions (256, 9, 1152)
1x1 case!
Called into systolic matmul!
Using accelerated matmul with dimensions (128, 9, 256)
Called into systolic matmul!
Using accelerated matmul with dimensions (256, 4, 1152)
1x1 case!
Called into systolic matmul!
Using accelerated matmul with dimensions (64, 4, 256)
Called into systolic matmul!
Using accelerated matmul with dimensions (128, 1, 576)
Called into systolic matmul!
Using accelerated matmul with dimensions (24, 1, 1152)
Called into systolic matmul!
Using accelerated matmul with dimensions (24, 4, 2304)
Called into systolic matmul!
Using accelerated matmul with dimensions (24, 9, 2304)
Called into systolic matmul!
Using accelerated matmul with dimensions (24, 25, 4608)
Called into systolic matmul!
Using accelerated matmul with dimensions (24, 100, 1152)
Called into systolic matmul!
Using accelerated matmul with dimensions (24, 400, 576)
Called into systolic matmul!
Using accelerated matmul with dimensions (12, 1, 1152)
Called into systolic matmul!
Using accelerated matmul with dimensions (12, 4, 2304)
Called into systolic matmul!
Using accelerated matmul with dimensions (12, 9, 2304)
Called into systolic matmul!
Using accelerated matmul with dimensions (12, 25, 4608)
Called into systolic matmul!
Using accelerated matmul with dimensions (12, 100, 1152)
Called into systolic matmul!
Using accelerated matmul with dimensions (12, 400, 576)
0
11
2
0
0
0
84 140 207 264 0
resize took 0 cycles 5.440022 s
normalize_transpose took 0 cycles 2.139706 s
Done! Pre Process 1 took 0 cycles 7.579837 s
Done! Inference 1 took 0 cycles 17.962803 s
Done! Pre Process 1 took 0 cycles 1.224211 s

I also tried to run this model on Rocket64b1gem8. This model always runs correctly with '-O 99', and it's inference time is much shorter than gem16 which is weird.
Below is rocket64b1gem8 with '-O 99' result.

debian@debian:~/imagenet_runner_0.7.1$ ./ort_test_gem8 -1 detection_quanV2.onnx -i images/2.jpg -x 2 -O 99
Loaded runner program
Using systolic in mode 2
Using Onnxruntime C++ API
Number of inputs = 1
Input 0 : name=input, type=1, num_dims=4: [1, 3, 320, 320, ]
Number of outputs = 12
Output 0 : name=299, type=1, num_dims=4: [1, 12, 20, 20, ]
Output 1 : name=301, type=1, num_dims=4: [1, 12, 10, 10, ]
Output 2 : name=303, type=1, num_dims=4: [1, 12, 5, 5, ]
Output 3 : name=305, type=1, num_dims=4: [1, 12, 3, 3, ]
Output 4 : name=307, type=1, num_dims=4: [1, 12, 2, 2, ]
Output 5 : name=309, type=1, num_dims=4: [1, 12, 1, 1, ]
Output 6 : name=300, type=1, num_dims=4: [1, 24, 20, 20, ]
Output 7 : name=302, type=1, num_dims=4: [1, 24, 10, 10, ]
Output 8 : name=304, type=1, num_dims=4: [1, 24, 5, 5, ]
Output 9 : name=306, type=1, num_dims=4: [1, 24, 3, 3, ]
Output 10 : name=308, type=1, num_dims=4: [1, 24, 2, 2, ]
Output 11 : name=310, type=1, num_dims=4: [1, 24, 1, 1, ]
yolox init
Loading image
Image dimensions: 256 256 3
Called into systolic conv
Using systolic pooling
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
0
11
2
0
0
0
84 140 207 264 0
resize took 0 cycles 1.830045 s
normalize_transpose took 0 cycles 1.073210 s
Done! Pre Process 1 took 0 cycles 2.903357 s
Done! Inference 1 took 0 cycles 1.933709 s
Done! Pre Process 1 took 0 cycles 0.445910 s

I also changed DDR to 2Gib DDR3, which I get the same result and the model gets stuck at the same place.
What might be the problem?
Thanks!

@Leo-Z-Li
Copy link

@Alwinnnn Hi, are you able to solve this issue? im facing the same thing.

@Alwinnnn
Copy link
Author

Alwinnnn commented Jan 12, 2024

@Alwinnnn Hi, are you able to solve this issue? im facing the same thing.

@Leo-Z-Li
Sorry i haven't fixed this issue yet. However, when i replace rcoket64 with boom medium core, things get worse. The program would get stuck earlier, even couldn't execute ROCCTEST_RESTNET50-linux.

@Leo-Z-Li
Copy link

Leo-Z-Li commented May 2, 2024

@Alwinnnn Hi, are you able to solve this issue? im facing the same thing.

@Leo-Z-Li Sorry i haven't fixed this issue yet. However, when i replace rcoket64 with boom medium core, things get worse. The program would get stuck earlier, even couldn't execute ROCCTEST_RESTNET50-linux.

@Alwinnnn Which FPGA are you using? Is it the nexys-video?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants