Performance deficit on Sparse Matrix Multiply #201

JamesKingdon · 2017-10-02T19:56:29Z

Performance of the Sparse Matrix Multiply kernel of the SciMark benchmark is significantly less with OpenJ9 compared to HotSpot.

Results on 32 core Xeon(R) CPU E7-8867:

OpenJ9 615 Mflops vs HotSpot 1755 Mflops

Part of the issue is that the test is short-running and we spend most of the run in a profiling compile. Wrapping the test in a harness for multiple iterations raises the throughput to 1104 Mflops.

Studying the compilation log suggests an opportunity to exploit x86 fused multiply add instructions, and this is being investigated.

andrewcraik · 2017-10-04T21:12:11Z

see Issue #199 for a discussion on profiling and some of the work underway to improve the performance of profiling code which should help here to at least an extent. The opportunity for fused multiply add is interesting and worthy of further study.

pshipton added the comp:jit label Dec 28, 2017

mstoodle added the userRaised label Jan 31, 2018

DanHeidinga added this to User Raised issues in Issue tracking Feb 14, 2018

DanHeidinga added the perf label Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance deficit on Sparse Matrix Multiply #201

Performance deficit on Sparse Matrix Multiply #201

JamesKingdon commented Oct 2, 2017 •

edited

andrewcraik commented Oct 4, 2017

Performance deficit on Sparse Matrix Multiply #201

Performance deficit on Sparse Matrix Multiply #201

Comments

JamesKingdon commented Oct 2, 2017 • edited

andrewcraik commented Oct 4, 2017

JamesKingdon commented Oct 2, 2017 •

edited