Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
Note: please observe that in the routine conj_grad three implementations of the sparse matrix-vector multiply have been supplied. The default matrix-vector multiply is not loop unrolled. The alternate implementations are unrolled to a depth of 2 and unrolled to a depth of 8. Please experiment with these to find the fastest for your particular architecture. If reporting timing results, any of these three may be used without penalty. Performance examples: The non-unrolled version of the multiply is actually (slightly: maybe %5) faster on the sp2-66MHz-WN on 16 nodes than is the unrolled-by-2 version below. On the Cray t3d, the reverse is true, i.e., the unrolled-by-two version is some 10% faster. The unrolled-by-8 version below is significantly faster on the Cray t3d - overall speed of code is 1.5 times faster.