New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve spectralnorm #9
Comments
Could we use the simd crate? It's not tested on stable AFAIK, but it might work on beta, at least. |
The simd crate is used on the rust repo. You can contribute here, but it'll be merged here when stable will support simd (or a couple of days before). This issue is about writing the code such that llvm optimizes with simd without using the simd crate. |
Ah, I see. Do you mean something like the following?
I don't see any advantage to that in a minimal test setup (look at the assembly output). Maybe within a loop? |
Also if I try to keep the operations within x2 representations, the assembly stays the same: http://is.gd/65q7NN |
It seems different in release mode. |
And yes, I was thinking of fn ax2(i: u64x2, j: u64x2) -> f64x2 {
((i + j) * (i + j + u64x2(1, 1)) / u64x2(2, 2) + i + u64x2(1, 1)).into() |
Yeah, I just diffed the assembly versions. They are different, although I'm not sure if the vectorized version is actually faster. Guess there's only one way to find out... 😄 |
The
fn A()
is not autovectorized, and then simd are not used anywhere (as it can be seen in the generated ASM). I think writing something likefn Ax2(i: u64x2, j: u64x2) -> f64x2
and use it where we useA()
should do the trick.The text was updated successfully, but these errors were encountered: