Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve spectralnorm #9

Closed
TeXitoi opened this issue Aug 12, 2015 · 7 comments · Fixed by #22
Closed

Improve spectralnorm #9

TeXitoi opened this issue Aug 12, 2015 · 7 comments · Fixed by #22

Comments

@TeXitoi
Copy link
Owner

TeXitoi commented Aug 12, 2015

The fn A() is not autovectorized, and then simd are not used anywhere (as it can be seen in the generated ASM). I think writing something like fn Ax2(i: u64x2, j: u64x2) -> f64x2 and use it where we use A() should do the trick.

@llogiq
Copy link
Contributor

llogiq commented Sep 22, 2015

Could we use the simd crate? It's not tested on stable AFAIK, but it might work on beta, at least.

@TeXitoi
Copy link
Owner Author

TeXitoi commented Sep 22, 2015

The simd crate is used on the rust repo. You can contribute here, but it'll be merged here when stable will support simd (or a couple of days before).

This issue is about writing the code such that llvm optimizes with simd without using the simd crate.

@llogiq
Copy link
Contributor

llogiq commented Sep 22, 2015

Ah, I see. Do you mean something like the following?

#[allow(non_camel_case_types)]
#[derive(Debug)]
struct u64x2(u64, u64);
impl std::ops::Add for u64x2 {
    type Output = Self;
    fn add(self, rhs: Self) -> Self {
        u64x2(self.0 + rhs.0, self.1 + rhs.1)
    }
}
impl std::ops::Div for u64x2 {
    type Output = Self;
    fn div(self, rhs: Self) -> Self {
        u64x2(self.0 / rhs.0, self.1 / rhs.1)
    }
}

fn Ax2(i: u64x2, j: u64x2) -> f64x2 {
    f64x2(((i.0 + j.0) * (i.0 + j.0 + 1) / 2 + i.0 + 1) as f64,
           ((i.1 + j.1) * (i.1 + j.1 + 1) / 2 + i.1 + 1) as f64)
}

I don't see any advantage to that in a minimal test setup (look at the assembly output). Maybe within a loop?

@llogiq
Copy link
Contributor

llogiq commented Sep 22, 2015

Also if I try to keep the operations within x2 representations, the assembly stays the same: http://is.gd/65q7NN

@TeXitoi
Copy link
Owner Author

TeXitoi commented Sep 22, 2015

It seems different in release mode.

@TeXitoi
Copy link
Owner Author

TeXitoi commented Sep 22, 2015

And yes, I was thinking of

fn ax2(i: u64x2, j: u64x2) -> f64x2 {
    ((i + j) * (i + j + u64x2(1, 1)) / u64x2(2, 2) + i + u64x2(1, 1)).into()

@llogiq
Copy link
Contributor

llogiq commented Sep 22, 2015

Yeah, I just diffed the assembly versions. They are different, although I'm not sure if the vectorized version is actually faster. Guess there's only one way to find out... 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants