New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve spectralnorm #9

Closed
TeXitoi opened this Issue Aug 12, 2015 · 7 comments

Comments

Projects
None yet
2 participants
@TeXitoi
Owner

TeXitoi commented Aug 12, 2015

The fn A() is not autovectorized, and then simd are not used anywhere (as it can be seen in the generated ASM). I think writing something like fn Ax2(i: u64x2, j: u64x2) -> f64x2 and use it where we use A() should do the trick.

@llogiq

This comment has been minimized.

Show comment
Hide comment
@llogiq

llogiq Sep 22, 2015

Contributor

Could we use the simd crate? It's not tested on stable AFAIK, but it might work on beta, at least.

Contributor

llogiq commented Sep 22, 2015

Could we use the simd crate? It's not tested on stable AFAIK, but it might work on beta, at least.

@TeXitoi

This comment has been minimized.

Show comment
Hide comment
@TeXitoi

TeXitoi Sep 22, 2015

Owner

The simd crate is used on the rust repo. You can contribute here, but it'll be merged here when stable will support simd (or a couple of days before).

This issue is about writing the code such that llvm optimizes with simd without using the simd crate.

Owner

TeXitoi commented Sep 22, 2015

The simd crate is used on the rust repo. You can contribute here, but it'll be merged here when stable will support simd (or a couple of days before).

This issue is about writing the code such that llvm optimizes with simd without using the simd crate.

@llogiq

This comment has been minimized.

Show comment
Hide comment
@llogiq

llogiq Sep 22, 2015

Contributor

Ah, I see. Do you mean something like the following?

#[allow(non_camel_case_types)]
#[derive(Debug)]
struct u64x2(u64, u64);
impl std::ops::Add for u64x2 {
    type Output = Self;
    fn add(self, rhs: Self) -> Self {
        u64x2(self.0 + rhs.0, self.1 + rhs.1)
    }
}
impl std::ops::Div for u64x2 {
    type Output = Self;
    fn div(self, rhs: Self) -> Self {
        u64x2(self.0 / rhs.0, self.1 / rhs.1)
    }
}

fn Ax2(i: u64x2, j: u64x2) -> f64x2 {
    f64x2(((i.0 + j.0) * (i.0 + j.0 + 1) / 2 + i.0 + 1) as f64,
           ((i.1 + j.1) * (i.1 + j.1 + 1) / 2 + i.1 + 1) as f64)
}

I don't see any advantage to that in a minimal test setup (look at the assembly output). Maybe within a loop?

Contributor

llogiq commented Sep 22, 2015

Ah, I see. Do you mean something like the following?

#[allow(non_camel_case_types)]
#[derive(Debug)]
struct u64x2(u64, u64);
impl std::ops::Add for u64x2 {
    type Output = Self;
    fn add(self, rhs: Self) -> Self {
        u64x2(self.0 + rhs.0, self.1 + rhs.1)
    }
}
impl std::ops::Div for u64x2 {
    type Output = Self;
    fn div(self, rhs: Self) -> Self {
        u64x2(self.0 / rhs.0, self.1 / rhs.1)
    }
}

fn Ax2(i: u64x2, j: u64x2) -> f64x2 {
    f64x2(((i.0 + j.0) * (i.0 + j.0 + 1) / 2 + i.0 + 1) as f64,
           ((i.1 + j.1) * (i.1 + j.1 + 1) / 2 + i.1 + 1) as f64)
}

I don't see any advantage to that in a minimal test setup (look at the assembly output). Maybe within a loop?

@llogiq

This comment has been minimized.

Show comment
Hide comment
@llogiq

llogiq Sep 22, 2015

Contributor

Also if I try to keep the operations within x2 representations, the assembly stays the same: http://is.gd/65q7NN

Contributor

llogiq commented Sep 22, 2015

Also if I try to keep the operations within x2 representations, the assembly stays the same: http://is.gd/65q7NN

@TeXitoi

This comment has been minimized.

Show comment
Hide comment
@TeXitoi

TeXitoi Sep 22, 2015

Owner

It seems different in release mode.

Owner

TeXitoi commented Sep 22, 2015

It seems different in release mode.

@TeXitoi

This comment has been minimized.

Show comment
Hide comment
@TeXitoi

TeXitoi Sep 22, 2015

Owner

And yes, I was thinking of

fn ax2(i: u64x2, j: u64x2) -> f64x2 {
    ((i + j) * (i + j + u64x2(1, 1)) / u64x2(2, 2) + i + u64x2(1, 1)).into()
Owner

TeXitoi commented Sep 22, 2015

And yes, I was thinking of

fn ax2(i: u64x2, j: u64x2) -> f64x2 {
    ((i + j) * (i + j + u64x2(1, 1)) / u64x2(2, 2) + i + u64x2(1, 1)).into()
@llogiq

This comment has been minimized.

Show comment
Hide comment
@llogiq

llogiq Sep 22, 2015

Contributor

Yeah, I just diffed the assembly versions. They are different, although I'm not sure if the vectorized version is actually faster. Guess there's only one way to find out... 😄

Contributor

llogiq commented Sep 22, 2015

Yeah, I just diffed the assembly versions. They are different, although I'm not sure if the vectorized version is actually faster. Guess there's only one way to find out... 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment