Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement normalize and normalize! #13681

Merged
merged 4 commits into from
Oct 23, 2015
Merged

Implement normalize and normalize! #13681

merged 4 commits into from
Oct 23, 2015

Conversation

jiahao
Copy link
Member

@jiahao jiahao commented Oct 20, 2015

Simple helper functions for normalizing vectors

Closes #12047

@andreasnoack
Copy link
Member

Maybe it should also return the norm. E.g., when using a normalize function in a Lanczos algorithm you want both the norm and the unit vector, but you don't want to compute the norm twice.

@stevengj
Copy link
Member

Considering that writing this function yourself takes all of one line, I would think that any specialized usages that require e.g. both the norm and the vector can just write their own implementation.

`normalize`

"""
function normalize!(v::AbstractVector, p::Real=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just a one-liner normalize!(v::AbstractVector, p::Real=2) = scale!(v, inv(norm(v, p))?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the one-liner (and the equivalent two line version here) is vulnerable to overflow, since inv(1e-310) == Inf. In-place division is safe though, so I've used that in the next iteration of this PR.

@jiahao
Copy link
Member Author

jiahao commented Oct 20, 2015

Maybe it should also return the norm. E.g., when using a normalize function in a Lanczos algorithm you want both the norm and the unit vector, but you don't want to compute the norm twice.

We could define QR on an AbstractVector to do exactly this.

@andreasnoack
Copy link
Member

We could define QR on an AbstractVector to do exactly this.

Good idea! I think we should consider that.

"""
function normalize!(v::AbstractVector, p::Real=2)
nrm = norm(v, p)
scale!(v, inv(nrm))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try x = [1e-310, 1e-310].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to... :/

@Jutho
Copy link
Contributor

Jutho commented Oct 20, 2015

Would it make sense to call scale! instead of writing the for loop for the in-place normalization. Then it can dispatch to BLAS for BLAS vectors. I'd say these are really one-liners.

normalize(v,p=2) = v/norm(v,p)
normalize!(v,p=2) = scale!(v,inv(norm(v,p)))

@nalimilan
Copy link
Member

IMHO, the fact that you need to discuss the best implementation shows that these are trickier that it seems to write correctly, and deserve being in Base.

@stevengj
Copy link
Member

@nalimilan, finding the fastest implementation of anything is tricky — computers are complicated beasts. All of these algorithms are O(n), and we are just fiddling with small constant factors. I don't think that is a great argument for putting a one-liner into base, because it would apply to almost anything.

The argument, as I understand it, is that people need this function so often that saving the few extra characters is worth it. Especially since v appears twice in the one-liner, having a function saves you from having to declare a temporary variable in the case where v is computed on the fly.

(Note that neither Matlab nor NumPy seem to bother to include this function, however.)

@jiahao
Copy link
Member Author

jiahao commented Oct 20, 2015

@Jutho no, I tried that the first time. scale!(v, inv(nrm)) is prone to overflow. For nrm less than machine epsilon, inv(nrm) is Inf. Furthermore it's only BLAS1, there's effectively no speedup calling out to BLAS.

@KristofferC
Copy link
Sponsor Member

Could scaling with inv(norm(v,p) be done only in those cases the norm is not too small (for some definition of "too small").

Even though BLAS doesn't matter the fact that you avoid the division speeds things up by ~ 60%

@Jutho
Copy link
Contributor

Jutho commented Oct 20, 2015

@jiahao Really? Not that I would trust normalising a vector with such a small norm anyway, but surely the following (overly) simple test succeeds (though I trust your tests are saying otherwise):

a=randn(100);
b=a*1e-200;
scale!(b,inv(norm(b)));
scale!(a,inv(norm(a)));
isapprox(a,b) # -> true

@jiahao
Copy link
Member Author

jiahao commented Oct 20, 2015

@nalimilan It's also moderately annoying to write the library function because it has to work for the general case. Most people are probably not going to normalize a tiny vector with norm less than epsilon, but as the library writer I have to worry about the edge cases. If necessary, I will have to sacrifice speed for correctness and inevitably some users will whine about the library code being slow.

@jiahao
Copy link
Member Author

jiahao commented Oct 20, 2015

@Jutho see the second test case I put in.

@KristofferC it makes no sense to do run time algorithm selection on the value. As I said, BLAS1 is just the naive for loop underneath; you are not gaining much calling out to BLAS and furthermore have the pay the overhead of an external library call.

@jiahao
Copy link
Member Author

jiahao commented Oct 20, 2015

Also, the naive for loop with division is a little more accurate in my testing because of fiddly details about roundoff. The first test block I put in fails using inv(norm) because the first compile is nextfloat(0.6), not 0.6.

@KristofferC
Copy link
Sponsor Member

@jihao, Hmm, I clearly acknowledged that BLAS didn't matter but avoiding the division did:

Even though BLAS doesn't matter the fact that you avoid the division speeds things up by ~ 60%

Anyway, adding @inbounds makes LLVM vectorize the code to use SIMD instructions which gives quite significant performance increase (~60%). This also makes the division and multiplication method close to each other in performance.

@Jutho
Copy link
Contributor

Jutho commented Oct 20, 2015

I also agree that BLAS will not give any significant speedup; my main concern was minimal code.

The second test (the overflow) runs fine with normalize!(v)=scale!(v,inv(norm(v))) on my machine, but you are correct about the round off error in the value 0.6 the first test. Then again, it's not like Julia Base is free of round off error cause otherwise it should also not use BLAS matrix multiplication. Anyway, I don't have strong feelings about this, just wanted to contribute to keeping the code simple.

@jiahao
Copy link
Member Author

jiahao commented Oct 20, 2015

@KristofferC sorry, I missed that in the first reading. If you have benchmarked both the /nrm and *inv(nrm) versions and the latter produces a loop that is 60% faster, then it might be worth paying the penalty for a branch

@KristofferC
Copy link
Sponsor Member

These are my benchmarks comparing the two versions and @inbounds or not.

# /nrm
  0.513842 seconds (4 allocations: 160 bytes)

# /nrm and @inbounds --> SIMD
  0.308682 seconds (4 allocations: 160 bytes)

# *inv(nrm) + scale
  0.199295 seconds (4 allocations: 160 bytes)

Edit: I removed the one with @inbounds and *inv(nrm) because I should have just called scale! like Jutho said, and I didn't before (I used a Julia loop).

@andreasnoack
Copy link
Member

I think that the usual approach in BLAS/LAPACK is to scale the vector with some power of two if the norm is very small and then normalize by scaling with the inverse of the scaled norm. In that way, you can save a lot of divisions when the norm is really small.

Some of the BLAS functions are threaded so that might give a speedup (until we have threading).

@jiahao
Copy link
Member Author

jiahao commented Oct 20, 2015

Thanks @KristofferC, will investigate the overflow case, especially given that @Jutho cannot reproduce my second test case which I observed to overflow.

@andreasnoack
Copy link
Member

@jiahao The values have to be smaller than

julia> inv(prevfloat(Inf))
5.562684646268003e-309

@andreasnoack
Copy link
Member

@KristofferC There are some "thread" related statements in https://github.com/xianyi/OpenBLAS/blob/develop/interface/scal.c so I think it is threaded.

@Jutho
Copy link
Contributor

Jutho commented Oct 20, 2015

I can confirm a factor 2 or 3 difference between scale!(v,inv(norm(v))) and

function normalize!(v)
  inrm=inv(norm(v));
  @simd for i in eachindex(v)
    @inbounds v[i]*=inrm
  end
end

for a vector of size 1048576 * 10 (in favor of the first version).

@KristofferC
Copy link
Sponsor Member

@andreasnoack yep, I was wrong:

julia> b = copy(a); @time scale!(b, 2.0);
  0.085937 seconds (4 allocations: 160 bytes)

julia> blas_set_num_threads(1)

julia> b = copy(a); @time scale!(b, 2.0);
  0.128238 seconds (4 allocations: 160 bytes)

Using a julia loop with @inbounds has the same performance as single threaded BLAS,

@KristofferC
Copy link
Sponsor Member

@Jutho make sure you don't run the benchmark on already normalized arrays. OpenBLAS has:

if (alpha == ONE) return;

@Jutho
Copy link
Contributor

Jutho commented Oct 20, 2015

You're correct; I should take more time before posting something. My apologies.

@jiahao
Copy link
Member Author

jiahao commented Oct 20, 2015

@andreasnoack @Jutho Ah I see, looks like I changed the test from 1e-310 to 1e-300 last night in updating? Will fix.

@KristofferC I was able to reproduce the speedup using *inv(nrm), @inbounds and @simd. I will make this the default loop, falling back to division when nrm <= inv(prevfloat(typemax(float(nrm)))). (It looks like @simd doesn't do anything for divisions, but the corner case isn't worth optimizing anyway.)

@KristofferC
Copy link
Sponsor Member

Maybe use scale! instead of the explicit for loop to get the BLAS boost for pertinent types? The fallback will be a generic scale which already does the inbounds macro.

@jiahao
Copy link
Member Author

jiahao commented Oct 20, 2015

Sure. It looks like generic_scale! does not apply @simd though, so I'll add that in so that types that are simple immutable wrappers around machine floats will benefit.

@KristofferC
Copy link
Sponsor Member

For me the code got vectorized even without the macro. It doesn't for you? Did u check with @code_llvm? I think that is why it seems like the macro doesn't do anything.

@jiahao
Copy link
Member Author

jiahao commented Oct 20, 2015

No I got a significant time difference in the *inv(nrm) case even though vectorized instructions are emitted with and without @simd. I don't think simply checking the code for emitted vectorized intrinsics is sufficient.

Compute polar decomposition of vector

`qr[!]` is equivalent to `v->(normalize[!](v), norm(v))` but is
convenient for reusing the norm calculation, e.g. for Krylov subspace
algorithms.

Also refactors the in place normalization code to a separate routine,
`__normalize!()`, so that it can be reused by `qr!`
jiahao added a commit that referenced this pull request Oct 23, 2015
Implement normalize and normalize!
@jiahao jiahao merged commit e0a8985 into master Oct 23, 2015
@jiahao jiahao deleted the cjh/normalize branch October 23, 2015 13:36
bjarthur pushed a commit to bjarthur/julia that referenced this pull request Oct 27, 2015
stevengj added a commit to stevengj/PETSc.jl that referenced this pull request Nov 17, 2015
stevengj added a commit to stevengj/PETSc.jl that referenced this pull request Nov 17, 2015
stevengj added a commit to stevengj/PETSc.jl that referenced this pull request Nov 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants