Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate manually vectorized methods in favor of dot syntax, monolithic edition #19791

Merged
merged 17 commits into from
Dec 31, 2016

Conversation

Sacha0
Copy link
Member

@Sacha0 Sacha0 commented Dec 31, 2016

This pull request deprecates almost all remaining manually vectorized methods, incorporating #19713, #19710, #19709, #18609, #18608, #18607, #18593, #18590, #18586, #18576, #18575, #18571, #18564, and #18512 into a single pull request. The last commit reinstates a few &, |, and xor methods over BitArrays as broadcast specializations (in accord with discussion in the corresponding pull requests).

(I dropped #19712 from this set: #19712 turned out to need a fair bit more work and touches a lot of code. Best done as a separate pull request if consensus favors deprecating those methods (two-arg vectorized complex); I might be able to put that together tomorrow if so.)

Best!

@@ -1,4 +1,4 @@
# This file is a part of Julia. License is MIT: http://julialang.org/license
<# This file is a part of Julia. License is MIT: http://julialang.org/license
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top-of-file gremlin strikes again. Fixed. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#19721 also means rebase needed for end-of-deprecated.jl

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased. Thanks!

@@ -1405,7 +1405,6 @@ conj!(A::SparseMatrixCSC) = (broadcast!(conj, A.nzval, A.nzval); A)

# TODO: The following definitions should be deprecated.
ceil{To}(::Type{To}, A::SparseMatrixCSC) = ceil.(To, A)
floor{To}(::Type{To}, A::SparseMatrixCSC) = floor.(To, A)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had this been missing a To<:Integer constraint?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on how you interpret the docstring? floor(T, x) converts the result to type T, throwing an InexactError if the value is not representable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing vectorized methods on a lot of these rounding functions were a bit inconsistent on whether or not they constrained the type argument to Integer subtypes. Is that just a matter of when you want your MethodError, or does it make a real difference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I thought perhaps the docstring implies that e.g. floor(Float32, 1.7) == 1.0f0 should yield true (though floor(Float32, 1.7) presently fails), and hence the absence of <:Integer constraints. Or is the docstring itself too loose? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dunno, but I guess the deprecations may as well leave off the Integer constraint and hand off the problem to the scalar method, seems safer than an incomplete deprecation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good :).

@tkelman
Copy link
Contributor

tkelman commented Dec 31, 2016

This will likely show big regressions for any of the benchmarks that are using now-deprecated syntax. I'm more curious about whether it shows any medium regressions for any refactorings of the internals here that might be using different code paths than they used to be: @nanosoldier runbenchmarks(ALL, vs = ":master")

(also question for @jrevels, does nanosoldier use a different branch of basebenchmarks for testing against different branches of Julia?)

@tkelman tkelman added needs news A NEWS entry is required for this change domain:broadcast Applying a function over a collection labels Dec 31, 2016
@tkelman
Copy link
Contributor

tkelman commented Dec 31, 2016

There are ambiguities and a use of one of the deprecated functions in a test. It would be good to squash the fixes for these into the specific commit that introduced them if you can, since the sequence of 15 commits here are logically separated and make more sense not-squashed - as long as the build passes at each intermediate step, to avoid unintentionally breaking future bisects. this was done

…x over SparseMatrixCSCs with similar tests of sparse map.
@Sacha0
Copy link
Member Author

Sacha0 commented Dec 31, 2016

Fixed the ambiguities introduced by rebasing post #19690-merge (and in the appropriate commits). With benefit of #19690, added a separate commit that removes a few vectorized methods over SparseVectors and performs minor associated cleanup. Also added a commit replacing removed error-path tests for deprecated vectorized &, |, xor, min, and max methods over SparseMatrixCSCs with similar tests of sparse map as discussed in #19713 (comment). Thanks again!

@tkelman tkelman self-requested a review December 31, 2016 09:06
@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@tkelman
Copy link
Contributor

tkelman commented Dec 31, 2016

That's less change than I expected. string join is often noisy, but ["array","comprehension",("collect","Array{Float64,1}")] might be real? Let's go again on the updated version @nanosoldier runbenchmarks(ALL, vs = ":master")

Copy link
Contributor

@tkelman tkelman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed my own review comments in Sacha0#1 - any perf regressions can hopefully be fixed after feature freeze

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - no performance regressions were detected. A full report can be found here. cc @jrevels

@tkelman
Copy link
Contributor

tkelman commented Dec 31, 2016

This is such a conflict-magnet that I'm going to merge now and open my trivial little PR against master instead. Almost all of the pieces of this have been open for weeks/months.

@tkelman tkelman merged commit 45ee8fc into JuliaLang:master Dec 31, 2016
@jrevels
Copy link
Member

jrevels commented Dec 31, 2016

(also question for @jrevels, does nanosoldier use a different branch of basebenchmarks for testing against different branches of Julia?)

No, all versions of Julia benchmark against the nanosoldier branch.

@Sacha0
Copy link
Member Author

Sacha0 commented Dec 31, 2016

Tony, you're amazing. Do you ever sleep, or instead just recover in blocks of hypnagogic PR review? Thanks for your tremendous efforts!

@Sacha0 Sacha0 deleted the monodevec branch December 31, 2016 18:45
@Sacha0
Copy link
Member Author

Sacha0 commented Dec 31, 2016

I'm curious how the new general sparse vector broadcast code compares in a benchmark against this older method for abs.(v). (Don't need to spend time checking right now, can wait.)

Likewise. I doubt generic sparse broadcast matches the performance of specializations like that. Making a note to benchmark this case when time allows. Thanks!

@giordano
Copy link
Contributor

giordano commented Jan 3, 2017

Going from the old manually vectorized method of the 2-arg variant of trunc to the dot syntax, I experience a large performance regression.

Before this PR (manually vectorized method):

julia> versioninfo()
Julia Version 0.6.0-dev.1843
Commit c38a5a304 (2017-01-01 14:09 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

julia> using BenchmarkTools

julia> const arr = collect(0.0:0.1:10.0);

julia> @benchmark trunc(Int, arr)
BenchmarkTools.Trial: 
  memory estimate:  912.00 bytes
  allocs estimate:  2
  --------------
  minimum time:     289.054 ns (0.00% GC)
  median time:      301.209 ns (0.00% GC)
  mean time:        354.579 ns (13.37% GC)
  maximum time:     5.139 μs (93.19% GC)
  --------------
  samples:          10000
  evals/sample:     278
  time tolerance:   5.00%
  memory tolerance: 1.00%

after the PR (dot syntax):

julia> versioninfo()
Julia Version 0.6.0-dev.1807
Commit 26c8d856a (2016-12-31 04:12 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

julia> using BenchmarkTools

julia> const arr = collect(0.0:0.1:10.0);

julia> @benchmark trunc.(Int, arr)
BenchmarkTools.Trial: 
  memory estimate:  2.56 kb
  allocs estimate:  108
  --------------
  minimum time:     28.058 μs (0.00% GC)
  median time:      28.583 μs (0.00% GC)
  mean time:        29.254 μs (0.00% GC)
  maximum time:     70.405 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

Can someone please confirm?

@KristofferC
Copy link
Sponsor Member

KristofferC commented Jan 3, 2017

Make sure to not benchmark in global scope or use const but even so, there is a significant regression.

@giordano
Copy link
Contributor

giordano commented Jan 3, 2017

Thanks, I updated my benchmarks with the use of const, trunc is still 100× slower after this PR. I'm going to open a new issue.

@tkelman
Copy link
Contributor

tkelman commented Jan 5, 2017

We somehow didn't see this in any tests, but ref https://discourse.julialang.org/t/is-0-a-1-going-away/1313/2 - chained comparisons like A .< B .< C expand to (A .< B) & (B .< C), which now throws a depwarn if any are arrays. Would it be always equivalent to the former behavior if we made it instead use .& in the expansion, or would it have to depend on the types? cc @stevengj

@stevengj
Copy link
Member

stevengj commented Jan 5, 2017

It should be equivalent to the former behavior if we made it use .&.

@Sacha0
Copy link
Member Author

Sacha0 commented Jan 7, 2017

I'm curious how the new general sparse vector broadcast code compares in a benchmark against this older method for abs.(v). (Don't need to spend time checking right now, can wait.)

@tkelman, generic sparse broadcast performs surprisingly well in this case:

julia> using BenchmarkTools

julia> begin
           vecabs(v::SparseVector) = SparseVector(v.n, copy(v.nzind), abs.(v.nzval))

           xs = sprand(10^2, 0.2);
           ds = @benchmark abs.($xs);
           vs = @benchmark vecabs($xs);
           println(judge(median(ds), median(vs)))

           xm = sprand(10^4, 0.1);
           dm = @benchmark abs.($xm);
           vm = @benchmark vecabs($xm);
           println(judge(median(dm), median(vm)))

           xl = sprand(10^6, 0.1);
           dl = @benchmark abs.($xl);
           vl = @benchmark vecabs($xl);
           println(judge(median(dl), median(vl)))
       end

BenchmarkTools.TrialJudgement:
  time:   +44.44% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
BenchmarkTools.TrialJudgement:
  time:   +43.54% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
BenchmarkTools.TrialJudgement:
  time:   +27.62% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)

Eking out a bit more performance might be possible (e.g. with a few @propagate_inbounds decorations). Best!

@tkelman
Copy link
Contributor

tkelman commented Jan 8, 2017

Cool. I'd add the one line of code specialization (despite the fact it'll be circumvented in fusing cases) if cleverer tweaks to the general case can't get within 10-20ish percent, but it's not that bad.

@Sacha0
Copy link
Member Author

Sacha0 commented Jan 10, 2017

Note that the above comparison isn't apples-to-apples: sparse map[!]/broadcast[!] avoids storing numerical zeros, whereas the specialization above does not. (The above performance difference is in line with that observed between the former nz2z and nz2nz sparse broadcast classes for similar cases, as benchmarked/discussed elsewhere.) Best!

tkelman referenced this pull request in SciML/DiffEqProblemLibrary.jl May 4, 2017
@Sacha0 Sacha0 removed the needs news A NEWS entry is required for this change label May 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:broadcast Applying a function over a collection
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants