New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make "dot" operations (.+ etc) fusing broadcasts #17623

Merged
merged 4 commits into from Dec 20, 2016

Conversation

@stevengj
Member

stevengj commented Jul 26, 2016

This is a WIP finished PR making dot operations into "fusing" broadcasts. That is, x .⨳ y (for any binary operator ) is transformed by the parser into (⨳).(x,y), which in turn is fused with other nested "dot" calls into a single broadcast.

To do:

  • Parse x .⨳ y as a fusing "dot" function call.
  • Deprecate manual .⨳ method definitions to broadcast(::typeof(⨳), ...) definitions. (Currently, these methods are silently ignored.)
  • Unbreak the REPL. (Somehow this patch breaks it: typing backspace gives MethodError: no method matching splice!(::Array{UInt8,1}, ::Array{Int64,1}, ::Array{UInt8,1}).)
  • [true] .* [true] gives a stack overflow.
  • Fix lots of breaking tests due to #16966.
  • restore 0.5 behavior of Float64 * Array{Float32} = Array{Float32} etc. (for non-dot ops)
  • Fix breaking method-ambiguity test
  • More tests
  • Documentation
  • eliminate as many specialized broadcast(::typeof(op), ...) methods as possible

@ViralBShah ViralBShah added this to the 0.5.x milestone Jul 26, 2016

@tkelman

This comment has been minimized.

Show comment
Hide comment
@tkelman

tkelman Jul 26, 2016

Contributor

This changes semantics and isn't a backport candidate.

Contributor

tkelman commented Jul 26, 2016

This changes semantics and isn't a backport candidate.

@JeffBezanson JeffBezanson modified the milestones: 0.5.x, 0.6.0 Jul 26, 2016

@stevengj stevengj referenced this pull request Jul 29, 2016

Closed

Vectorization Roadmap #16285

5 of 5 tasks complete
@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Aug 2, 2016

Member

@JeffBezanson, I'm noticing an odd behavior with the compiler in the REPL. Basically, the first fused function I compile is slow, but the second and subsequent ones are fast.

In particular, the following f(x) = x .+ 3.*x.^3 .+ 4.*x.^2 is extremely slow:

x = rand(10^7);
f(x) = x .+ 3.*x.^3 .+ 4.*x.^2
@time f(x);
@time f(x);

reporting 40M allocations even on the second run. However, the same function is fast if I compile a different fused function first!

x = rand(10^7);
g(x) = x .+ 3.*x.^3 .- 4.*x.^2
@time g(x);
f(x) = x .+ 3.*x.^3 .+ 4.*x.^2
@time f(x);
@time f(x);

This reports only 8 allocations for f(x), and allocates exactly the expected amount of memory for the output array.

Any idea what could cause this? (I'll try to reproduce it in the master branch, to see if it affects the 0.5 loop fusion, and file a separate issue if that is the case.)

Member

stevengj commented Aug 2, 2016

@JeffBezanson, I'm noticing an odd behavior with the compiler in the REPL. Basically, the first fused function I compile is slow, but the second and subsequent ones are fast.

In particular, the following f(x) = x .+ 3.*x.^3 .+ 4.*x.^2 is extremely slow:

x = rand(10^7);
f(x) = x .+ 3.*x.^3 .+ 4.*x.^2
@time f(x);
@time f(x);

reporting 40M allocations even on the second run. However, the same function is fast if I compile a different fused function first!

x = rand(10^7);
g(x) = x .+ 3.*x.^3 .- 4.*x.^2
@time g(x);
f(x) = x .+ 3.*x.^3 .+ 4.*x.^2
@time f(x);
@time f(x);

This reports only 8 allocations for f(x), and allocates exactly the expected amount of memory for the output array.

Any idea what could cause this? (I'll try to reproduce it in the master branch, to see if it affects the 0.5 loop fusion, and file a separate issue if that is the case.)

@tkelman

This comment has been minimized.

Show comment
Hide comment
@tkelman

tkelman Aug 2, 2016

Contributor

#17759 ?

Contributor

tkelman commented Aug 2, 2016

#17759 ?

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Aug 2, 2016

Member

Ah, thanks @tkelman, that seems like the culprit. Using f(x) = x .+ 3.*x.*x.*x .+ 4.*x.*x, i.e. avoiding ^, eliminates the problem for me.

Member

stevengj commented Aug 2, 2016

Ah, thanks @tkelman, that seems like the culprit. Using f(x) = x .+ 3.*x.*x.*x .+ 4.*x.*x, i.e. avoiding ^, eliminates the problem for me.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Aug 2, 2016

Member

As long as I avoid ^, whose problems seem orthogonal to this PR, the performance is exactly what I was hoping for. For example, y .= x .+ 3.*x.*x.*x .+ 4.*x.*x occurs entirely in-place, with performance identical to writing out the loops manually (with @inbounds annotation).

@jrevels, it would be good to get some "vectorized operation" performance benchmarks on @nanosoldier.

Member

stevengj commented Aug 2, 2016

As long as I avoid ^, whose problems seem orthogonal to this PR, the performance is exactly what I was hoping for. For example, y .= x .+ 3.*x.*x.*x .+ 4.*x.*x occurs entirely in-place, with performance identical to writing out the loops manually (with @inbounds annotation).

@jrevels, it would be good to get some "vectorized operation" performance benchmarks on @nanosoldier.

Show outdated Hide outdated src/julia-parser.scm
@@ -865,7 +865,7 @@
(begin
#;(if (and (number? ex) (= ex 0))
(error "juxtaposition with literal \"0\""))
`(call * ,ex ,(parse-unary s))))
`(call .* ,ex ,(parse-unary s))))

This comment has been minimized.

@TotalVerb

TotalVerb Sep 19, 2016

Contributor

I'm not convinced this revised behaviour is a good thing. This could break things like 2x where x::Diagonal, which broadcast will try to promote to Array.

@TotalVerb

TotalVerb Sep 19, 2016

Contributor

I'm not convinced this revised behaviour is a good thing. This could break things like 2x where x::Diagonal, which broadcast will try to promote to Array.

This comment has been minimized.

@stevengj

stevengj Sep 19, 2016

Member

2 .* x for x::Diagonal is also broken by this PR; why is breaking 2x worse?

(Such cases could be fixed by adding specialized broadcast methods, of course.)

@stevengj

stevengj Sep 19, 2016

Member

2 .* x for x::Diagonal is also broken by this PR; why is breaking 2x worse?

(Such cases could be fixed by adding specialized broadcast methods, of course.)

This comment has been minimized.

@TotalVerb

TotalVerb Sep 19, 2016

Contributor

* has different semantics from .*. I think it is a mistake to treat the latter as a superset of the former, as is being done with this implicit multiplication lowering to .*. Intuitively, 2x means 2 * x, not 2 .* x.

@TotalVerb

TotalVerb Sep 19, 2016

Contributor

* has different semantics from .*. I think it is a mistake to treat the latter as a superset of the former, as is being done with this implicit multiplication lowering to .*. Intuitively, 2x means 2 * x, not 2 .* x.

This comment has been minimized.

@stevengj

stevengj Sep 20, 2016

Member

It doesn't have different semantics for multiplication by scalars...

@stevengj

stevengj Sep 20, 2016

Member

It doesn't have different semantics for multiplication by scalars...

This comment has been minimized.

@TotalVerb

TotalVerb Sep 20, 2016

Contributor

But there is no guarantee that 2 .* x is the same as 2 * x for all types.

Also, this looks to be a performance disaster for the common scalar case.

julia> @code_llvm broadcast(*, 2, 2)

define %jl_value_t* @julia_broadcast_65524(%jl_value_t*, %jl_value_t**, i32) #0 {
top:
  %3 = alloca %jl_value_t**, align 8
  store volatile %jl_value_t** %1, %jl_value_t*** %3, align 8
  %4 = add i32 %2, -1
  %5 = icmp eq i32 %4, 0
  br i1 %5, label %fail, label %pass

fail:                                             ; preds = %top
  %6 = getelementptr %jl_value_t*, %jl_value_t** %1, i64 1
  call void @jl_bounds_error_tuple_int(%jl_value_t** %6, i64 0, i64 1)
  unreachable

pass:                                             ; preds = %top
  %7 = icmp ugt i32 %4, 1
  br i1 %7, label %pass.2, label %fail1

fail1:                                            ; preds = %pass
  %8 = sext i32 %4 to i64
  %9 = getelementptr %jl_value_t*, %jl_value_t** %1, i64 1
  call void @jl_bounds_error_tuple_int(%jl_value_t** %9, i64 %8, i64 2)
  unreachable

pass.2:                                           ; preds = %pass
  %10 = getelementptr %jl_value_t*, %jl_value_t** %1, i64 1
  %11 = bitcast %jl_value_t** %10 to i64**
  %12 = load i64*, i64** %11, align 8
  %13 = load i64, i64* %12, align 16
  %14 = getelementptr %jl_value_t*, %jl_value_t** %1, i64 2
  %15 = bitcast %jl_value_t** %14 to i64**
  %16 = load i64*, i64** %15, align 8
  %17 = load i64, i64* %16, align 16
  %18 = mul i64 %17, %13
  %19 = call %jl_value_t* @jl_box_int64(i64 signext %18)
  ret %jl_value_t* %19
}

I guess I am not understanding what is gained by this change. Loop fusion can be forced explicitly with .* anyway. Why should the scalar case be disrupted for the convenience of the vector case?

@TotalVerb

TotalVerb Sep 20, 2016

Contributor

But there is no guarantee that 2 .* x is the same as 2 * x for all types.

Also, this looks to be a performance disaster for the common scalar case.

julia> @code_llvm broadcast(*, 2, 2)

define %jl_value_t* @julia_broadcast_65524(%jl_value_t*, %jl_value_t**, i32) #0 {
top:
  %3 = alloca %jl_value_t**, align 8
  store volatile %jl_value_t** %1, %jl_value_t*** %3, align 8
  %4 = add i32 %2, -1
  %5 = icmp eq i32 %4, 0
  br i1 %5, label %fail, label %pass

fail:                                             ; preds = %top
  %6 = getelementptr %jl_value_t*, %jl_value_t** %1, i64 1
  call void @jl_bounds_error_tuple_int(%jl_value_t** %6, i64 0, i64 1)
  unreachable

pass:                                             ; preds = %top
  %7 = icmp ugt i32 %4, 1
  br i1 %7, label %pass.2, label %fail1

fail1:                                            ; preds = %pass
  %8 = sext i32 %4 to i64
  %9 = getelementptr %jl_value_t*, %jl_value_t** %1, i64 1
  call void @jl_bounds_error_tuple_int(%jl_value_t** %9, i64 %8, i64 2)
  unreachable

pass.2:                                           ; preds = %pass
  %10 = getelementptr %jl_value_t*, %jl_value_t** %1, i64 1
  %11 = bitcast %jl_value_t** %10 to i64**
  %12 = load i64*, i64** %11, align 8
  %13 = load i64, i64* %12, align 16
  %14 = getelementptr %jl_value_t*, %jl_value_t** %1, i64 2
  %15 = bitcast %jl_value_t** %14 to i64**
  %16 = load i64*, i64** %15, align 8
  %17 = load i64, i64* %16, align 16
  %18 = mul i64 %17, %13
  %19 = call %jl_value_t* @jl_box_int64(i64 signext %18)
  ret %jl_value_t* %19
}

I guess I am not understanding what is gained by this change. Loop fusion can be forced explicitly with .* anyway. Why should the scalar case be disrupted for the convenience of the vector case?

This comment has been minimized.

@stevengj

stevengj Sep 20, 2016

Member

Sure, it's not a big deal.

But why is it a performance disaster for the scalar case? Shouldn't it be getting inlined to be equivalent to 2 * 2?

@stevengj

stevengj Sep 20, 2016

Member

Sure, it's not a big deal.

But why is it a performance disaster for the scalar case? Shouldn't it be getting inlined to be equivalent to 2 * 2?

This comment has been minimized.

@stevengj

stevengj Sep 20, 2016

Member

Performance looks good to me:

julia> f(x, y) = broadcast(*, x, y)
f (generic function with 1 method)

julia> @code_llvm f(2,2)

define i64 @julia_f_70407(i64, i64) #0 {
top:
  %2 = mul i64 %1, %0
  ret i64 %2
}
@stevengj

stevengj Sep 20, 2016

Member

Performance looks good to me:

julia> f(x, y) = broadcast(*, x, y)
f (generic function with 1 method)

julia> @code_llvm f(2,2)

define i64 @julia_f_70407(i64, i64) #0 {
top:
  %2 = mul i64 %1, %0
  ret i64 %2
}

This comment has been minimized.

@stevengj

stevengj Sep 20, 2016

Member

Anyway, I'll revert this part of the PR, since it is controversial.

@stevengj

stevengj Sep 20, 2016

Member

Anyway, I'll revert this part of the PR, since it is controversial.

This comment has been minimized.

@TotalVerb

TotalVerb Sep 20, 2016

Contributor

I wonder why @code_llvm on the broadcast itself is so scary.

@TotalVerb

TotalVerb Sep 20, 2016

Contributor

I wonder why @code_llvm on the broadcast itself is so scary.

This comment has been minimized.

@yuyichao

yuyichao Sep 20, 2016

Member

jlcall

@yuyichao

yuyichao Sep 20, 2016

Member

jlcall

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Sep 21, 2016

Member

One of the difficulties I'm having with this PR is that it makes it effectively impossible to define specialized methods for .* etcetera.

For example, in Julia ≤ 0.5 we have specialized methods for array .< scalar that produce a BitArray, and we have specialized methods for sparsevector .* sparsevector that preserve the sparsity. With this PR, I can in principle overload broadcast(::typeof(<), array, scalar), but in fact it is almost useless.

The problem is that as soon as you fuse the operation with another dot call, it produces a fused anonymous function and the specialized broadcast method is never called. (The compress-fuse optimizations that inline numeric literals and combine duplicate broadcast arguments are another problem.)

Do we have to give up on .* for sparse arrays, and do we have to give up on elementwise boolean operations producing a BitArray? Or should broadcast always generate a BitArray if it detects that the output type is Bool?

Member

stevengj commented Sep 21, 2016

One of the difficulties I'm having with this PR is that it makes it effectively impossible to define specialized methods for .* etcetera.

For example, in Julia ≤ 0.5 we have specialized methods for array .< scalar that produce a BitArray, and we have specialized methods for sparsevector .* sparsevector that preserve the sparsity. With this PR, I can in principle overload broadcast(::typeof(<), array, scalar), but in fact it is almost useless.

The problem is that as soon as you fuse the operation with another dot call, it produces a fused anonymous function and the specialized broadcast method is never called. (The compress-fuse optimizations that inline numeric literals and combine duplicate broadcast arguments are another problem.)

Do we have to give up on .* for sparse arrays, and do we have to give up on elementwise boolean operations producing a BitArray? Or should broadcast always generate a BitArray if it detects that the output type is Bool?

@TotalVerb

This comment has been minimized.

Show comment
Hide comment
@TotalVerb

TotalVerb Sep 21, 2016

Contributor

See also ongoing discussion in #18590 (comment).

Contributor

TotalVerb commented Sep 21, 2016

See also ongoing discussion in #18590 (comment).

stevengj referenced this pull request Dec 8, 2016

Restructure of the promotion mechanism for broadcast (#18642)
* Restructure the promotion mechanism for broadcast

* More broadcast tests

* Use broadcast for element wise operators where appropriate

@Sacha0 Sacha0 referenced this pull request Dec 8, 2016

Closed

Make 3x broadcasting #19537

Show outdated Hide outdated base/deprecated.jl
# deprecations for uses of old dot operators (.* etc) as objects, rather than
# just calling them infix.
for op in (:(!=), :≠, :+, :-, :*, :/, :÷, :%, :<, :(<=), :≤, :(==), :>, :>=, :≥, :\, :^)
dotop = Symbol(string('.', op))

This comment has been minimized.

@pabloferz

pabloferz Dec 9, 2016

Contributor

Not that is critical or anything, but couldn't this just be Symbol('.', op)?

@pabloferz

pabloferz Dec 9, 2016

Contributor

Not that is critical or anything, but couldn't this just be Symbol('.', op)?

This comment has been minimized.

@stevengj

stevengj Dec 9, 2016

Member

yup, will change.

@stevengj

stevengj Dec 9, 2016

Member

yup, will change.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Dec 13, 2016

Member

Loop fusion makes most of the specialized .== etc methods on BitArray useless, so I eliminated them. One "new" behavior is that broadcast now always produces a BitArray if the output is a Bool array. This preserves the BitArray-producing behavior of expressions like A .== B .> 3.

I'm somewhat inclined to suggest that we ditch BitArray completely (moving it to a non-default package). Given loop fusion for logical expressions, the advantage of BitArray is greatly reduced, and it may even slow things down (since Vector{Bool} is actually faster for most non-logical operations).

Member

stevengj commented Dec 13, 2016

Loop fusion makes most of the specialized .== etc methods on BitArray useless, so I eliminated them. One "new" behavior is that broadcast now always produces a BitArray if the output is a Bool array. This preserves the BitArray-producing behavior of expressions like A .== B .> 3.

I'm somewhat inclined to suggest that we ditch BitArray completely (moving it to a non-default package). Given loop fusion for logical expressions, the advantage of BitArray is greatly reduced, and it may even slow things down (since Vector{Bool} is actually faster for most non-logical operations).

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Dec 13, 2016

Member

@Sacha0, note that the work you've done for broadcast on SparseMatrix should really be applied to SparseVector as well (ideally without code duplication), so that we can ditch the problematic broadcast(::typeof(foo), ...) methods for SparseVector.

Member

stevengj commented Dec 13, 2016

@Sacha0, note that the work you've done for broadcast on SparseMatrix should really be applied to SparseVector as well (ideally without code duplication), so that we can ditch the problematic broadcast(::typeof(foo), ...) methods for SparseVector.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Dec 13, 2016

Member

@nanosoldier runbenchmarks(ALL, vs=":master")

Member

stevengj commented Dec 13, 2016

@nanosoldier runbenchmarks(ALL, vs=":master")

@Sacha0

This comment has been minimized.

Show comment
Hide comment
@Sacha0

Sacha0 Dec 13, 2016

Member

@Sacha0, note that the work you've done for broadcast on SparseMatrix should really be applied to SparseVector as well (ideally without code duplication), so that we can ditch the problematic broadcast(::typeof(foo), ...) methods for SparseVector.

Yes, ref. https://discourse.julialang.org/t/status-of-fusion/833/7. Best!

Member

Sacha0 commented Dec 13, 2016

@Sacha0, note that the work you've done for broadcast on SparseMatrix should really be applied to SparseVector as well (ideally without code duplication), so that we can ditch the problematic broadcast(::typeof(foo), ...) methods for SparseVector.

Yes, ref. https://discourse.julialang.org/t/status-of-fusion/833/7. Best!

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Dec 13, 2016

Member

Yay, Travis and AppVeyor green at last!

Member

stevengj commented Dec 13, 2016

Yay, Travis and AppVeyor green at last!

1.0 1.0 1.0
```
"""
Base.:(./)

This comment has been minimized.

@tkelman

tkelman Dec 13, 2016

Contributor

should also be removed from the manual index

@tkelman

tkelman Dec 13, 2016

Contributor

should also be removed from the manual index

This comment has been minimized.

@MichaelHatherly

MichaelHatherly Dec 13, 2016

Member

A couple of other ones as well:

diff --git a/doc/src/stdlib/math.md b/doc/src/stdlib/math.md
index 3ccce45..fa7c5e4 100644
--- a/doc/src/stdlib/math.md
+++ b/doc/src/stdlib/math.md
@@ -10,12 +10,6 @@ Base.:*(::Any, ::Any...)
 Base.:(/)
 Base.:\(::Any, ::Any)
 Base.:^(::Any, ::Any)
-Base.:(.+)
-Base.:(.-)
-Base.:(.*)
-Base.:(./)
-Base.:(.\)
-Base.:(.^)
 Base.fma
 Base.muladd
 Base.div
@@ -47,12 +41,6 @@ Base.:(<)
 Base.:(<=)
 Base.:(>)
 Base.:(>=)
-Base.:(.==)
-Base.:(.!=)
-Base.:(.<)
-Base.:(.<=)
-Base.:(.>)
-Base.:(.>=)
 Base.cmp
 Base.:(~)
 Base.:(&)
@MichaelHatherly

MichaelHatherly Dec 13, 2016

Member

A couple of other ones as well:

diff --git a/doc/src/stdlib/math.md b/doc/src/stdlib/math.md
index 3ccce45..fa7c5e4 100644
--- a/doc/src/stdlib/math.md
+++ b/doc/src/stdlib/math.md
@@ -10,12 +10,6 @@ Base.:*(::Any, ::Any...)
 Base.:(/)
 Base.:\(::Any, ::Any)
 Base.:^(::Any, ::Any)
-Base.:(.+)
-Base.:(.-)
-Base.:(.*)
-Base.:(./)
-Base.:(.\)
-Base.:(.^)
 Base.fma
 Base.muladd
 Base.div
@@ -47,12 +41,6 @@ Base.:(<)
 Base.:(<=)
 Base.:(>)
 Base.:(>=)
-Base.:(.==)
-Base.:(.!=)
-Base.:(.<)
-Base.:(.<=)
-Base.:(.>)
-Base.:(.>=)
 Base.cmp
 Base.:(~)
 Base.:(&)

This comment has been minimized.

@stevengj

stevengj Dec 13, 2016

Member

Will do.

@stevengj

stevengj Dec 13, 2016

Member

Will do.

Show outdated Hide outdated base/docs/helpdb/Base.jl
"""
Base.:(.==)
NullException

This comment has been minimized.

@tkelman

tkelman Dec 13, 2016

Contributor

duplicated, bad rebase resolution

@tkelman

tkelman Dec 13, 2016

Contributor

duplicated, bad rebase resolution

<|, |>, \, ^, |, ~, !==, ===, >:, colon, hcat, vcat, hvcat, getindex, setindex!,
transpose, ctranspose,
≥, ≤, ≠, .≥, .≤, .≠, ⋅, ×, ∈, ∉, ∋, ∌, ⊆, ⊈, ⊊, ∩, ∪, √, ∛, ⊻

This comment has been minimized.

@tkelman

tkelman Dec 13, 2016

Contributor

don't delete xor

@tkelman

tkelman Dec 13, 2016

Contributor

don't delete xor

This comment has been minimized.

@stevengj

stevengj Dec 14, 2016

Member

fixed.

@stevengj

stevengj Dec 14, 2016

Member

fixed.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Dec 19, 2016

Member

Rebased to fix a merge conflict with #19527.

Member

stevengj commented Dec 19, 2016

Rebased to fix a merge conflict with #19527.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Dec 19, 2016

Member

I will merge tomorrow if there are no further objections.

Member

stevengj commented Dec 19, 2016

I will merge tomorrow if there are no further objections.

@stevengj stevengj merged commit b0a4153 into JuliaLang:master Dec 20, 2016

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@stevengj stevengj deleted the stevengj:fuse-dotops branch Dec 20, 2016

@vtjnash

This comment has been minimized.

Show comment
Hide comment
Member

vtjnash commented Dec 21, 2016

Looks like this failed when run on the buildbot? https://build.julialang.org/builders/coverage_ubuntu14.04-x64/builds/803

@pabloferz

This comment has been minimized.

Show comment
Hide comment
@pabloferz

pabloferz Dec 21, 2016

Contributor

@vtjnash The log shows a method ambiguity between *(y::Number, x::Bool) and *(q::TestMain_linalg_generic.Quaternion, r::Real). The first wasn't even touched by this PR nor was the Quaternion test in test/linalg/generic/. Maybe Test.detect_ambiguities or something over the buildbot was modified recently?

Contributor

pabloferz commented Dec 21, 2016

@vtjnash The log shows a method ambiguity between *(y::Number, x::Bool) and *(q::TestMain_linalg_generic.Quaternion, r::Real). The first wasn't even touched by this PR nor was the Quaternion test in test/linalg/generic/. Maybe Test.detect_ambiguities or something over the buildbot was modified recently?

@yuyichao

This comment has been minimized.

Show comment
Hide comment
@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Dec 21, 2016

Member

The tests were green when I merged this, so maybe something changed in the ambiguity testing?

The source of the ambiguity is that I expanded the quaternion tests (to catch a commutativity bug that I accidentally introduced at one point), and I added the (*)(q::Quaternion, r::Real) method, which conflicts with the pre-existing *(y::Number, x::Bool).

Member

stevengj commented Dec 21, 2016

The tests were green when I merged this, so maybe something changed in the ambiguity testing?

The source of the ambiguity is that I expanded the quaternion tests (to catch a commutativity bug that I accidentally introduced at one point), and I added the (*)(q::Quaternion, r::Real) method, which conflicts with the pre-existing *(y::Number, x::Bool).

@carlobaldassi

This comment has been minimized.

Show comment
Hide comment
@carlobaldassi

carlobaldassi Dec 22, 2016

Member

This has removed a few optimizations for BitArrays. One in particular is the case A .* B when A and B have the same shape, which previously was specialized and called A & B. The difference is quite significant, e.g. for 1000x1000 BitArrays it's almost 40-fold.

I wonder if there is a way to catch this case again and get the same performance as map. Also, now that we have a generalized dot-operator syntax, it would be particularly useful to exploit the cases when the function is pure and operate chunk-wise (also in map, which currently only recognizes a few operations). Is there a way to determine if a function is pure?

Member

carlobaldassi commented Dec 22, 2016

This has removed a few optimizations for BitArrays. One in particular is the case A .* B when A and B have the same shape, which previously was specialized and called A & B. The difference is quite significant, e.g. for 1000x1000 BitArrays it's almost 40-fold.

I wonder if there is a way to catch this case again and get the same performance as map. Also, now that we have a generalized dot-operator syntax, it would be particularly useful to exploit the cases when the function is pure and operate chunk-wise (also in map, which currently only recognizes a few operations). Is there a way to determine if a function is pure?

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Dec 22, 2016

Member

Yup, the general problem here is broadcast(::typeof(*),...) and similar methods aren't that useful to define, because they won't be called in lots of cases due to fusion. (In the particular case of A .* B, of course, one could simply call A & B.) As far as I know, we don't currently have any way to determine whether a function is pure.

Now that boolean operations are fused, however, it's not clear to me how often one does non-fused operations on bitarrays. We used to need it for things like (A .> 6) .& (B .> 0), but now this is fused into a single loop. How common are operations on large boolean arrays?

It also seems to me that there is quite a bit of unrolling that could be done to make chunk-by-chunk processing of BitArrays more efficient, e.g. for broadcast! with a BitArray result or getindex(A, ::BitArray). We should probably just write out (via metaprogramming) the unrolled loop for all 64 bits in a chunk. But that is separate from the question of chunk-wise pure bit operations.

Member

stevengj commented Dec 22, 2016

Yup, the general problem here is broadcast(::typeof(*),...) and similar methods aren't that useful to define, because they won't be called in lots of cases due to fusion. (In the particular case of A .* B, of course, one could simply call A & B.) As far as I know, we don't currently have any way to determine whether a function is pure.

Now that boolean operations are fused, however, it's not clear to me how often one does non-fused operations on bitarrays. We used to need it for things like (A .> 6) .& (B .> 0), but now this is fused into a single loop. How common are operations on large boolean arrays?

It also seems to me that there is quite a bit of unrolling that could be done to make chunk-by-chunk processing of BitArrays more efficient, e.g. for broadcast! with a BitArray result or getindex(A, ::BitArray). We should probably just write out (via metaprogramming) the unrolled loop for all 64 bits in a chunk. But that is separate from the question of chunk-wise pure bit operations.

.>=,
.≥,
.\,
.^,
/,
//,
.//,

This comment has been minimized.

@JeffBezanson

JeffBezanson Dec 22, 2016

Member

Is there a reason this is still here?

@JeffBezanson

JeffBezanson Dec 22, 2016

Member

Is there a reason this is still here?

This comment has been minimized.

@tkelman

tkelman Dec 22, 2016

Contributor

probably just missed - likewise with .>> and .<< below

@tkelman

tkelman Dec 22, 2016

Contributor

probably just missed - likewise with .>> and .<< below

This comment has been minimized.

@stevengj

stevengj Dec 22, 2016

Member

Yup, just missed them, sorry.

@stevengj

stevengj Dec 22, 2016

Member

Yup, just missed them, sorry.

@tkelman

This comment has been minimized.

Show comment
Hide comment
@tkelman

tkelman Dec 27, 2016

Contributor

This appears to have broken the ability for packages to define and use .. as an operator.

Contributor

tkelman commented Dec 27, 2016

This appears to have broken the ability for packages to define and use .. as an operator.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Dec 27, 2016

Member

Whoops, that wasn't intended.

Member

stevengj commented Dec 27, 2016

Whoops, that wasn't intended.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Dec 27, 2016

Member

Will have a PR to fix .. shortly.

Member

stevengj commented Dec 27, 2016

Will have a PR to fix .. shortly.

stevengj added a commit to stevengj/julia that referenced this pull request Dec 27, 2016

@Sacha0 Sacha0 referenced this pull request Jan 7, 2017

Closed

things we should deprecate, 0.6 edition #19598

16 of 22 tasks complete

@ararslan ararslan referenced this pull request Jan 9, 2017

Merged

RFC: Export iszero #19950

tkelman referenced this pull request in JuliaMatrices/BandedMatrices.jl Feb 4, 2017

@nalimilan nalimilan referenced this pull request Feb 4, 2017

Merged

Add .& and .| #306

ajkeller34 referenced this pull request in ajkeller34/Unitful.jl Feb 21, 2017

tkelman referenced this pull request in JuliaDiffEq/DiffEqProblemLibrary.jl May 4, 2017

@Sacha0 Sacha0 added the deprecation label May 14, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment