Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RowVector, pinv, and quasi-division #23067

Merged
merged 5 commits into from
Aug 22, 2017
Merged

RowVector, pinv, and quasi-division #23067

merged 5 commits into from
Aug 22, 2017

Conversation

martinholters
Copy link
Member

No tests yet, I first would like feedback whether we actually want (all of) this.

Demo:

julia> c = [1; 2]
2-element Array{Int64,1}:
 1
 2

julia> r = [3; 4]'
1×2 RowVector{Int64,Array{Int64,1}}:
 3  4

julia> s = r*c
11

julia> pinv(c)      # returns Matrix on master
1×2 RowVector{Float64,Array{Float64,1}}:
 0.2  0.4

julia> pinv(c)*c    # returns Vector on master
1.0

julia> pinv(r)      # throws MethodError on master
2-element Array{Float64,1}:
 0.12
 0.16

julia> s/c          # throws MethodError on master
1×2 RowVector{Float64,Array{Float64,1}}:
 2.2  4.4

julia> r\s          # throws MethodError on master
2-element Array{Float64,1}:
 1.32
 1.76

Closes #23028.

@martinholters martinholters added domain:linear algebra Linear algebra needs tests Unit tests are required for this change labels Aug 1, 2017
@martinholters
Copy link
Member Author

For consistency, we'd then probably also want r/r and c\c (which currently return a RowVector and a Vector, respectively) to return a scalar and r\r and c/c (which currently throw a DimensionMismatch) to return a Matrix.

The rationale for the proposed change is
a) if it works for a n×1 or 1×n matrix, it should work for the corresponding (row) vector
b) x/y should have the same type as x*pinv(y), likewise x\y and pinv(x)*y

Leaving pinv(::AbstractVector) alone (returning a matrix) and add pinv(::RowVector) also returning a matrix would make (r*c)/c == (r*c)*pinv(c) and r*(c/c) == r*(c*pinv(c)) have different types which also strikes me as not desirable. Opinions?

CC @andyferris as the father of RowVector 😄

@andyferris
Copy link
Member

I do like consistency.

One thing I'm wondering about is what / and \ should really be doing when there are multiple answers or no answer. I do get that the current L2 minimisation behaviour for matrices is convenient, but it's already quite annoying (to me anyway) that the square matrix division is treated as exact and rectangular not, where I can get run-time errors when I expect the QR behaviour and hit an example of a singular square matrix at run time (typically expecting rectangular matrices but sizes which depend on run-time data).

So for instance, should vec1/vec2 behave strictly or non-strictly? Outside of rectangular matrix division, are there examples of non-strict / methods in Julia or in packages?

Cc @andreasnoack

@martinholters
Copy link
Member Author

If / and \ are not strict for rectangular matrices, they probably shouldn't be for vectors.

Relatedly, just noticed this:

# \(A::StridedMatrix,x::Number) = inv(A)*x Should be added at some point when the old elementwise version has been deprecated long enough
# /(x::Number,A::StridedMatrix) = x*inv(A)

which dates back to February 2014 - maybe now is the time.

@andyferris
Copy link
Member

If / and \ are not strict for rectangular matrices, they probably shouldn't be for vectors.

Right, I don't disagree, and the method would be next-to-useless if it were strict because of numerical rounding errors, etc.

What I was asking was the (heretical?) question of whether non-strict / and \ for rectangular matrices has been good or not? (Since in practice I find I can't use these operators in my code at all if I can't be 100% assured that the matrix isn't square, like shown below)

julia> [0.0 1.0 0.0; 0.0 1.0 0.0] \ [0.0, 1.0] # rectangular - works
3-element Array{Float64,1}:
 0.0
 0.5
 0.0

julia> [0.0 1.0; 0.0 1.0] \ [0.0, 1.0] # square - doesn't work
ERROR: Base.LinAlg.LAPACKException(1)
Stacktrace:
 [1] chklapackerror(::Int64) at .\linalg\lapack.jl:34
 [2] trtrs!(::Char, ::Char, ::Char, ::Array{Float64,2}, ::Array{Float64,1}) at .\linalg\lapack.jl:3291
 [3] \(::Array{Float64,2}, ::Array{Float64,1}) at .\linalg\generic.jl:815

julia> reshape([1.0, 1.0], (2,1)) \ [0.0, 1.0] # rectangular - works
1-element Array{Float64,1}:
 0.5

So the reason I'm discussing this is that I just want to check that we aren't propagating badness in the name of consistency.

@andyferris
Copy link
Member

An example of a non-strict / in another setting, Number, would be if 5/3 === 2 since this is the integer which minimizes the L2 error... we just don't see this stuff and tend to use InexactError or whatever whenever it doesn't exactly divide.

Just pondering out loud here. Maybe we need a ~/ operator or something... Or maybe I'm worrying overly much...

@StefanKarpinski
Copy link
Sponsor Member

Regarding the last issue, integer division produces floats, so division of integer matrices should produce float matrices, rendering the InexactError concern moot. Division of integer matrices should produce a float matrix whose values are as close as possible to the true division result.

@andyferris
Copy link
Member

andyferris commented Aug 1, 2017

Regarding the last issue, integer division produces floats, so division of integer matrices should produce float matrices, rendering the InexactError concern moot. Division of integer matrices should produce a float matrix whose values are as close as possible to the true division result.

Right, this example was a bit of a distraction, sorry. The precise way of saying what I mean is this:

  • I expect (a / b) * b to be approximately a or else an error
  • I expect b * (b \ a) to be approximately a or else an error

And apart from overconstrained problems involving rectangular matrices I haven't seen this violated elsewhere (but someone please correct me if I'm wrong). Underconstrained rectangular matrix problems aren't an issue - they satisfy the above. My speculation is that perhaps overconstrained matrix (residual L2 minization) problems could instead be solved with ~/ and ~\ operators or something along those lines (which may also work for singular square matrices).

@andreasnoack
Copy link
Member

I think it is fair to ask if the current polyalgorithm for \ is the right solution but I also think it should be handled separately from this PR. If we decide to change the polyalgorithm then the code in this PR won't make it any harder to implement the change.

@andyferris The potential runtime errors from square \ are annoying but they are also easy to avoid for an expert like you. You just do qrfact(A, Val{true})\b instead of A\b.

We use the LU in the square case because it is quite a bit faster than the QR. You could argue, though, that users who want speed could just do lufact(A)\b. If the main goal was to avoid the runtime error then we could also create Infs in the result if the LU failed but then there'd still be behavioral differences between the square and non-square cases.

Finally, if we decide that \ is generally minimizing the norm of the solution when A is singular, we'd also need to adjust solvers for matrices with special structure. Primarily Diagonal and Triangular. Not impossible but we should be fairly sure that it is really the right thing to do. I tend to think that the current default is reasonable. In particular, because we allow expert users to customize the behavior via the Factorization.

@martinholters
Copy link
Member Author

So there are two questions here:

  1. Should we keep the current behavior of / and \ for rectangular matrices? If not, should we give that operation a new name or operator? Probably yes, but which one?
  2. Assuming we keep this "non-strict" behavior in one way or another, do the changes proposed in this PR make sense?

I'd prefer to discuss only the second question here. Care to open a dedicated issue for the first question, @andyferris?

@martinholters
Copy link
Member Author

This is getting more and more involved... If we allow '/' for rectangular matrices, I'd expect (a/a)*a to work dimension-wise, but

julia> zeros(3,4) / zeros(3,4) * zeros(3,4) # OK
3×4 Array{Float64,2}:
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0

julia> zeros(3,2) / zeros(3,2) * zeros(3,2) # not OK
ERROR: DimensionMismatch("A has dimensions (3,2) but B has dimensions (3,2)")

Before I start changing that... is there a particular reason for the current behavior I'm not aware of?

@andreasnoack
Copy link
Member

This is on 0.6, right? There was a recent bug fix for zero matrices, #22831.

@martinholters
Copy link
Member Author

This is on 0.6, right?

It was on an apparently-not-recent-enough master, so yeah, at least that seems to be resolved, phew.

@andreasnoack
Copy link
Member

Just to be clear: I think we should go ahead with this PR and continue the general discussion of \ elsewhere.

@martinholters
Copy link
Member Author

Fixed some loose ends so that the following passes:

let c = [1.0, 2.0], r = [3.0, 4.0]', cm = reshape(c, :, 1), rm = reshape(r', 1, :), m = [1.0 2.0; 3.0 4.0]
    for a in (r, rm), b in (c, cm) # inner products
        @test (a*b)/b  a*(b/b)  (a*b)*pinv(b)  a*(b*pinv(b))
        @test typeof((a*b)/b) == typeof(a*(b/b)) == typeof((a*b)*pinv(b)) == typeof(a*(b*pinv(b)))
        @test a\(a*b)  (a\a)*b  (pinv(a)*a)*b  pinv(a)*(a*b)
        @test typeof(a\(a*b)) == typeof((a\a)*b) == typeof((pinv(a)*a)*b) == typeof(pinv(a)*(a*b))
    end
    for (a, b) in ((c, r), (cm, rm)) # outer products
        @test (a*b)/b  a*(b/b)  (a*b)*pinv(b)  a*(b*pinv(b))
        @test typeof((a*b)/b) == typeof(a*(b/b)) == typeof((a*b)*pinv(b)) == typeof(a*(b*pinv(b)))
        @test a\(a*b)  (a\a)*b  (pinv(a)*a)*b  pinv(a)*(a*b)
        @test typeof(a\(a*b)) == typeof((a\a)*b) == typeof((pinv(a)*a)*b) == typeof(pinv(a)*(a*b))
    end
    @test (m*c)/c  m*(c/c)  (m*c)*pinv(c)  m*(c*pinv(c))
    @test typeof((m*c)/c) == typeof(m*(c/c)) == typeof((m*c)*pinv(c)) == typeof(m*(c*pinv(c)))
    @test r\(r*m)  (r\r)*m  (pinv(r)*r)*m  pinv(r)*(r*m)
    @test typeof(r\(r*m)) == typeof((r\r)*m) == typeof((pinv(r)*r)*m) == typeof(pinv(r)*(r*m))
end

This verifies the consistency I was aiming for. I'll add those tests and some others that directly verify the results (so that they are not consistently wrong...) in the next days.

@@ -795,6 +795,19 @@ function inv(A::AbstractMatrix{T}) where T
A_ldiv_B!(factorize(convert(AbstractMatrix{S}, A)), eye(S0, checksquare(A)))
end

function pinv(v::AbstractVector{T}, tol::Real=real(zero(T))) where T
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be tol::Real=eps(real(float(one(T))))*length(v) for consistency with pinv(::StridedMatrix{T}) (and likewise below)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I was under the impression that pinv(::StridedMatrix{T}, tol) ignores all singular values below tol, but the threshold is actually tol * maximum(singular_values), so for the vector case, it only matters whether tol < 1. So the with the computed default tol consistent with the matrix case, an all-zero vector would be returned if the input is all-zero, which makes sense, or if length(v) > 1/eps(real(float(one(T)))), which strikes me as pretty bizarre.

I think I'd rather keep the tol=0 default than aim for maximum consistency here. Opinions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pinv(::Diagonal) also uses tol=0 by default, so there is precedence for inconsistency here. Users striving for consistency across types should probably give tol explicitly, otherwise exploiting information the type conveys to choose a sensible default seems pretty reasonable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it only matters whether tol < 1

I don't follow the reasoning here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A vector has one singular value s, which obvious is the maximum one. That singular value will be ignored (resulting in a zero vector) if s<=s*tol, i.e. tol>=1 or s=0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A vector has one singular value s, which obvious is the maximum one.

Of course.

I can see that the length(v) > 1/eps(real(float(one(T)))) condition is a bit weird although it might not be a big practical concern since it would be 32 PB for Float64. However, I did some simulations and I'm wondering if using the maximum length is the right thing to do. It looks like it is the minimum that determines the error and using the minimum would fix the bizarre case for vectors.

@@ -842,10 +855,11 @@ function (\)(A::AbstractMatrix, B::AbstractVecOrMat)
return qrfact(A,Val(true)) \ B
end

(\)(a::AbstractVector, b::AbstractArray) = reshape(a, length(a), 1) \ b
(\)(a::AbstractVector, b::AbstractArray) = pinv(a) * b
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But should this use tol=0 or the default? (Likewise below)

function pinv(v::AbstractVector{T}, tol::Real=real(zero(T))) where T
res = similar(v, typeof(zero(T) / (abs2(one(T)) + abs2(one(T)))))'
den = sum(abs2, v)
if den <= tol^2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with pinv(::StridedMatrix, tol), this should be iszero(den) || tol >= one(tol). Not sure how much sense that makes, though. Opinions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you decide against consistency with the matrix case then? And to stick with default tolerance above?

Also broaden from StridedVector to AbstractVector while at it and don't
employ full matrix SVD.
Also start testing consistency between division and multiplication with
pseudo-inverse involving vectors.
Let \(::AbstractVector, ::AbstractMatrix) return a RowVector and
\(::AbstractVector, ::AbstractVector) return a scalar.
@martinholters martinholters removed the needs tests Unit tests are required for this change label Aug 4, 2017
@martinholters martinholters changed the title RFC: RowVector, pinv, and quasi-division RowVector, pinv, and quasi-division Aug 4, 2017
@martinholters
Copy link
Member Author

Good from my side. Travis failure is good old arnoldi.

@andreasnoack andreasnoack merged commit f1d3556 into master Aug 22, 2017
@andreasnoack andreasnoack deleted the mh/row_pinv branch August 22, 2017 03:22
(/)(A::AbstractVecOrMat, B::AbstractVecOrMat) = (B' \ A')'
# \(A::StridedMatrix,x::Number) = inv(A)*x Should be added at some point when the old elementwise version has been deprecated long enough
# /(x::Number,A::StridedMatrix) = x*inv(A)
/(x::Number, v::AbstractVector) = x*pinv(v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the result between two versions without any deprecation and have just caught me completely by surprise.

One more evidence why giving completely different meaning for operations that used to have elwize meanings makes no sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which versions? It is an error on 0.6.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, there used to be /(x::Number, r::Range):

julia> VERSION
v"0.6.0"

julia> 1 / (1:4)
4-element Array{Float64,1}:
 1.0     
 0.5     
 0.333333
 0.25
julia> VERSION
v"0.7.0-DEV.2158"

julia> 1 / (1:4)
1×4 RowVector{Float64,Array{Float64,1}}:
 0.0333333  0.0666667  0.1  0.133333

The old method was removed in #22932 without adding an appropriate deprecation. So I guess we could add a deprecated /(x::Number, r::Range) to still do the element-wise operation, which would be more specific than the method defined here. @yuyichao is that the case that was bothering you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I may add, when dividing a scalar by a vector, taking the pseudo-inverse of the "vector" and then multiplying it by a constant seems very far from an intuitive behavior to me. I think if one is doing this operation and one is aware it is taking the pseudoinverse, then explicitly calling pinv seems more appropriate rather than relying on this (in)convenience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants