New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taking vector transposes seriously #4774

Closed
jiahao opened this Issue Nov 10, 2013 · 417 comments

Comments

Projects
None yet
@jiahao
Member

jiahao commented Nov 10, 2013

from @alanedelman:

We really should think carefully about how the transpose of a vector should dispatch the various A_*op*_B* methods. It must be possible to avoid new types and ugly mathematics. For example, vector'vector yielding a vector (#2472, #2936), vector' yielding a matrix, and vector'' yielding a matrix (#2686) are all bad mathematics.

What works for me mathematically (which avoids introducing a new type) is that for a 1-dimensional Vector v:

  • v' is a no-op (i.e. just returns v),
  • v'v or v'*v is a scalar,
  • v*v' is a matrix, and
  • v'A or v'*A (where A is an AbstractMatrix) is a vector

A general N-dimensional transpose reverses the order of indices. A vector, having one index, should be invariant under transposition.

In practice v' is rarely used in isolation, and is usually encountered in matrix-vector products and matrix-matrix products. A common example would be to construct bilinear forms v'A*w and quadratic forms v'A*v which are used in conjugate gradients, Rayleigh quotients, etc.

The only reason to introduce a new Transpose{Vector} type would be to represent the difference between contravariant and covariant vectors, and I don't find this compelling enough.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Nov 10, 2013

Member

For example, vector'vector yielding a vector (#2472, #2936), vector' yielding a matrix, and vector'' yielding a matrix (#2686) are all bad mathematics.

The double-dual of a finite dimensional vector space is isomorphic to it, not identical. So I'm not clear on how this is bad mathematics. It's more that we tend gloss over the distinction between things that are isomorphic in mathematics, because human brains are good at handling that sort of slippery ambiguity and just doing the right thing. That said, I agree that this should be improved, but not because it's mathematically incorrect, but rather because it's annoying.

Member

StefanKarpinski commented Nov 10, 2013

For example, vector'vector yielding a vector (#2472, #2936), vector' yielding a matrix, and vector'' yielding a matrix (#2686) are all bad mathematics.

The double-dual of a finite dimensional vector space is isomorphic to it, not identical. So I'm not clear on how this is bad mathematics. It's more that we tend gloss over the distinction between things that are isomorphic in mathematics, because human brains are good at handling that sort of slippery ambiguity and just doing the right thing. That said, I agree that this should be improved, but not because it's mathematically incorrect, but rather because it's annoying.

@JeffBezanson

This comment has been minimized.

Show comment
Hide comment
@JeffBezanson

JeffBezanson Nov 11, 2013

Member

How can v' == v, but v'*v != v*v? Does it make more sense than we thought for x' * y to be its own operator?

Member

JeffBezanson commented Nov 11, 2013

How can v' == v, but v'*v != v*v? Does it make more sense than we thought for x' * y to be its own operator?

@jiahao

This comment has been minimized.

Show comment
Hide comment
@jiahao

jiahao Nov 11, 2013

Member

The double-dual of a finite dimensional vector space is isomorphic to it, not identical.

(Speaking as myself now) It is not just isomorphic, it is naturally isomorphic, i.e. the isomorphism is independent of the choice of basis. I can’t think of a practical application for which it would be worth distinguishing between this kind of isomorphism and an identity. IMO the annoyance factor comes from making this kind of distinction.

Does it make more sense than we thought for x' * y to be its own operator?

That was the impression I got out of this afternoon’s discussion with @alanedelman.

Member

jiahao commented Nov 11, 2013

The double-dual of a finite dimensional vector space is isomorphic to it, not identical.

(Speaking as myself now) It is not just isomorphic, it is naturally isomorphic, i.e. the isomorphism is independent of the choice of basis. I can’t think of a practical application for which it would be worth distinguishing between this kind of isomorphism and an identity. IMO the annoyance factor comes from making this kind of distinction.

Does it make more sense than we thought for x' * y to be its own operator?

That was the impression I got out of this afternoon’s discussion with @alanedelman.

@alanedelman

This comment has been minimized.

Show comment
Hide comment
@alanedelman

alanedelman Nov 11, 2013

Contributor

I think what Jeff is asking is right on the money ... it is starting to look like x'_y and x_y' is making more
sense than ever.

Contributor

alanedelman commented Nov 11, 2013

I think what Jeff is asking is right on the money ... it is starting to look like x'_y and x_y' is making more
sense than ever.

@alanedelman

This comment has been minimized.

Show comment
Hide comment
@alanedelman

alanedelman Nov 11, 2013

Contributor

I'm in violent agreement with @Stefan. Bad mathematics was not meant to mean, wrong mathematics, it was
meant to mean annoying mathematics. There are lots of things that are technically right, but not very nice ....

Contributor

alanedelman commented Nov 11, 2013

I'm in violent agreement with @Stefan. Bad mathematics was not meant to mean, wrong mathematics, it was
meant to mean annoying mathematics. There are lots of things that are technically right, but not very nice ....

@alanedelman

This comment has been minimized.

Show comment
Hide comment
@alanedelman

alanedelman Nov 11, 2013

Contributor

If we go with this logic, here are two choices we have

x_x remains an error ..... perhaps with a suggestion "maybe you want to use dot"
or x_x is the dot product (I don't love that choice)

Contributor

alanedelman commented Nov 11, 2013

If we go with this logic, here are two choices we have

x_x remains an error ..... perhaps with a suggestion "maybe you want to use dot"
or x_x is the dot product (I don't love that choice)

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Nov 11, 2013

Member

If x and x' are the same thing, then if you want (x')*y to mean dot(x,y) that implies that x*y is also dot(x,y). There's no way out of that. We could make x'y and x'*y a different operation, but I'm not sure that's a great idea. People want to be able to parenthesize this in the obvious way and have it still work.

I would point out that if we allow x*x to mean the dot product, there's basically no going back. That is going to get put into people's code everywhere and eradicating it will be a nightmare. So, natural isomorphism or not, this ain't pure mathematics and we've got to deal with the fact that different things in a computer are different.

Member

StefanKarpinski commented Nov 11, 2013

If x and x' are the same thing, then if you want (x')*y to mean dot(x,y) that implies that x*y is also dot(x,y). There's no way out of that. We could make x'y and x'*y a different operation, but I'm not sure that's a great idea. People want to be able to parenthesize this in the obvious way and have it still work.

I would point out that if we allow x*x to mean the dot product, there's basically no going back. That is going to get put into people's code everywhere and eradicating it will be a nightmare. So, natural isomorphism or not, this ain't pure mathematics and we've got to deal with the fact that different things in a computer are different.

@JeffBezanson

This comment has been minimized.

Show comment
Hide comment
@JeffBezanson

JeffBezanson Nov 11, 2013

Member

Here is a practical discussion of distinguishing "up tuples" and "down tuples" that I like:

http://mitpress.mit.edu/sites/default/files/titles/content/sicm/book-Z-H-79.html#%_idx_3310

It carefully avoids words like "vector" and "dual", perhaps to avoid annoying people. I find the application to partial derivatives compelling though:

http://mitpress.mit.edu/sites/default/files/titles/content/sicm/book-Z-H-79.html#%_sec_Temp_453

Member

JeffBezanson commented Nov 11, 2013

Here is a practical discussion of distinguishing "up tuples" and "down tuples" that I like:

http://mitpress.mit.edu/sites/default/files/titles/content/sicm/book-Z-H-79.html#%_idx_3310

It carefully avoids words like "vector" and "dual", perhaps to avoid annoying people. I find the application to partial derivatives compelling though:

http://mitpress.mit.edu/sites/default/files/titles/content/sicm/book-Z-H-79.html#%_sec_Temp_453

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Nov 11, 2013

Member

Another reason to distinguish M[1,:] and M[:,1] is that currently our broadcasting behavior allows this very convenient behavior: M./sum(M,1) is column-stochastic and M./sum(M,2) is row-stochastic. The same could be done for normalization if we "fixed" the norm function to allow application over rows and columns easily. Of course, we could still have sum(M,1) and sum(M,2) return matrices instead of up and down vectors, but that seems a bit off.

Member

StefanKarpinski commented Nov 11, 2013

Another reason to distinguish M[1,:] and M[:,1] is that currently our broadcasting behavior allows this very convenient behavior: M./sum(M,1) is column-stochastic and M./sum(M,2) is row-stochastic. The same could be done for normalization if we "fixed" the norm function to allow application over rows and columns easily. Of course, we could still have sum(M,1) and sum(M,2) return matrices instead of up and down vectors, but that seems a bit off.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Nov 11, 2013

Member

I like the idea of up and down vectors. The trouble is generalizing it to higher dimensions in a way that's not completely insane. Or you can just make vectors a special case. But that feels wrong too.

Member

StefanKarpinski commented Nov 11, 2013

I like the idea of up and down vectors. The trouble is generalizing it to higher dimensions in a way that's not completely insane. Or you can just make vectors a special case. But that feels wrong too.

@JeffBezanson

This comment has been minimized.

Show comment
Hide comment
@JeffBezanson

JeffBezanson Nov 11, 2013

Member

It's true that up/down may be a separate theory. The approach to generalizing them seems to be nested structure, which takes things in a different direction. Very likely there is a reason they don't call them vectors.

Member

JeffBezanson commented Nov 11, 2013

It's true that up/down may be a separate theory. The approach to generalizing them seems to be nested structure, which takes things in a different direction. Very likely there is a reason they don't call them vectors.

@toivoh

This comment has been minimized.

Show comment
Hide comment
@toivoh

toivoh Nov 11, 2013

Member

Also, x*y = dot(x,y) would make * non-associative, as in x*(y*z) vs. (x*y)*z. I really hope that we can avoid that.

Member

toivoh commented Nov 11, 2013

Also, x*y = dot(x,y) would make * non-associative, as in x*(y*z) vs. (x*y)*z. I really hope that we can avoid that.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Nov 11, 2013

Member

Yes. To me, that's completely unacceptable. I mean technically, floating-point * is non-associative, but it's nearly associative, whereas this would just be flagrantly non-associative.

Member

StefanKarpinski commented Nov 11, 2013

Yes. To me, that's completely unacceptable. I mean technically, floating-point * is non-associative, but it's nearly associative, whereas this would just be flagrantly non-associative.

@alanedelman

This comment has been minimized.

Show comment
Hide comment
@alanedelman

alanedelman Nov 11, 2013

Contributor

We all agree that x*x should not be the dot product.

The question remains whether we can think of v'w and v'*w as the dot product --
I really really like this working that way.

@JeffBezanson and I were chatting

A proposal is the following:

v' is an error for vectors (This is what mathematica does)
v'w and v'*w is a dot product (result = scalar)
v*w is an outer product matrix (result = matrix)

There is no distinction between rows and column vectors. I liked this anyway
and was happy to see mathematica's precedent
From mathematica: http://reference.wolfram.com/mathematica/tutorial/VectorsAndMatrices.html
Because of the way Mathematica uses lists to represent vectors and matrices, you never have to distinguish between "row" and "column" vectors

Users have to be aware that there are no row vectors .... period.

Thus if M is a matrix

M[1,:]*v is an error ..... (assuming we go with M[1,:] is a scalar
The warning could suggest trying dot or '* or M[i:i,:]

M[[1],:]*v or M[1:1,:]*v is a vector of length 1 (This is julia's current behavior anyway)

Regarding the closely related issue in https://groups.google.com/forum/#!topic/julia-users/L3vPeZ7kews

Mathematica compresses scalar-like array sections:

m = Array[a, {2, 2, 2}] 


Out[49]= {{{a[1, 1, 1], a[1, 1, 2]}, {a[1, 2, 1], 
   a[1, 2, 2]}}, {{a[2, 1, 1], a[2, 1, 2]}, {a[2, 2, 1], a[2, 2, 2]}}}

In[123]:= Dimensions[m]
Dimensions[m[[All, 1, All]]]
Dimensions[m[[2, 1, All]]]
Dimensions[m[[2, 1 ;; 1, All]]]

Out[123]= {2, 2, 2}

Out[124]= {2, 2}

Out[125]= {2}

Out[126]= {1, 2}

[Edit: code formatting – @StefanKarpinski]

Contributor

alanedelman commented Nov 11, 2013

We all agree that x*x should not be the dot product.

The question remains whether we can think of v'w and v'*w as the dot product --
I really really like this working that way.

@JeffBezanson and I were chatting

A proposal is the following:

v' is an error for vectors (This is what mathematica does)
v'w and v'*w is a dot product (result = scalar)
v*w is an outer product matrix (result = matrix)

There is no distinction between rows and column vectors. I liked this anyway
and was happy to see mathematica's precedent
From mathematica: http://reference.wolfram.com/mathematica/tutorial/VectorsAndMatrices.html
Because of the way Mathematica uses lists to represent vectors and matrices, you never have to distinguish between "row" and "column" vectors

Users have to be aware that there are no row vectors .... period.

Thus if M is a matrix

M[1,:]*v is an error ..... (assuming we go with M[1,:] is a scalar
The warning could suggest trying dot or '* or M[i:i,:]

M[[1],:]*v or M[1:1,:]*v is a vector of length 1 (This is julia's current behavior anyway)

Regarding the closely related issue in https://groups.google.com/forum/#!topic/julia-users/L3vPeZ7kews

Mathematica compresses scalar-like array sections:

m = Array[a, {2, 2, 2}] 


Out[49]= {{{a[1, 1, 1], a[1, 1, 2]}, {a[1, 2, 1], 
   a[1, 2, 2]}}, {{a[2, 1, 1], a[2, 1, 2]}, {a[2, 2, 1], a[2, 2, 2]}}}

In[123]:= Dimensions[m]
Dimensions[m[[All, 1, All]]]
Dimensions[m[[2, 1, All]]]
Dimensions[m[[2, 1 ;; 1, All]]]

Out[123]= {2, 2, 2}

Out[124]= {2, 2}

Out[125]= {2}

Out[126]= {1, 2}

[Edit: code formatting – @StefanKarpinski]

@blakejohnson

This comment has been minimized.

Show comment
Hide comment
@blakejohnson

blakejohnson Nov 12, 2013

Contributor

@alanedelman

assuming we go with M[1,:] is a scalar

do you mean M[1,:] is just a vector?

Contributor

blakejohnson commented Nov 12, 2013

@alanedelman

assuming we go with M[1,:] is a scalar

do you mean M[1,:] is just a vector?

@alanedelman

This comment has been minimized.

Show comment
Hide comment
@alanedelman

alanedelman Nov 12, 2013

Contributor

Yes, sorry. My mind meant M[1,:] was processing the scalar 1 :-)

Mathematica uses the period . rather than the asterisk *
and then goes the whole 9 yards and makes (vector . vector) into a scalar, exactly what we are avoiding
with the asterisk.

There are no doubt many problems with the period, one of which is that it just doesn't
look like the "dot" in a dot product, and another of which is that it clashes with
the "pointwise op" reading of dot,

Unicode does provide very nicely a character named "the dot operator"
(char(8901)) which we can imagine offering

so we could have (v ⋅ w) becoming synonymous with (v'*w)

In summary a current proposal subject for debate is

  1. Scalar indexing kills the dimension thus
    A[i,:] is a vector as is A[:,i,j]

  2. Vector indexing is thick
    A[ i:i , : ] or A[ [i], : ] returns a matrix with one row

  3. v'w or v'*w is the dot product for vectors (similarly v*w' for outer product)

  4. v' is undefined for vectors (point the user to permutedims(v,1)????)

  5. v*A returns a vector if A is a matrix

  6. v⋅w also returns the dot product (but does not go as far as mathematica's . by working on matrices

  7. v*w is undefined for vectors but a warning might tip the user off with good suggestions including

    Consequences are that

a. if you stick to all vectors being column vectors, everything works
b. if you make everything a matrix, everything certainly works, and it's easy to make everything a matrix
c. if your mind can't distinguish a row vector from a one row matrix, chances are you'll be educated
politely and gracefully
d. This dot notation is kind of pleasant to the eye

Contributor

alanedelman commented Nov 12, 2013

Yes, sorry. My mind meant M[1,:] was processing the scalar 1 :-)

Mathematica uses the period . rather than the asterisk *
and then goes the whole 9 yards and makes (vector . vector) into a scalar, exactly what we are avoiding
with the asterisk.

There are no doubt many problems with the period, one of which is that it just doesn't
look like the "dot" in a dot product, and another of which is that it clashes with
the "pointwise op" reading of dot,

Unicode does provide very nicely a character named "the dot operator"
(char(8901)) which we can imagine offering

so we could have (v ⋅ w) becoming synonymous with (v'*w)

In summary a current proposal subject for debate is

  1. Scalar indexing kills the dimension thus
    A[i,:] is a vector as is A[:,i,j]

  2. Vector indexing is thick
    A[ i:i , : ] or A[ [i], : ] returns a matrix with one row

  3. v'w or v'*w is the dot product for vectors (similarly v*w' for outer product)

  4. v' is undefined for vectors (point the user to permutedims(v,1)????)

  5. v*A returns a vector if A is a matrix

  6. v⋅w also returns the dot product (but does not go as far as mathematica's . by working on matrices

  7. v*w is undefined for vectors but a warning might tip the user off with good suggestions including

    Consequences are that

a. if you stick to all vectors being column vectors, everything works
b. if you make everything a matrix, everything certainly works, and it's easy to make everything a matrix
c. if your mind can't distinguish a row vector from a one row matrix, chances are you'll be educated
politely and gracefully
d. This dot notation is kind of pleasant to the eye

@blakejohnson

This comment has been minimized.

Show comment
Hide comment
@blakejohnson

blakejohnson Nov 12, 2013

Contributor

Suggestion 5) looks very odd to me. I prefer v'*A so that it is explicit that you are using the dual vector. This is particularly important in complex vector spaces where the dual isn't just a "shape" transformation.

I want to echo @StefanKarpinski that it would be rather unfortunate to lose our concise broadcasting behavior in all this. After this change, what is the concise syntax for taking a vector v and normalizing the columns of matrix A by those values? Currently, one can use A ./ v'. This is extremely nice for data manipulation.

Contributor

blakejohnson commented Nov 12, 2013

Suggestion 5) looks very odd to me. I prefer v'*A so that it is explicit that you are using the dual vector. This is particularly important in complex vector spaces where the dual isn't just a "shape" transformation.

I want to echo @StefanKarpinski that it would be rather unfortunate to lose our concise broadcasting behavior in all this. After this change, what is the concise syntax for taking a vector v and normalizing the columns of matrix A by those values? Currently, one can use A ./ v'. This is extremely nice for data manipulation.

@alanedelman

This comment has been minimized.

Show comment
Hide comment
@alanedelman

alanedelman Nov 12, 2013

Contributor

Good questions

My scheme does not preclude v'*A taking the complex conjugate of v and mulitiplying by A
and all the various other cases that I didn't yet explicitly mention, but readily could

we could eliminate 5
perhaps that's desirable
it doesn't conform to my column vector rule

This approach to broadcasting is cute and kludgy
One solution now is A ./ v[:,[1]]

It has the benefit of documenting which dimension is being broadcast on
and generalizes to higher dimensional arrays

Oh and the v[:,[1]] solution has the virtue of NOT taking the complex conjugate
which is probably what the user intends .....

I LIKE THESE TWO EXAMPLES because the first is a LINEAR ALGEBRA example
where a complex conjugate is desired very frequently, but the second example is
a MULTIDIMENSIONAL DATA EXAMPLE where we want things to work in all dimensions
not just for matrices, and we very likely don't want a complex conjuugate

Contributor

alanedelman commented Nov 12, 2013

Good questions

My scheme does not preclude v'*A taking the complex conjugate of v and mulitiplying by A
and all the various other cases that I didn't yet explicitly mention, but readily could

we could eliminate 5
perhaps that's desirable
it doesn't conform to my column vector rule

This approach to broadcasting is cute and kludgy
One solution now is A ./ v[:,[1]]

It has the benefit of documenting which dimension is being broadcast on
and generalizes to higher dimensional arrays

Oh and the v[:,[1]] solution has the virtue of NOT taking the complex conjugate
which is probably what the user intends .....

I LIKE THESE TWO EXAMPLES because the first is a LINEAR ALGEBRA example
where a complex conjugate is desired very frequently, but the second example is
a MULTIDIMENSIONAL DATA EXAMPLE where we want things to work in all dimensions
not just for matrices, and we very likely don't want a complex conjuugate

@jiahao

This comment has been minimized.

Show comment
Hide comment
@jiahao

jiahao Nov 12, 2013

Member

requires #552. This is the third time it's shown up in the past two weeks.

Member

jiahao commented Nov 12, 2013

requires #552. This is the third time it's shown up in the past two weeks.

@BobPortmann

This comment has been minimized.

Show comment
Hide comment
@BobPortmann

BobPortmann Nov 12, 2013

Contributor

Another reason to distinguish M[1,:] and M[:,1] is that currently our broadcasting behavior allows this very convenient behavior: M./sum(M,1) is column-stochastic and M./sum(M,2) is row-stochastic. The same could be done for normalization if we "fixed" the norm function to allow application over rows and columns easily. Of course, we could still have sum(M,1) and sum(M,2) return matrices instead of up and down vectors, but that seems a bit off.

It seems to me that while having the broadcasting behavior is nice some of the time, you end up having to squeeze those extra unit dims outs just as often. Thus, having to do the opposite some of the time is OK if the rest of the system is nicer (and I do think having scalar dimensions dropped will make the system nicer). Thus you will need a function like

julia> widen(A::AbstractArray,dim::Int) = reshape(A,insert!([size(A)...],dim,1)...)
# methods for generic function widen
widen(A::AbstractArray{T,N},dim::Int64) at none:1

which will allow code like M ./ widen(sum(M,2),2) or A ./ widen(v,1) (see @blakejohnson example above)

Contributor

BobPortmann commented Nov 12, 2013

Another reason to distinguish M[1,:] and M[:,1] is that currently our broadcasting behavior allows this very convenient behavior: M./sum(M,1) is column-stochastic and M./sum(M,2) is row-stochastic. The same could be done for normalization if we "fixed" the norm function to allow application over rows and columns easily. Of course, we could still have sum(M,1) and sum(M,2) return matrices instead of up and down vectors, but that seems a bit off.

It seems to me that while having the broadcasting behavior is nice some of the time, you end up having to squeeze those extra unit dims outs just as often. Thus, having to do the opposite some of the time is OK if the rest of the system is nicer (and I do think having scalar dimensions dropped will make the system nicer). Thus you will need a function like

julia> widen(A::AbstractArray,dim::Int) = reshape(A,insert!([size(A)...],dim,1)...)
# methods for generic function widen
widen(A::AbstractArray{T,N},dim::Int64) at none:1

which will allow code like M ./ widen(sum(M,2),2) or A ./ widen(v,1) (see @blakejohnson example above)

@alanedelman

This comment has been minimized.

Show comment
Hide comment
@alanedelman

alanedelman Nov 13, 2013

Contributor

M[:,0,:] and v[:,0] ?????

Contributor

alanedelman commented Nov 13, 2013

M[:,0,:] and v[:,0] ?????

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Nov 13, 2013

Member

I'm more with @blakejohnson on the reduction issue; I personally think it's clearer to squeeze dimensions than to widen them. I suspect I'd be perpetually looking at the docs to figure out whether widen inserts the dimension at or after the indicated index, and the numbering gets a little more complex if you want to widen over multiple dimensions at once. (What does widen(v, (1, 2)) for a vector v do?) None of these are issues for squeeze.

Member

timholy commented Nov 13, 2013

I'm more with @blakejohnson on the reduction issue; I personally think it's clearer to squeeze dimensions than to widen them. I suspect I'd be perpetually looking at the docs to figure out whether widen inserts the dimension at or after the indicated index, and the numbering gets a little more complex if you want to widen over multiple dimensions at once. (What does widen(v, (1, 2)) for a vector v do?) None of these are issues for squeeze.

@toivoh

This comment has been minimized.

Show comment
Hide comment
@toivoh

toivoh Nov 13, 2013

Member

Regardless of whether we would widen or squeeze per default, I think that Julia should follow the lead from numpy when it comes to widening and allow something like v[:, newaxis]. But I do believe that I prefer to keep dimensions instead of discarding them, it's harder to catch a bug where you accidentally widened the wrong way than when you squeezed the wrong way (which will usually give an error).

Member

toivoh commented Nov 13, 2013

Regardless of whether we would widen or squeeze per default, I think that Julia should follow the lead from numpy when it comes to widening and allow something like v[:, newaxis]. But I do believe that I prefer to keep dimensions instead of discarding them, it's harder to catch a bug where you accidentally widened the wrong way than when you squeezed the wrong way (which will usually give an error).

@wenxgwen

This comment has been minimized.

Show comment
Hide comment
@wenxgwen

wenxgwen Nov 13, 2013

In the list of @alanedelman
I feel that

v*A returns a vector if A is a matrix

is not good.

v_A should be an error if A is not 1x1 (mismatch of the range of index)
v'_A should be the proper way to do it.

One way to handle this issue is to automatically convert vector v to nx1 matrix (when needed)
and always treat v' as 1xn matrix (never convert that to a vector or nx1 matrix)
Also we allow to automatically convert 1x1 matrix to a scaler number (when needed).

I feel that this represents a consistent and uniform way to think about linear algebra. (good mathematics)

A uniform way to handle all those issues is to allow automatic (type?) conversion (when needed)
between array of size (n), (n,1), (n,1,1),(n,1,1,1) etc (but not between arrays of size (n,1) and (1,n) )
(Just like we automatically convert real number to complex number when needed)

In this case an array of size (1,1) can be converted to a number (when needed) (See #4797 )

Xiao-Gang (a physicist)

In the list of @alanedelman
I feel that

v*A returns a vector if A is a matrix

is not good.

v_A should be an error if A is not 1x1 (mismatch of the range of index)
v'_A should be the proper way to do it.

One way to handle this issue is to automatically convert vector v to nx1 matrix (when needed)
and always treat v' as 1xn matrix (never convert that to a vector or nx1 matrix)
Also we allow to automatically convert 1x1 matrix to a scaler number (when needed).

I feel that this represents a consistent and uniform way to think about linear algebra. (good mathematics)

A uniform way to handle all those issues is to allow automatic (type?) conversion (when needed)
between array of size (n), (n,1), (n,1,1),(n,1,1,1) etc (but not between arrays of size (n,1) and (1,n) )
(Just like we automatically convert real number to complex number when needed)

In this case an array of size (1,1) can be converted to a number (when needed) (See #4797 )

Xiao-Gang (a physicist)

@alanedelman

This comment has been minimized.

Show comment
Hide comment
@alanedelman

alanedelman Nov 13, 2013

Contributor

This leaves v'_A however .... I really want v'_A*w to work

Contributor

alanedelman commented Nov 13, 2013

This leaves v'_A however .... I really want v'_A*w to work

@toivoh

This comment has been minimized.

Show comment
Hide comment
@toivoh

toivoh Nov 13, 2013

Member

My impression of linear algebra in Julia is that it is very much organized like matrix algebra, although scalars and true vectors exist (which I think is good!)

Let us consider how to handle a product like x*y*z*w, where each factor may be a scalar, vector, or matrix, possibly with a transpose on it. Matrix algebra defines the product of matrices, where a matrix has size n x m. One approach would be to extend this definition so that n or m might be replaced by absent, which would act like a value of one as far as computing the product is concerned, but is used to distinguish scalar and vectors from matrices:

  • a scalar would be absent x absent
  • a (column) vector would be n x absent
  • a row vector would be absent x n

Ideally, we would like to arrange things so that we never need to represent row vectors, but it would be enough to implement operations like x'*y and x*y'. I get the feeling that this is the flavor of scheme that many of us are searching for.

But I'm beginning to suspect that banning row vectors in this kind of scheme will come at a high cost. Example: Consider how we would need to parenthesize a product to avoid forming a row vector at any intermediate step: (a is a scalar, u and v are vectors)

a*u'*v = a*(u'*v) // a*u' is forbidden
v*u'*a = (v*u')*a // u'*a is forbidden

To evaluate a product x*y'*z while avoiding to produce row vectors, we would need to know the types of the factors before picking the multiplication order! If the user should do it himself, it seems like an obstacle to generic programming. And I'm not sure how Julia could do it automatically in a sane way.

Another reason not to fix the multiplication order in advance: I seem to remember that there was an idea to use dynamic programming to choose the optimal evaluation order of *(x,y,z,w) to minimize the number of operations needed. Anything that we do to avoid forming row vectors will likely interfere with this.

So right now, introducing a transposed vector type seems like the most sane alternative to me. That, or doing everything as now but dropping trailing singleton dimensions when keeping them would result in an error.

Member

toivoh commented Nov 13, 2013

My impression of linear algebra in Julia is that it is very much organized like matrix algebra, although scalars and true vectors exist (which I think is good!)

Let us consider how to handle a product like x*y*z*w, where each factor may be a scalar, vector, or matrix, possibly with a transpose on it. Matrix algebra defines the product of matrices, where a matrix has size n x m. One approach would be to extend this definition so that n or m might be replaced by absent, which would act like a value of one as far as computing the product is concerned, but is used to distinguish scalar and vectors from matrices:

  • a scalar would be absent x absent
  • a (column) vector would be n x absent
  • a row vector would be absent x n

Ideally, we would like to arrange things so that we never need to represent row vectors, but it would be enough to implement operations like x'*y and x*y'. I get the feeling that this is the flavor of scheme that many of us are searching for.

But I'm beginning to suspect that banning row vectors in this kind of scheme will come at a high cost. Example: Consider how we would need to parenthesize a product to avoid forming a row vector at any intermediate step: (a is a scalar, u and v are vectors)

a*u'*v = a*(u'*v) // a*u' is forbidden
v*u'*a = (v*u')*a // u'*a is forbidden

To evaluate a product x*y'*z while avoiding to produce row vectors, we would need to know the types of the factors before picking the multiplication order! If the user should do it himself, it seems like an obstacle to generic programming. And I'm not sure how Julia could do it automatically in a sane way.

Another reason not to fix the multiplication order in advance: I seem to remember that there was an idea to use dynamic programming to choose the optimal evaluation order of *(x,y,z,w) to minimize the number of operations needed. Anything that we do to avoid forming row vectors will likely interfere with this.

So right now, introducing a transposed vector type seems like the most sane alternative to me. That, or doing everything as now but dropping trailing singleton dimensions when keeping them would result in an error.

@lsorber

This comment has been minimized.

Show comment
Hide comment
@lsorber

lsorber Nov 14, 2013

Transposition is just a particular way to permute modes. If you allow v.' where v is a vector, then permutedims(v,[2 1]) should return the exact same thing. Either both return a special row vector type, or they introduce a new dimension.

Having a special type for row vectors doesn't look like a good solution to me, because what will you do about other types of mode-n vectors, e.g., permutedims([1:4],[3 2 1])? I urge you to also take the multilinear algebra into account before taking a decision.

lsorber commented Nov 14, 2013

Transposition is just a particular way to permute modes. If you allow v.' where v is a vector, then permutedims(v,[2 1]) should return the exact same thing. Either both return a special row vector type, or they introduce a new dimension.

Having a special type for row vectors doesn't look like a good solution to me, because what will you do about other types of mode-n vectors, e.g., permutedims([1:4],[3 2 1])? I urge you to also take the multilinear algebra into account before taking a decision.

@wenxgwen

This comment has been minimized.

Show comment
Hide comment
@wenxgwen

wenxgwen Nov 14, 2013

@toivoh mentioned that

"One approach would be to extend this definition so that n or m might be replaced by absent, which would act like a value of one as far as computing the product is concerned, but is used to distinguish scalar and vectors from matrices:

  1. a scalar would be absent x absent
  2. a (column) vector would be n x absent
  3. a row vector would be absent x n "

In multi linear algebra (or for high rand tensors) the above proposal correspond to use absent to represent
many indices of range 1, ie size (m,n,absent) may correspond to (m,n), (m,n,1), (m,n,1,1), etc.

If we use this interpretation of absent , then 1. and 2. is OK and nice to have, but 3. may not be OK.
We do not want to mix arrays of size (1,n) and (1,1,n).

@toivoh mentioned that

"One approach would be to extend this definition so that n or m might be replaced by absent, which would act like a value of one as far as computing the product is concerned, but is used to distinguish scalar and vectors from matrices:

  1. a scalar would be absent x absent
  2. a (column) vector would be n x absent
  3. a row vector would be absent x n "

In multi linear algebra (or for high rand tensors) the above proposal correspond to use absent to represent
many indices of range 1, ie size (m,n,absent) may correspond to (m,n), (m,n,1), (m,n,1,1), etc.

If we use this interpretation of absent , then 1. and 2. is OK and nice to have, but 3. may not be OK.
We do not want to mix arrays of size (1,n) and (1,1,n).

@thomasmcoffee

This comment has been minimized.

Show comment
Hide comment
@thomasmcoffee

thomasmcoffee Jan 17, 2014

I'm not a specialist in tensor theory, but I have used all the systems mentioned above (without any add-on packages) for substantial projects involving linear algebra.

[TL;DR: skip to SUMMARY]

Here are the most common scenarios in which I have found a need for greater generality in array handling than commonplace matrix-vector operations:

(1) Functional analysis: For instance, as soon as you're using the Hessian of a vector-valued function, you need higher-order tensors to work. If you're writing a lot of math, it would be a huge pain to have to use special syntax for these cases.

(2) Evaluation control: For instance, given any product that can be computed, one should be able to compute any sub-entity of that product separately, because one might wish to combine it with multiple different sub-entities to form different products. Thus Toivo's concern about, e.g., a*u' being forbidden is not just a compilation concern, but a programming concern; an even more common variant is pre-computing x'Q to compute quadratic forms x'Q*y1, x'Q*y2, ... (where these must be done sequentially).

(3) Simplifying code: Several times when dealing with arithmetic operations mapped over multi-dimensional data sets, I have found that 6-7 lines of inscrutable looping or function-mapping code can be replaced with one or two brief array operations, in systems that provide appropriate generality. Much more readable, and much faster.

Here are my general experiences with the above systems:

MATLAB: Core language is limited beyond commonplace matrix-vector ops, so usually end up writing loops with indexing.

NumPy: More general capability than MATLAB, but messy and complicated. For almost every nontrivial problem instance, I had to refer to documentation, and even then sometimes found that I had to implement myself some array operation that I felt intuitively should have been defined. It seems like there are so many separate ideas in the system that any given user and developer will have trouble automatically guessing how the other will think about something. It is usually possible to find a short, efficient way to do it, but that way is not always obvious to the writer or reader. In particular, I feel that the need for widening and singleton dimensions just reflects a lack of generality in the implementation for applying operators (though maybe some find it more intuitive).

Mathematica: Clean and very general---in particular, all relevant operators are designed with higher-order tensor behavior in mind. Besides Dot, see for example the docs on Transpose, Flatten / Partition, and Inner / Outer. By combining just these operations, you can already cover most array-juggling use cases, and in version 9 they even have additional tensor algebra operations added to the core language. The downside is that even though the Mathematica way of doing something is clean and makes sense (if you know the language), it may not obviously correspond to the usual mathematical notation for doing it. And of course, the generality makes it difficult to know how the code will perform.

scmutils: For functional analysis, it is clean, general, and provides the most mathematically intuitive operations (both write and read) of any of the above. The up/down tuple idea is really just a more consistent and more general extension of what people often do in written mathematics using transpose signs, differentiation conventions, and other semi-standardized notions; but everything just works. (To write my Ph.D. thesis, I ended up developing a consistent and unambiguous notation resembling traditional math notation but isomorphic to Sussman & Wisdom's SICM syntax.) They've also used it for a differential geometry implementation [1], which has inspired a port to SymPy [2]. I have not used it for for data analysis, but I would expect that in a generic array context where you only wanted one kind of tuple (like Mathematica's List), you could just pick one ("up") by convention. Again, generality obscures performance considerations to the programmer, but I would hope this is an area where Julia can excel.

SUMMARY

I think the proposed transposed-vector type should be characterized as the more general "down" tuple in scmutils, while regular vectors would be the "up" tuples. Calling them something like "vector" and "transposed-vector" would probably make more sense to people than calling them "up" and "down" (at the cost of brevity). This would support three categories of use:

(1) for data analysis, if people just want nested arrays, they only need "vector";
(2) for basic matrix-vector linear algebra, people can use "vector" and "transposed-vector" in direct correspondence with mathematical convention ("matrix" would be equivalent to a "transposed-vector" of "vector"s);
(3) for higher-order tensor operations (where there's less standardization and people usually have to think anyway), the implementation should support the full generality of the two-kind tuple arithmetic system.

I believe this approach reflects the emerging consensus above for the results of various operations, with the exception that those cases that earlier posts considered errors (v' and v*A) would actually give meaningful (and often useful) results.

[1] http://dspace.mit.edu/handle/1721.1/30520
[2] http://krastanov.wordpress.com/diff-geometry-in-python/

I'm not a specialist in tensor theory, but I have used all the systems mentioned above (without any add-on packages) for substantial projects involving linear algebra.

[TL;DR: skip to SUMMARY]

Here are the most common scenarios in which I have found a need for greater generality in array handling than commonplace matrix-vector operations:

(1) Functional analysis: For instance, as soon as you're using the Hessian of a vector-valued function, you need higher-order tensors to work. If you're writing a lot of math, it would be a huge pain to have to use special syntax for these cases.

(2) Evaluation control: For instance, given any product that can be computed, one should be able to compute any sub-entity of that product separately, because one might wish to combine it with multiple different sub-entities to form different products. Thus Toivo's concern about, e.g., a*u' being forbidden is not just a compilation concern, but a programming concern; an even more common variant is pre-computing x'Q to compute quadratic forms x'Q*y1, x'Q*y2, ... (where these must be done sequentially).

(3) Simplifying code: Several times when dealing with arithmetic operations mapped over multi-dimensional data sets, I have found that 6-7 lines of inscrutable looping or function-mapping code can be replaced with one or two brief array operations, in systems that provide appropriate generality. Much more readable, and much faster.

Here are my general experiences with the above systems:

MATLAB: Core language is limited beyond commonplace matrix-vector ops, so usually end up writing loops with indexing.

NumPy: More general capability than MATLAB, but messy and complicated. For almost every nontrivial problem instance, I had to refer to documentation, and even then sometimes found that I had to implement myself some array operation that I felt intuitively should have been defined. It seems like there are so many separate ideas in the system that any given user and developer will have trouble automatically guessing how the other will think about something. It is usually possible to find a short, efficient way to do it, but that way is not always obvious to the writer or reader. In particular, I feel that the need for widening and singleton dimensions just reflects a lack of generality in the implementation for applying operators (though maybe some find it more intuitive).

Mathematica: Clean and very general---in particular, all relevant operators are designed with higher-order tensor behavior in mind. Besides Dot, see for example the docs on Transpose, Flatten / Partition, and Inner / Outer. By combining just these operations, you can already cover most array-juggling use cases, and in version 9 they even have additional tensor algebra operations added to the core language. The downside is that even though the Mathematica way of doing something is clean and makes sense (if you know the language), it may not obviously correspond to the usual mathematical notation for doing it. And of course, the generality makes it difficult to know how the code will perform.

scmutils: For functional analysis, it is clean, general, and provides the most mathematically intuitive operations (both write and read) of any of the above. The up/down tuple idea is really just a more consistent and more general extension of what people often do in written mathematics using transpose signs, differentiation conventions, and other semi-standardized notions; but everything just works. (To write my Ph.D. thesis, I ended up developing a consistent and unambiguous notation resembling traditional math notation but isomorphic to Sussman & Wisdom's SICM syntax.) They've also used it for a differential geometry implementation [1], which has inspired a port to SymPy [2]. I have not used it for for data analysis, but I would expect that in a generic array context where you only wanted one kind of tuple (like Mathematica's List), you could just pick one ("up") by convention. Again, generality obscures performance considerations to the programmer, but I would hope this is an area where Julia can excel.

SUMMARY

I think the proposed transposed-vector type should be characterized as the more general "down" tuple in scmutils, while regular vectors would be the "up" tuples. Calling them something like "vector" and "transposed-vector" would probably make more sense to people than calling them "up" and "down" (at the cost of brevity). This would support three categories of use:

(1) for data analysis, if people just want nested arrays, they only need "vector";
(2) for basic matrix-vector linear algebra, people can use "vector" and "transposed-vector" in direct correspondence with mathematical convention ("matrix" would be equivalent to a "transposed-vector" of "vector"s);
(3) for higher-order tensor operations (where there's less standardization and people usually have to think anyway), the implementation should support the full generality of the two-kind tuple arithmetic system.

I believe this approach reflects the emerging consensus above for the results of various operations, with the exception that those cases that earlier posts considered errors (v' and v*A) would actually give meaningful (and often useful) results.

[1] http://dspace.mit.edu/handle/1721.1/30520
[2] http://krastanov.wordpress.com/diff-geometry-in-python/

@jiahao

This comment has been minimized.

Show comment
Hide comment
@jiahao

jiahao Jan 17, 2014

Member

@thomasmcoffee sound like you are advocating for an explicit distinction between co- and contravariant vectors then.

Member

jiahao commented Jan 17, 2014

@thomasmcoffee sound like you are advocating for an explicit distinction between co- and contravariant vectors then.

@thomasmcoffee

This comment has been minimized.

Show comment
Hide comment
@thomasmcoffee

thomasmcoffee Jan 18, 2014

I would think of that as a common application, but overly specific for what I'm advocating: to me, that has a geometric meaning implying a restriction to rectangular tensors of numbers (for coordinate representations). Since I imagine (without expertise in this area) that a suitable library of tensor algebra functions with standard arrays would usually suffice for this purpose, I'm sympathetic to Alan's point that this alone is not compelling enough to introduce a two-kind system in the core language.

I'm mainly thinking of other applications depending on more general nested structure, for instance, calculus on functions of multiple arguments of mixed dimensionality, which would be more difficult to develop as an "add-on" later if the core language did not support this distinction. Maybe we mean the same thing.

I would think of that as a common application, but overly specific for what I'm advocating: to me, that has a geometric meaning implying a restriction to rectangular tensors of numbers (for coordinate representations). Since I imagine (without expertise in this area) that a suitable library of tensor algebra functions with standard arrays would usually suffice for this purpose, I'm sympathetic to Alan's point that this alone is not compelling enough to introduce a two-kind system in the core language.

I'm mainly thinking of other applications depending on more general nested structure, for instance, calculus on functions of multiple arguments of mixed dimensionality, which would be more difficult to develop as an "add-on" later if the core language did not support this distinction. Maybe we mean the same thing.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Jan 18, 2014

Member

The problem with up and down vectors is that you need to extend the idea to general arrays. Otherwise vectors become something special and separate from arrays, instead of simply the 1-dimensional case of an array, which would lead to whole mess of awful problems. I've thought a lot about how to do that, but haven't come up with any acceptable. If you've got any good ideas of how to sanely generalize up and down vectors to arrays, I'd love to hear them.

Member

StefanKarpinski commented Jan 18, 2014

The problem with up and down vectors is that you need to extend the idea to general arrays. Otherwise vectors become something special and separate from arrays, instead of simply the 1-dimensional case of an array, which would lead to whole mess of awful problems. I've thought a lot about how to do that, but haven't come up with any acceptable. If you've got any good ideas of how to sanely generalize up and down vectors to arrays, I'd love to hear them.

@toivoh

This comment has been minimized.

Show comment
Hide comment
@toivoh

toivoh Jan 18, 2014

Member

Just trying to extrapolate this idea. As I understand it, in order to use an array to calculate with up and down vectors, you need to indicate for each dimension whether it is up or down. In general, this could be achieved by wrapping an array in something like

immutable UpDownTensor{T, N, UPMASK} <: AbstractArray{T, N}
    A::AbstractArray{T, N}
end

where UPMASK would be a bit mask to indicate which dimensions are up. Then operations on non-wrapped arrays could be implemented by providing a default UPMASK as a function of N: vectors would default to a single up, matrices to the first up and the second down; then I'm not sure how it would be reasonably continued.

Some random thoughts:

  • Would quadratic/bilinear forms be better represented by two down dimensions?
  • If tranpose would correspond to just flipping the up-/downness of each dimension, I guess that we would also get a transposed-matrix type with the first dimension down and the second up.
  • Up-patterns that corresponded to the default could be represented directly by the underlying array instead of wrapping it.

Well, this is certainly one generalization of the Transposed type, and it certainly has some merit. Not sure whether it is feasible.

Member

toivoh commented Jan 18, 2014

Just trying to extrapolate this idea. As I understand it, in order to use an array to calculate with up and down vectors, you need to indicate for each dimension whether it is up or down. In general, this could be achieved by wrapping an array in something like

immutable UpDownTensor{T, N, UPMASK} <: AbstractArray{T, N}
    A::AbstractArray{T, N}
end

where UPMASK would be a bit mask to indicate which dimensions are up. Then operations on non-wrapped arrays could be implemented by providing a default UPMASK as a function of N: vectors would default to a single up, matrices to the first up and the second down; then I'm not sure how it would be reasonably continued.

Some random thoughts:

  • Would quadratic/bilinear forms be better represented by two down dimensions?
  • If tranpose would correspond to just flipping the up-/downness of each dimension, I guess that we would also get a transposed-matrix type with the first dimension down and the second up.
  • Up-patterns that corresponded to the default could be represented directly by the underlying array instead of wrapping it.

Well, this is certainly one generalization of the Transposed type, and it certainly has some merit. Not sure whether it is feasible.

@thomasmcoffee

This comment has been minimized.

Show comment
Hide comment
@thomasmcoffee

thomasmcoffee Jan 19, 2014

I think Toivo's suggestion is a reasonable realization of what I was advocating above. To me, the most sensible default is to continue alternating directions at higher orders: for instance, if someone provided the components of a power series as non-wrapped arrays, this would do the right thing.

But on further reflection, I think it could be very powerful to combine both ideas: (1) distinctions between up and down vectors and (2) distinctions between arrays and vectors. Then you could have objects where some dimensions are up, some are down, and some are "neutral." The reason to distinguish arrays and vectors is that, semantically, arrays are for organization (collecting multiple things of the same kinds), whereas vectors are for coordinatization (representing multidimensional spaces). The power of combining both distinctions in one object is that it can then serve both these purposes simultaneously. The neutral dimensions would be treated according to broadcasting rules, while the up/down dimensions would be treated according to tensor arithmetic rules.

Returning to an earlier example of mine, suppose you're computing a number of quadratic forms x'Q*y1, x'Q*y2, ... for different vectors y1, y2, .... Following SICM, denote up tuples (column vectors) by (...) and down tuples (row vectors) by [...]. If you wanted to do this all at once, and you're stuck with up/down only, the conventional way would be to combine the yi into a matrix Y = [y1, y2, ...] using a down tuple (of up tuples), and compute r = x'Q*Y, which gives you the results in a down tuple r. But what if you want to multiply each of these results by a (column) vector v? You can't just do r*v, because you'll get a contraction (dot product). You can convert r to an up tuple, and then multiply, which gives you your results in an up tuple (of up tuples). But suppose for the next step you need a down tuple? Semantically, you have a dimension going through your calculation that just represents a collection of things, which you always want broadcasted; but to achieve this in the strictly up/down world, you have to keep doing arbitrary context-dependent conversions to elicit the right behavior.

In contrast, suppose you also have neutral tuples (arrays), denoted {...}. Then you naturally write ys = {y1, y2, ...} as an array (of up tuples), so that r = x'Q*ys is an array, and r*v is also an array (of up tuples). Everything makes sense, and no arbitrary conversions are required.

Stefan suggests that distinguishing 1-D arrays from up/down vectors is disastrous, but I think this problem gets solved by the fact that most functions make sense operating on vectors or on arrays, but NOT either. (Or, on matrices or on arrays of vectors or on vectors of arrays or on arrays of arrays, but NOT either. And so on.) So with appropriate conversion rules, I haven't thought of a common case that wouldn't do the right thing automatically. Maybe someone can?

Upon looking deeper [1], I discovered that scmutils actually distinguishes what they call "vectors" from up and down tuples under the hood; but currently the conversion rules are set up so that these "vectors" are mapped to up tuples (as I had proposed earlier) whenever they enter the up/down world, with the caveat that "We reserve the right to change this implementation to distinguish Scheme vectors from up tuples." (Perhaps someone on campus can ask GJS if he had any specific ideas in mind.) The Sage system [2] largely separates handling of arrays from vectors and matrices (currently no core support for tensors), and the only problems I've experienced with this have to do with its lack of built-in conversion between them in cases that would obviously make sense.

[1] http://groups.csail.mit.edu/mac/users/gjs/6946/refman.txt --- starting at "Structured Objects"
[2] http://www.sagemath.org/

I think Toivo's suggestion is a reasonable realization of what I was advocating above. To me, the most sensible default is to continue alternating directions at higher orders: for instance, if someone provided the components of a power series as non-wrapped arrays, this would do the right thing.

But on further reflection, I think it could be very powerful to combine both ideas: (1) distinctions between up and down vectors and (2) distinctions between arrays and vectors. Then you could have objects where some dimensions are up, some are down, and some are "neutral." The reason to distinguish arrays and vectors is that, semantically, arrays are for organization (collecting multiple things of the same kinds), whereas vectors are for coordinatization (representing multidimensional spaces). The power of combining both distinctions in one object is that it can then serve both these purposes simultaneously. The neutral dimensions would be treated according to broadcasting rules, while the up/down dimensions would be treated according to tensor arithmetic rules.

Returning to an earlier example of mine, suppose you're computing a number of quadratic forms x'Q*y1, x'Q*y2, ... for different vectors y1, y2, .... Following SICM, denote up tuples (column vectors) by (...) and down tuples (row vectors) by [...]. If you wanted to do this all at once, and you're stuck with up/down only, the conventional way would be to combine the yi into a matrix Y = [y1, y2, ...] using a down tuple (of up tuples), and compute r = x'Q*Y, which gives you the results in a down tuple r. But what if you want to multiply each of these results by a (column) vector v? You can't just do r*v, because you'll get a contraction (dot product). You can convert r to an up tuple, and then multiply, which gives you your results in an up tuple (of up tuples). But suppose for the next step you need a down tuple? Semantically, you have a dimension going through your calculation that just represents a collection of things, which you always want broadcasted; but to achieve this in the strictly up/down world, you have to keep doing arbitrary context-dependent conversions to elicit the right behavior.

In contrast, suppose you also have neutral tuples (arrays), denoted {...}. Then you naturally write ys = {y1, y2, ...} as an array (of up tuples), so that r = x'Q*ys is an array, and r*v is also an array (of up tuples). Everything makes sense, and no arbitrary conversions are required.

Stefan suggests that distinguishing 1-D arrays from up/down vectors is disastrous, but I think this problem gets solved by the fact that most functions make sense operating on vectors or on arrays, but NOT either. (Or, on matrices or on arrays of vectors or on vectors of arrays or on arrays of arrays, but NOT either. And so on.) So with appropriate conversion rules, I haven't thought of a common case that wouldn't do the right thing automatically. Maybe someone can?

Upon looking deeper [1], I discovered that scmutils actually distinguishes what they call "vectors" from up and down tuples under the hood; but currently the conversion rules are set up so that these "vectors" are mapped to up tuples (as I had proposed earlier) whenever they enter the up/down world, with the caveat that "We reserve the right to change this implementation to distinguish Scheme vectors from up tuples." (Perhaps someone on campus can ask GJS if he had any specific ideas in mind.) The Sage system [2] largely separates handling of arrays from vectors and matrices (currently no core support for tensors), and the only problems I've experienced with this have to do with its lack of built-in conversion between them in cases that would obviously make sense.

[1] http://groups.csail.mit.edu/mac/users/gjs/6946/refman.txt --- starting at "Structured Objects"
[2] http://www.sagemath.org/

@drhagen

This comment has been minimized.

Show comment
Hide comment
@drhagen

drhagen Jan 25, 2014

I was talking to @jiahao at the lunch table and he mentioned that the Julia team was trying to figure out how to generalize linear algebra operations to higher dimensional arrays. Two years ago I spent several months thinking about this because I needed it for KroneckerBio. I wanted to share my approach.

Let's consider just the product between two arrays for the moment. Other operations have a similar generalization. The three most common types of products when dealing with arrays are the outer product, the inner product, and the elementwise product. We usually think of doing operations like this between two objects, such as inner(A,B) or A*B. When doing these operations on higher-dimensional arrays, however, they are not done between the arrays as a whole, but between particular dimensions of the arrays. Multiple outer/inner/elementwise suboperations happen in a single operation between two arrays and each dimension of each array must be assigned to exactly one suboperation (either explicitly or to a default). For the inner and elementwise products, one dimension on the left must be paired with an equally sized dimension on the right. The outer product dimensions do not have to be paired. Most of the time the user is doing either an inner product or an elementwise product between a pair of dimensions and an outer product for all others. The outer product makes a good default because it is the most common and does not have to be paired.

I usually think about the dimensions as being named rather than being ordered, much like the x, y, and z axes of a plot. But if you want users to actually be able to access the arrays by ordered indexing (like A[1,2,5] rather than A[a1=1, a3=5, a2=2]) then you have to have a consistent procedure to order the results of an operation. I propose ordering the result by listing all the dimensions of the first array followed by listing all the dimensions of the second array. Any dimensions that participated in an inner product are squeezed out, and for dimensions that participated in an elementwise product, only the dimension from the second array gets squeezed out.

I am going to make up some notation for this. Feel free to Juliafy it. Let A be an array that is a1 by a2 by a3 and let B be an array that is b1 by b2. Let's say that array_product(A, B, inner=[2, 1], elementwise=[3, 2]) would take the inner product between dimensions a2 and b1, the elementwise product between a3 and b2, and the outer product of a1. The result would be an array that is a1 by a3.

It should be clear that no binary or unary operator is going to have much meaning in the context of higher-dimension arrays. You need more than two arguments in order to specify what to do with each dimension. However, you can recapture the ease of linear algebra by making the Matlab operators shorthand for array operations on just the first two dimensions:

Matlab's A*B is array_product(A, B, inner=[2,1]).

Matlab's A.' is permute(A, B, [2,1]) where permute keeps unchanged all dimensions higher than the count of the third argument.

You can choose whether or not to throw errors when the dimensionality of the arrays is greater than 2 or even not equal to 2, like Mathematica does with vector transposes. If you are using just the general array calculations, you do not have to decide whether or not to take @wenxgwen's suggestion to interpret all (n, m) arrays as (n, m, 1) and (n, m, 1, 1). Only when using the linear algebra operators or other operators that expect array or particular dimensionality do you have to make this decision. I like @wenxgwen's suggestion, because there is little downside in a dynamically typed language.

I wrote a more detailed description of my system, which includes addition, exponentiation, and differentiation along with the chain rule and product rule for array calculus .

drhagen commented Jan 25, 2014

I was talking to @jiahao at the lunch table and he mentioned that the Julia team was trying to figure out how to generalize linear algebra operations to higher dimensional arrays. Two years ago I spent several months thinking about this because I needed it for KroneckerBio. I wanted to share my approach.

Let's consider just the product between two arrays for the moment. Other operations have a similar generalization. The three most common types of products when dealing with arrays are the outer product, the inner product, and the elementwise product. We usually think of doing operations like this between two objects, such as inner(A,B) or A*B. When doing these operations on higher-dimensional arrays, however, they are not done between the arrays as a whole, but between particular dimensions of the arrays. Multiple outer/inner/elementwise suboperations happen in a single operation between two arrays and each dimension of each array must be assigned to exactly one suboperation (either explicitly or to a default). For the inner and elementwise products, one dimension on the left must be paired with an equally sized dimension on the right. The outer product dimensions do not have to be paired. Most of the time the user is doing either an inner product or an elementwise product between a pair of dimensions and an outer product for all others. The outer product makes a good default because it is the most common and does not have to be paired.

I usually think about the dimensions as being named rather than being ordered, much like the x, y, and z axes of a plot. But if you want users to actually be able to access the arrays by ordered indexing (like A[1,2,5] rather than A[a1=1, a3=5, a2=2]) then you have to have a consistent procedure to order the results of an operation. I propose ordering the result by listing all the dimensions of the first array followed by listing all the dimensions of the second array. Any dimensions that participated in an inner product are squeezed out, and for dimensions that participated in an elementwise product, only the dimension from the second array gets squeezed out.

I am going to make up some notation for this. Feel free to Juliafy it. Let A be an array that is a1 by a2 by a3 and let B be an array that is b1 by b2. Let's say that array_product(A, B, inner=[2, 1], elementwise=[3, 2]) would take the inner product between dimensions a2 and b1, the elementwise product between a3 and b2, and the outer product of a1. The result would be an array that is a1 by a3.

It should be clear that no binary or unary operator is going to have much meaning in the context of higher-dimension arrays. You need more than two arguments in order to specify what to do with each dimension. However, you can recapture the ease of linear algebra by making the Matlab operators shorthand for array operations on just the first two dimensions:

Matlab's A*B is array_product(A, B, inner=[2,1]).

Matlab's A.' is permute(A, B, [2,1]) where permute keeps unchanged all dimensions higher than the count of the third argument.

You can choose whether or not to throw errors when the dimensionality of the arrays is greater than 2 or even not equal to 2, like Mathematica does with vector transposes. If you are using just the general array calculations, you do not have to decide whether or not to take @wenxgwen's suggestion to interpret all (n, m) arrays as (n, m, 1) and (n, m, 1, 1). Only when using the linear algebra operators or other operators that expect array or particular dimensionality do you have to make this decision. I like @wenxgwen's suggestion, because there is little downside in a dynamically typed language.

I wrote a more detailed description of my system, which includes addition, exponentiation, and differentiation along with the chain rule and product rule for array calculus .

@toivoh

This comment has been minimized.

Show comment
Hide comment
@toivoh

toivoh Jan 26, 2014

Member

Thanks for the perspective! I found this quite enlightening to understand what kind of beast that a general array * array product really is.

Member

toivoh commented Jan 26, 2014

Thanks for the perspective! I found this quite enlightening to understand what kind of beast that a general array * array product really is.

@JeffreySarnoff

This comment has been minimized.

Show comment
Hide comment
@JeffreySarnoff

JeffreySarnoff Dec 16, 2016

Contributor

A thought for the conversation about mathy regions of the type hierarchy and reasonable roles for parameterized abstractions. When attainable, it is good policy for Julia to simplify and lighten any separation of how code is written to do what expertise intends from what is expertly expressed intention.

One convention we might choose to use makes common practice using an abstract type other than Any to fashion an ontotopological region amidst which concomitant concrete types find shared repose easy. That would bring, at very little cost, greater multi-contextual sensibleness -- understanding some part of the type hierarchy helps one approach other parts.

As I see this, it is a generalized rapport bringer. An elucidative abstraction illuminates as a neighborhood cointendening concretions, the generative alikenesses that construct. With parameterizations carrying intuition's simple clarity, Julia gets many accomplishing more with less nonsense time.

Contributor

JeffreySarnoff commented Dec 16, 2016

A thought for the conversation about mathy regions of the type hierarchy and reasonable roles for parameterized abstractions. When attainable, it is good policy for Julia to simplify and lighten any separation of how code is written to do what expertise intends from what is expertly expressed intention.

One convention we might choose to use makes common practice using an abstract type other than Any to fashion an ontotopological region amidst which concomitant concrete types find shared repose easy. That would bring, at very little cost, greater multi-contextual sensibleness -- understanding some part of the type hierarchy helps one approach other parts.

As I see this, it is a generalized rapport bringer. An elucidative abstraction illuminates as a neighborhood cointendening concretions, the generative alikenesses that construct. With parameterizations carrying intuition's simple clarity, Julia gets many accomplishing more with less nonsense time.

@andyferris

This comment has been minimized.

Show comment
Hide comment
@andyferris

andyferris Dec 20, 2016

Contributor

OK, any more comments on TransposedVectors would be appreciated. I feel it's getting quite solid and ready to be turned into a PR, but there are some relevant issues that I've listed here.

(for instance: Is RowVector a better name than TransposedVector? Is [1 2 3] a RowVector or a Matrix? What is indices(row, 1)?)

Contributor

andyferris commented Dec 20, 2016

OK, any more comments on TransposedVectors would be appreciated. I feel it's getting quite solid and ready to be turned into a PR, but there are some relevant issues that I've listed here.

(for instance: Is RowVector a better name than TransposedVector? Is [1 2 3] a RowVector or a Matrix? What is indices(row, 1)?)

@ahwillia

This comment has been minimized.

Show comment
Hide comment
@ahwillia

ahwillia Dec 20, 2016

Contributor
Contributor

ahwillia commented Dec 20, 2016

@felixrehren

This comment has been minimized.

Show comment
Hide comment
@felixrehren

felixrehren Dec 20, 2016

Contributor

I would really like to keep the row-/column- convention out of it, and I think it sounds weird to have Vector and RowVector (or even ColVector and RowVector). +1 for TransposedVector or DualVector

Contributor

felixrehren commented Dec 20, 2016

I would really like to keep the row-/column- convention out of it, and I think it sounds weird to have Vector and RowVector (or even ColVector and RowVector). +1 for TransposedVector or DualVector

@andyferris

This comment has been minimized.

Show comment
Hide comment
@andyferris

andyferris Dec 21, 2016

Contributor

@felixrehren I feel that DualVector should have a different semantic to what I've implemented, primarily being 1-dimensional (mathematically, the dual of a vector is a vector) and have other duality properties (e.g. complex conjugation). Which is fine, but I found it a bit harder to implement and a bit less backward compatible.

Contributor

andyferris commented Dec 21, 2016

@felixrehren I feel that DualVector should have a different semantic to what I've implemented, primarily being 1-dimensional (mathematically, the dual of a vector is a vector) and have other duality properties (e.g. complex conjugation). Which is fine, but I found it a bit harder to implement and a bit less backward compatible.

@benninkrs

This comment has been minimized.

Show comment
Hide comment
@benninkrs

benninkrs Dec 21, 2016

The name TransposedVector is ok. But it is a little long. Also, it kind of suggests that an object of that type can only be obtained by transposing a Vector. But I presume you could make a TransposedVector by other means as well, say, by extracting one row of a matrix?

I think RowVector would be a good name -- it is concise, accurate, and intuitive. @felixrehren, I think the row/column picture is helpful. It's probably even unavoidable, since whenever you do concatenation or other common array operations (other than multiplication) you need to think about which way the vector is oriented.

DualVector isn't bad either, but CoVector would sound less formal.

The name TransposedVector is ok. But it is a little long. Also, it kind of suggests that an object of that type can only be obtained by transposing a Vector. But I presume you could make a TransposedVector by other means as well, say, by extracting one row of a matrix?

I think RowVector would be a good name -- it is concise, accurate, and intuitive. @felixrehren, I think the row/column picture is helpful. It's probably even unavoidable, since whenever you do concatenation or other common array operations (other than multiplication) you need to think about which way the vector is oriented.

DualVector isn't bad either, but CoVector would sound less formal.

@andyferris

This comment has been minimized.

Show comment
Hide comment
@andyferris

andyferris Dec 21, 2016

Contributor

The PR I threatened earlier is now submitted at #19670. I went with RowVector for now.

But I presume you could make a TransposedVector by other means as well, say, by extracting one row of a matrix?

This is actually one sticking point - I haven't thought of great syntax for that yet. Any ideas?

Contributor

andyferris commented Dec 21, 2016

The PR I threatened earlier is now submitted at #19670. I went with RowVector for now.

But I presume you could make a TransposedVector by other means as well, say, by extracting one row of a matrix?

This is actually one sticking point - I haven't thought of great syntax for that yet. Any ideas?

@benninkrs

This comment has been minimized.

Show comment
Hide comment
@benninkrs

benninkrs Dec 22, 2016

But I presume you could make a TransposedVector by other means as well, say, by extracting one row of a matrix?

This is actually one sticking point - I haven't thought of great syntax for that yet. Any ideas?

While at first it did seem appealing to have matrix[scalar,range] (and other similar constructs) yield a row vector, that would be a significant departure from the current indexing semantics for multidimensional arrays, and special cases make me leery.

Instead I'd like to see RowVector (and Vector for that matter) convert any iterable type into the appropriate kind of vector. Then you could do something like RowVector(matrix[scalar,range]) which would be quite clear and not disturb the current behavior of array indexing.

But I presume you could make a TransposedVector by other means as well, say, by extracting one row of a matrix?

This is actually one sticking point - I haven't thought of great syntax for that yet. Any ideas?

While at first it did seem appealing to have matrix[scalar,range] (and other similar constructs) yield a row vector, that would be a significant departure from the current indexing semantics for multidimensional arrays, and special cases make me leery.

Instead I'd like to see RowVector (and Vector for that matter) convert any iterable type into the appropriate kind of vector. Then you could do something like RowVector(matrix[scalar,range]) which would be quite clear and not disturb the current behavior of array indexing.

@andyferris

This comment has been minimized.

Show comment
Hide comment
@andyferris

andyferris Dec 22, 2016

Contributor

Right, the ith row can be extracted in a row-shape by A[i,:].', currently in v0.5 and would continue to do-so in the future (to make a RowVector or Transpose{V<:AbstractVector} or whatever we settle on eventually (see here for ongoing discussion)). Maybe that's the answer.

Contributor

andyferris commented Dec 22, 2016

Right, the ith row can be extracted in a row-shape by A[i,:].', currently in v0.5 and would continue to do-so in the future (to make a RowVector or Transpose{V<:AbstractVector} or whatever we settle on eventually (see here for ongoing discussion)). Maybe that's the answer.

@Jutho

This comment has been minimized.

Show comment
Hide comment
@Jutho

Jutho Dec 22, 2016

Contributor

What about just adding two new functions to Base?
row(A,i) and col(A,i)
The latter is not needed, but just for symmetry. It's concise, clear, and as many characters as A[i,:].'

Contributor

Jutho commented Dec 22, 2016

What about just adding two new functions to Base?
row(A,i) and col(A,i)
The latter is not needed, but just for symmetry. It's concise, clear, and as many characters as A[i,:].'

@felixrehren

This comment has been minimized.

Show comment
Hide comment
@felixrehren

felixrehren Dec 22, 2016

Contributor

@benninkrs that makes sense. In contrast my intuitive interpretation is the linear algebra one, where a vector has no orientation at all. All the points of view on this are reasonable, I just don't like Vector and RowVector together because the naming looks like it mixes an abstract and a concrete paradigm.

Contributor

felixrehren commented Dec 22, 2016

@benninkrs that makes sense. In contrast my intuitive interpretation is the linear algebra one, where a vector has no orientation at all. All the points of view on this are reasonable, I just don't like Vector and RowVector together because the naming looks like it mixes an abstract and a concrete paradigm.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Dec 24, 2016

Member

At this point, I think we need to either go with something based on @andyferris's implementation or decide not to take vector transposes seriously. As an example of an alternative, here's an approach which treats the worst symptoms: leave v' as it currently is – i.e. producing a row matrix – but parse a'b as an infix operator with precedence just below multiplication. In other words we would have the following behaviors:

  1. v' is a row matrix (ctranspose)
  2. v'v is a scalar (dot product)
  3. v'*v is a 1-element vector (row matrix * vector)
  4. v*v' is an outer product matrix (vector * row matrix)
  5. v'A is a row matrix ("vecmat")
  6. v'*A is a row matrix (matmat)
  7. v'A*v is a scalar (matvec A*v then dot product)
  8. (v'A)*v is a 1-element vector (vecmat then matvec)
  9. v'*A*v is a 1-element vector (matmat then matvec)
  10. v'' is a column matrix (vector ctranspose, then matrix ctranspose)

In the current setup 2 and 3 are equivalent and 7, 8 and 9 are equivalent, which this change breaks. But crucially, the bold items are the ones people will commonly reach for since they're the shortest and most convenient of the similar forms – and they all do what we'd like them to. No new types, just one new infix operator. The main drawback is 10 – v'' is still a column matrix. Arguably, this could be seen as a benefit since postfix '' is the operator for turning a vector into a column matrix.

Taking a step back, I think what we've learned is that without additional up-down or dimension labeling functionality or treating ≤ 2 dimensions as fungible like Matlab does, multidimensional arrays can't really be made to dovetail smoothly with linear algebra. So what we're left with is a question of convenience – can we let people express common linear algebra operations on vectors and matrices conveniently without overly complicating arrays? I'm not dead set on this approach, but I think we should seriously consider this and other approaches that address syntactic convenience without mucking up our array type hierarchy too much.

Member

StefanKarpinski commented Dec 24, 2016

At this point, I think we need to either go with something based on @andyferris's implementation or decide not to take vector transposes seriously. As an example of an alternative, here's an approach which treats the worst symptoms: leave v' as it currently is – i.e. producing a row matrix – but parse a'b as an infix operator with precedence just below multiplication. In other words we would have the following behaviors:

  1. v' is a row matrix (ctranspose)
  2. v'v is a scalar (dot product)
  3. v'*v is a 1-element vector (row matrix * vector)
  4. v*v' is an outer product matrix (vector * row matrix)
  5. v'A is a row matrix ("vecmat")
  6. v'*A is a row matrix (matmat)
  7. v'A*v is a scalar (matvec A*v then dot product)
  8. (v'A)*v is a 1-element vector (vecmat then matvec)
  9. v'*A*v is a 1-element vector (matmat then matvec)
  10. v'' is a column matrix (vector ctranspose, then matrix ctranspose)

In the current setup 2 and 3 are equivalent and 7, 8 and 9 are equivalent, which this change breaks. But crucially, the bold items are the ones people will commonly reach for since they're the shortest and most convenient of the similar forms – and they all do what we'd like them to. No new types, just one new infix operator. The main drawback is 10 – v'' is still a column matrix. Arguably, this could be seen as a benefit since postfix '' is the operator for turning a vector into a column matrix.

Taking a step back, I think what we've learned is that without additional up-down or dimension labeling functionality or treating ≤ 2 dimensions as fungible like Matlab does, multidimensional arrays can't really be made to dovetail smoothly with linear algebra. So what we're left with is a question of convenience – can we let people express common linear algebra operations on vectors and matrices conveniently without overly complicating arrays? I'm not dead set on this approach, but I think we should seriously consider this and other approaches that address syntactic convenience without mucking up our array type hierarchy too much.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Dec 24, 2016

Member

Another approach would be to parse infix a'b specially (just below *) and have v' return a conjugated vector. More generally, postfix A' could conjugate an array and lazily reverse its dimensions while A.' would just lazily reverse the dimensions of an array thus acting as the identity on vectors. In this scheme the list of properties could be the following:

  1. v' is a vector (conjugated)
  2. v'v is a scalar (dot product)
  3. v'*v is an error (recommend v'v for inner product and outer(v,v) for outer product)†
  4. v*v' is an error (recommend v'v for inner product and outer(v,v) for outer product)†
  5. v'A is a vector (vecmat)
  6. v'*A is an error (recommend v'A for vecmat)
  7. v'A*v is a scalar (matvec A*v then dot product)
  8. (v'A)*v is an error (recommend v'v for inner product and outer(v,v) for outer product)†
  9. v'A'v is a scalar (v'(A'v) – conjugate matvec then inner product)
  10. v'' is a vector (v'' === v and v.' === v)

Now that I write out all of these properties, this may be the preferable option: all the error cases actually serve to help people discover and use perfered syntaxes, and of course it has the desirable v'' === v property (and dovetails nicely with .' being a generic dimension reversal operator). Having very similar syntaxes that are only subtly different is probably more confusing.

† We could potentially catch these at parse time for more precise errors at the risk of giving errors for cases where user code has overloaded ' and * so that these operations work. I believe having a lazy conjugate wrapper may be necessary to make these recommendations correct, but we need that anyway for #5332.

Member

StefanKarpinski commented Dec 24, 2016

Another approach would be to parse infix a'b specially (just below *) and have v' return a conjugated vector. More generally, postfix A' could conjugate an array and lazily reverse its dimensions while A.' would just lazily reverse the dimensions of an array thus acting as the identity on vectors. In this scheme the list of properties could be the following:

  1. v' is a vector (conjugated)
  2. v'v is a scalar (dot product)
  3. v'*v is an error (recommend v'v for inner product and outer(v,v) for outer product)†
  4. v*v' is an error (recommend v'v for inner product and outer(v,v) for outer product)†
  5. v'A is a vector (vecmat)
  6. v'*A is an error (recommend v'A for vecmat)
  7. v'A*v is a scalar (matvec A*v then dot product)
  8. (v'A)*v is an error (recommend v'v for inner product and outer(v,v) for outer product)†
  9. v'A'v is a scalar (v'(A'v) – conjugate matvec then inner product)
  10. v'' is a vector (v'' === v and v.' === v)

Now that I write out all of these properties, this may be the preferable option: all the error cases actually serve to help people discover and use perfered syntaxes, and of course it has the desirable v'' === v property (and dovetails nicely with .' being a generic dimension reversal operator). Having very similar syntaxes that are only subtly different is probably more confusing.

† We could potentially catch these at parse time for more precise errors at the risk of giving errors for cases where user code has overloaded ' and * so that these operations work. I believe having a lazy conjugate wrapper may be necessary to make these recommendations correct, but we need that anyway for #5332.

@Sacha0

This comment has been minimized.

Show comment
Hide comment
@Sacha0

Sacha0 Dec 24, 2016

Member

Taking a step back, I think what we've learned is that without additional up-down or dimension labeling functionality or treating ≤ 2 dimensions as fungible like Matlab does, multidimensional arrays can't really be made to dovetail smoothly with linear algebra. So what we're left with is a question of convenience – can we let people express common linear algebra operations on vectors and matrices conveniently without overly complicating arrays? I'm not dead set on this approach, but I think we should seriously consider this and other approaches that address syntactic convenience without mucking up our array type hierarchy too much.

💯

Explicitly making postfix ' and .' generic array (rather than linear algebra) operations nicely sidesteps doubling down on conflation of storage and linear algebra types, and leaves the door open for frameworks involving fewer compromises. In the interim, those operations' ability to simulate Householder notation in most common cases should provide most of the desired convenience. Also less code and complexity. Best!

Member

Sacha0 commented Dec 24, 2016

Taking a step back, I think what we've learned is that without additional up-down or dimension labeling functionality or treating ≤ 2 dimensions as fungible like Matlab does, multidimensional arrays can't really be made to dovetail smoothly with linear algebra. So what we're left with is a question of convenience – can we let people express common linear algebra operations on vectors and matrices conveniently without overly complicating arrays? I'm not dead set on this approach, but I think we should seriously consider this and other approaches that address syntactic convenience without mucking up our array type hierarchy too much.

💯

Explicitly making postfix ' and .' generic array (rather than linear algebra) operations nicely sidesteps doubling down on conflation of storage and linear algebra types, and leaves the door open for frameworks involving fewer compromises. In the interim, those operations' ability to simulate Householder notation in most common cases should provide most of the desired convenience. Also less code and complexity. Best!

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Dec 25, 2016

Member

One issue with v.' being a no-op is that A .+ v.' would change meaning from adding the values of v to each column of A to adding the values to the rows of A. This would generally make it harder to do row-like operations with vectors and it would definitely need a full deprecation cycle to avoid silently making code do the wrong thing (in cases like this where A happens to be square).

Member

StefanKarpinski commented Dec 25, 2016

One issue with v.' being a no-op is that A .+ v.' would change meaning from adding the values of v to each column of A to adding the values to the rows of A. This would generally make it harder to do row-like operations with vectors and it would definitely need a full deprecation cycle to avoid silently making code do the wrong thing (in cases like this where A happens to be square).

@andyferris

This comment has been minimized.

Show comment
Hide comment
@andyferris

andyferris Dec 25, 2016

Contributor

At this point, I think we need to either go with something based on @andyferris's implementation or decide not to take vector transposes seriously.

I realize the deadline for v0.6 is almost upon us, but I would caution against throwing out the baby with the bathwater. At this stage, it appears that the mentioned RowVector plus views for matrix transpose and array conjugation will let us get:

  • IMO, somewhat more reasonable linear algebra types (we're not denying the existence of dual vectors as now, though a row vector might be seen as a somewhat of a special case of dual vector)
  • v'' === v
  • Some of the things on Stefan's list like v1'v2 is a dot product
  • Almost backward compatible array semantics - e.g. the result of size(v') is unchanged, but we have v'' as 1-dimensional
  • Lazy conj and transpose wrappers might increase performance in certain circumstances and be useful anyway
  • Removal of all the Ac_mul_Bc functions in favour of * and A_mul_B! only (and similarly for \, /).

Getting all of this working for Array honestly would not take too much effort (for me the problem is finding time at this time of year, and chasing down all the other types we have in our linear algebra suite). And the last point would be a sigh of relief.

On the flip side - IMHO those rules seem to complicate parsing slightly and might be a little confusing and/or surprising how they compose ' with * (3, 4, 6 and 8 would work with RowVector).

And yes we'd have to deprecate v.' or something in order to highlight potential bugs, at which point it almost seems better to make v.' a missing method error (we simply won't support row/dual vectors, but won't stop a package doing so if they wish)

Contributor

andyferris commented Dec 25, 2016

At this point, I think we need to either go with something based on @andyferris's implementation or decide not to take vector transposes seriously.

I realize the deadline for v0.6 is almost upon us, but I would caution against throwing out the baby with the bathwater. At this stage, it appears that the mentioned RowVector plus views for matrix transpose and array conjugation will let us get:

  • IMO, somewhat more reasonable linear algebra types (we're not denying the existence of dual vectors as now, though a row vector might be seen as a somewhat of a special case of dual vector)
  • v'' === v
  • Some of the things on Stefan's list like v1'v2 is a dot product
  • Almost backward compatible array semantics - e.g. the result of size(v') is unchanged, but we have v'' as 1-dimensional
  • Lazy conj and transpose wrappers might increase performance in certain circumstances and be useful anyway
  • Removal of all the Ac_mul_Bc functions in favour of * and A_mul_B! only (and similarly for \, /).

Getting all of this working for Array honestly would not take too much effort (for me the problem is finding time at this time of year, and chasing down all the other types we have in our linear algebra suite). And the last point would be a sigh of relief.

On the flip side - IMHO those rules seem to complicate parsing slightly and might be a little confusing and/or surprising how they compose ' with * (3, 4, 6 and 8 would work with RowVector).

And yes we'd have to deprecate v.' or something in order to highlight potential bugs, at which point it almost seems better to make v.' a missing method error (we simply won't support row/dual vectors, but won't stop a package doing so if they wish)

@andyferris

This comment has been minimized.

Show comment
Hide comment
@andyferris

andyferris Jan 4, 2017

Contributor

#19670 is looking either ready or close to it, if there is an appetite to sneak something into v0.6.

Contributor

andyferris commented Jan 4, 2017

#19670 is looking either ready or close to it, if there is an appetite to sneak something into v0.6.

andyferris added a commit to andyferris/julia that referenced this issue Jan 5, 2017

Introduce `RowVector` as vector transpose
`RowVector` is now defined as the `transpose` of any `AbstractVector`. If
`v` is an `AbstractVector`, then it obeys the identity that `(v.').'
=== v` and the matrix multiplication rules follow that `(A * v).' == (v.' *
A.')`. `RowVector` is a "view" and maintains the recursive nature of
`transpose`. It is a subtype of `AbstractMatrix` and maintains the
current shape and broadcast behavior for `v.'.

Consequences include:

 * v'' is a vector, not a matrix
 * v'*v is a scalar, not a vector
 * v*v' is the outer produce (returns a matrix)
 * v*A (for A::AbstractMatrix) is removed, since its puprose was to
   provide a way of doing the outer product above, and is no longer
   necessary.

Closes #4774

andyferris added a commit to andyferris/julia that referenced this issue Jan 5, 2017

Introduce `RowVector` as vector transpose
`RowVector` is now defined as the `transpose` of any `AbstractVector`. If
`v` is an `AbstractVector`, then it obeys the identity that `(v.').'
=== v` and the matrix multiplication rules follow that `(A * v).' == (v.' *
A.')`. `RowVector` is a "view" and maintains the recursive nature of
`transpose`. It is a subtype of `AbstractMatrix` and maintains the
current shape and broadcast behavior for `v.'.

Consequences include:

 * v'' is a vector, not a matrix
 * v'*v is a scalar, not a vector
 * v*v' is the outer produce (returns a matrix)
 * v*A (for A::AbstractMatrix) is removed, since its puprose was to
   provide a way of doing the outer product above, and is no longer
   necessary.

Closes #4774

andyferris added a commit to andyferris/julia that referenced this issue Jan 5, 2017

Introduce `RowVector` as vector transpose
`RowVector` is now defined as the `transpose` of any `AbstractVector`. If
`v` is an `AbstractVector`, then it obeys the identity that `(v.').'
=== v` and the matrix multiplication rules follow that `(A * v).' == (v.' *
A.')`. `RowVector` is a "view" and maintains the recursive nature of
`transpose`. It is a subtype of `AbstractMatrix` and maintains the
current shape and broadcast behavior for `v.'.

Consequences include:

 * v'' is a vector, not a matrix
 * v'*v is a scalar, not a vector
 * v*v' is the outer produce (returns a matrix)
 * v*A (for A::AbstractMatrix) is removed, since its puprose was to
   provide a way of doing the outer product above, and is no longer
   necessary.

Closes #4774

andyferris added a commit to andyferris/julia that referenced this issue Jan 9, 2017

Introduce `RowVector` as vector transpose
`RowVector` is now defined as the `transpose` of any `AbstractVector`. If
`v` is an `AbstractVector`, then it obeys the identity that `(v.').'
=== v` and the matrix multiplication rules follow that `(A * v).' == (v.' *
A.')`. `RowVector` is a "view" and maintains the recursive nature of
`transpose`. It is a subtype of `AbstractMatrix` and maintains the
current shape and broadcast behavior for `v.'.

Consequences include:

 * v'' is a vector, not a matrix
 * v'*v is a scalar, not a vector
 * v*v' is the outer produce (returns a matrix)
 * v*A (for A::AbstractMatrix) is removed, since its puprose was to
   provide a way of doing the outer product above, and is no longer
   necessary.

Closes #4774

andyferris added a commit to andyferris/julia that referenced this issue Jan 9, 2017

Introduce `RowVector` as vector transpose
`RowVector` is now defined as the `transpose` of any `AbstractVector`. If
`v` is an `AbstractVector`, then it obeys the identity that `(v.').'
=== v` and the matrix multiplication rules follow that `(A * v).' == (v.' *
A.')`. `RowVector` is a "view" and maintains the recursive nature of
`transpose`. It is a subtype of `AbstractMatrix` and maintains the
current shape and broadcast behavior for `v.'.

Consequences include:

 * v'' is a vector, not a matrix
 * v'*v is a scalar, not a vector
 * v*v' is the outer produce (returns a matrix)
 * v*A (for A::AbstractMatrix) is removed, since its puprose was to
   provide a way of doing the outer product above, and is no longer
   necessary.

Closes #4774

andyferris added a commit to andyferris/julia that referenced this issue Jan 12, 2017

Introduce `RowVector` as vector transpose
`RowVector` is now defined as the `transpose` of any `AbstractVector`. If
`v` is an `AbstractVector`, then it obeys the identity that `(v.').'
=== v` and the matrix multiplication rules follow that `(A * v).' == (v.' *
A.')`. `RowVector` is a "view" and maintains the recursive nature of
`transpose`. It is a subtype of `AbstractMatrix` and maintains the
current shape and broadcast behavior for `v.'.

Consequences include:

 * v'' is a vector, not a matrix
 * v'*v is a scalar, not a vector
 * v*v' is the outer produce (returns a matrix)
 * v*A (for A::AbstractMatrix) is removed, since its puprose was to
   provide a way of doing the outer product above, and is no longer
   necessary.

Closes #4774

andyferris added a commit to andyferris/julia that referenced this issue Jan 13, 2017

Introduce `RowVector` as vector transpose
`RowVector` is now defined as the `transpose` of any `AbstractVector`. If
`v` is an `AbstractVector`, then it obeys the identity that `(v.').'
=== v` and the matrix multiplication rules follow that `(A * v).' == (v.' *
A.')`. `RowVector` is a "view" and maintains the recursive nature of
`transpose`. It is a subtype of `AbstractMatrix` and maintains the
current shape and broadcast behavior for `v.'.

Consequences include:

 * v'' is a vector, not a matrix
 * v'*v is a scalar, not a vector
 * v*v' is the outer produce (returns a matrix)
 * v*A (for A::AbstractMatrix) is removed, since its puprose was to
   provide a way of doing the outer product above, and is no longer
   necessary.

Closes #4774

@stevengj stevengj closed this in 3d6fa35 Jan 13, 2017

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
Member

StefanKarpinski commented Jan 13, 2017

BAM

@andyferris

This comment has been minimized.

Show comment
Hide comment
@andyferris

andyferris Jan 13, 2017

Contributor

Woot.

Was this our longest issue thread?

Contributor

andyferris commented Jan 13, 2017

Woot.

Was this our longest issue thread?

@Keno

This comment has been minimized.

Show comment
Hide comment
@Keno

Keno Jan 13, 2017

Member

No, #11004 has more

Member

Keno commented Jan 13, 2017

No, #11004 has more

@andyferris

This comment has been minimized.

Show comment
Hide comment
@andyferris

andyferris Jan 14, 2017

Contributor

Sorry. You're right, I should have specified open issue thread.

Contributor

andyferris commented Jan 14, 2017

Sorry. You're right, I should have specified open issue thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment