what should assignment with colons do? #23

JeffBezanson · 2016-07-26T22:26:07Z

This is not really intuitive to me:

julia> a = NDSparse([1,2,2,2], [1,2,3,4], zeros(4))
NDSparse{Float64,Tuple{Int64,Int64}}:
 (1,1) => 0.0
 (2,2) => 0.0
 (2,3) => 0.0
 (2,4) => 0.0

julia> a[2,:] = 1;

julia> a
NDSparse{Float64,Tuple{Int64,Int64}}:
 (1,1) => 0.0
 (2,1) => 1.0
 (2,2) => 1.0
 (2,3) => 1.0
 (2,4) => 1.0

What we're doing now is converting : to a vector of all the unique indices in its dimension. This is trying to be consistent with dense arrays, where assignment operates on the product of the given indices. For example:

julia> a[1:2,1:2] = 1;

julia> a
NDSparse{Float64,Tuple{Int64,Int64}}:
 (1,1) => 1.0
 (1,2) => 1.0
 (2,1) => 1.0
 (2,2) => 1.0
 (2,3) => 0.0
 (2,4) => 0.0

This doesn't seem like the right thing for sparse arrays, but it's understandable. The colon behavior is much more surprising.

The text was updated successfully, but these errors were encountered:

ViralBShah · 2016-07-27T01:00:24Z

Did you want to set (1,1) and (2,2) to 1 in this case? I feel like it should have a different index type, but then things become a bit too verbose.

JeffBezanson · 2016-07-27T01:06:30Z

I'm not sure what I wanted :) But it's clear that implicitly taking the product space of the indices is designed for dense arrays. To assign sparse arrays, you want a sparse index space. So one API that makes sense is merge!, where you assign one sparse array into another. Or you could explicitly use a sparse index space, like you said, on the left-hand side:

a[Indexes(1:2, 1:2)] = 1

We already have the Indexes type, but maybe needs a better name --- something like SparseIndexSpace.

JeffBezanson · 2016-07-27T01:10:10Z

Of course that still leaves the question of what to do with colons. If we keep the usual product space interpretation of plain indices, I'm inclined not to allow them.

timholy · 2016-07-27T02:04:51Z

Noticed this from AxisArrays. I'm not sure I follow, but I think you are wishing it would just assign to the already-stored values? Over at https://github.com/timholy/ArrayIteration.jl, that would be called something like A[stored(index(A, :, 2)), 2] = 1. (ArrayIteration is currently striving for accuracy & expressiveness, not brevity 😄.)

JeffBezanson · 2016-07-27T17:50:05Z

Yes, that's probably what I want. I'm beginning to think the key abstraction is IndexSpace, subsuming indices, CartesianRange, etc. For example

indexspace(rand(2,3)) => ProductSpace(OneTo(2), OneTo(3))
indexspace(AxisArray(...)) => ProductSpace(0:0.1:1, -1:0.2:1)
indexspace(NDSparse(...)) => ZipSpace(col1, col2, ...)

A[ProductSpace(1:2,2:3)] = foo   # same as `A[1:2,2:3] = foo` currently
A[ZipSpace(x,y)] = foo  # same as `for (i,j) in zip(x,y); A[i,j] = foo; end`

Then I could get the "write only to existing locations" behavior using

A[intersect(ProductSpace(1:2,1:2), indexspace(A))] = 1

timholy · 2016-07-27T22:15:12Z

Very similar thinking to my own. index(A, region...) is basically like your ProductSpace, and then stored is basically computing the intersection with with your indexspace.

I'm not sure which formulation I like better---yours is certainly attractive. Just to explain my own reasoning, the motivation behind the design in ArrayIteration was to support not just getindex and setindex!, but also to write iterative algorithms (which avoids temporary allocation, etc). Iterating over:

each(A, region...) returns the values of A over the product-space (Cartesian, i.e., like a dense array);
stored(A, region...) returns the values of A in just the stored entries;
index(A, region...) returns the product-space (Cartesian, dense) indices;
stored(index(A, region...)) returns just the stored indices.

I think of indices and values similar to the keys/values of an associative array.

JeffBezanson · 2016-07-27T22:43:31Z

Surprising nobody, I added something very similar to NDSparse; there is a function where(A, indexes...) that returns an iterator over the values of A within the given index space.

This is quite exciting; it seems all we need to do is pick a few common names. There are a few other fairly trivial API issues, e.g. sparse accepts arguments in the order I, J, V but AxisArrays puts the data first.

stored seems fine to me --- my conception of it is that it's redundant for all arrays except strange beasts like SparseMatrixCSC, which has a dense index space but also a sparse index space of just the nonzeros. My current thinking is that that's different from NDSparse, which inherently only has a sparse index space.

timholy · 2016-07-27T23:05:07Z

This is quite exciting; it seems all we need to do is pick a few common names.

👍 Love it. I'm happy to change the names/strategy I've started with, as common ground is precious and multiple heads hammering out design even moreso.

My current thinking is that that's different from NDSparse, which inherently only has a sparse index space.

I always worry about what happens if someone does sum(A.*B) and one of them is dense and the other an NDSparse. Isn't it a little dangerous to say "this is an abstract array but it doesn't follow any of the rules we expect for a dense array"? Similar issues come up for, e.g. dot(a, b) with (Sparse)Vectors.

JeffBezanson · 2016-07-27T23:29:59Z

As it happens, NDSparse is actually not a subtype of AbstractArray. It's likely that the vast majority of uses of AbstractArray assume a dense index space. For example, NDSparse can't really support size.

For things like .*, you basically need to specify what kind of join to do on the indexes; right now we usually do an inner join. So I don't necessarily expect .* etc. to work out of the box. For now I'd be happy just to get indexing and iteration really right.

ViralBShah · 2016-07-28T12:17:16Z

This is where the word Data in NDSparseData may help signal that it does not follow array rules.

alanedelman · 2016-07-28T15:01:55Z

just worth saying that the original implementation of sparse
in the famous paper made a big deal that
a matrix is a matrix independent of whether it is dense or sparse
as i recall

StefanKarpinski · 2016-07-28T15:12:44Z

That's fine for matrices but doesn't make sense when your domain is "strings" or "version numbers" or "dollar values".

JeffBezanson · 2016-07-28T15:21:22Z

Yes, that's why I say a SparseMatrixCSC is not really sparse --- it's a dense matrix with clever storage.

alanedelman · 2016-07-28T15:43:23Z

or that there is a linear algebra point of view and a container point of view
and they are not the same at all, as we see so very often

JeffBezanson · 2016-07-28T16:27:08Z

I'm simply claiming that some objects have a cartesian product of indices, and some don't. I'm fine with considering that interface totally separate from linear algebra --- for example you might have a linear operator that does't support indexing, and that's fine. We just want a common interface for things that do have these properties.

ViralBShah · 2016-07-28T18:03:43Z

The famous sparse paper was about sparse matrices. We are talking about sparse storage here - so yes containers vs. matrices.

JeffBezanson · 2016-07-28T18:18:47Z

I don't really see what that concretely implies for what we're discussing here.

I maintain that sparse matrices can be discussed in this framework. The fact is they implement the same interface as dense matrices: they have a number of rows and columns, and they have a value for each of MxN index positions. The same idea applies to containers. However, sparse matrices also contain an object nonzeros(A), which has a sparse index space just like NDSparse. So that idea also applies to containers. If we have common first-class notions of dense and sparse index spaces, it becomes much easier to talk about algorithms on whole matrices vs. nonzeros, for example.

ViralBShah · 2016-07-28T18:49:35Z

Ok - I do agree with dense and sparse index spaces, and that is an idea that we can bring back into matrices as well.

ViralBShah · 2016-07-28T18:52:04Z

There are richer sets of sparse storage than SparseMatrixCSC, and having a more general way to think about sparsity than just the presence of zeros is actually a good thing. For that purpose, perhaps stored is not the right word.

Sacha0 · 2016-07-28T18:58:46Z

I have nothing to add here apart from irrepressible enthusiasm.

timholy · 2016-07-28T19:51:34Z

I'm fine with the idea of this not acting like a regular array, but of course if the goal is to design an API that is reasonably generic then that API needs to be designed with arrays also in mind. In other words, let's try to make things work for everyone.

As soon as you think about arrays, many computations end up involving more than one array, and that makes life more interesting. For example, dot(a, b) only needs to visit the intersection of indices that are nonzero in a and b---if one is dense and the other is sparse, you can just let the sparse one dictate the iteration. But a.+b needs to visit the union of indices that are nonzero. This is where it seems like what I've called sync---a variant of zip that couples the two iterators together, making sure they advance in unison---becomes a key part of the API.

pranavtbhat · 2016-07-29T04:16:37Z

Would it make sense to think of NDSparse as a multi-level dictionary? It maps a tuple of keys onto values, so maybe have it implement something like Associative{NTuple{N,K},V}?

JeffBezanson · 2016-08-12T16:42:37Z

I wrote up some thoughts on what it means to be an array: https://gist.github.com/JeffBezanson/24b9e2820262cdeb74f96b81534a4d1f

The goal is to get the basics exactly right. It's mostly unsurprising, but there are a couple interesting implications. I think what's there would be enough to get AxisArrays, NDSparse, SparseMatrixCSC, and OffsetArrays on the same page.

also cc @mbauman

timholy · 2016-08-13T15:22:26Z

This is awesome. I posted some comments in the gist.

JeffBezanson · 2016-08-14T20:40:21Z

Thanks. Reply posted --- I'm not sure if gist comments generate notifications.

timholy · 2016-08-15T16:22:00Z

They don't seem to, so thanks for the ping.

JeffBezanson added the question label Jul 26, 2016

JeffBezanson mentioned this issue Jul 27, 2016

Sparse Arrays JuliaArrays/AxisArrays.jl#9

Open

JeffBezanson mentioned this issue Jul 28, 2016

fill! behavior for structurally-constrained storage types? JuliaLang/julia#17670

Open

JeffBezanson closed this as completed in 738f0b3 Oct 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what should assignment with colons do? #23

what should assignment with colons do? #23

JeffBezanson commented Jul 26, 2016

ViralBShah commented Jul 27, 2016

JeffBezanson commented Jul 27, 2016 •

edited

Loading

JeffBezanson commented Jul 27, 2016

timholy commented Jul 27, 2016 •

edited

Loading

JeffBezanson commented Jul 27, 2016

timholy commented Jul 27, 2016 •

edited

Loading

JeffBezanson commented Jul 27, 2016

timholy commented Jul 27, 2016

JeffBezanson commented Jul 27, 2016

ViralBShah commented Jul 28, 2016

alanedelman commented Jul 28, 2016

StefanKarpinski commented Jul 28, 2016

JeffBezanson commented Jul 28, 2016

alanedelman commented Jul 28, 2016

JeffBezanson commented Jul 28, 2016

ViralBShah commented Jul 28, 2016

JeffBezanson commented Jul 28, 2016

ViralBShah commented Jul 28, 2016

ViralBShah commented Jul 28, 2016

Sacha0 commented Jul 28, 2016

timholy commented Jul 28, 2016

pranavtbhat commented Jul 29, 2016

JeffBezanson commented Aug 12, 2016

timholy commented Aug 13, 2016

JeffBezanson commented Aug 14, 2016

timholy commented Aug 15, 2016

what should assignment with colons do? #23

what should assignment with colons do? #23

Comments

JeffBezanson commented Jul 26, 2016

ViralBShah commented Jul 27, 2016

JeffBezanson commented Jul 27, 2016 • edited Loading

JeffBezanson commented Jul 27, 2016

timholy commented Jul 27, 2016 • edited Loading

JeffBezanson commented Jul 27, 2016

timholy commented Jul 27, 2016 • edited Loading

JeffBezanson commented Jul 27, 2016

timholy commented Jul 27, 2016

JeffBezanson commented Jul 27, 2016

ViralBShah commented Jul 28, 2016

alanedelman commented Jul 28, 2016

StefanKarpinski commented Jul 28, 2016

JeffBezanson commented Jul 28, 2016

alanedelman commented Jul 28, 2016

JeffBezanson commented Jul 28, 2016

ViralBShah commented Jul 28, 2016

JeffBezanson commented Jul 28, 2016

ViralBShah commented Jul 28, 2016

ViralBShah commented Jul 28, 2016

Sacha0 commented Jul 28, 2016

timholy commented Jul 28, 2016

pranavtbhat commented Jul 29, 2016

JeffBezanson commented Aug 12, 2016

timholy commented Aug 13, 2016

JeffBezanson commented Aug 14, 2016

timholy commented Aug 15, 2016

JeffBezanson commented Jul 27, 2016 •

edited

Loading

timholy commented Jul 27, 2016 •

edited

Loading

timholy commented Jul 27, 2016 •

edited

Loading