Row Major Arrays #5932

tknopp · 2014-02-24T13:37:01Z

I have already asked this on julia-users (https://groups.google.com/forum/?fromgroups=#!searchin/julia-users/Row$20Major/julia-users/FUU7C6Pvx3Q/Q6iUkUQM8xAJ) but maybe Github is the better place for this.

There are situations where row-major arrays are more suitable. In particular I am thinking about Kaczmarz algorithm which operates on matrix rows and this is simply more efficient on row-major arrays.

Solutions:

Keep the reversed indices in mind when writing the algorithm. IMHO a hacky solution.
Write some array view type which reverses indexing (i.e. the lazy transpose type)
Let the standard array have different storage orders. This is how Numpy does it.

I know that this is core stuff of Julia and understand any objections in changing things here. Still, Julia seems to be not that flexible in this regards as Numpy and it would be great if we could get a clean solution for this.

timholy · 2014-02-24T14:23:37Z

Several existing algorithms follow LAPACK and allow you to pass 'N', 'T', or 'C' to indicate whether you want an operation to work with normal, transposed, or conjugate-transposed storage order. The algorithms take different code-paths for all 3 cases, and for efficiency I suspect one needs them all (cache-misses are so important that I think one needs to think about them explicitly).

For sparse matrices, this is an interesting approach, and perhaps the only viable one for making both A*v and A'*w fast.

timholy · 2014-02-24T14:28:13Z

CC @madeleineudell.

tknopp · 2014-02-24T14:31:19Z

I know that a flexible data layout would require a lot of work to get all operations efficient. But maybe one can use Julia fallbacks in the first iteration for row-major data.

jiahao · 2014-02-24T14:47:04Z

The Transpose{Array} type proposed in #4774 could be a way of specifying a row-major matrix, although I'm not sure what people think of the corresponding CTranspose{Array} which would be needed to make use of Lapack 'C' storage.

jiahao · 2014-02-24T14:50:44Z

For sparse matrices, however, I see no conceptual reason to not support a SparseMatrixCSR type. Would be interesting to sketch out the possibilities for code reuse between CSR and CSC types.

tknopp · 2014-02-24T14:58:51Z

Yes a transposed array type would solve this although it is still somewhat inconvenient as one has to fill the array in a transposed way (My application has row major order matrices stored on disk). But having CSR in addition to CSC is the same as having column major + row major arrays.

lindahua · 2014-02-24T15:11:26Z

Having both CSR and CSC for sparse matrices are desirable.

However, for dense matrices, having two kinds of layouts would complicate things a lot:
(1) Element-wise operations that involve arrays of different layouts are very tricky to write.
(2) Lots of codes have been developed based on column-major order. Many of such codes won't work nicely or efficiently with row-major arrays.
(3) Shall we require/encourage people to consider both column-major or row-major cases when writing new functions that accept array arguments?

If we want to add the support of row-major arrays, it is much more preferable to add a RowMajorArray type than having Array to support both column-major and row-major orders.

tknopp · 2014-02-24T15:24:35Z

The RowMajorArray seems to be easiest approach to get something working and all your concerns are correct. I still wanted to put the flexible array layout solution onto the table as it is kind of more general.

I am wondering if I am the first to run into issues with row major data and Julia arrays. Numpy defaults to row major ordering.

andreasnoack · 2014-02-24T15:38:44Z

Isn't it mostly a matter about habit? If you are more used to row major you'd prefer to write your loop efficient for row major. Fortran is columns major and considered okay for numerical programming.

lindahua · 2014-02-24T15:44:38Z

I think Fortran and C respectively using column-major and row-major orders (by default) was unfortunate. Since then, languages are divided into two camps. According to Wikipedia:

Row-major order is used in C/C++, Mathematica, PL/I, Pascal, Python, Speakeasy, SAS and others. Column-major order is used in Fortran, OpenGL and OpenGL ES, MATLAB, GNU Octave, R, Julia, Rasdaman, and Scilab.

tknopp · 2014-02-24T15:56:04Z

@andreasnoackjensen: Technically this is not a problem. But my implementation of the Kaczmarz algorithm now operates on columns instead of rows. This makes the code less readable. Its not the biggest issue ever but whenever I see that Julia has a corner where it is not completely awesome I feel reponsible to report an issue on this :-)

madeleineudell · 2014-02-24T16:02:41Z

When this discussion is expanded to sparse arrays as well as dense ones, it becomes clear that there are algorithms for which one storage format is much faster than another. It's important to be able to support the needs of advanced users who care about these details while making sure that defaults are strong enough that casual users don't get confused.

For example, scipy supports 7 different sparse matrix formats, with very clear documentation on the strengths and weaknesses of each. So as a programmer, I can decide to eg build a matrix using coo format and convert it to csr for efficient A_b, or csc for efficient b'_A. But all these matrix types are clearly named as such, and actually raise warnings if you, say, try to set indices in a csr matrix.

timholy · 2014-02-24T16:07:33Z

But having CSR in addition to CSC is the same as having column major + row major arrays.

Not really. For dense arrays, memory locations are arranged in a predictable manner---calculating the address is a matter of arithmetic. For sparse arrays, it's completely different: there isn't any way to predict where something is stored without looking up the indexing. You're likely to get 2 cache misses just to find out where something is stored, and a third to fetch it.

lindahua · 2014-02-24T16:07:33Z

I wholeheartedly agree to support multiple sparse matrix formats. I personally have long been waiting for CSR in my own applications.

Having said that, the need of multiple layouts for dense arrays is not as obvious.

andreasnoack · 2014-02-24T16:13:14Z

I guess having more options is not always a good thing even though a more efficient solution exists. Left turns are inefficient in right-hand traffic but it is minor inefficiency compared to allowing mix of left and right-hand traffic.

The good news are that you have an efficient implementation for the transposed problem. Now you only have to write the version for the non-transposed problem:-)

The efficiency/simplicity tradeoff seems different for sparse, as pointed out by others. The only problem is that not even CSC is covered fully yet so there is a lot of work to be done.

lindahua · 2014-02-24T16:16:41Z

I like the metaphor of mix of left and right-hand traffic.

tknopp · 2014-02-24T16:24:14Z

@timholy Well, you can emulate CSR in the same way as CSC as emulating a row major array using a column major array: Keep the transposition in mind when working with these matrices.

Maybe I have overestimated my problem and it is to specific. I had thought that the column/row major issue would be more common.

timholy · 2014-02-24T18:37:48Z

That's true; instead of calling them colptr and rowval, the fields could be slowindex and fastindex, and then encode whether it's CSC or CSR as a template parameter. For example, this might be an improved sparse matrix format:

type SparseMatrix{Tv,Ti<:Integer,Av<:DenseArray,Ai<:DenseArray,isCSC} <: AbstractSparseMatrix{Tv,Ti}
    m::Int                  # Number of rows
    n::Int                  # Number of columns
    slowindex::Ai     # In CSC, column i is found at slowindex[i]:(slowindex[i+1]-1)
    fastindex::Ai      # In CSC, this stores row indexes of nonzeros
    nzval::Av           # Nonzero values
end

isCSC is a boolean template parameter, so we can do dispatch on it. I also generalized the storage vectors, primarily so they could be Vectors or SharedArrays.

jiahao · 2014-02-24T19:17:59Z

I like the idea of the isCSC parameter very much.

tknopp · 2014-02-24T19:47:27Z

@timholy: exactly. And quite similar one can shape up a strided array type that has special cases where one has row-major or column major data.

I have written myself a c++ template library that does exactly that and it worked out quite well with an interface very similar to the numpy ndarray.

So there is no principle problem with that approach. But its clear that it increases the complexity of all array stuff a lot. Maybe that I seem to be the first to run into the row-major ordering problem is a sign that the current design is the right one.

ViralBShah · 2014-04-25T12:17:32Z

At least for sparse matrices, I personally feel that this increases the codebase complexity a bit too much. I thought about this when I started writing the sparse implementation, and thought that it is better to have one complete and well-performing implementation of a sparse matrix data structure, before experimenting with new data structures.

Making all the CSC code generic for CSR, and efficiently doing operations on combinations of CSC and CSR seems like a lot of work. We certainly want COO format, but we might as well do that as part of the sparse N-d array implementation.

ViralBShah · 2015-07-03T01:13:00Z

I am closing this one, since we generally will do more sparse storage formats as packages.

tkelman · 2015-07-03T02:19:02Z

This isn't strictly for sparse, but I think it could be subsumed under potential solutions to #4774.

jiahao added linear algebra labels Feb 24, 2014

jiahao added the speculative label Feb 24, 2014

timholy mentioned this issue Feb 28, 2014

Fix Issue #5985: sprandbool() fails and does not have unit tests #5986

Merged

jiahao mentioned this issue Mar 19, 2014

Introduce AbstractSparseArray #6213

Closed

ViralBShah added the sparse label Apr 25, 2014

tknopp mentioned this issue May 30, 2014

WIP: Sparse CSR Matrices #7029

Closed

stevengj mentioned this issue Aug 1, 2014

RFC: Reverse Array dims in no-copy conversion from Julia to Numpy JuliaPy/PyCall.jl#85

Closed

ViralBShah removed the feature label Feb 14, 2015

garrison mentioned this issue Mar 11, 2015

improved ind2sub/sub2ind #10337

Merged

ViralBShah closed this as completed Jul 3, 2015

ViralBShah added the won't change Indicates that work won't continue on an issue or pull request label Jul 3, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Row Major Arrays #5932

Row Major Arrays #5932

tknopp commented Feb 24, 2014

timholy commented Feb 24, 2014

timholy commented Feb 24, 2014

tknopp commented Feb 24, 2014

jiahao commented Feb 24, 2014

jiahao commented Feb 24, 2014

tknopp commented Feb 24, 2014

lindahua commented Feb 24, 2014

tknopp commented Feb 24, 2014

andreasnoack commented Feb 24, 2014

lindahua commented Feb 24, 2014

tknopp commented Feb 24, 2014

madeleineudell commented Feb 24, 2014

timholy commented Feb 24, 2014

lindahua commented Feb 24, 2014

andreasnoack commented Feb 24, 2014

lindahua commented Feb 24, 2014

tknopp commented Feb 24, 2014

timholy commented Feb 24, 2014

jiahao commented Feb 24, 2014

tknopp commented Feb 24, 2014

ViralBShah commented Apr 25, 2014

ViralBShah commented Jul 3, 2015

tkelman commented Jul 3, 2015

Row Major Arrays #5932

Row Major Arrays #5932

Comments

tknopp commented Feb 24, 2014

timholy commented Feb 24, 2014

timholy commented Feb 24, 2014

tknopp commented Feb 24, 2014

jiahao commented Feb 24, 2014

jiahao commented Feb 24, 2014

tknopp commented Feb 24, 2014

lindahua commented Feb 24, 2014

tknopp commented Feb 24, 2014

andreasnoack commented Feb 24, 2014

lindahua commented Feb 24, 2014

tknopp commented Feb 24, 2014

madeleineudell commented Feb 24, 2014

timholy commented Feb 24, 2014

lindahua commented Feb 24, 2014

andreasnoack commented Feb 24, 2014

lindahua commented Feb 24, 2014

tknopp commented Feb 24, 2014

timholy commented Feb 24, 2014

jiahao commented Feb 24, 2014

tknopp commented Feb 24, 2014

ViralBShah commented Apr 25, 2014

ViralBShah commented Jul 3, 2015

tkelman commented Jul 3, 2015