Do we want fixed-size arrays? #5857

Closed
timholy opened this Issue Feb 19, 2014 · 65 comments

Comments

Projects
None yet
@timholy
Member

timholy commented Feb 19, 2014

One of the fun things you can do with Cartesian is easily generate unrolled matrix multiplication algorithms. This might be particularly useful for a fixed-size array type (see also ImmutableArrays; the major difference is that ImmutableArrays generates types called Matrix2x3 etc, whereas here the size is encoded as a parameter).

Here's a short proof-of-concept demo (which takes many shortcuts in the interest of brevity):

module FixedArrays

using Base.Cartesian
import Base: A_mul_B!

type FixedArray{T,N,SZ,L} <: DenseArray{T,N}
    data::NTuple{L,T}
end

function fixedvector{T}(::Type{T}, len::Integer, data)
    length(data) == len || error("The vector length $len is inconsistent with $(length(data)) elements")
    FixedArray{T,1,(int(len),),int(len)}(ntuple(length(x), i->convert(T, data[i])))  # needs to be faster than this
end
fixedvector(data) = fixedvector(typeof(data[1]), length(data), data)

function fixedmatrix{T}(::Type{T}, cols::Integer, rows::Integer, data)
    rows*cols == length(data) || error("The number $(length(data)) of elements supplied is not equal to $(cols*rows)")
    FixedArray{T,2,(int(cols),int(rows)),int(rows)*int(cols)}(ntuple(length(data), i->convert(T, data[i])))
end
fixedmatrix(data::AbstractMatrix) = fixedmatrix(eltype(data), size(data,1), size(data,2), data)

# For now, output is an array to bypass the slow FixedArray constructor
for N=1:4, K=1:4, M=1:4
    @eval begin
        function A_mul_B_FA!{TA,TB}(dest::AbstractMatrix, A::FixedArray{TA,2,($M,$K)}, B::FixedArray{TB,2,($K,$N)})
            TP = typeof(one(TA)*one(TB) + one(TA)*one(TB))
            @nexprs $K d2->(@nexprs $M d1->A_d1_d2 = A.data[(d2-1)*$M+d1])
            @nexprs $N d2->(@nexprs $K d1->B_d1_d2 = B.data[(d2-1)*$K+d1])
            @nexprs $N n->(@nexprs $M m->begin
                tmp = zero(TP)
                @nexprs $K k->(tmp += A_m_k * B_k_n)
                dest[m,n] = tmp
            end)
            dest
        end
    end
end

end

and the results in a session:

julia> AA = rand(2,2)
2x2 Array{Float64,2}:
 0.73037   0.525699
 0.581208  0.973336

julia> BB = rand(2,3)
2x3 Array{Float64,2}:
 0.527632  0.0111126   0.701888
 0.840294  0.00627326  0.450177

julia> include("FixedArrays.jl")

julia> A = FixedArrays.fixedmatrix(AA);

julia> B = FixedArrays.fixedmatrix(BB);

julia> C = Array(Float64, 2, 3);

julia> FixedArrays.A_mul_B_FA!(C, A, B)
2x3 Array{Float64,2}:
 0.827108  0.0114142  0.749295
 1.12455   0.0125647  0.846116

julia> AA*BB
2x3 Array{Float64,2}:
 0.827108  0.0114142  0.749295
 1.12455   0.0125647  0.846116

julia> @time (for i = 1:10^5; FixedArrays.A_mul_B_FA!(C, A, B); end)
elapsed time: 0.004008817 seconds (0 bytes allocated)

julia> @time (for i = 1:10^5; FixedArrays.A_mul_B_FA!(C, A, B); end)
elapsed time: 0.004217504 seconds (0 bytes allocated)

julia> @time (for i = 1:10^5; A_mul_B!(C, AA, BB); end)
elapsed time: 0.099574707 seconds (3856 bytes allocated)

julia> @time (for i = 1:10^5; A_mul_B!(C, AA, BB); end)
elapsed time: 0.098794949 seconds (0 bytes allocated)

Because of the use of globals, this 20-fold improvement may even underestimate the advantage this would confer.

Do we want this? If so, I recommend it as a GSoC project. It would actually be quite a lot of work, especially to get these working in conjunction with more traditional array types---the number of required methods is not small and indeed somewhat scary.

For the record, here's what the generated code looks like for the case of 2x2 multiplication (look Ma, no loops!):

julia> using Base.Cartesian

julia> M = K = N = 2
2

julia> macroexpand(:(function A_mul_B_FA!{TA,TB}(dest::AbstractMatrix, A::FixedArray{TA,2,($M,$K)}, B::FixedArray{TB,2,($K,$N)})
                   TP = typeof(one(TA)*one(TB) + one(TA)*one(TB))
                   @nexprs $K d2->(@nexprs $M d1->A_d1_d2 = A.data[(d2-1)*$M+d1])
                   @nexprs $N d2->(@nexprs $K d1->B_d1_d2 = B.data[(d2-1)*$K+d1])
                   @nexprs $N n->(@nexprs $M m->begin
                       tmp = zero(TP)
                       @nexprs $K k->(tmp += A_m_k * B_k_n)
                       dest[m,n] = tmp
                   end)
                   dest
               end))
:(function A_mul_B_FA!{TA,TB}(dest::AbstractMatrix,A::FixedArray{TA,2,(2,2)},B::FixedArray{TB,2,(2,2)}) # none, line 2:
        TP = typeof(one(TA) * one(TB) + one(TA) * one(TB)) # line 3:
        begin 
            begin 
                A_1_1 = A.data[1]
                A_2_1 = A.data[2]
            end
            begin 
                A_1_2 = A.data[3]
                A_2_2 = A.data[4]
            end
        end # line 4:
        begin 
            begin 
                B_1_1 = B.data[1]
                B_2_1 = B.data[2]
            end
            begin 
                B_1_2 = B.data[3]
                B_2_2 = B.data[4]
            end
        end # line 5:
        begin 
            begin 
                begin  # none, line 6:
                    tmp = zero(TP) # line 7:
                    begin 
                        tmp += A_1_1 * B_1_1
                        tmp += A_1_2 * B_2_1
                    end # line 8:
                    dest[1,1] = tmp
                end
                begin  # none, line 6:
                    tmp = zero(TP) # line 7:
                    begin 
                        tmp += A_2_1 * B_1_1
                        tmp += A_2_2 * B_2_1
                    end # line 8:
                    dest[2,1] = tmp
                end
            end
            begin 
                begin  # none, line 6:
                    tmp = zero(TP) # line 7:
                    begin 
                        tmp += A_1_1 * B_1_2
                        tmp += A_1_2 * B_2_2
                    end # line 8:
                    dest[1,2] = tmp
                end
                begin  # none, line 6:
                    tmp = zero(TP) # line 7:
                    begin 
                        tmp += A_2_1 * B_1_2
                        tmp += A_2_2 * B_2_2
                    end # line 8:
                    dest[2,2] = tmp
                end
            end
        end # line 10:
        dest
    end)
@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Feb 19, 2014

Member

I should also point out that one of the historical impediments to the popularity of ImmutableArrays is the long package-loading time:

julia> tic(); using ImmutableArrays; toc()
elapsed time: 5.436777621 seconds

Now that precompilation has arrived, this is no longer an impediment, so we may not need any kind of FixedArray in base, i.e., we may already have the perfect solution in ImmutableArrays.

CC @twadleigh

Member

timholy commented Feb 19, 2014

I should also point out that one of the historical impediments to the popularity of ImmutableArrays is the long package-loading time:

julia> tic(); using ImmutableArrays; toc()
elapsed time: 5.436777621 seconds

Now that precompilation has arrived, this is no longer an impediment, so we may not need any kind of FixedArray in base, i.e., we may already have the perfect solution in ImmutableArrays.

CC @twadleigh

@toivoh

This comment has been minimized.

Show comment
Hide comment
@toivoh

toivoh Feb 19, 2014

Member

+1 for better functionality for fixed-size arrays. I'm not quite sure what kind of implementation that you are proposing? I, for one, would welcome language support for fixed-size, mutable, arrays (or at least vectors, from which it should be reasonable to build fixed-size array support). I seem to recall that this was up for discussion at some point, but I can't remember when.

Member

toivoh commented Feb 19, 2014

+1 for better functionality for fixed-size arrays. I'm not quite sure what kind of implementation that you are proposing? I, for one, would welcome language support for fixed-size, mutable, arrays (or at least vectors, from which it should be reasonable to build fixed-size array support). I seem to recall that this was up for discussion at some point, but I can't remember when.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Feb 19, 2014

Member

A candidate implementation was in that post. (Missing lots of functions, of course.) I agree that mutability is very nice, so rather than using a tuple I suppose one could use an Array.

Member

timholy commented Feb 19, 2014

A candidate implementation was in that post. (Missing lots of functions, of course.) I agree that mutability is very nice, so rather than using a tuple I suppose one could use an Array.

@cdsousa

This comment has been minimized.

Show comment
Hide comment
@cdsousa

cdsousa Feb 19, 2014

Contributor

👍 I would say that the performance improvement is worth it, especially for applications with lots of 2D and 3D geometry transformations, where mutability seems desirable too.

Contributor

cdsousa commented Feb 19, 2014

👍 I would say that the performance improvement is worth it, especially for applications with lots of 2D and 3D geometry transformations, where mutability seems desirable too.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Feb 19, 2014

Member

Actually, the idea of having it simply contain an array has a certain appeal: it would make it much easier to interop with existing arrays, and give us easy interop with C:

type FixedArray{T,N,SZ,L} <: DenseArray{T,N}
    data::Array{T,N}
end

sdata(A::FixedArray) = A.data     # borrowing a function from SharedArrays...
convert{T}(Ptr{T}, A::FixedArray{T}) = pointer(A.data)

Functions for which we want to be able to mix regular arrays and FixedArrays could simply call sdata on their inputs (until someone decides it's worth it to create specialized implementations).

Member

timholy commented Feb 19, 2014

Actually, the idea of having it simply contain an array has a certain appeal: it would make it much easier to interop with existing arrays, and give us easy interop with C:

type FixedArray{T,N,SZ,L} <: DenseArray{T,N}
    data::Array{T,N}
end

sdata(A::FixedArray) = A.data     # borrowing a function from SharedArrays...
convert{T}(Ptr{T}, A::FixedArray{T}) = pointer(A.data)

Functions for which we want to be able to mix regular arrays and FixedArrays could simply call sdata on their inputs (until someone decides it's worth it to create specialized implementations).

@jiahao

This comment has been minimized.

Show comment
Hide comment
@jiahao

jiahao Feb 19, 2014

Member

👍

One of the secret features of the generic linear algebra functions that @andreasnoackjensen and I have been silently injecting into LinAlg is that they already support matrices of arbitrary eltype, including matrices (or ought to with some tweaking). So you can already start doing linear algebra over Matrix{Matrix}. Matrix{T<:ImmutableMatrix} would be ideal to start writing up tiled matrix operations in.

And yes, that sounds like a fantastic GSoC project.

Member

jiahao commented Feb 19, 2014

👍

One of the secret features of the generic linear algebra functions that @andreasnoackjensen and I have been silently injecting into LinAlg is that they already support matrices of arbitrary eltype, including matrices (or ought to with some tweaking). So you can already start doing linear algebra over Matrix{Matrix}. Matrix{T<:ImmutableMatrix} would be ideal to start writing up tiled matrix operations in.

And yes, that sounds like a fantastic GSoC project.

@ihnorton

This comment has been minimized.

Show comment
Hide comment
@ihnorton

ihnorton Feb 19, 2014

Member

@timholy if FixedArray contains an Array or NTuple like that, the data is not contiguous and FixedArray could not be packed in a struct. So C-interop would still be a bit hampered.

Member

ihnorton commented Feb 19, 2014

@timholy if FixedArray contains an Array or NTuple like that, the data is not contiguous and FixedArray could not be packed in a struct. So C-interop would still be a bit hampered.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Feb 19, 2014

Member

@ihnorton, true if you're thinking about them as a field of a struct, but I was thinking more about passing them to LAPACK when needed (e.g., for factorizations).

Member

timholy commented Feb 19, 2014

@ihnorton, true if you're thinking about them as a field of a struct, but I was thinking more about passing them to LAPACK when needed (e.g., for factorizations).

@lindahua

This comment has been minimized.

Show comment
Hide comment
@lindahua

lindahua Feb 19, 2014

Member

Fixed-size arrays are definitely useful.

Since these arrays are usually used in applications with high performance requirement (geometric computing, etc). Therefore, computation over these arrays should be implemented as fast as possible.

However, IMO, such computation should be implemented based on SIMD instead of cartesian indexing:

Here is a snippet of C++ codes for 2x2 matrix multiplication

# suppose SSE vector a holds a 2x2 float matrix A, and b holds B
__m128 t1 = _mm_movelh_ps(a, a);   // t1 = [a11, a21, a11, a21]
__m128 t2 = _mm_moveldup_ps(b);   // t2 = [b11, b11, b12, b12]
__m128 t3 = _mm_movehl_ps(a, a);   // t3 = [a12, a22, a12, a22]
__m128 t4 = _mm_movehdup_ps(b);  // t4 = [b21, b21, b22, b22]
t1 = _mm_mul_ps(t1, t2);   // t1 = [a11 * b11, a21 * b11, a11 * b12, a21 * b12]
t3 = _mm_mul_ps(t3, t4);   // t2 = [a12 * b21, a22 * b21, a12 * b22, a22 * b22]
__m128 r = _mm_add_ps(t1, t3);  // the result

Small matrix algebra can usually be computed within a small number of cpu cycles. Even a waste of a couple CPU cycles would lead to considerable performance penalty.

Also note that BLAS & LAPACK are aimed for matrices of sufficiently large size (e.g. 32 x 32 or above) -- due to their overhead. Using BLAS & LAPACK to small fixed size matrices would be even much slower than a naive for-loop.

I have long been considering writing a package for small matrices. Just that necessary facilities for SIMD has not been supported.

Member

lindahua commented Feb 19, 2014

Fixed-size arrays are definitely useful.

Since these arrays are usually used in applications with high performance requirement (geometric computing, etc). Therefore, computation over these arrays should be implemented as fast as possible.

However, IMO, such computation should be implemented based on SIMD instead of cartesian indexing:

Here is a snippet of C++ codes for 2x2 matrix multiplication

# suppose SSE vector a holds a 2x2 float matrix A, and b holds B
__m128 t1 = _mm_movelh_ps(a, a);   // t1 = [a11, a21, a11, a21]
__m128 t2 = _mm_moveldup_ps(b);   // t2 = [b11, b11, b12, b12]
__m128 t3 = _mm_movehl_ps(a, a);   // t3 = [a12, a22, a12, a22]
__m128 t4 = _mm_movehdup_ps(b);  // t4 = [b21, b21, b22, b22]
t1 = _mm_mul_ps(t1, t2);   // t1 = [a11 * b11, a21 * b11, a11 * b12, a21 * b12]
t3 = _mm_mul_ps(t3, t4);   // t2 = [a12 * b21, a22 * b21, a12 * b22, a22 * b22]
__m128 r = _mm_add_ps(t1, t3);  // the result

Small matrix algebra can usually be computed within a small number of cpu cycles. Even a waste of a couple CPU cycles would lead to considerable performance penalty.

Also note that BLAS & LAPACK are aimed for matrices of sufficiently large size (e.g. 32 x 32 or above) -- due to their overhead. Using BLAS & LAPACK to small fixed size matrices would be even much slower than a naive for-loop.

I have long been considering writing a package for small matrices. Just that necessary facilities for SIMD has not been supported.

@lindahua

This comment has been minimized.

Show comment
Hide comment
@lindahua

lindahua Feb 19, 2014

Member

This issue is related to #2489.

Also note that I have a C++ library developed several years ago, which contains highly optimized SIMD codes for small matrix algebra (e.g. hand-crafted kernels for matrix multiplication of (m, k) x (k, n) for each combination of m, n, k = 1, 2, 3, 4).

I'd be happy to translate all these to Julia whenever SIMD lands.

Member

lindahua commented Feb 19, 2014

This issue is related to #2489.

Also note that I have a C++ library developed several years ago, which contains highly optimized SIMD codes for small matrix algebra (e.g. hand-crafted kernels for matrix multiplication of (m, k) x (k, n) for each combination of m, n, k = 1, 2, 3, 4).

I'd be happy to translate all these to Julia whenever SIMD lands.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Feb 19, 2014

Member

Sounds reasonable to me. I presume it's not the case that LLVM can generate such instructions automatically?

Member

timholy commented Feb 19, 2014

Sounds reasonable to me. I presume it's not the case that LLVM can generate such instructions automatically?

@cdsousa

This comment has been minimized.

Show comment
Hide comment
@cdsousa

cdsousa Feb 19, 2014

Contributor

I think it wouldn't it be too hard to dispatch fixed-size matrix operations to different underlying implementations depending on the size; just as Eigen lib (AFAIK) does.

Contributor

cdsousa commented Feb 19, 2014

I think it wouldn't it be too hard to dispatch fixed-size matrix operations to different underlying implementations depending on the size; just as Eigen lib (AFAIK) does.

@lindahua

This comment has been minimized.

Show comment
Hide comment
@lindahua

lindahua Feb 19, 2014

Member

I don't think LLVM is smart enough to do that.

It is good to have a matrix type declared as: SmallMatrix{M, N}. When M and N is small enough, we use specialized routines. When M and N is bigger, we decompose the computation is to computation over smaller parts, each using a specialized kernel.

Member

lindahua commented Feb 19, 2014

I don't think LLVM is smart enough to do that.

It is good to have a matrix type declared as: SmallMatrix{M, N}. When M and N is small enough, we use specialized routines. When M and N is bigger, we decompose the computation is to computation over smaller parts, each using a specialized kernel.

@lindahua

This comment has been minimized.

Show comment
Hide comment
@lindahua

lindahua Feb 19, 2014

Member

I think your FixedArray definition is more general than SmallMatrix{M, N}.

Member

lindahua commented Feb 19, 2014

I think your FixedArray definition is more general than SmallMatrix{M, N}.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Feb 19, 2014

Member

@cdsousa, this implementation does dispatch on different sizes, because of the SZ type-parameter.

If we use an array instead of a tuple in the underlying representation, we don't need that L parameter. And presumably N can be computed as length(SZ). So I got a little carried away with the parameters 😄.

@lindahua, your "tiled" idea seems like a good one.

Member

timholy commented Feb 19, 2014

@cdsousa, this implementation does dispatch on different sizes, because of the SZ type-parameter.

If we use an array instead of a tuple in the underlying representation, we don't need that L parameter. And presumably N can be computed as length(SZ). So I got a little carried away with the parameters 😄.

@lindahua, your "tiled" idea seems like a good one.

@cdsousa

This comment has been minimized.

Show comment
Hide comment
@cdsousa

cdsousa Feb 19, 2014

Contributor

@timholy: oh, it's true! (I'd failed to read the macro foo part) That is great 😃

Contributor

cdsousa commented Feb 19, 2014

@timholy: oh, it's true! (I'd failed to read the macro foo part) That is great 😃

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Feb 19, 2014

Member

@lindahua, I just checked code_native, and while I suck at reading assembly, I don't see anything that jumps out as SIMD. However, are we enabling the SLP vectorizer? I see that FPM->add(createBBVectorizePass()) is currently commented-out. A key point is that the product is written in terms of scalars---all the indexing is extracted at the beginning. Of course, that by itself still may not be enough for it to figure out how to use SIMD.

Member

timholy commented Feb 19, 2014

@lindahua, I just checked code_native, and while I suck at reading assembly, I don't see anything that jumps out as SIMD. However, are we enabling the SLP vectorizer? I see that FPM->add(createBBVectorizePass()) is currently commented-out. A key point is that the product is written in terms of scalars---all the indexing is extracted at the beginning. Of course, that by itself still may not be enough for it to figure out how to use SIMD.

@JeffBezanson

This comment has been minimized.

Show comment
Hide comment
@JeffBezanson

JeffBezanson Feb 19, 2014

Member

What jumps out at me here is #2489 --- our standard matrix multiply performs horribly on small matrices. That code needs a bath.

I think eventually tuples will use native data layouts, at which point they will become immutable fixed-size vectors. Immutability is the way to go here, especially since this feature is aimed at smallish arrays, and we could use an immutable array type anyway.

Member

JeffBezanson commented Feb 19, 2014

What jumps out at me here is #2489 --- our standard matrix multiply performs horribly on small matrices. That code needs a bath.

I think eventually tuples will use native data layouts, at which point they will become immutable fixed-size vectors. Immutability is the way to go here, especially since this feature is aimed at smallish arrays, and we could use an immutable array type anyway.

@lindahua

This comment has been minimized.

Show comment
Hide comment
@lindahua

lindahua Feb 19, 2014

Member

I agree with @JeffBezanson that for small matrices, it is desirable to have immutable arrays. To me, the (potential) benefit of immutability outweighs the need to allow modifying individual elements of a small array. (In practice, small matrices/vectors are usually used as a whole).

I think @timholy's definition is a good proposal (except that I would use immutable FixedArray instead of type FixedArray). The tuple field is immutable anyway.

Member

lindahua commented Feb 19, 2014

I agree with @JeffBezanson that for small matrices, it is desirable to have immutable arrays. To me, the (potential) benefit of immutability outweighs the need to allow modifying individual elements of a small array. (In practice, small matrices/vectors are usually used as a whole).

I think @timholy's definition is a good proposal (except that I would use immutable FixedArray instead of type FixedArray). The tuple field is immutable anyway.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Feb 19, 2014

Member

Re #2489, in my hands the performance gap was even larger (20x instead of 4x). Not sure if it's because of heap vs stack, or bounds-checking vs not, or 2x3 vs 2x2, or changes in Julia since then, or me messing something up...

I'm fine with either immutability or mutability. I guess my bottom line is this: ever since we allowed a tuple of integers to be a template parameter, we've been in a position to create a general FixedArray type that plays nicely with multiple dispatch. Life will get even better with SIMD and native layout of tuples, of course, but we've now got what we need to take an "intermediate" step, if folks think we should go down this road.

(And to further clarify, here I'm trying to confine myself to "professor" mode and simply sketch a potential project, possibly for a GSoC student; I don't have time or intention to actually pursue this myself in the foreseeable future.)

Member

timholy commented Feb 19, 2014

Re #2489, in my hands the performance gap was even larger (20x instead of 4x). Not sure if it's because of heap vs stack, or bounds-checking vs not, or 2x3 vs 2x2, or changes in Julia since then, or me messing something up...

I'm fine with either immutability or mutability. I guess my bottom line is this: ever since we allowed a tuple of integers to be a template parameter, we've been in a position to create a general FixedArray type that plays nicely with multiple dispatch. Life will get even better with SIMD and native layout of tuples, of course, but we've now got what we need to take an "intermediate" step, if folks think we should go down this road.

(And to further clarify, here I'm trying to confine myself to "professor" mode and simply sketch a potential project, possibly for a GSoC student; I don't have time or intention to actually pursue this myself in the foreseeable future.)

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Feb 19, 2014

Member

Forgot to respond to this:

have you tried this with @ArchRobison SIMD branch?

Nope. Don't know enough about SIMD to do that in a respectable fashion.

Member

timholy commented Feb 19, 2014

Forgot to respond to this:

have you tried this with @ArchRobison SIMD branch?

Nope. Don't know enough about SIMD to do that in a respectable fashion.

@ArchRobison

This comment has been minimized.

Show comment
Hide comment
@ArchRobison

ArchRobison Feb 19, 2014

Contributor

My simd branch is targeting loops that can broken into SIMD chunks, so it might not buy anything for fixed sized arrays. I tried it on a 8-iteration loop (with AVX enabled) and the unroller unrolled the loop first. Maybe SLP_Vectorizer can deal with the unrolled loop, or maybe I need to run the loop vectorizer before the unroller.

Contributor

ArchRobison commented Feb 19, 2014

My simd branch is targeting loops that can broken into SIMD chunks, so it might not buy anything for fixed sized arrays. I tried it on a 8-iteration loop (with AVX enabled) and the unroller unrolled the loop first. Maybe SLP_Vectorizer can deal with the unrolled loop, or maybe I need to run the loop vectorizer before the unroller.

@lindahua

This comment has been minimized.

Show comment
Hide comment
@lindahua

lindahua Feb 19, 2014

Member

@ArchRobison: I am wondering whether the vectorizer can automatically convert the following statements into an _mm256_add_pd instruction (suppose AVX is enabled)?

# let x and y be a four tuple of Float64
z = (x[1] + y[1], x[2] + y[2], x[3] + y[3], x[4] + y[4]).

And whether it may convert the following statements into an _mm256_loadu_pd instructions

# let a be an ordinary array, 
t = (a[i], a[i+1], a[i+2], a[i+3])

We don't have to explicitly export low-level SIMD machinery if the vectorizer can deal with tuple operations in a smart way.

Member

lindahua commented Feb 19, 2014

@ArchRobison: I am wondering whether the vectorizer can automatically convert the following statements into an _mm256_add_pd instruction (suppose AVX is enabled)?

# let x and y be a four tuple of Float64
z = (x[1] + y[1], x[2] + y[2], x[3] + y[3], x[4] + y[4]).

And whether it may convert the following statements into an _mm256_loadu_pd instructions

# let a be an ordinary array, 
t = (a[i], a[i+1], a[i+2], a[i+3])

We don't have to explicitly export low-level SIMD machinery if the vectorizer can deal with tuple operations in a smart way.

@ArchRobison

This comment has been minimized.

Show comment
Hide comment
@ArchRobison

ArchRobison Feb 19, 2014

Contributor

What wasn't clear to me until now is that BBVectorizePass and SLPVectorizer are distinct different LLVM passes. Julia has BBVectorizePass commented out, not surprising since it is documented as compile-time intensive and turned off in Clang by default. But SLPVectorizer is enabled by default in Clang, and so I'll try adding it to the Julia compilation chain and give it a try on your tuple example.

Contributor

ArchRobison commented Feb 19, 2014

What wasn't clear to me until now is that BBVectorizePass and SLPVectorizer are distinct different LLVM passes. Julia has BBVectorizePass commented out, not surprising since it is documented as compile-time intensive and turned off in Clang by default. But SLPVectorizer is enabled by default in Clang, and so I'll try adding it to the Julia compilation chain and give it a try on your tuple example.

@ArchRobison

This comment has been minimized.

Show comment
Hide comment
@ArchRobison

ArchRobison Feb 19, 2014

Contributor

I gave it a whirl and the first thing I noticed is that a tuple access generates a bounds checks that seems gratuitous. For example, run:

function foo( a, b )
    (a[1]+b[1],a[2]+b[2],a[3]+b[3],a[4]+b[4])
end

x = 1.0f0
y = (x,x,x,x)
t = typeof(y)
println(foo(y,y))
println(code_typed(foo,(t,t)))
code_llvm(foo,(t,t))

Using Julia built from today's master source, the LLVM code looks like it has a bounds check for most of the subscripts even though I though lowering to LLVM took pains to avoid them. Am I missing something or should I file an issue about the gratuitous branches?

Contributor

ArchRobison commented Feb 19, 2014

I gave it a whirl and the first thing I noticed is that a tuple access generates a bounds checks that seems gratuitous. For example, run:

function foo( a, b )
    (a[1]+b[1],a[2]+b[2],a[3]+b[3],a[4]+b[4])
end

x = 1.0f0
y = (x,x,x,x)
t = typeof(y)
println(foo(y,y))
println(code_typed(foo,(t,t)))
code_llvm(foo,(t,t))

Using Julia built from today's master source, the LLVM code looks like it has a bounds check for most of the subscripts even though I though lowering to LLVM took pains to avoid them. Am I missing something or should I file an issue about the gratuitous branches?

@jakebolewski

This comment has been minimized.

Show comment
Hide comment
@jakebolewski

jakebolewski Feb 19, 2014

Member

Could more intensive passes be selectively enabled during the precompilation step?

Member

jakebolewski commented Feb 19, 2014

Could more intensive passes be selectively enabled during the precompilation step?

@simonster

This comment has been minimized.

Show comment
Hide comment
@simonster

simonster Feb 20, 2014

Member

@ArchRobison: Functions with no type declarations don't get specialized on tuple arguments (see #4090). If I use function foo( a::NTuple{4,Float32}, b::NTuple{4,Float32} ), then the generated code looks a lot nicer and doesn't have bounds checks. On the other hand, compiling code for every possible size of array may not always be a great idea.

Member

simonster commented Feb 20, 2014

@ArchRobison: Functions with no type declarations don't get specialized on tuple arguments (see #4090). If I use function foo( a::NTuple{4,Float32}, b::NTuple{4,Float32} ), then the generated code looks a lot nicer and doesn't have bounds checks. On the other hand, compiling code for every possible size of array may not always be a great idea.

@ArchRobison

This comment has been minimized.

Show comment
Hide comment
@ArchRobison

ArchRobison Feb 20, 2014

Contributor

Thanks for the explanation and link. I tried SLPVectorizer on the NTuple{4,Float32} variation of foo. Alas SLPVectorizer missed what looks like an obvious opportunity to generate a SIMD fadd. I'll poke around and try to figure what thwarted it.

Contributor

ArchRobison commented Feb 20, 2014

Thanks for the explanation and link. I tried SLPVectorizer on the NTuple{4,Float32} variation of foo. Alas SLPVectorizer missed what looks like an obvious opportunity to generate a SIMD fadd. I'll poke around and try to figure what thwarted it.

@ArchRobison

This comment has been minimized.

Show comment
Hide comment
@ArchRobison

ArchRobison Feb 21, 2014

Contributor

It turns out that SLPVectorizer sometimes vectorizes tuple stuff nicely, but sometimes misses. Either lowering of tuple constructors to LLVM needs to be changed or SLPVectorizer needs to be improved.

For example, look at the transcript at https://gist.github.com/ArchRobison/9145036, which is using SLPVectorizer hacked into my SIMD branch, with LLVM 3.4 enabled. I don't know if using 3.3 makes a difference or not. The transcript shows how verbose source generates beautifully short LLVM, but short source generates verbose LLVM. 😦

The problem in SLPVectorizer is a sensitivity to instruction order in a basic block. It insists that uses of the scalar fadd instructions must come after those fadd instructions. For reference, here's the key logic in deps/llvm-3.4/lib/Transforms/VectorizeSLPVectorizer.cpp:

      if (UserIndex < MyLastIndex) {

        DEBUG(dbgs() << "SLP: Can't schedule extractelement for "
              << *User << ". \n");
        newTreeEntry(VL, false);
        return;
      }

I'm inclined to look into improving SLPVectorizer to remove the sensitivity. The alternative of changing the Julia lowering would undesirably increase live ranges of values.

Contributor

ArchRobison commented Feb 21, 2014

It turns out that SLPVectorizer sometimes vectorizes tuple stuff nicely, but sometimes misses. Either lowering of tuple constructors to LLVM needs to be changed or SLPVectorizer needs to be improved.

For example, look at the transcript at https://gist.github.com/ArchRobison/9145036, which is using SLPVectorizer hacked into my SIMD branch, with LLVM 3.4 enabled. I don't know if using 3.3 makes a difference or not. The transcript shows how verbose source generates beautifully short LLVM, but short source generates verbose LLVM. 😦

The problem in SLPVectorizer is a sensitivity to instruction order in a basic block. It insists that uses of the scalar fadd instructions must come after those fadd instructions. For reference, here's the key logic in deps/llvm-3.4/lib/Transforms/VectorizeSLPVectorizer.cpp:

      if (UserIndex < MyLastIndex) {

        DEBUG(dbgs() << "SLP: Can't schedule extractelement for "
              << *User << ". \n");
        newTreeEntry(VL, false);
        return;
      }

I'm inclined to look into improving SLPVectorizer to remove the sensitivity. The alternative of changing the Julia lowering would undesirably increase live ranges of values.

@lindahua

This comment has been minimized.

Show comment
Hide comment
@lindahua

lindahua Feb 21, 2014

Member

I think it is useful to start adding the support of low-level SIMD instructions (probably after v0.3 released). These support may be encapsulated within an inner module (say Base.SIMD) without being exported. So that advanced users may start exploring the possibility of using these machinery to write high-performance functions.

Member

lindahua commented Feb 21, 2014

I think it is useful to start adding the support of low-level SIMD instructions (probably after v0.3 released). These support may be encapsulated within an inner module (say Base.SIMD) without being exported. So that advanced users may start exploring the possibility of using these machinery to write high-performance functions.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Feb 21, 2014

Member

Pretty amazing detective work, @ArchRobison!

Member

timholy commented Feb 21, 2014

Pretty amazing detective work, @ArchRobison!

@lindahua

This comment has been minimized.

Show comment
Hide comment
@lindahua

lindahua Feb 21, 2014

Member

Great! looking forward to the SIMD stuff being merged.

Member

lindahua commented Feb 21, 2014

Great! looking forward to the SIMD stuff being merged.

@andrewcooke

This comment has been minimized.

Show comment
Hide comment
@andrewcooke

andrewcooke Mar 2, 2014

Contributor

Just tying things together. A comment very early in the thread here mentioned earlier discussion which I think would have been @bsilbaugh at https://groups.google.com/d/msg/julia-users/gZ5hAVK2PtA/rqnkGzKp6PEJ

Contributor

andrewcooke commented Mar 2, 2014

Just tying things together. A comment very early in the thread here mentioned earlier discussion which I think would have been @bsilbaugh at https://groups.google.com/d/msg/julia-users/gZ5hAVK2PtA/rqnkGzKp6PEJ

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Mar 2, 2014

Member

I'd forgotten that thread. I'll add FixedArrays to the GSoC page.

Member

timholy commented Mar 2, 2014

I'd forgotten that thread. I'll add FixedArrays to the GSoC page.

@ArchRobison

This comment has been minimized.

Show comment
Hide comment
@ArchRobison

ArchRobison Mar 19, 2014

Contributor

I have a patch for the LLVM 3.5 SLPVecctorizer that fixes the sensitivity I mentioned. My plan is to:

  1. Work on getting the patch into LLVM trunk. This is going to take a while, since I have to learn the LLVM test system, create some unit tests, and go through the LLVM code review process.
  2. Backport the patch, if possible, to LLVM 3.3, and submit a pull request for it, so that it can be used in the current Julia.
    Are we open to having a deps/llvm-3.3.patch for this purpose? The patch file for LLVM 3.5 is 178 lines?
Contributor

ArchRobison commented Mar 19, 2014

I have a patch for the LLVM 3.5 SLPVecctorizer that fixes the sensitivity I mentioned. My plan is to:

  1. Work on getting the patch into LLVM trunk. This is going to take a while, since I have to learn the LLVM test system, create some unit tests, and go through the LLVM code review process.
  2. Backport the patch, if possible, to LLVM 3.3, and submit a pull request for it, so that it can be used in the current Julia.
    Are we open to having a deps/llvm-3.3.patch for this purpose? The patch file for LLVM 3.5 is 178 lines?
@JeffBezanson

This comment has been minimized.

Show comment
Hide comment
@JeffBezanson

JeffBezanson Mar 19, 2014

Member

I think we'd be more than open to including that patch.

Member

JeffBezanson commented Mar 19, 2014

I think we'd be more than open to including that patch.

@mschauer

This comment has been minimized.

Show comment
Hide comment
@mschauer

mschauer Mar 27, 2014

Contributor

The NTuple type used in FixedArrays above is not a "pointerfree" type. ImmutableArrays has the advantage that Array(Vector4{Int}, 10) it can be reinterpreted as a 4x10 matrix with reinterpret and reshape without memory movement (e.g. when calling a c function on the data).

Contributor

mschauer commented Mar 27, 2014

The NTuple type used in FixedArrays above is not a "pointerfree" type. ImmutableArrays has the advantage that Array(Vector4{Int}, 10) it can be reinterpreted as a 4x10 matrix with reinterpret and reshape without memory movement (e.g. when calling a c function on the data).

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Mar 27, 2014

Member

I think the idea is to make that have the same layout, which is in principle possible.

Member

StefanKarpinski commented Mar 27, 2014

I think the idea is to make that have the same layout, which is in principle possible.

@tknopp

This comment has been minimized.

Show comment
Hide comment
@tknopp

tknopp Apr 18, 2014

Contributor

@mschauer is there a reason why NTuple is not "pointer free"? I thought this would be the point of Kenos tuple work.

I am currently coding something involving magnetic fields where a fixed size "Vec3{Float64}" would be really handy. I tried to use NTuple{3,Float64} but the inability to reinterpret it is a major issue.

Contributor

tknopp commented Apr 18, 2014

@mschauer is there a reason why NTuple is not "pointer free"? I thought this would be the point of Kenos tuple work.

I am currently coding something involving magnetic fields where a fixed size "Vec3{Float64}" would be really handy. I tried to use NTuple{3,Float64} but the inability to reinterpret it is a major issue.

@twadleigh

This comment has been minimized.

Show comment
Hide comment
@twadleigh

twadleigh Apr 18, 2014

Contributor

When NTuple can be made to behave like a bits type in terms of its memory footprint and also provide runtime performance comparable to the analogous immutable types, I will probably deprecate the ImmutableArrays library, or at least stop using it for my own projects.

At some point I really ought to invest some time in building a relatively comprehensive performance benchmark library for ImmutableArrays, along with some alternative implementations (like one based on NTuple), so that we can assess when the transition occurs.

Contributor

twadleigh commented Apr 18, 2014

When NTuple can be made to behave like a bits type in terms of its memory footprint and also provide runtime performance comparable to the analogous immutable types, I will probably deprecate the ImmutableArrays library, or at least stop using it for my own projects.

At some point I really ought to invest some time in building a relatively comprehensive performance benchmark library for ImmutableArrays, along with some alternative implementations (like one based on NTuple), so that we can assess when the transition occurs.

@Keno

This comment has been minimized.

Show comment
Hide comment
@Keno

Keno Apr 18, 2014

Member

There is a technical reason that hasn't been resolved yet. There's still a bit of work let to do. I've been wanting this for a long time, but I haven't had the time. It's on my short list of things to do though. For now I've been using ImmutableArrays as well, though even though it's a great library (thanks @twadleigh) it pains me that tuples can't just do this.

Member

Keno commented Apr 18, 2014

There is a technical reason that hasn't been resolved yet. There's still a bit of work let to do. I've been wanting this for a long time, but I haven't had the time. It's on my short list of things to do though. For now I've been using ImmutableArrays as well, though even though it's a great library (thanks @twadleigh) it pains me that tuples can't just do this.

@tknopp

This comment has been minimized.

Show comment
Hide comment
@tknopp

tknopp Apr 18, 2014

Contributor

Ok thanks, I will use the ImmutableArrays then as well for the moment. Should NTuple in the future also allow arithmetics like the ImmutableArrays? One missing piece when using NTuple for small vectors would be the question what type to use for matrices.

Contributor

tknopp commented Apr 18, 2014

Ok thanks, I will use the ImmutableArrays then as well for the moment. Should NTuple in the future also allow arithmetics like the ImmutableArrays? One missing piece when using NTuple for small vectors would be the question what type to use for matrices.

@Keno

This comment has been minimized.

Show comment
Hide comment
@Keno

Keno Apr 18, 2014

Member

My thoughts on the matter are to use NTuple for pure storage and have an

immutable NVector{N,T}
    data::NTuple{N,T}
end

on which to define the arithmetic operations, so there might still be use for something like the immutable arrays package (or we might just put it in base). The reason for this distinction is that I'm nor sure people would expect tuples to behave like vectors, but I haven't yet seen a case where it's an actual problem.

Member

Keno commented Apr 18, 2014

My thoughts on the matter are to use NTuple for pure storage and have an

immutable NVector{N,T}
    data::NTuple{N,T}
end

on which to define the arithmetic operations, so there might still be use for something like the immutable arrays package (or we might just put it in base). The reason for this distinction is that I'm nor sure people would expect tuples to behave like vectors, but I haven't yet seen a case where it's an actual problem.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Apr 18, 2014

Member

Note that up at the top of this issue is a candidate implementation of these ideas, including an algorithm for matrix-multiply that exploits a tuple representation. I haven't benchmarked it, but I very much doubt it's currently as fast as ImmutableArrays. (But it's a lot faster than standard Arrays, for small matrices.)

Member

timholy commented Apr 18, 2014

Note that up at the top of this issue is a candidate implementation of these ideas, including an algorithm for matrix-multiply that exploits a tuple representation. I haven't benchmarked it, but I very much doubt it's currently as fast as ImmutableArrays. (But it's a lot faster than standard Arrays, for small matrices.)

@Keno

This comment has been minimized.

Show comment
Hide comment
@Keno

Keno Apr 18, 2014

Member

Ah yes I forgotten about that. That's certainly along the line of what I had in mind.

Member

Keno commented Apr 18, 2014

Ah yes I forgotten about that. That's certainly along the line of what I had in mind.

@tknopp

This comment has been minimized.

Show comment
Hide comment
@tknopp

tknopp Apr 18, 2014

Contributor

I am not entirely sure what I would prefer on this point. There were situations where I wanted to do arithmetics on tuples (e.g. on the size tuple of an array).

Contributor

tknopp commented Apr 18, 2014

I am not entirely sure what I would prefer on this point. There were situations where I wanted to do arithmetics on tuples (e.g. on the size tuple of an array).

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Apr 18, 2014

Member

I'd lean towards fewer types – i.e. just making the usual vector arithmetic work on NTuples. An interesting corner case is what to do about say (1, 2.5, 2//3) + (2, 1.5, 3//4). I think it would be pretty cool if that just produced (3, 4.0, 17//12), i.e. mixed-type vectorized operations.

Member

StefanKarpinski commented Apr 18, 2014

I'd lean towards fewer types – i.e. just making the usual vector arithmetic work on NTuples. An interesting corner case is what to do about say (1, 2.5, 2//3) + (2, 1.5, 3//4). I think it would be pretty cool if that just produced (3, 4.0, 17//12), i.e. mixed-type vectorized operations.

@simonster

This comment has been minimized.

Show comment
Hide comment
@simonster

simonster Apr 18, 2014

Member

It seems like the specialization rules for tuples also need to change if we want to make NTuples behave like fixed-size vectors.

Member

simonster commented Apr 18, 2014

It seems like the specialization rules for tuples also need to change if we want to make NTuples behave like fixed-size vectors.

@ArchRobison

This comment has been minimized.

Show comment
Hide comment
@ArchRobison

ArchRobison Apr 18, 2014

Contributor

By coincidence, I was trying to create a vectorized 4x4 matrix multiplication using PR #6271, and almost got it to vectorize. (I think with some more work on LLVM I can make it vectorize). Her'e's what I was trying:

function (+){T}( a::(T,T,T,T), b::(T,T,T,T) )
    (a[1]+b[1],a[2]+b[2],a[3]+b[3],a[4]+b[4])
end

function (*){T}(a::(T,T,T,T), b::T)
    (a[1]*b,a[2]*b,a[3]*b,a[4]*b)
end

function MatMul4x4{T}( c::Array{T,2}, a::Array{T,2}, b::Array{T,2} )
    for j=1:4
        for k=1:4
            @inbounds begin
                (c[1,j],c[2,j],c[3,j],c[4,j]) = (c[1,j],c[2,j],c[3,j],c[4,j]) + (a[1,k],a[2,k],a[3,3],a[4,k])*b[k,j];
            end
        end
    end
end

t = Array{Float32,2}
code_llvm(MatMul4x4,(t,t,t))

A small matrix class with (e.g. MNMatrix{M,N,T}) might make this sort of thing more practical. Note the destructive update of c. An immutable matrix wouldn't work.

Contributor

ArchRobison commented Apr 18, 2014

By coincidence, I was trying to create a vectorized 4x4 matrix multiplication using PR #6271, and almost got it to vectorize. (I think with some more work on LLVM I can make it vectorize). Her'e's what I was trying:

function (+){T}( a::(T,T,T,T), b::(T,T,T,T) )
    (a[1]+b[1],a[2]+b[2],a[3]+b[3],a[4]+b[4])
end

function (*){T}(a::(T,T,T,T), b::T)
    (a[1]*b,a[2]*b,a[3]*b,a[4]*b)
end

function MatMul4x4{T}( c::Array{T,2}, a::Array{T,2}, b::Array{T,2} )
    for j=1:4
        for k=1:4
            @inbounds begin
                (c[1,j],c[2,j],c[3,j],c[4,j]) = (c[1,j],c[2,j],c[3,j],c[4,j]) + (a[1,k],a[2,k],a[3,3],a[4,k])*b[k,j];
            end
        end
    end
end

t = Array{Float32,2}
code_llvm(MatMul4x4,(t,t,t))

A small matrix class with (e.g. MNMatrix{M,N,T}) might make this sort of thing more practical. Note the destructive update of c. An immutable matrix wouldn't work.

@Keno

This comment has been minimized.

Show comment
Hide comment
@Keno

Keno Apr 18, 2014

Member

I'm fine with defining +,- on tuples, but what does * do? Is it matrix multiply or elementwise multiplication. I think people might be expecting elementwise multiplication on tuples.

Member

Keno commented Apr 18, 2014

I'm fine with defining +,- on tuples, but what does * do? Is it matrix multiply or elementwise multiplication. I think people might be expecting elementwise multiplication on tuples.

@tknopp

This comment has been minimized.

Show comment
Hide comment
@tknopp

tknopp Apr 18, 2014

Contributor

I also think that less types are better, especially as one otherwise has to decide when to use NTuple and when the fixed size vector. My only concern is what to do with matrices. Is the syntax

(1,0,0; 0,1,0; 0,0,1)

still open?

Contributor

tknopp commented Apr 18, 2014

I also think that less types are better, especially as one otherwise has to decide when to use NTuple and when the fixed size vector. My only concern is what to do with matrices. Is the syntax

(1,0,0; 0,1,0; 0,0,1)

still open?

@tknopp

This comment has been minimized.

Show comment
Hide comment
@tknopp

tknopp Apr 18, 2014

Contributor

I would expect .* to do element-wise multiplication

Contributor

tknopp commented Apr 18, 2014

I would expect .* to do element-wise multiplication

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Apr 18, 2014

Member

If you want to represent a matrix using a tuple, you're going to have to wrap it (even if just with template parameters as above) to encode the m,n sizes.

Member

timholy commented Apr 18, 2014

If you want to represent a matrix using a tuple, you're going to have to wrap it (even if just with template parameters as above) to encode the m,n sizes.

@mbauman

This comment has been minimized.

Show comment
Hide comment
@mbauman

mbauman Apr 18, 2014

Member

Hah, also by coincidence, just last night I was playing with implementing element-wise comparisons for tuples. I, too, was frustrated with the [size(A)...] song and dance.

Member

mbauman commented Apr 18, 2014

Hah, also by coincidence, just last night I was playing with implementing element-wise comparisons for tuples. I, too, was frustrated with the [size(A)...] song and dance.

@tknopp

This comment has been minimized.

Show comment
Hide comment
@tknopp

tknopp Apr 18, 2014

Contributor

Its kind of funny that C++ has fixed size (stack-allocated) multi-dimensional arrays but no dynamic equivalent, while Julia has it the other way around.

Contributor

tknopp commented Apr 18, 2014

Its kind of funny that C++ has fixed size (stack-allocated) multi-dimensional arrays but no dynamic equivalent, while Julia has it the other way around.

@tknopp

This comment has been minimized.

Show comment
Hide comment
@tknopp

tknopp May 8, 2014

Contributor

To answer the original question of this issue: Yes I think we should have fixed size arrays in base. Some thoughts:

  1. I think one hard requirement is that the memory layout is tight so that ordinary arrays can have FixedArray members and one can reinterprete back and forth.
  2. One might correct me but this last point seems to imply that the FixedArray has to be immutable.
  3. One honest question is whether we need ND FixedArrays or if vectors and matrices suffice
  4. If vectors and matrices suffice one could think about making NTuple the vector type and just introduce matrix type. In this case it would be wonderfull if the parser could be made to generate a matrix from (1,0,0; 0,1,0; 0,0,1)

It seems that one central issue is to make NTuples memory layout tight. Is there actually any other possibility to implement a FixedArray with other means? Or does this have to wait until that is done? @loladiro

Contributor

tknopp commented May 8, 2014

To answer the original question of this issue: Yes I think we should have fixed size arrays in base. Some thoughts:

  1. I think one hard requirement is that the memory layout is tight so that ordinary arrays can have FixedArray members and one can reinterprete back and forth.
  2. One might correct me but this last point seems to imply that the FixedArray has to be immutable.
  3. One honest question is whether we need ND FixedArrays or if vectors and matrices suffice
  4. If vectors and matrices suffice one could think about making NTuple the vector type and just introduce matrix type. In this case it would be wonderfull if the parser could be made to generate a matrix from (1,0,0; 0,1,0; 0,0,1)

It seems that one central issue is to make NTuples memory layout tight. Is there actually any other possibility to implement a FixedArray with other means? Or does this have to wait until that is done? @loladiro

@mschauer

This comment has been minimized.

Show comment
Hide comment
@mschauer

mschauer May 8, 2014

Contributor

Yes, it is good to keep in mind that in some sense an mxnxl array is already an efficient data structure for many mxn arrays of the same size, especially with #5556 giving fast contiguous array views. Only the notational overhead for the nth subarray A[:,:,n] is much higher than in C a[n], with ellipsis slicing #5405 coming to mind.

Contributor

mschauer commented May 8, 2014

Yes, it is good to keep in mind that in some sense an mxnxl array is already an efficient data structure for many mxn arrays of the same size, especially with #5556 giving fast contiguous array views. Only the notational overhead for the nth subarray A[:,:,n] is much higher than in C a[n], with ellipsis slicing #5405 coming to mind.

@cdsousa

This comment has been minimized.

Show comment
Hide comment
@cdsousa

cdsousa Jul 8, 2014

Contributor

I've stumbled upon this issue once again, and @tknopp comment on C++ made me think: What if FixedArray was a new intrinsic/built-in just like Array or tuples (and being mutable)? Once again I thought about the awesome meta-programming-to-the-limits of C++ Eigen library. The static/dynamicness of Eigen matrix sizes (and arrays) is defined by a template parameter: an integer indicating the size (or many for higher dimensions) which can take a special value indicating that it is a dynamic size. I think that's something easily implementable in Julia, although probably not necessary.

Contributor

cdsousa commented Jul 8, 2014

I've stumbled upon this issue once again, and @tknopp comment on C++ made me think: What if FixedArray was a new intrinsic/built-in just like Array or tuples (and being mutable)? Once again I thought about the awesome meta-programming-to-the-limits of C++ Eigen library. The static/dynamicness of Eigen matrix sizes (and arrays) is defined by a template parameter: an integer indicating the size (or many for higher dimensions) which can take a special value indicating that it is a dynamic size. I think that's something easily implementable in Julia, although probably not necessary.

@tknopp

This comment has been minimized.

Show comment
Hide comment
@tknopp

tknopp Jul 8, 2014

Contributor

@cdsousa Actually the ImmutableArrays package uses a similar approach as Eigen by making the dimension a type parameter. If @Keno gets the tight memory layout for tuples implemented they would be kind of a built-in type. I don't think any C implementation can help here. This needs to be implemented on the llvm level so that the right code is emitted.

Contributor

tknopp commented Jul 8, 2014

@cdsousa Actually the ImmutableArrays package uses a similar approach as Eigen by making the dimension a type parameter. If @Keno gets the tight memory layout for tuples implemented they would be kind of a built-in type. I don't think any C implementation can help here. This needs to be implemented on the llvm level so that the right code is emitted.

@cdsousa

This comment has been minimized.

Show comment
Hide comment
@cdsousa

cdsousa Jul 9, 2014

Contributor

I don't see ImmutableArrays package doing it (here?). Maybe you wanted to say the above proposed FixedArray? Anyway, I was talking about the size type parameter which can take a special value to mean that the size is dynamic (actually it's a special integer value since C++ templates parameters have to have constant type). In Julia that would be even easily implementable since type parameters don't have to be of constant type and the dynamic value could be a singleton type. Something like

type Dynamic end
type EigenLikeArray{T, S} #= ... =# end
typealias EigenLikeVector{T} EigenLikeArray{T, (Dynamic,)}
typealias EigenLikeMatrix{T} EigenLikeArray{T, (Dynamic, Dynamic)}
typealias EigenLikeMatrix3{T} EigenLikeArray{T, (3, 3)}
typealias EigenLikeMatrixXd EigenLikeArray{Float64, (Dynamic, Dynamic)}
typealias EigenLikeMatrix3d EigenLikeArray{Float64, (3, 3)}

However, that is probably not necessary in Julia.

What I think would be nice to have in Julia is built-in fixed size and mutable arrays for small but also for large sizes (just like common Array). It seems to me that the drawback of using tuples is that they are immutable and, since they can take different types, the internal code is (this I'm not sure) more complex.
With built-in static (tight layout, stack allocated) arrays it could be emitted more efficient code (?). Then

function f(i, x)
    A = FixedVector(1, 2, 3, 4)
    A[1] = x # no bounds check
    A[i] = x # bounds check
    A[5] = x # run-time or jit-compile-time error
end

Just ignore me if what I'm saying is just plain wrong 😄

Contributor

cdsousa commented Jul 9, 2014

I don't see ImmutableArrays package doing it (here?). Maybe you wanted to say the above proposed FixedArray? Anyway, I was talking about the size type parameter which can take a special value to mean that the size is dynamic (actually it's a special integer value since C++ templates parameters have to have constant type). In Julia that would be even easily implementable since type parameters don't have to be of constant type and the dynamic value could be a singleton type. Something like

type Dynamic end
type EigenLikeArray{T, S} #= ... =# end
typealias EigenLikeVector{T} EigenLikeArray{T, (Dynamic,)}
typealias EigenLikeMatrix{T} EigenLikeArray{T, (Dynamic, Dynamic)}
typealias EigenLikeMatrix3{T} EigenLikeArray{T, (3, 3)}
typealias EigenLikeMatrixXd EigenLikeArray{Float64, (Dynamic, Dynamic)}
typealias EigenLikeMatrix3d EigenLikeArray{Float64, (3, 3)}

However, that is probably not necessary in Julia.

What I think would be nice to have in Julia is built-in fixed size and mutable arrays for small but also for large sizes (just like common Array). It seems to me that the drawback of using tuples is that they are immutable and, since they can take different types, the internal code is (this I'm not sure) more complex.
With built-in static (tight layout, stack allocated) arrays it could be emitted more efficient code (?). Then

function f(i, x)
    A = FixedVector(1, 2, 3, 4)
    A[1] = x # no bounds check
    A[i] = x # bounds check
    A[5] = x # run-time or jit-compile-time error
end

Just ignore me if what I'm saying is just plain wrong 😄

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Jul 9, 2014

Member

Isn't this exactly what the current Array type does? EDIT: by "this" I meant your Dynamic keyword.

Member

timholy commented Jul 9, 2014

Isn't this exactly what the current Array type does? EDIT: by "this" I meant your Dynamic keyword.

@tknopp

This comment has been minimized.

Show comment
Hide comment
@tknopp

tknopp Jul 9, 2014

Contributor

Ah yes I confused it a bit. ImmutableArrays just generates the appropriate immutable types as a macro.

But I am actually with you. If it is easier to implement a fixed-size array on the C level (exploiting that all entry types are the same) it would be great to have them. On the Julia level using an immutable is the only way to achieve a tight memory layout. And I am not so sure how important the mutablility of fixed-size arrays is in practice.

Contributor

tknopp commented Jul 9, 2014

Ah yes I confused it a bit. ImmutableArrays just generates the appropriate immutable types as a macro.

But I am actually with you. If it is easier to implement a fixed-size array on the C level (exploiting that all entry types are the same) it would be great to have them. On the Julia level using an immutable is the only way to achieve a tight memory layout. And I am not so sure how important the mutablility of fixed-size arrays is in practice.

@tknopp

This comment has been minimized.

Show comment
Hide comment
@tknopp

tknopp Jul 9, 2014

Contributor

@timholy: Yes the Dynamic Eigen Arrays are like Julia arrays but actually this is restricted to 1D and 2D in Eigen. I am not sure why they have used a template parameter to switch between fixed-size and dynamic array. Maybe in this way some methods can be used for both. In Julia this is not really necessary. In C++ templates one cannot declare subtype relations so that one either has to derive from the same base class or make the entire array a template parameter. The later does not work always however.

Contributor

tknopp commented Jul 9, 2014

@timholy: Yes the Dynamic Eigen Arrays are like Julia arrays but actually this is restricted to 1D and 2D in Eigen. I am not sure why they have used a template parameter to switch between fixed-size and dynamic array. Maybe in this way some methods can be used for both. In Julia this is not really necessary. In C++ templates one cannot declare subtype relations so that one either has to derive from the same base class or make the entire array a template parameter. The later does not work always however.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Jul 11, 2014

Member

There seems to be a fair amount of interest in these, so let's move the discussion over to #7568.

Member

timholy commented Jul 11, 2014

There seems to be a fair amount of interest in these, so let's move the discussion over to #7568.

@timholy timholy closed this Jul 11, 2014

@JaredCrean2 JaredCrean2 referenced this issue in OptimalDesignLab/PDESolver.jl Nov 22, 2015

Closed

Investigate Fixed Size Arrays #32

@eschnett

This comment has been minimized.

Show comment
Hide comment
@eschnett

eschnett May 6, 2016

Contributor

There is also the apparently-confusingly-named package <
https://github.com/eschnett/FlexibleArrays.jl> that provides arrays of
fixed size (among other features):

using FlexibleArrays

A (10x10) fixed-size arraytypealias Arr2d_10x10 FlexArray(1:10, 1:10)

a2 = Arr2d_10x10{Float64}(:,:)

-erik

On Wed, May 4, 2016 at 4:32 AM, Lyndon White notifications@github.com
wrote:

So the reason this is not closed by
@SimonDanisch https://github.com/SimonDanisch et al's
https://github.com/SimonDanisch/FixedSizeArrays.jl
Is because it is still being considered merging that package into Base?

But that because that package does not currently Inherit from
AbstractArray, (let alone the correct Dense Array), and so there are all
kinds of issues preventing merging it?

Or is it that alternate ideas are under consideration?


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#5857 (comment)

Erik Schnetter schnetter@gmail.com
http://www.perimeterinstitute.ca/personal/eschnetter/

Contributor

eschnett commented May 6, 2016

There is also the apparently-confusingly-named package <
https://github.com/eschnett/FlexibleArrays.jl> that provides arrays of
fixed size (among other features):

using FlexibleArrays

A (10x10) fixed-size arraytypealias Arr2d_10x10 FlexArray(1:10, 1:10)

a2 = Arr2d_10x10{Float64}(:,:)

-erik

On Wed, May 4, 2016 at 4:32 AM, Lyndon White notifications@github.com
wrote:

So the reason this is not closed by
@SimonDanisch https://github.com/SimonDanisch et al's
https://github.com/SimonDanisch/FixedSizeArrays.jl
Is because it is still being considered merging that package into Base?

But that because that package does not currently Inherit from
AbstractArray, (let alone the correct Dense Array), and so there are all
kinds of issues preventing merging it?

Or is it that alternate ideas are under consideration?


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#5857 (comment)

Erik Schnetter schnetter@gmail.com
http://www.perimeterinstitute.ca/personal/eschnetter/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment