Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Do we want fixed-size arrays? #5857

Closed
timholy opened this Issue · 64 comments
@timholy
Owner

One of the fun things you can do with Cartesian is easily generate unrolled matrix multiplication algorithms. This might be particularly useful for a fixed-size array type (see also ImmutableArrays; the major difference is that ImmutableArrays generates types called Matrix2x3 etc, whereas here the size is encoded as a parameter).

Here's a short proof-of-concept demo (which takes many shortcuts in the interest of brevity):

module FixedArrays

using Base.Cartesian
import Base: A_mul_B!

type FixedArray{T,N,SZ,L} <: DenseArray{T,N}
    data::NTuple{L,T}
end

function fixedvector{T}(::Type{T}, len::Integer, data)
    length(data) == len || error("The vector length $len is inconsistent with $(length(data)) elements")
    FixedArray{T,1,(int(len),),int(len)}(ntuple(length(x), i->convert(T, data[i])))  # needs to be faster than this
end
fixedvector(data) = fixedvector(typeof(data[1]), length(data), data)

function fixedmatrix{T}(::Type{T}, cols::Integer, rows::Integer, data)
    rows*cols == length(data) || error("The number $(length(data)) of elements supplied is not equal to $(cols*rows)")
    FixedArray{T,2,(int(cols),int(rows)),int(rows)*int(cols)}(ntuple(length(data), i->convert(T, data[i])))
end
fixedmatrix(data::AbstractMatrix) = fixedmatrix(eltype(data), size(data,1), size(data,2), data)

# For now, output is an array to bypass the slow FixedArray constructor
for N=1:4, K=1:4, M=1:4
    @eval begin
        function A_mul_B_FA!{TA,TB}(dest::AbstractMatrix, A::FixedArray{TA,2,($M,$K)}, B::FixedArray{TB,2,($K,$N)})
            TP = typeof(one(TA)*one(TB) + one(TA)*one(TB))
            @nexprs $K d2->(@nexprs $M d1->A_d1_d2 = A.data[(d2-1)*$M+d1])
            @nexprs $N d2->(@nexprs $K d1->B_d1_d2 = B.data[(d2-1)*$K+d1])
            @nexprs $N n->(@nexprs $M m->begin
                tmp = zero(TP)
                @nexprs $K k->(tmp += A_m_k * B_k_n)
                dest[m,n] = tmp
            end)
            dest
        end
    end
end

end

and the results in a session:

julia> AA = rand(2,2)
2x2 Array{Float64,2}:
 0.73037   0.525699
 0.581208  0.973336

julia> BB = rand(2,3)
2x3 Array{Float64,2}:
 0.527632  0.0111126   0.701888
 0.840294  0.00627326  0.450177

julia> include("FixedArrays.jl")

julia> A = FixedArrays.fixedmatrix(AA);

julia> B = FixedArrays.fixedmatrix(BB);

julia> C = Array(Float64, 2, 3);

julia> FixedArrays.A_mul_B_FA!(C, A, B)
2x3 Array{Float64,2}:
 0.827108  0.0114142  0.749295
 1.12455   0.0125647  0.846116

julia> AA*BB
2x3 Array{Float64,2}:
 0.827108  0.0114142  0.749295
 1.12455   0.0125647  0.846116

julia> @time (for i = 1:10^5; FixedArrays.A_mul_B_FA!(C, A, B); end)
elapsed time: 0.004008817 seconds (0 bytes allocated)

julia> @time (for i = 1:10^5; FixedArrays.A_mul_B_FA!(C, A, B); end)
elapsed time: 0.004217504 seconds (0 bytes allocated)

julia> @time (for i = 1:10^5; A_mul_B!(C, AA, BB); end)
elapsed time: 0.099574707 seconds (3856 bytes allocated)

julia> @time (for i = 1:10^5; A_mul_B!(C, AA, BB); end)
elapsed time: 0.098794949 seconds (0 bytes allocated)

Because of the use of globals, this 20-fold improvement may even underestimate the advantage this would confer.

Do we want this? If so, I recommend it as a GSoC project. It would actually be quite a lot of work, especially to get these working in conjunction with more traditional array types---the number of required methods is not small and indeed somewhat scary.

For the record, here's what the generated code looks like for the case of 2x2 multiplication (look Ma, no loops!):

julia> using Base.Cartesian

julia> M = K = N = 2
2

julia> macroexpand(:(function A_mul_B_FA!{TA,TB}(dest::AbstractMatrix, A::FixedArray{TA,2,($M,$K)}, B::FixedArray{TB,2,($K,$N)})
                   TP = typeof(one(TA)*one(TB) + one(TA)*one(TB))
                   @nexprs $K d2->(@nexprs $M d1->A_d1_d2 = A.data[(d2-1)*$M+d1])
                   @nexprs $N d2->(@nexprs $K d1->B_d1_d2 = B.data[(d2-1)*$K+d1])
                   @nexprs $N n->(@nexprs $M m->begin
                       tmp = zero(TP)
                       @nexprs $K k->(tmp += A_m_k * B_k_n)
                       dest[m,n] = tmp
                   end)
                   dest
               end))
:(function A_mul_B_FA!{TA,TB}(dest::AbstractMatrix,A::FixedArray{TA,2,(2,2)},B::FixedArray{TB,2,(2,2)}) # none, line 2:
        TP = typeof(one(TA) * one(TB) + one(TA) * one(TB)) # line 3:
        begin 
            begin 
                A_1_1 = A.data[1]
                A_2_1 = A.data[2]
            end
            begin 
                A_1_2 = A.data[3]
                A_2_2 = A.data[4]
            end
        end # line 4:
        begin 
            begin 
                B_1_1 = B.data[1]
                B_2_1 = B.data[2]
            end
            begin 
                B_1_2 = B.data[3]
                B_2_2 = B.data[4]
            end
        end # line 5:
        begin 
            begin 
                begin  # none, line 6:
                    tmp = zero(TP) # line 7:
                    begin 
                        tmp += A_1_1 * B_1_1
                        tmp += A_1_2 * B_2_1
                    end # line 8:
                    dest[1,1] = tmp
                end
                begin  # none, line 6:
                    tmp = zero(TP) # line 7:
                    begin 
                        tmp += A_2_1 * B_1_1
                        tmp += A_2_2 * B_2_1
                    end # line 8:
                    dest[2,1] = tmp
                end
            end
            begin 
                begin  # none, line 6:
                    tmp = zero(TP) # line 7:
                    begin 
                        tmp += A_1_1 * B_1_2
                        tmp += A_1_2 * B_2_2
                    end # line 8:
                    dest[1,2] = tmp
                end
                begin  # none, line 6:
                    tmp = zero(TP) # line 7:
                    begin 
                        tmp += A_2_1 * B_1_2
                        tmp += A_2_2 * B_2_2
                    end # line 8:
                    dest[2,2] = tmp
                end
            end
        end # line 10:
        dest
    end)
@timholy
Owner

I should also point out that one of the historical impediments to the popularity of ImmutableArrays is the long package-loading time:

julia> tic(); using ImmutableArrays; toc()
elapsed time: 5.436777621 seconds

Now that precompilation has arrived, this is no longer an impediment, so we may not need any kind of FixedArray in base, i.e., we may already have the perfect solution in ImmutableArrays.

CC @twadleigh

@toivoh
Collaborator

+1 for better functionality for fixed-size arrays. I'm not quite sure what kind of implementation that you are proposing? I, for one, would welcome language support for fixed-size, mutable, arrays (or at least vectors, from which it should be reasonable to build fixed-size array support). I seem to recall that this was up for discussion at some point, but I can't remember when.

@timholy
Owner

A candidate implementation was in that post. (Missing lots of functions, of course.) I agree that mutability is very nice, so rather than using a tuple I suppose one could use an Array.

@cdsousa

:+1: I would say that the performance improvement is worth it, especially for applications with lots of 2D and 3D geometry transformations, where mutability seems desirable too.

@timholy
Owner

Actually, the idea of having it simply contain an array has a certain appeal: it would make it much easier to interop with existing arrays, and give us easy interop with C:

type FixedArray{T,N,SZ,L} <: DenseArray{T,N}
    data::Array{T,N}
end

sdata(A::FixedArray) = A.data     # borrowing a function from SharedArrays...
convert{T}(Ptr{T}, A::FixedArray{T}) = pointer(A.data)

Functions for which we want to be able to mix regular arrays and FixedArrays could simply call sdata on their inputs (until someone decides it's worth it to create specialized implementations).

@jiahao
Collaborator

:+1:

One of the secret features of the generic linear algebra functions that @andreasnoackjensen and I have been silently injecting into LinAlg is that they already support matrices of arbitrary eltype, including matrices (or ought to with some tweaking). So you can already start doing linear algebra over Matrix{Matrix}. Matrix{T<:ImmutableMatrix} would be ideal to start writing up tiled matrix operations in.

And yes, that sounds like a fantastic GSoC project.

@ihnorton
Collaborator

@timholy if FixedArray contains an Array or NTuple like that, the data is not contiguous and FixedArray could not be packed in a struct. So C-interop would still be a bit hampered.

@timholy
Owner

@ihnorton, true if you're thinking about them as a field of a struct, but I was thinking more about passing them to LAPACK when needed (e.g., for factorizations).

@lindahua
Owner

Fixed-size arrays are definitely useful.

Since these arrays are usually used in applications with high performance requirement (geometric computing, etc). Therefore, computation over these arrays should be implemented as fast as possible.

However, IMO, such computation should be implemented based on SIMD instead of cartesian indexing:

Here is a snippet of C++ codes for 2x2 matrix multiplication

# suppose SSE vector a holds a 2x2 float matrix A, and b holds B
__m128 t1 = _mm_movelh_ps(a, a);   // t1 = [a11, a21, a11, a21]
__m128 t2 = _mm_moveldup_ps(b);   // t2 = [b11, b11, b12, b12]
__m128 t3 = _mm_movehl_ps(a, a);   // t3 = [a12, a22, a12, a22]
__m128 t4 = _mm_movehdup_ps(b);  // t4 = [b21, b21, b22, b22]
t1 = _mm_mul_ps(t1, t2);   // t1 = [a11 * b11, a21 * b11, a11 * b12, a21 * b12]
t3 = _mm_mul_ps(t3, t4);   // t2 = [a12 * b21, a22 * b21, a12 * b22, a22 * b22]
__m128 r = _mm_add_ps(t1, t3);  // the result

Small matrix algebra can usually be computed within a small number of cpu cycles. Even a waste of a couple CPU cycles would lead to considerable performance penalty.

Also note that BLAS & LAPACK are aimed for matrices of sufficiently large size (e.g. 32 x 32 or above) -- due to their overhead. Using BLAS & LAPACK to small fixed size matrices would be even much slower than a naive for-loop.

I have long been considering writing a package for small matrices. Just that necessary facilities for SIMD has not been supported.

@lindahua
Owner

This issue is related to #2489.

Also note that I have a C++ library developed several years ago, which contains highly optimized SIMD codes for small matrix algebra (e.g. hand-crafted kernels for matrix multiplication of (m, k) x (k, n) for each combination of m, n, k = 1, 2, 3, 4).

I'd be happy to translate all these to Julia whenever SIMD lands.

@timholy
Owner

Sounds reasonable to me. I presume it's not the case that LLVM can generate such instructions automatically?

@cdsousa

I think it wouldn't it be too hard to dispatch fixed-size matrix operations to different underlying implementations depending on the size; just as Eigen lib (AFAIK) does.

@lindahua
Owner

I don't think LLVM is smart enough to do that.

It is good to have a matrix type declared as: SmallMatrix{M, N}. When M and N is small enough, we use specialized routines. When M and N is bigger, we decompose the computation is to computation over smaller parts, each using a specialized kernel.

@lindahua
Owner

I think your FixedArray definition is more general than SmallMatrix{M, N}.

@timholy
Owner

@cdsousa, this implementation does dispatch on different sizes, because of the SZ type-parameter.

If we use an array instead of a tuple in the underlying representation, we don't need that L parameter. And presumably N can be computed as length(SZ). So I got a little carried away with the parameters :smile:.

@lindahua, your "tiled" idea seems like a good one.

@cdsousa

@timholy: oh, it's true! (I'd failed to read the macro foo part) That is great :smiley:

@timholy
Owner

@lindahua, I just checked code_native, and while I suck at reading assembly, I don't see anything that jumps out as SIMD. However, are we enabling the SLP vectorizer? I see that FPM->add(createBBVectorizePass()) is currently commented-out. A key point is that the product is written in terms of scalars---all the indexing is extracted at the beginning. Of course, that by itself still may not be enough for it to figure out how to use SIMD.

@JeffBezanson

What jumps out at me here is #2489 --- our standard matrix multiply performs horribly on small matrices. That code needs a bath.

I think eventually tuples will use native data layouts, at which point they will become immutable fixed-size vectors. Immutability is the way to go here, especially since this feature is aimed at smallish arrays, and we could use an immutable array type anyway.

@lindahua
Owner

I agree with @JeffBezanson that for small matrices, it is desirable to have immutable arrays. To me, the (potential) benefit of immutability outweighs the need to allow modifying individual elements of a small array. (In practice, small matrices/vectors are usually used as a whole).

I think @timholy's definition is a good proposal (except that I would use immutable FixedArray instead of type FixedArray). The tuple field is immutable anyway.

@timholy
Owner

Re #2489, in my hands the performance gap was even larger (20x instead of 4x). Not sure if it's because of heap vs stack, or bounds-checking vs not, or 2x3 vs 2x2, or changes in Julia since then, or me messing something up...

I'm fine with either immutability or mutability. I guess my bottom line is this: ever since we allowed a tuple of integers to be a template parameter, we've been in a position to create a general FixedArray type that plays nicely with multiple dispatch. Life will get even better with SIMD and native layout of tuples, of course, but we've now got what we need to take an "intermediate" step, if folks think we should go down this road.

(And to further clarify, here I'm trying to confine myself to "professor" mode and simply sketch a potential project, possibly for a GSoC student; I don't have time or intention to actually pursue this myself in the foreseeable future.)

@timholy
Owner

Forgot to respond to this:

have you tried this with @ArchRobison SIMD branch?

Nope. Don't know enough about SIMD to do that in a respectable fashion.

@ArchRobison
Collaborator

My simd branch is targeting loops that can broken into SIMD chunks, so it might not buy anything for fixed sized arrays. I tried it on a 8-iteration loop (with AVX enabled) and the unroller unrolled the loop first. Maybe SLP_Vectorizer can deal with the unrolled loop, or maybe I need to run the loop vectorizer before the unroller.

@lindahua
Owner

@ArchRobison: I am wondering whether the vectorizer can automatically convert the following statements into an _mm256_add_pd instruction (suppose AVX is enabled)?

# let x and y be a four tuple of Float64
z = (x[1] + y[1], x[2] + y[2], x[3] + y[3], x[4] + y[4]).

And whether it may convert the following statements into an _mm256_loadu_pd instructions

# let a be an ordinary array, 
t = (a[i], a[i+1], a[i+2], a[i+3])

We don't have to explicitly export low-level SIMD machinery if the vectorizer can deal with tuple operations in a smart way.

@ArchRobison
Collaborator

What wasn't clear to me until now is that BBVectorizePass and SLPVectorizer are distinct different LLVM passes. Julia has BBVectorizePass commented out, not surprising since it is documented as compile-time intensive and turned off in Clang by default. But SLPVectorizer is enabled by default in Clang, and so I'll try adding it to the Julia compilation chain and give it a try on your tuple example.

@ArchRobison
Collaborator

I gave it a whirl and the first thing I noticed is that a tuple access generates a bounds checks that seems gratuitous. For example, run:

function foo( a, b )
    (a[1]+b[1],a[2]+b[2],a[3]+b[3],a[4]+b[4])
end

x = 1.0f0
y = (x,x,x,x)
t = typeof(y)
println(foo(y,y))
println(code_typed(foo,(t,t)))
code_llvm(foo,(t,t))

Using Julia built from today's master source, the LLVM code looks like it has a bounds check for most of the subscripts even though I though lowering to LLVM took pains to avoid them. Am I missing something or should I file an issue about the gratuitous branches?

@jakebolewski
Collaborator

Could more intensive passes be selectively enabled during the precompilation step?

@simonster
Collaborator

@ArchRobison: Functions with no type declarations don't get specialized on tuple arguments (see #4090). If I use function foo( a::NTuple{4,Float32}, b::NTuple{4,Float32} ), then the generated code looks a lot nicer and doesn't have bounds checks. On the other hand, compiling code for every possible size of array may not always be a great idea.

@ArchRobison
Collaborator

Thanks for the explanation and link. I tried SLPVectorizer on the NTuple{4,Float32} variation of foo. Alas SLPVectorizer missed what looks like an obvious opportunity to generate a SIMD fadd. I'll poke around and try to figure what thwarted it.

@ArchRobison
Collaborator

It turns out that SLPVectorizer sometimes vectorizes tuple stuff nicely, but sometimes misses. Either lowering of tuple constructors to LLVM needs to be changed or SLPVectorizer needs to be improved.

For example, look at the transcript at https://gist.github.com/ArchRobison/9145036, which is using SLPVectorizer hacked into my SIMD branch, with LLVM 3.4 enabled. I don't know if using 3.3 makes a difference or not. The transcript shows how verbose source generates beautifully short LLVM, but short source generates verbose LLVM. :frowning:

The problem in SLPVectorizer is a sensitivity to instruction order in a basic block. It insists that uses of the scalar fadd instructions must come after those fadd instructions. For reference, here's the key logic in deps/llvm-3.4/lib/Transforms/VectorizeSLPVectorizer.cpp:

      if (UserIndex < MyLastIndex) {

        DEBUG(dbgs() << "SLP: Can't schedule extractelement for "
              << *User << ". \n");
        newTreeEntry(VL, false);
        return;
      }

I'm inclined to look into improving SLPVectorizer to remove the sensitivity. The alternative of changing the Julia lowering would undesirably increase live ranges of values.

@lindahua
Owner

I think it is useful to start adding the support of low-level SIMD instructions (probably after v0.3 released). These support may be encapsulated within an inner module (say Base.SIMD) without being exported. So that advanced users may start exploring the possibility of using these machinery to write high-performance functions.

@timholy
Owner

Pretty amazing detective work, @ArchRobison!

@lindahua
Owner

Great! looking forward to the SIMD stuff being merged.

@andrewcooke

Just tying things together. A comment very early in the thread here mentioned earlier discussion which I think would have been @bsilbaugh at https://groups.google.com/d/msg/julia-users/gZ5hAVK2PtA/rqnkGzKp6PEJ

@timholy
Owner

I'd forgotten that thread. I'll add FixedArrays to the GSoC page.

@ArchRobison
Collaborator

I have a patch for the LLVM 3.5 SLPVecctorizer that fixes the sensitivity I mentioned. My plan is to:
1. Work on getting the patch into LLVM trunk. This is going to take a while, since I have to learn the LLVM test system, create some unit tests, and go through the LLVM code review process.
2. Backport the patch, if possible, to LLVM 3.3, and submit a pull request for it, so that it can be used in the current Julia.
Are we open to having a deps/llvm-3.3.patch for this purpose? The patch file for LLVM 3.5 is 178 lines?

@JeffBezanson

I think we'd be more than open to including that patch.

@mschauer

The NTuple type used in FixedArrays above is not a "pointerfree" type. ImmutableArrays has the advantage that Array(Vector4{Int}, 10) it can be reinterpreted as a 4x10 matrix with reinterpret and reshape without memory movement (e.g. when calling a c function on the data).

@StefanKarpinski

I think the idea is to make that have the same layout, which is in principle possible.

@tknopp
Collaborator

@mschauer is there a reason why NTuple is not "pointer free"? I thought this would be the point of Kenos tuple work.

I am currently coding something involving magnetic fields where a fixed size "Vec3{Float64}" would be really handy. I tried to use NTuple{3,Float64} but the inability to reinterpret it is a major issue.

@twadleigh

When NTuple can be made to behave like a bits type in terms of its memory footprint and also provide runtime performance comparable to the analogous immutable types, I will probably deprecate the ImmutableArrays library, or at least stop using it for my own projects.

At some point I really ought to invest some time in building a relatively comprehensive performance benchmark library for ImmutableArrays, along with some alternative implementations (like one based on NTuple), so that we can assess when the transition occurs.

@Keno
Owner

There is a technical reason that hasn't been resolved yet. There's still a bit of work let to do. I've been wanting this for a long time, but I haven't had the time. It's on my short list of things to do though. For now I've been using ImmutableArrays as well, though even though it's a great library (thanks @twadleigh) it pains me that tuples can't just do this.

@tknopp
Collaborator

Ok thanks, I will use the ImmutableArrays then as well for the moment. Should NTuple in the future also allow arithmetics like the ImmutableArrays? One missing piece when using NTuple for small vectors would be the question what type to use for matrices.

@Keno
Owner

My thoughts on the matter are to use NTuple for pure storage and have an

immutable NVector{N,T}
    data::NTuple{N,T}
end

on which to define the arithmetic operations, so there might still be use for something like the immutable arrays package (or we might just put it in base). The reason for this distinction is that I'm nor sure people would expect tuples to behave like vectors, but I haven't yet seen a case where it's an actual problem.

@timholy
Owner

Note that up at the top of this issue is a candidate implementation of these ideas, including an algorithm for matrix-multiply that exploits a tuple representation. I haven't benchmarked it, but I very much doubt it's currently as fast as ImmutableArrays. (But it's a lot faster than standard Arrays, for small matrices.)

@Keno
Owner

Ah yes I forgotten about that. That's certainly along the line of what I had in mind.

@tknopp
Collaborator

I am not entirely sure what I would prefer on this point. There were situations where I wanted to do arithmetics on tuples (e.g. on the size tuple of an array).

@StefanKarpinski

I'd lean towards fewer types – i.e. just making the usual vector arithmetic work on NTuples. An interesting corner case is what to do about say (1, 2.5, 2//3) + (2, 1.5, 3//4). I think it would be pretty cool if that just produced (3, 4.0, 17//12), i.e. mixed-type vectorized operations.

@simonster
Collaborator

It seems like the specialization rules for tuples also need to change if we want to make NTuples behave like fixed-size vectors.

@ArchRobison
Collaborator

By coincidence, I was trying to create a vectorized 4x4 matrix multiplication using PR #6271, and almost got it to vectorize. (I think with some more work on LLVM I can make it vectorize). Her'e's what I was trying:

function (+){T}( a::(T,T,T,T), b::(T,T,T,T) )
    (a[1]+b[1],a[2]+b[2],a[3]+b[3],a[4]+b[4])
end

function (*){T}(a::(T,T,T,T), b::T)
    (a[1]*b,a[2]*b,a[3]*b,a[4]*b)
end

function MatMul4x4{T}( c::Array{T,2}, a::Array{T,2}, b::Array{T,2} )
    for j=1:4
        for k=1:4
            @inbounds begin
                (c[1,j],c[2,j],c[3,j],c[4,j]) = (c[1,j],c[2,j],c[3,j],c[4,j]) + (a[1,k],a[2,k],a[3,3],a[4,k])*b[k,j];
            end
        end
    end
end

t = Array{Float32,2}
code_llvm(MatMul4x4,(t,t,t))

A small matrix class with (e.g. MNMatrix{M,N,T}) might make this sort of thing more practical. Note the destructive update of c. An immutable matrix wouldn't work.

@Keno
Owner

I'm fine with defining +,- on tuples, but what does * do? Is it matrix multiply or elementwise multiplication. I think people might be expecting elementwise multiplication on tuples.

@tknopp
Collaborator

I also think that less types are better, especially as one otherwise has to decide when to use NTuple and when the fixed size vector. My only concern is what to do with matrices. Is the syntax

(1,0,0; 0,1,0; 0,0,1)

still open?

@tknopp
Collaborator

I would expect .* to do element-wise multiplication

@timholy
Owner

If you want to represent a matrix using a tuple, you're going to have to wrap it (even if just with template parameters as above) to encode the m,n sizes.

@mbauman
Collaborator

Hah, also by coincidence, just last night I was playing with implementing element-wise comparisons for tuples. I, too, was frustrated with the [size(A)...] song and dance.

@tknopp
Collaborator

Its kind of funny that C++ has fixed size (stack-allocated) multi-dimensional arrays but no dynamic equivalent, while Julia has it the other way around.

@tknopp tknopp referenced this issue in rennis250/OpenGL.jl
Open

Made OpenGL > 1.0 Plattform independant #12

@tknopp
Collaborator

To answer the original question of this issue: Yes I think we should have fixed size arrays in base. Some thoughts:
1. I think one hard requirement is that the memory layout is tight so that ordinary arrays can have FixedArray members and one can reinterprete back and forth.
2. One might correct me but this last point seems to imply that the FixedArray has to be immutable.
3. One honest question is whether we need ND FixedArrays or if vectors and matrices suffice
4. If vectors and matrices suffice one could think about making NTuple the vector type and just introduce matrix type. In this case it would be wonderfull if the parser could be made to generate a matrix from (1,0,0; 0,1,0; 0,0,1)

It seems that one central issue is to make NTuples memory layout tight. Is there actually any other possibility to implement a FixedArray with other means? Or does this have to wait until that is done? @loladiro

@mschauer

Yes, it is good to keep in mind that in some sense an mxnxl array is already an efficient data structure for many mxn arrays of the same size, especially with #5556 giving fast contiguous array views. Only the notational overhead for the nth subarray A[:,:,n] is much higher than in C a[n], with ellipsis slicing #5405 coming to mind.

@cdsousa

I've stumbled upon this issue once again, and @tknopp comment on C++ made me think: What if FixedArray was a new intrinsic/built-in just like Array or tuples (and being mutable)? Once again I thought about the awesome meta-programming-to-the-limits of C++ Eigen library. The static/dynamicness of Eigen matrix sizes (and arrays) is defined by a template parameter: an integer indicating the size (or many for higher dimensions) which can take a special value indicating that it is a dynamic size. I think that's something easily implementable in Julia, although probably not necessary.

@tknopp
Collaborator

@cdsousa Actually the ImmutableArrays package uses a similar approach as Eigen by making the dimension a type parameter. If @Keno gets the tight memory layout for tuples implemented they would be kind of a built-in type. I don't think any C implementation can help here. This needs to be implemented on the llvm level so that the right code is emitted.

@cdsousa

I don't see ImmutableArrays package doing it (here?). Maybe you wanted to say the above proposed FixedArray? Anyway, I was talking about the size type parameter which can take a special value to mean that the size is dynamic (actually it's a special integer value since C++ templates parameters have to have constant type). In Julia that would be even easily implementable since type parameters don't have to be of constant type and the dynamic value could be a singleton type. Something like

type Dynamic end
type EigenLikeArray{T, S} #= ... =# end
typealias EigenLikeVector{T} EigenLikeArray{T, (Dynamic,)}
typealias EigenLikeMatrix{T} EigenLikeArray{T, (Dynamic, Dynamic)}
typealias EigenLikeMatrix3{T} EigenLikeArray{T, (3, 3)}
typealias EigenLikeMatrixXd EigenLikeArray{Float64, (Dynamic, Dynamic)}
typealias EigenLikeMatrix3d EigenLikeArray{Float64, (3, 3)}

However, that is probably not necessary in Julia.

What I think would be nice to have in Julia is built-in fixed size and mutable arrays for small but also for large sizes (just like common Array). It seems to me that the drawback of using tuples is that they are immutable and, since they can take different types, the internal code is (this I'm not sure) more complex.
With built-in static (tight layout, stack allocated) arrays it could be emitted more efficient code (?). Then

function f(i, x)
    A = FixedVector(1, 2, 3, 4)
    A[1] = x # no bounds check
    A[i] = x # bounds check
    A[5] = x # run-time or jit-compile-time error
end

Just ignore me if what I'm saying is just plain wrong :smile:

@timholy
Owner

Isn't this exactly what the current Array type does? EDIT: by "this" I meant your Dynamic keyword.

@tknopp
Collaborator

Ah yes I confused it a bit. ImmutableArrays just generates the appropriate immutable types as a macro.

But I am actually with you. If it is easier to implement a fixed-size array on the C level (exploiting that all entry types are the same) it would be great to have them. On the Julia level using an immutable is the only way to achieve a tight memory layout. And I am not so sure how important the mutablility of fixed-size arrays is in practice.

@tknopp
Collaborator

@timholy: Yes the Dynamic Eigen Arrays are like Julia arrays but actually this is restricted to 1D and 2D in Eigen. I am not sure why they have used a template parameter to switch between fixed-size and dynamic array. Maybe in this way some methods can be used for both. In Julia this is not really necessary. In C++ templates one cannot declare subtype relations so that one either has to derive from the same base class or make the entire array a template parameter. The later does not work always however.

@timholy
Owner

There seems to be a fair amount of interest in these, so let's move the discussion over to #7568.

@timholy timholy closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.