# Arrays and views

Julia has excellent functionality for manipulating arrays and for linear algebra. We will have a quick look at this subject, which is much more complicated than you might suspect; see e.g. the talk on "Taking vector transposes seriously".

Let's define a 2x2 array (matrix):

In [1]:
M = [1 2 3; 4 5 6; 7 8 9]  # a 3x3 matrix

3×3 Array{Int64,2}:
 1  2  3
 4  5  6
 7  8  9

In [2]:
typeof(M)

Array{Int64,2}

We can extract part of the matrix using indexing notation:

In [3]:
part = M[2:3, 1:2]

2×2 Array{Int64,2}:
 4  5
 7  8

What happens if we modify `part`?

In [4]:
part[1, 1]

4

In [5]:
part[1, 1] = 10

10

In [6]:
part

2×2 Array{Int64,2}:
 10  5
  7  8

In [7]:
M

3×3 Array{Int64,2}:
 1  2  3
 4  5  6
 7  8  9

We see that `M` has *not* been modified: `part` was a **copy** of that part of `M`.

## Views

We often do *not* want a copy, but rather just a reference to the same data, which is called a `view`: 

In [8]:
V = view(M, 2:3, 1:2)

2×2 SubArray{Int64,2,Array{Int64,2},Tuple{UnitRange{Int64},UnitRange{Int64}},false}:
 4  5
 7  8

In [9]:
typeof(V)

SubArray{Int64,2,Array{Int64,2},Tuple{UnitRange{Int64},UnitRange{Int64}},false}

Although this type looks complicated, it just contains the necessary information for the object to manipulate correctly the underlying data.

If we modify `V`, then `M` also gets modified, since it is the same data:

In [10]:
V[1, 1]

4

In [11]:
V[1, 1] = 100

100

In [12]:
V

2×2 SubArray{Int64,2,Array{Int64,2},Tuple{UnitRange{Int64},UnitRange{Int64}},false}:
 100  5
   7  8

In [13]:
M

3×3 Array{Int64,2}:
   1  2  3
 100  5  6
   7  8  9

We can also write

In [14]:
@view M[2:3, 1:2]

2×2 SubArray{Int64,2,Array{Int64,2},Tuple{UnitRange{Int64},UnitRange{Int64}},false}:
 100  5
   7  8

for ease of use.

## In-place and vectorized operations: "`.`" ("pointwise")

Suppose we have two matrices and wish to add one to the other:

In [15]:
A = rand(1000, 1000)
B = rand(1000, 1000);

In [17]:
rand(3,3)

3×3 Array{Float64,2}:
 0.0425233  0.0342273  0.203924
 0.489621   0.894856   0.885921
 0.524045   0.848284   0.516949

Coming from other languages, we might expect to write `A += B`, and indeed this works:

In [16]:
A += B

1000×1000 Array{Float64,2}:
 0.894917  1.82161   0.776603  1.14853   …  1.49906   1.53684   0.755709
 0.491057  1.10481   0.949359  1.29527      1.49199   0.440834  0.471163
 1.43416   1.21442   1.54994   1.55451      1.24499   0.383604  0.869122
 0.983042  0.935295  1.06658   0.871522     1.32329   0.507601  1.26623 
 0.642209  0.790821  1.32358   0.894536     0.745363  0.939386  1.30229 
 1.18372   0.745812  0.72355   1.3018    …  1.35943   1.81162   0.411776
 0.755475  1.41659   1.78185   1.35287      0.972928  1.07237   0.974105
 1.3407    1.01308   1.20215   1.3648       0.950955  1.16851   1.25888 
 1.08027   1.02354   0.717006  1.04484      0.750393  0.389389  0.382915
 0.626611  1.04453   1.67384   0.529075     1.0948    0.59053   0.604993
 1.14356   1.47736   0.705514  0.180355  …  0.757037  1.04743   1.24072 
 1.01817   0.806034  0.775202  1.73442      1.59145   1.11337   1.15631 
 0.989007  1.45311   1.31775   0.568221     1.35395   1.93406   1.41605 
 ⋮                     

This is just "syntactic sugar" (i.e. a cute way of writing) `A = A + B`.

However, it turns out that this does not do what you think it does, namely "in-place addition", in which each element of `A` is updated in place. Rather, it allocates a new temporary object for the result of `A + B`. We can see this:

In [18]:
using BenchmarkTools

@btime $A += $B;

  4.711 ms (2 allocations: 7.63 MiB)


Note the large amount of allocation here (1,000,000 $\times$ 8 bytes)

The in-place behaviour can be obtained using **pointwise operators**:

In [19]:
A .= A .+ B

1000×1000 Array{Float64,2}:
 1.48806   2.71811   1.29136   1.53546   …  2.45377   2.41903   1.14581 
 0.553257  1.58231   1.26858   1.88117      2.24575   0.864168  0.643038
 2.42571   1.66966   2.29287   2.20951      2.14804   0.704168  1.68605 
 1.91233   1.56519   1.46322   1.53465      2.19278   0.898515  1.55234 
 1.26931   1.36936   1.98765   1.61892      1.4698    1.08885   1.85518 
 1.7296    1.1736    1.12588   2.22342   …  2.27577   2.76021   0.809113
 1.00822   2.16754   2.76748   1.91038      1.68677   1.46932   1.33694 
 2.3255    1.34683   1.58049   2.20968      1.22465   1.48771   1.58319 
 1.33098   1.87357   0.939084  1.92244      1.04163   0.610938  0.699249
 1.19329   1.607     2.40026   0.910108     1.33173   0.794208  1.03888 
 1.88614   2.22666   1.31488   0.194805  …  1.40373   1.1713    1.90584 
 1.52979   1.34197   1.47793   2.7146       2.33516   1.28797   1.39007 
 1.50289   1.99745   2.036     0.869877     2.21248   2.91095   2.22387 
 ⋮                     

In [20]:
@btime A .= A .+ B

  2.579 ms (4 allocations: 128 bytes)


1000×1000 Array{Float64,2}:
 2130.29   3220.26   1848.76   1390.23    …  3428.91    3168.59    1401.22 
  223.789  1715.31   1146.94   2104.65       2707.5     1520.21     617.503
 3561.12   1635.52   2668.7    2353.03       3243.23    1151.21    2933.65 
 3337.12   2262.27   1425.01   2381.52       3122.78    1403.89    1028.43 
 2251.92   2077.74   2385.32   2601.45       2601.47     537.531   1986.18 
 1960.86   1536.49   1445.09   3309.92    …  3291.04    3407.24    1426.85 
  908.125  2697.3    3540.21   2002.8        2563.68    1426.13    1303.56 
 3536.79   1199.18   1359.46   3034.46        983.535   1147.08    1165.52 
  901.124  3052.64    797.977  3151.62       1046.28     795.75    1136.02 
 2035.01   2020.31   2609.53   1368.43        851.656    731.794   1558.25 
 2667.03   2691.45   2188.34     52.0548  …  2322.4      445.751   2389.05 
 1837.73   1924.8    2523.56   3520.6        2671.51     627.908    840.353
 1845.85   1955.62   2579.85   1083.51       3083.47    3508

Furthermore, we can chain such operations together with no creation of temporaries:

In [46]:
C = rand(1000, 1000)

@btime A .+= B + C  # allocates

  2.626 ms (6 allocations: 7.63 MiB)


1000×1000 Array{Float64,2}:
  6871.03    7508.43   6151.07  10138.4    …   5878.45    4342.9   12620.3 
  2263.17    5360.1   11577.1    2205.83       6958.22    9295.74  11218.5 
   483.878   1856.49   3755.18  14813.2         115.145   2609.76  11516.7 
 12125.8    14842.5    3490.5   15315.2        1769.31    5973.3    2211.09
  9887.45   10989.5    9158.5   15452.5       12181.2    14525.2    6659.89
  4005.47   10119.7    7320.63   2867.35   …   4612.68   10629.1    2402.8 
 15877.0     7978.97  11259.8    5609.87       7807.92   12885.3   10213.0 
  3712.99    5512.19   9497.4   11834.9        5358.9     8356.29   2570.53
  4808.12   13984.4    2802.51  10297.9        7302.3    12307.5   11400.4 
  8550.36    9887.78  11272.6    7755.13       2589.72    7357.38  13315.8 
 11759.0     5040.06  13658.2    9208.91   …   1415.75    9477.93   6824.76
  6539.19    5129.52   4796.1    2844.78      15438.9     5459.98  10899.8 
 11661.3     8025.83  12628.0    2743.05       7688.21    35

In [47]:
@btime A .+= B .+ C  # does not allocate

  1.128 ms (4 allocations: 160 bytes)


1000×1000 Array{Float64,2}:
 14146.5   13674.5   15500.6   21106.7   …  11808.8    11325.8   20361.1 
  8400.02   8328.81  20612.0    4522.03     10871.8    14623.4   20382.8 
  1719.99   4333.43   8028.48  26482.2        238.767   4556.97  19861.2 
 21694.6   26409.9    9644.93  27830.7       6651.68   14901.1    5488.46
 17831.1   21956.4   17486.4   27811.0      22759.5    24967.1   15084.0 
  9838.66  19514.7   11734.7   10462.1   …   9127.62   19148.9    4762.53
 29448.8   14519.0   20407.8    9272.27     15940.1    24874.6   16150.0 
  5748.72  10318.7   19741.1   22521.2      12381.5    14114.1    5887.41
 13392.6   27275.9    9777.21  20438.6      11704.6    23703.1   20036.2 
 14882.3   19368.3   21752.3   16320.5       8339.06   12431.3   21157.8 
 19892.8   10699.7   26911.5   14695.3   …   2492.12   16109.2   12120.2 
 10952.5   10349.7    8455.05   6170.75     27977.8    12389.6   19107.6 
 22405.8   13025.1   24634.4    4520.95     14354.6     8298.81  24256.0 
     ⋮    

See [this blog post by Steven Johnson](https://julialang.org/blog/2017/01/moredots) for more details.

## Efficient small matrices and vectors

For small matrices and vectors, the generic vector and matrix code is too slow, since the type does not contain the information on the number of elements contained in the array, so that generic loops are used.

The `StaticArrays.jl` package fixes this problem by unrolling operations for small arrays.

In [48]:
# Pkg.add("StaticArrays")

using StaticArrays, BenchmarkTools

In [25]:
function bench()
    x = SVector(1, 2)
    y = [1, 2]
    
    @btime $x + $x
    @btime $y + $y
end

bench (generic function with 1 method)

In [26]:
bench()

  1.380 ns (0 allocations: 0 bytes)
  38.239 ns (1 allocation: 96 bytes)


2-element Array{Int64,1}:
 2
 4

In [28]:
x = SVector(1, 2)
@code_lowered x + x

CodeInfo(:(begin 
        nothing
        nothing
        return (StaticArrays.map)(StaticArrays.+, a, b)
    end))

In [29]:
@code_typed x + x

CodeInfo(:(begin 
        SSAValue(4) = a
        SSAValue(5) = b
        $(Expr(:inbounds, false))
        # meta: location /Users/dpsanders/.julia/v0.6/StaticArrays/src/mapreduce.jl map 10
        SSAValue(2) = SSAValue(4)
        SSAValue(3) = SSAValue(5)
        # meta: location /Users/dpsanders/.julia/v0.6/StaticArrays/src/mapreduce.jl _map 14
        # meta: location /Users/dpsanders/.julia/v0.6/StaticArrays/src/mapreduce.jl # line 23:
        $(Expr(:inbounds, true))
        #temp# = $(Expr(:new, SVector{2,Int64}, :((StaticArrays.tuple)((Base.add_int)((Base.getfield)((Core.getfield)(SSAValue(2), :data)::Tuple{Int64,Int64}, 1)::Int64, (Base.getfield)((Core.getfield)(SSAValue(3), :data)::Tuple{Int64,Int64}, 1)::Int64)::Int64, (Base.add_int)((Base.getfield)((Core.getfield)(SSAValue(2), :data)::Tuple{Int64,Int64}, 2)::Int64, (Base.getfield)((Core.getfield)(SSAValue(3), :data)::Tuple{Int64,Int64}, 2)::Int64)::Int64)::Tuple{Int64,Int64})))
        goto 15
        # meta: pop location


In [30]:
@code_llvm x + x


define void @"julia_+_61704"(%SArray* noalias nocapture sret, %SArray* nocapture readonly dereferenceable(16), %SArray* nocapture readonly dereferenceable(16)) #0 !dbg !5 {
top:
  %3 = getelementptr inbounds %SArray, %SArray* %1, i64 0, i32 0, i64 0
  %4 = getelementptr inbounds %SArray, %SArray* %2, i64 0, i32 0, i64 0
  %5 = load i64, i64* %3, align 8
  %6 = load i64, i64* %4, align 8
  %7 = add i64 %6, %5
  %8 = getelementptr inbounds %SArray, %SArray* %1, i64 0, i32 0, i64 1
  %9 = getelementptr inbounds %SArray, %SArray* %2, i64 0, i32 0, i64 1
  %10 = load i64, i64* %8, align 8
  %11 = load i64, i64* %9, align 8
  %12 = add i64 %11, %10
  %"#temp#.sroa.0.sroa.0.0.#temp#.sroa.0.0..sroa_cast1.sroa_idx" = getelementptr inbounds %SArray, %SArray* %0, i64 0, i32 0, i64 0
  store i64 %7, i64* %"#temp#.sroa.0.sroa.0.0.#temp#.sroa.0.0..sroa_cast1.sroa_idx", align 8
  %"#temp#.sroa.0.sroa.2.0.#temp#.sroa.0.0..sroa_cast1.sroa_idx7" = getelementptr inbounds %SArray, %SArray* %0, i64 0, i32

In [31]:
@code_native x + x

	.section	__TEXT,__text,regular,pure_instructions
Filename: linalg.jl
	pushq	%rbp
	movq	%rsp, %rbp
Source line: 23
	movq	(%rdx), %rax
	movq	8(%rdx), %rcx
	addq	(%rsi), %rax
	addq	8(%rsi), %rcx
Source line: 10
	movq	%rax, (%rdi)
	movq	%rcx, 8(%rdi)
	movq	%rdi, %rax
	popq	%rbp
	retq
	nop


In [32]:
y = [1, 2]
@code_native y + y

	.section	__TEXT,__text,regular,pure_instructions
Filename: arraymath.jl
	pushq	%rbp
	movq	%rsp, %rbp
	pushq	%r15
	pushq	%r14
	pushq	%r12
	pushq	%rbx
	subq	$64, %rsp
	movq	%rsi, %r12
	movq	%rdi, %r15
	movabsq	$jl_get_ptls_states_fast, %rax
	callq	*%rax
	movq	%rax, %r14
	movq	$0, -40(%rbp)
	movq	$0, -48(%rbp)
	movq	$4, -64(%rbp)
	movq	(%r14), %rax
	movq	%rax, -56(%rbp)
	leaq	-64(%rbp), %rax
	movq	%rax, (%r14)
Source line: 64
	movq	24(%r15), %rax
Source line: 64
	movq	24(%r12), %rcx
	xorl	%ebx, %ebx
Source line: 37
	testq	%rax, %rax
	cmovsq	%rbx, %rax
	movq	%rax, -72(%rbp)
	testq	%rcx, %rcx
	cmovsq	%rbx, %rcx
	movq	%rcx, -80(%rbp)
	movabsq	$promote_shape, %rax
	leaq	-72(%rbp), %rdi
	leaq	-80(%rbp), %rsi
	callq	*%rax
Source line: 64
	movq	24(%r15), %rax
Source line: 64
	movq	24(%r12), %rcx
Source line: 63
	testq	%rax, %rax
	cmovsq	%rbx, %rax
	movq	%rax, -88(%rbp)
	testq	%rcx, %rcx
	cmovsq	%rbx, %rcx
	movq	%rcx, -96(%rbp)
	movabsq	$_bcs1, %rax
	leaq	-88(%rbp), %rdi
	leaq	-96(%rbp), %rsi
	c