# Julia Gotchas and how to handle them

Largely inspired by http://www.stochasticlifestyle.com/7-julia-gotchas-handle/ by Chris Rackauckas.

# Gotcha 1: Global scope

The reason is that the REPL/global scope does not allow type specificity.

In [None]:
a=2.0; b=3.0
function linearcombo()
  return 2a+b
end
answer = linearcombo()

# could also be
# a = 2; b = 3
# answer = linearcombo()

@show answer;

In [None]:
@code_llvm linearcombo()

### How to avoid this issue?

One way to identify the issue is [Traceur.jl](https://github.com/MikeInnes/Traceur.jl). It is basically a coded version of the [performance tips](https://docs.julialang.org/en/v0.6.4/manual/performance-tips/#man-performance-tips-1) in the Julia documentation.

In [None]:
using Traceur
@trace linearcombo()

#### 1) Wrap code in functions.

In [None]:
function outer()
    a=2.0; b=3.0
    function linearcombo()
      return 2a+b
    end
    return linearcombo() 
end

answer = outer()

@show answer;

In [None]:
@code_llvm outer()

In [None]:
@trace outer()

#### 2) Declare globals as (compile-time) constants.

In [None]:
const A=2.0; const B=3.0
function Linearcombo()
  return 2A+B
end
answer = Linearcombo()

@show answer;

In [None]:
@code_llvm Linearcombo()

In [None]:
@trace Linearcombo()

#### Take home message: Don't write performance critical scripts in global scope, always wrap them in a function.

# Gotcha 2: Views and copies

In [None]:
a = [3;4;5]
b = a
b[1] = 1
a

In [None]:
a = rand(2,2)
b = vec(a) # Makes a view to the 2x2 matrix which is a 1-dimensional array

In [None]:
c = a[1:2,1] # Creates a copy (slice on rhs of assignment)

In [None]:
# Create a view into array a.
d = @view a[1:2,1]
e = view(a,1:2,1)
@views p = a[1:2,1]

In [None]:
a[1:2,1] = [1;2] # Modifies a in-place (slice on lhs of assignment)

In [None]:
a = Vector{Vector{Float64}}(undef, 2)
a[1] = [1;2;3]
a[2] = [4;5;6]

b = copy(a)
b[1][1] = 10 # will alter a!

b = deepcopy(a) # "recursive copy"

# Gotcha 3: Type-instabilities

What's bad for performance in the following function?

In [36]:
function g()
  x=1
  for i = 1:10
    x = x/2
  end
  return x
end

g (generic function with 1 method)

In [37]:
@code_llvm g()


; Function g
; Location: In[36]:2
; Function Attrs: uwtable
define { %jl_value_t addrspace(10)*, i8 } @julia_g_36658([8 x i8]* noalias nocapture align 8 dereferenceable(8)) #0 {
top:
; Location: In[36]:3
  br label %L10

L10:                                              ; preds = %top, %L37
  %1 = phi double [ 4.940660e-324, %top ], [ %value_phi3, %L37 ]
  %.sroa.013.0 = phi i64 [ 1, %top ], [ %5, %L37 ]
  %tindex_phi = phi i2 [ -2, %top ], [ 1, %L37 ]
  %value_phi2 = phi i64 [ 1, %top ], [ %4, %L37 ]
; Location: In[36]:4
  switch i2 %tindex_phi, label %L25 [
    i2 -2, label %L14
    i2 1, label %L27
  ]

L14:                                              ; preds = %L10
; Function /; {
; Location: int.jl:59
; Function float; {
; Location: float.jl:269
; Function Type; {
; Location: float.jl:254
; Function Type; {
; Location: float.jl:60
  %2 = sitofp i64 %.sroa.013.0 to double
;}}}}
  br label %L27

L25:                                              ; preds = %L10
  call void @jl_throw

### How to Find and Deal with Type-Instabilities

#### 1) Avoid type changes

Initialize `x` as `Float64` and it's fast.

In [38]:
function h()
  x=1.0
  for i = 1:10
    x = x/2
  end
  return x
end

h (generic function with 1 method)

In fact, it's not just fast, but as fast as it can be! Julia has figured out the result of the calculation at compile-time and returns **just the result (a literal)!**

(`h() = 9.765625e-04` at run-time.)

In [39]:
@code_llvm h()


; Function h
; Location: In[38]:2
; Function Attrs: uwtable
define double @julia_h_36663() #0 {
top:
; Location: In[38]:6
  ret double 0x3F50000000000000
}


#### 2) Detect issues with `@code_warntype` (or `@trace`)

In [40]:
@code_warntype g()

Body[91m[1m::Union{Float64, Int64}[22m[39m
[90m[63G│╻╷╷   Colon[1G[39m[90m3 [39m1 ──       (Base.ifelse)(true, 10, 0)
[90m[63G││╻╷╷   isempty[1G[39m[90m  [39m│    %2  = (Base.slt_int)(10, 1)[36m::Bool[39m
[90m[63G││    [1G[39m[90m  [39m└───       goto #3 if not %2
[90m[63G││    [1G[39m[90m  [39m2 ──       goto #4
[90m[63G││    [1G[39m[90m  [39m3 ──       goto #4
[90m[63G│     [1G[39m[90m  [39m4 ┄─ %6  = φ (#2 => true, #3 => false)[36m::Bool[39m
[90m[63G│     [1G[39m[90m  [39m│    %7  = φ (#3 => 1)[36m::Int64[39m
[90m[63G│     [1G[39m[90m  [39m│    %8  = (Base.not_int)(%6)[36m::Bool[39m
[90m[63G│     [1G[39m[90m  [39m└───       goto #15 if not %8
[90m[63G│     [1G[39m[90m  [39m5 ┄─ %10 = φ (#4 => 1, #14 => %27)[91m[1m::Union{Float64, Int64}[22m[39m
[90m[63G│     [1G[39m[90m  [39m│    %11 = φ (#4 => %7, #14 => %33)[36m::Int64[39m
[90m[63G│     [1G[39m[90m4 [39m│    %12 = (isa)(%10, Int64)[36m::Bool

(On a side note: Much better handled in Julia 1.0 by "Union splitting"! See blog post by Tim Holy: https://julialang.org/blog/2018/08/union-splitting)

In [41]:
@code_warntype h()

Body[36m::Float64[39m
[90m[64G│╻╷╷  Colon[1G[39m[90m3 [39m1 ──       (Base.ifelse)(true, 10, 0)
[90m[64G││╻╷╷  isempty[1G[39m[90m  [39m│    %2  = (Base.slt_int)(10, 1)[36m::Bool[39m
[90m[64G││   [1G[39m[90m  [39m└───       goto #3 if not %2
[90m[64G││   [1G[39m[90m  [39m2 ──       goto #4
[90m[64G││   [1G[39m[90m  [39m3 ──       goto #4
[90m[64G│    [1G[39m[90m  [39m4 ┄─ %6  = φ (#2 => true, #3 => false)[36m::Bool[39m
[90m[64G│    [1G[39m[90m  [39m│    %7  = φ (#3 => 1)[36m::Int64[39m
[90m[64G│    [1G[39m[90m  [39m│    %8  = (Base.not_int)(%6)[36m::Bool[39m
[90m[64G│    [1G[39m[90m  [39m└───       goto #10 if not %8
[90m[64G│    [1G[39m[90m  [39m5 ┄─ %10 = φ (#4 => 1.0, #9 => %12)[36m::Float64[39m
[90m[64G│    [1G[39m[90m  [39m│    %11 = φ (#4 => %7, #9 => %18)[36m::Int64[39m
[90m[64G│╻╷   /[1G[39m[90m4 [39m│    %12 = (Base.div_float)(%10, 2.0)[36m::Float64[39m
[90m[64G││╻    ==[1G[39m[90m  [3

A more drastic example:

In [42]:
f() = rand([1.0, 2, "3"])
@code_warntype f()

Body[91m[1m::Any[22m[39m
[90m[41G│            [1G[39m[90m1 [39m1 ── %1  = (Core.tuple)(1.0, 2, "3")[36m::Tuple{Float64,Int64,String}[39m
[90m[41G│╻            vect[1G[39m[90m  [39m│    %2  = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Any,1}, svec(Any, Int64), :(:ccall), 2, Array{Any,1}, 3, 3))[36m::Array{Any,1}[39m
[90m[41G││           [1G[39m[90m  [39m│    %3  = invoke Base.copyto!(%2::Array{Any,1}, %1::Tuple{Float64,Int64,String})[36m::Array{Any,1}[39m
[90m[41G│╻            rand[1G[39m[90m  [39m│    %4  = Random.GLOBAL_RNG[36m::Random.MersenneTwister[39m
[90m[41G││╻╷╷╷╷╷╷      rand[1G[39m[90m  [39m│    %5  = (Base.arraysize)(%3, 1)[36m::Int64[39m
[90m[41G│││╻╷╷╷         Type[1G[39m[90m  [39m│    %6  = (Base.slt_int)(%5, 0)[36m::Bool[39m
[90m[41G││││┃││││││││    Type[1G[39m[90m  [39m│          (Base.ifelse)(%6, 0, %5)
[90m[41G│││││╻╷╷╷         lastindex[1G[39m[90m  [39m│    %8  = (Base.arraysize)(%3, 1)[36m::Int64

In [43]:
@trace g()

└ @ In[36]:2
└ @ In[36]:4
└ @ In[36]:2


0.0009765625

#### 3) The C/Fortran way: specify types (to get errors or heal the problem by conversion)

In [44]:
function g2()
  x::Float64 = 1
  for i = 1:10
    x = x/2
  end
  return x
end

g2 (generic function with 1 method)

In [45]:
@code_llvm g2()


; Function g2
; Location: In[44]:2
; Function Attrs: uwtable
define double @julia_g2_36673() #0 {
top:
; Location: In[44]:6
  ret double 0x3F50000000000000
}


Julia can solve the conflict in `x::Float64 = 1` without further help. If it can't, you'll get an error indicating the type conflict.

In [46]:
function g3()
  x::Float64 = 3+im*2
  for i = 1:10
    x = x/2
  end
  return x
end

g3 (generic function with 1 method)

In [47]:
g3()

InexactError: InexactError: Float64(Float64, 3 + 2im)

#### 4) Function barriers

In [54]:
arr = Vector{Union{Int64,Float64}}(undef, 4)
arr[1]=4
arr[2]=2.0
arr[3]=3.2
arr[4]=1
arr

4-element Array{Union{Float64, Int64},1}:
 4  
 2.0
 3.2
 1  

In [55]:
function foo(array)
  for i in eachindex(array)
    val = array[i]
    # do algorithm X on val
    val^2
  end
end

foo (generic function with 1 method)

In [56]:
@code_warntype foo(arr)

Body[36m::Nothing[39m
[90m[58G│╻╷╷╷    eachindex[1G[39m[90m2 [39m1 ── %1  = (Base.arraysize)(array, 1)[36m::Int64[39m
[90m[58G││╻╷╷╷    axes1[1G[39m[90m  [39m│    %2  = (Base.slt_int)(%1, 0)[36m::Bool[39m
[90m[58G│││┃││││   axes[1G[39m[90m  [39m│    %3  = (Base.ifelse)(%2, 0, %1)[36m::Int64[39m
[90m[58G││╻╷╷     isempty[1G[39m[90m  [39m│    %4  = (Base.slt_int)(%3, 1)[36m::Bool[39m
[90m[58G││      [1G[39m[90m  [39m└───       goto #3 if not %4
[90m[58G││      [1G[39m[90m  [39m2 ──       goto #4
[90m[58G││      [1G[39m[90m  [39m3 ──       goto #4
[90m[58G│       [1G[39m[90m  [39m4 ┄─ %8  = φ (#2 => true, #3 => false)[36m::Bool[39m
[90m[58G│       [1G[39m[90m  [39m│    %9  = φ (#3 => 1)[36m::Int64[39m
[90m[58G│       [1G[39m[90m  [39m│    %10 = φ (#3 => 1)[36m::Int64[39m
[90m[58G│       [1G[39m[90m  [39m│    %11 = (Base.not_int)(%8)[36m::Bool[39m
[90m[58G│       [1G[39m[90m  [39m└───       goto #15 i

In [60]:
function inner_foo(val)
  # Do algorithm X on val
  val^2
end
 
function foo2(array)
  for i in eachindex(array)
    inner_foo(array[i])
  end
end

foo2 (generic function with 1 method)

In [61]:
@code_warntype inner_foo(arr[1])

Body[36m::Int64[39m
[90m[63G│╻╷ literal_pow[1G[39m[90m3 [39m1 ─ %1 = (Base.mul_int)(val, val)[36m::Int64[39m
[90m[63G│  [1G[39m[90m  [39m└──      return %1


#### Comments:

Why Allow Type-Instabilities in the first place? Convenience vs performance

Note that type instabilities can naturally occur (reading files, user input etc.) so not any red marker is bad.

Note that Julia is smart and a changing type isn't *per se* an issue:

In [62]:
function g2()
  x=1
  x=1.0
  for i = 1:10
    x = x/2
  end
  return x
end

g2 (generic function with 1 method)

In [63]:
@code_warntype g2()

Body[36m::Float64[39m
[90m[64G│╻╷╷  Colon[1G[39m[90m4 [39m1 ──       (Base.ifelse)(true, 10, 0)
[90m[64G││╻╷╷  isempty[1G[39m[90m  [39m│    %2  = (Base.slt_int)(10, 1)[36m::Bool[39m
[90m[64G││   [1G[39m[90m  [39m└───       goto #3 if not %2
[90m[64G││   [1G[39m[90m  [39m2 ──       goto #4
[90m[64G││   [1G[39m[90m  [39m3 ──       goto #4
[90m[64G│    [1G[39m[90m  [39m4 ┄─ %6  = φ (#2 => true, #3 => false)[36m::Bool[39m
[90m[64G│    [1G[39m[90m  [39m│    %7  = φ (#3 => 1)[36m::Int64[39m
[90m[64G│    [1G[39m[90m  [39m│    %8  = (Base.not_int)(%6)[36m::Bool[39m
[90m[64G│    [1G[39m[90m  [39m└───       goto #10 if not %8
[90m[64G│    [1G[39m[90m  [39m5 ┄─ %10 = φ (#4 => 1.0, #9 => %13)[36m::Float64[39m
[90m[64G│    [1G[39m[90m  [39m│    %11 = φ (#4 => %7, #9 => %19)[36m::Int64[39m
[90m[64G│    [1G[39m[90m5 [39m│    %12 = π (%10, [36mFloat64[39m)
[90m[64G││╻    /[1G[39m[90m  [39m│    %13 = (Base.div

In [64]:
@code_llvm g2()


; Function g2
; Location: In[62]:2
; Function Attrs: uwtable
define double @julia_g2_36752() #0 {
top:
; Location: In[62]:7
  ret double 0x3F50000000000000
}


#### Take home message: watch out for type-instabilities in performance critical parts of your code.

# Gotcha 4: Temporary allocations and vectorized code

In [1]:
using BenchmarkTools

In [2]:
function f()
  x = [1;5;6]
  for i in 1:100_000
    x = x + 2*x
  end
  return x
end

f (generic function with 1 method)

In [3]:
@btime f();

  9.403 ms (200001 allocations: 21.36 MiB)


### How to handle it? More dots or more explicity

https://julialang.org/blog/2017/01/moredots

https://github.com/JuliaLang/www.julialang.org/blob/master/blog/_posts/moredots/More-Dots.ipynb

In [4]:
function f()
    x = [1;5;6]
    for i in 1:100_000    
        for k in 1:3
            x[k] = x[k] + 2 * x[k]
        end
    end
    return x
end
@btime f();

  337.920 μs (1 allocation: 112 bytes)


In [5]:
function f()
    x = [1;5;6]
    for i in 1:100_000
        x = x .+ 2 .* x
    end
    return x
end
@btime f();

  3.740 ms (100001 allocations: 10.68 MiB)


In [6]:
function f()
    x = [1;5;6]
    for i in 1:100_000
        x .= x .+ 2 .* x
    end
    return x
end
@btime f();

  372.054 μs (1 allocation: 112 bytes)


In [7]:
function f()
    x = [1;5;6]
    for i in 1:100_000
        @. x = x + 2*x
        # or @. x = x + 2*x
    end
    return x
end
@btime f();

  382.720 μs (1 allocation: 112 bytes)


In [8]:
function f()
    x = [1;5;6]
    @inbounds for i in 1:100_000    
        for k in 1:3 # @simd
            x[k] = x[k] + 2*x[k]
        end
    end
    return x
end
@btime f();

  107.093 μs (1 allocation: 112 bytes)


# Gotcha 5: Julia + MKL incompatible with PyPlot (numpy)

https://github.com/JuliaPy/PyCall.jl/issues/443

# Gotcha 6: Writing to global scope

In [8]:
# Try this in the Julia REPL
a = 0
for i in 1:10
    a += i
end

See "official" discussion here: https://github.com/JuliaLang/julia/issues/28789

# Gotcha 7: Abstract fields

In [1]:
using BenchmarkTools

In [2]:
struct A
    x::AbstractFloat
    y::AbstractString
end

f(a::A) = a.x * a.x

f (generic function with 1 method)

In [3]:
a = A(3.0, "test")

@btime f($a);

  16.386 ns (1 allocation: 16 bytes)


In [4]:
struct B
    x::Float64
    y::String
end

f(b::B) = b.x * b.x

f (generic function with 2 methods)

In [5]:
b = B(3.0, "test")

@btime f($b);

  1.236 ns (0 allocations: 0 bytes)


Note that the latter implementation is **about 13x faster**!

### How to handle it?

But what if I want to accept any kind of `AbstractFloat` and `AbstractString` in my type?

Use type parameters!

In [6]:
struct C{F<:AbstractFloat, S<:AbstractString}
    x::F
    y::S
end

f(c::C) = c.x * c.x

f (generic function with 3 methods)

In [7]:
c = C(3.0, "test")

C{Float64,String}(3.0, "test")

From the type alone the compiler knows what the structure contains and can produce optimal code:

In [8]:
@btime f($c);

  1.236 ns (0 allocations: 0 bytes)


In [15]:
c = C(Float32(3.0), SubString("test"))

C{Float32,SubString{String}}(3.0f0, "test")

In [16]:
@btime f($c);

  1.236 ns (0 allocations: 0 bytes)


# Gotcha 8: Column major order

In [3]:
M = rand(1000,1000);

function fcol(M)
    for col in 1:size(M, 2)
        for row in 1:size(M, 1)
            M[row, col] = 42
        end
    end
    nothing
end

function frow(M)
    for row in 1:size(M, 1)
        for col in 1:size(M, 2)
            M[row, col] = 42
        end
    end
    nothing
end

frow (generic function with 1 method)

In [4]:
@btime fcol($M)

  396.700 μs (0 allocations: 0 bytes)


In [5]:
@btime frow($M)

  1.742 ms (0 allocations: 0 bytes)


### How to handle it?

Take care and remember: fastest varying index goes first!

# Gotcha 9: Lazy operations

In [19]:
using LinearAlgebra

Let's say we want to calculate `B = A + (A' + 2*I)`.

In [90]:
A = [1 2; 3 4]
A + (A' + 2*I)

2×2 Array{Int64,2}:
 4   5
 5  10

Now let's assume that, for some reason, we want to implement it more explicitly, something along the lines of

In [91]:
function calc(A)
    B = A'
    B[1,1] += 2
    B[2,2] += 2
    A + B
end

calc (generic function with 1 method)

Let's check for correctness.

In [92]:
calc([1 2; 3 4]) == A + (A' + 2*I)

false

Somehow it's not correct! **Why?**

### How to solve this?

The "issue" is that `A'` makes a lazy adjoint of `A`. It is just another way of looking at the same piece of memory! Hence, when we do `B[1,1] += 1` we are actually changing `A`, leading to a wrong result. We can heal this by enforcing a `copy`:

In [93]:
function calc_corrected(A)
    B = copy(A')
    B[diagind(B)] .+= 2
    A + B
end

calc_corrected (generic function with 1 method)

In [94]:
calc_corrected([1 2; 3 4]) == A + (A' + 2*I)

true

This isn't really an issue. In fact, this lazyness (+ allocation free identity matrix) is precisley the reason why the straightforward solution is fast!

In [96]:
using BenchmarkTools

function calc_straightforward(A)
    A + (A' + 2*I)
end

@btime calc($[1 2; 3 4]);
@btime calc_corrected($[1 2; 3 4]);
@btime calc_straightforward($[1 2; 3 4]);

  50.644 ns (2 allocations: 128 bytes)
  166.198 ns (5 allocations: 400 bytes)
  87.744 ns (3 allocations: 240 bytes)


Compare this to Julia v0.6:

```julia
using BenchmarkTools
A = [1 2; 3 4]
function calc_straightforward(A)
    A + (A' + 2*I)
end
```

which gives (on the same machine):

```julia
julia> @btime calc_straightforward($[1 2; 3 4]);
  115.817 ns (3 allocations: 336 bytes)
```

# Tipp: Comprehensions and Generators

Comprehensions: https://docs.julialang.org/en/stable/manual/arrays/#Comprehensions-1

Generators: https://docs.julialang.org/en/stable/manual/arrays/#Generator-Expressions-1

In [6]:
sum([k for k in 1:10])

55

In [7]:
sum(k for k in 1:10)

55

In [19]:
typeof(k for k in 1:10)

Base.Generator{UnitRange{Int64},getfield(Main, Symbol("##27#28"))}

Does it matter?

In [9]:
using BenchmarkTools

@btime sum([k for k in 1:10]);
@btime sum(k for k in 1:10);

  35.556 ns (1 allocation: 160 bytes)
  2.782 ns (0 allocations: 0 bytes)


In [51]:
collect(k*i for k in 1:10 for i in 1:10 if i+k < 10) # collecting values of a generator (iteratable)

36-element Array{Int64,1}:
  1
  2
  3
  4
  5
  6
  7
  8
  2
  4
  6
  8
 10
  ⋮
 16
 20
  5
 10
 15
 20
  6
 12
 18
  7
 14
  8

In [62]:
# convoluted example
using Random
map(tuple, (k*i for k in 1:10 for i in 1:10 if i+k < 10), (randstring() for i in 1:36))

36-element Array{Tuple{Int64,String},1}:
 (1, "XXT2WMSO") 
 (2, "Pku4fULM") 
 (3, "dY64oJWH") 
 (4, "JLDAkNaz") 
 (5, "X44wColb") 
 (6, "A1f4A0Dk") 
 (7, "lBMpw86v") 
 (8, "wqKGkL5s") 
 (2, "7MBmEB7H") 
 (4, "X8cfm8wr") 
 (6, "NxA9bh5q") 
 (8, "0dEmtida") 
 (10, "yhw33Hzg")
 ⋮               
 (16, "cn3EIBo9")
 (20, "KGgxeoyC")
 (5, "bjycWO9J") 
 (10, "CQ7pw2Zh")
 (15, "ByYEJY0C")
 (20, "7kIXamKO")
 (6, "DLW4AOhd") 
 (12, "Db4WL3vh")
 (18, "M4sLfZbC")
 (7, "BpUbOGkB") 
 (14, "6PoD345Z")
 (8, "OVnU9Ov9") 

# Tipp: Allocation free sum of absolute squared values

In [72]:
x = rand(100_000);

In [85]:
sum(abs2.(x))

33403.3388731547

In [83]:
using BenchmarkTools

@btime sum(abs2.($x))
@btime sum(abs2, $x) # avoids temporary allocations

  64.309 μs (2 allocations: 781.33 KiB)
  12.676 μs (0 allocations: 0 bytes)


33403.33887315471

In [92]:
# But we are working in Julia, so explicit implementations are fast!
function sumabs2(x)
    r = zero(eltype(x))
    @simd for i in 1:length(x)
        @inbounds r += x[i]^2
    end
    r
end

@btime sumabs2($x)

  11.748 μs (0 allocations: 0 bytes)


33403.33887315469