# Specialization and code inspection

 <img src="../presentation/images/from_source_to_native.png" alt="drawing" width="800"/>

Internally, the compiler generates specialized code for particular input types.

When a function is called for the first time, julia compiles a specific version of the function for the given input types.

There are multiple stages and we can look into all of them using a bunch of macros:

* The AST after parsing <- Macros (`@macroexpand`)
* The AST after lowering (`@code_typed`, `@code_warntype`)
* The AST after type inference and optimization <- Generated Functions (`@code_lowered`)
* The LLVM IR <- Functions (`@code_llvm`)
* The assembly code (`@code_native`)

AST = Abstract Syntax Tree

In [1]:
myadd(x,y) = x + y

myadd (generic function with 1 method)

In [4]:
dump(:(myadd(3,4)))

Expr
  head: Symbol call
  args: Array{Any}((3,))
    1: Symbol myadd
    2: Int64 3
    3: Int64 4


In [67]:
@time myadd(1,1)

  0.002392 seconds (756 allocations: 44.626 KiB)


2

In [68]:
@time myadd(1,1)

  0.000002 seconds (4 allocations: 160 bytes)


2

In [69]:
@time myadd(1,1)

  0.000002 seconds (4 allocations: 160 bytes)


2

In [71]:
@code_typed myadd(1,1)

CodeInfo(
[90m[74G│╻ +[1G[39m[90m1 [39m1 ─ %1 = (Base.add_int)(x, y)[36m::Int64[39m
[90m[74G│ [1G[39m[90m  [39m└──      return %1
) => Int64

In [72]:
@code_lowered myadd(1,1)

CodeInfo(
[90m[77G│[1G[39m[90m1 [39m1 ─ %1 = x + y
[90m[77G│[1G[39m[90m  [39m└──      return %1
)

In [73]:
@code_llvm myadd(1,1)


; Function myadd
; Location: In[66]:1
; Function Attrs: uwtable
define i64 @julia_myadd_35687(i64, i64) #0 {
top:
; Function +; {
; Location: int.jl:53
  %2 = add i64 %1, %0
;}
  ret i64 %2
}


In [18]:
@code_native myadd(1,1)

	.text
; Function myadd {
; Location: In[13]:1
	pushq	%rbp
	movq	%rsp, %rbp
; Function +; {
; Location: int.jl:53
	leaq	(%rcx,%rdx), %rax
;}
	popq	%rbp
	retq
	nopw	(%rax,%rax)
;}


Let's compare with `Float64` input.

In [17]:
@code_native myadd(1.0, 2.0)

	.text
; Function myadd {
; Location: In[13]:1
	pushq	%rbp
	movq	%rsp, %rbp
; Function +; {
; Location: float.jl:395
	vaddsd	%xmm1, %xmm0, %xmm0
;}
	popq	%rbp
	retq
	nopw	(%rax,%rax)
;}


## Specialization is important!

Let's try to estimate the performance of our `myadd` function if julia wouldn't specialize. We mimic this situation by wrapping our floating point numbers into a custom type which internally stores them as `Any`s.

In [12]:
struct Anything
    value::Any
end

add(x::Number,y::Number) = x + y
add(x::Anything,y::Anything) = x.value + y.value

add (generic function with 2 methods)

In [15]:
@time add(1, 2);
@time add(1.0, 2.0);

x = Anything(1.0)
y = Anything(2.0)
@time add(x,y);

  0.000002 seconds (4 allocations: 160 bytes)
  0.000002 seconds (5 allocations: 176 bytes)
  0.000002 seconds (5 allocations: 176 bytes)


Oh, seems to be equally fast. Screw specialization.

**Benchmarking isn't trivial!** There are tools in Julia that help you avoid the most common mistakes.

### Interlude: BenchmarkTools.jl

In [34]:
using BenchmarkTools

In [20]:
x = rand(2,2)
@time zero(x)
@time zero(x)

  0.010022 seconds (12.60 k allocations: 671.829 KiB)
  0.000002 seconds (5 allocations: 272 bytes)


2×2 Array{Float64,2}:
 0.0  0.0
 0.0  0.0

In [21]:
@time zero(1)
@time zero(1)

  0.000009 seconds (5 allocations: 240 bytes)
  0.000002 seconds (4 allocations: 160 bytes)


0

This must be faster...

In [39]:
@benchmark zero(x)

BenchmarkTools.Trial: 
  memory estimate:  112 bytes
  allocs estimate:  1
  --------------
  minimum time:     37.325 ns (0.00% GC)
  median time:      38.880 ns (0.00% GC)
  mean time:        49.817 ns (13.54% GC)
  maximum time:     32.641 μs (99.79% GC)
  --------------
  samples:          10000
  evals/sample:     994

In [40]:
@benchmark zero(1)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     0.001 ns (0.00% GC)
  median time:      0.001 ns (0.00% GC)
  mean time:        0.025 ns (0.00% GC)
  maximum time:     14.223 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

That make more sense!

Typically we don't need all this information. Just use `@btime` instead of `@time`!

In [46]:
@btime zero(x);
@btime zero(1);

  37.870 ns (1 allocation: 112 bytes)
  0.001 ns (0 allocations: 0 bytes)


Some more features

In [51]:
@btime zero($x); # interpolate the value of x into the expression to avoid overhead of globals

  25.970 ns (1 allocation: 112 bytes)


In [50]:
@btime zero(x) setup=(x=rand(2,2));

  25.662 ns (1 allocation: 112 bytes)


See [BenchmarkTools.jl](https://github.com/JuliaCI/BenchmarkTools.jl/blob/master/doc/manual.md) for more information.

### Back to benchmarking Specialization

In [52]:
@btime add(1, 2);
@btime add(1.0, 2.0);

x = Anything(1.0)
y = Anything(2.0)
@btime add($x,$y);

  0.001 ns (0 allocations: 0 bytes)
  0.001 ns (0 allocations: 0 bytes)
  21.024 ns (1 allocation: 16 bytes)


**That's about 20000 times slower!**

## Explicit typing

Note that Julia's type inference is powerful. Specifying types isn't necessary for best performance!

In [92]:
function my_function(x)
    y = rand()
    z = rand()
    x+y+z
end

function my_function_typed(x::Int)::Float64
    y::Float64 = rand()
    z::Float64 = rand()
    x+y+z
end

my_function_typed (generic function with 1 method)

In [93]:
@btime my_function(10);
@btime my_function_typed(10);

  6.492 ns (0 allocations: 0 bytes)
  6.492 ns (0 allocations: 0 bytes)


 However it can serve one of the following purposes

* **Define a user interface** (will error if incompatible type is given)
* Enforce conversions
* Help the compiler infer types in tricky situations

In [64]:
add_first_two(x) = x[1] + x[2]

add_first_two (generic function with 1 method)

In [70]:
add_first_two(1:10)

3

In [66]:
add_first_two(3)

BoundsError: BoundsError

In [67]:
add_first_two_better(x::AbstractArray) = x[1] + x[2]

add_first_two_better (generic function with 1 method)

In [69]:
add_first_two_better(3) # better error message

MethodError: MethodError: no method matching add_first_two_better(::Int64)
Closest candidates are:
  add_first_two_better(!Matched::AbstractArray) at In[67]:1

In [73]:
add_first_two_better(split("Das ist ein Test!"))

MethodError: MethodError: no method matching +(::SubString{String}, ::SubString{String})
Closest candidates are:
  +(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:502

In [74]:
typeof(split("Das ist ein Test!"))

Array{SubString{String},1}

To make an even preciser interface we have to learn about parametric types first.

### Coming back to our add_first_two example

**Quick exercise**: define `add_first_two_even_better` as a refined version of `add_first_two_better`. It should take a reasonable subset of all `AbstractArrays`.