# Understandable performance
*Going fast, nowhere*

## A note on benchmarking
*Premature optimization is the root of all evil* & *If you don't measure you won't improve*

### Tools
1. BenchmarkTools.jl https://github.com/JuliaCI/BenchmarkTools.jl
2. Profiler https://docs.julialang.org/en/latest/manual/profile/
3. ProfileView.jl https://github.com/timholy/ProfileView.jl
4. VTunes/Perf/OProfile https://docs.julialang.org/en/latest/manual/profile/#External-Profiling-1

## BenchmarkTools.jl
Solid package that tries to eliminate common pitfalls in performance measurment.
- `@benchmark` macro that will repeatedly evaluate your code to gain enough samples
- Caveat: You probably want to escape `$` your input data

In [2]:
data = rand(2^10);

In [5]:
using BenchmarkTools
@benchmark sum($data)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     76.940 ns (0.00% GC)
  median time:      76.966 ns (0.00% GC)
  mean time:        79.347 ns (0.00% GC)
  maximum time:     492.186 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     970

![Compiler](compiler.png)

![Compiler Stages](compiler-stages.png)

## Figuring out what is happening
The stages of the compiler
- `@code_lowered`
- `@code_typed` & `@code_warntype`
- `@code_llvm`
- `@code_native`

Where is a function defined
`@which` & `@edit`

In [1]:
##########################
# Low-level benchmarking #
##########################
using LLVM
using LLVM.Interop

 """
    clobber()
 Force the compiler to flush pending writes to global memory.
Acts as an effective read/write barrier.
"""
@inline clobber() = @asmcall("", "~{memory}", true) 

"""
    escape(val)
 The `escape` function can be used to prevent a value or
expression from being optimized away by the compiler. This function is
intended to add little to no overhead.
See: https://youtu.be/nXaxk27zwlk?t=2441
"""
@inline escape(val::T) where T = @asmcall("", "X,~{memory}", true, Nothing, Tuple{T}, val)

┌ Info: Recompiling stale cache file /home/vchuravy/.julia/compiled/v1.0/LLVM/e8NBy.ji for LLVM [929cbde3-209d-540e-8aea-75f648917ca0]
└ @ Base loading.jl:1184


escape (generic function with 1 method)

# A simple example first counting

In [3]:
function f(N)
    acc = 0
    for i in 1:N
        acc += 1
    end
    return acc
end

f (generic function with 1 method)

In [68]:
N = 100_000_000
result = @benchmark f($N)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.269 ns (0.00% GC)
  median time:      1.454 ns (0.00% GC)
  mean time:        1.476 ns (0.00% GC)
  maximum time:     18.439 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

In [69]:
t = time(minimum(result)) # in ns
N / (t * 1e-9) # in Hz

7.880220646178093e16

So we are doing 100 million additions in 1.2ns.
So our processor is operating at 70 PHz...

We wish...

What is going on?

In [67]:
@benchmark f($(10*N))

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.268 ns (0.00% GC)
  median time:      1.540 ns (0.00% GC)
  mean time:        1.470 ns (0.00% GC)
  maximum time:     18.724 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

In [30]:
@code_lowered f(N)

CodeInfo(
[90m[77G│[1G[39m[90m2 [39m1 ─       acc = 0
[90m[77G│[1G[39m[90m3 [39m│   %2  = 1:N
[90m[77G│[1G[39m[90m  [39m│         #temp# = (Base.iterate)(%2)
[90m[77G│[1G[39m[90m  [39m│   %4  = #temp# === nothing
[90m[77G│[1G[39m[90m  [39m│   %5  = (Base.not_int)(%4)
[90m[77G│[1G[39m[90m  [39m└──       goto #4 if not %5
[90m[77G│[1G[39m[90m  [39m2 ┄ %7  = #temp#
[90m[77G│[1G[39m[90m  [39m│         i = (Core.getfield)(%7, 1)
[90m[77G│[1G[39m[90m  [39m│   %9  = (Core.getfield)(%7, 2)
[90m[77G│[1G[39m[90m4 [39m│         acc = acc + 1
[90m[77G│[1G[39m[90m  [39m│         #temp# = (Base.iterate)(%2, %9)
[90m[77G│[1G[39m[90m  [39m│   %12 = #temp# === nothing
[90m[77G│[1G[39m[90m  [39m│   %13 = (Base.not_int)(%12)
[90m[77G│[1G[39m[90m  [39m└──       goto #4 if not %13
[90m[77G│[1G[39m[90m  [39m3 ─       goto #2
[90m[77G│[1G[39m[90m6 [39m4 ─       return acc
)

In [31]:
@code_typed optimize=false f(N)

CodeInfo(
[90m[77G│[1G[39m[90m2 [39m1 ─       (acc = 0)[90m::Const(0, false)[39m
[90m[77G│[1G[39m[90m3 [39m│   %2  = (1:N)[36m::UnitRange{Int64}[39m
[90m[77G│[1G[39m[90m  [39m│         (#temp# = (Base.iterate)(%2))[90m::Union{Nothing, Tuple{Int64,Int64}}[39m
[90m[77G│[1G[39m[90m  [39m│   %4  = (#temp# === nothing)[36m::Bool[39m
[90m[77G│[1G[39m[90m  [39m│   %5  = (Base.not_int)(%4)[36m::Bool[39m
[90m[77G│[1G[39m[90m  [39m└──       goto #4 if not %5
[90m[77G│[1G[39m[90m  [39m2 ┄ %7  = #temp#::Tuple{Int64,Int64}[36m::Tuple{Int64,Int64}[39m
[90m[77G│[1G[39m[90m  [39m│         (i = (Core.getfield)(%7, 1))[90m::Int64[39m
[90m[77G│[1G[39m[90m  [39m│   %9  = (Core.getfield)(%7, 2)[36m::Int64[39m
[90m[77G│[1G[39m[90m4 [39m│         (acc = acc + 1)[90m::Int64[39m
[90m[77G│[1G[39m[90m  [39m│         (#temp# = (Base.iterate)(%2, %9))[90m::Union{Nothing, Tuple{Int64,Int64}}[39m
[90m[77G│[1G[39m[90m  [39m│  

In [32]:
@code_typed optimize=true f(N)

CodeInfo(
[90m[55G│╻╷╷╷╷ Colon[1G[39m[90m3 [39m1 ── %1  = (Base.sle_int)(1, N)[36m::Bool[39m
[90m[55G││╻     Type[1G[39m[90m  [39m│          (Base.sub_int)(N, 1)[90m::Int64[39m
[90m[55G│││┃     unitrange_last[1G[39m[90m  [39m│    %3  = (Base.ifelse)(%1, N, 0)[36m::Int64[39m
[90m[55G││╻╷╷   isempty[1G[39m[90m  [39m│    %4  = (Base.slt_int)(%3, 1)[36m::Bool[39m
[90m[55G││    [1G[39m[90m  [39m└───       goto #3 if not %4
[90m[55G││    [1G[39m[90m  [39m2 ──       goto #4
[90m[55G││    [1G[39m[90m  [39m3 ──       goto #4
[90m[55G│     [1G[39m[90m  [39m4 ┄─ %8  = φ (#2 => true, #3 => false)[36m::Bool[39m
[90m[55G│     [1G[39m[90m  [39m│    %9  = φ (#3 => 1)[36m::Int64[39m
[90m[55G│     [1G[39m[90m  [39m│    %10 = (Base.not_int)(%8)[36m::Bool[39m
[90m[55G│     [1G[39m[90m  [39m└───       goto #10 if not %10
[90m[55G│     [1G[39m[90m  [39m5 ┄─ %12 = φ (#4 => 0, #9 => %14)[36m::Int64[39m
[90m[55G│     [1G

In [33]:
@code_llvm optimize=false f(10)


; Function f
; Location: In[3]:2
define i64 @julia_f_37380(i64) {
top:
  %1 = call %jl_value_t*** @julia.ptls_states()
  %2 = bitcast %jl_value_t*** %1 to %jl_value_t addrspace(10)**
  %3 = getelementptr inbounds %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %2, i64 3
  %4 = bitcast %jl_value_t addrspace(10)** %3 to i64**
  %5 = load i64*, i64** %4
; Location: In[3]:3
; Function Colon; {
; Location: range.jl:5
; Function Type; {
; Location: range.jl:255
; Function unitrange_last; {
; Location: range.jl:260
; Function >=; {
; Location: operators.jl:333
; Function <=; {
; Location: int.jl:428
  %6 = icmp sle i64 1, %0
  %7 = zext i1 %6 to i8
;}}
; Function -; {
; Location: int.jl:52
  %8 = sub i64 %0, 1
;}
  %9 = trunc i8 %7 to i1
  %10 = xor i1 %9, true
  %11 = select i1 %10, i64 0, i64 %0
;}}}
; Function iterate; {
; Location: range.jl:571
; Function isempty; {
; Location: range.jl:455
; Function >; {
; Location: operators.jl:286
; Function <; {
; Location: int.jl:49
  %12 =

In [34]:
@code_llvm optimize=true f(10)


; Function f
; Location: In[3]:2
define i64 @julia_f_37380(i64) {
top:
; Location: In[3]:3
; Function Colon; {
; Location: range.jl:5
; Function Type; {
; Location: range.jl:255
; Function unitrange_last; {
; Location: range.jl:260
; Function >=; {
; Location: operators.jl:333
; Function <=; {
; Location: int.jl:428
  %1 = icmp sgt i64 %0, 0
;}}}}}
  %spec.select = select i1 %1, i64 %0, i64 0
; Location: In[3]:6
  ret i64 %spec.select
}


In [36]:
@code_native f(10)

	.text
; Function f {
; Location: In[3]:2
	movq	%rdi, %rax
	sarq	$63, %rax
	andnq	%rdi, %rax, %rax
; Location: In[3]:6
	retq
	nopl	(%rax)
;}


# Conclusion

LLVM realised that our loop.

```julia
for i in 1:N
  acc += 1
end
```

Just ended up being $acc = 1 * N$

# Exercise

What happens with:

```julia
function h(N)
    acc = 0.0
    for i in 1:N
        acc += 1.0
    end
    acc
end
```

and

```julia
function g(N)
    acc = 0
    for i in 1:N
        acc += 1.0
    end
    acc
end
```
    

In [53]:
function h(N)
    acc = 0.0
    for i in 1:N
        acc += 1.0
    end
    acc
end

h (generic function with 1 method)

In [55]:
@code_native h(10)

	.text
; Function h {
; Location: In[53]:2
	vxorpd	%xmm0, %xmm0, %xmm0
; Location: In[53]:3
; Function Colon; {
; Location: range.jl:5
; Function Type; {
; Location: range.jl:255
; Function unitrange_last; {
; Location: range.jl:260
; Function >=; {
; Location: operators.jl:333
; Function <=; {
; Location: int.jl:428
	testq	%rdi, %rdi
;}}}}}
	jle	L42
	movabsq	$139722268546304, %rax  # imm = 0x7F13A02F1500
	vmovsd	(%rax), %xmm1           # xmm1 = mem[0],zero
	nopw	(%rax,%rax)
; Location: In[53]:4
; Function +; {
; Location: float.jl:395
L32:
	vaddsd	%xmm1, %xmm0, %xmm0
;}
; Function iterate; {
; Location: range.jl:575
; Function ==; {
; Location: promotion.jl:425
	addq	$-1, %rdi
;}}
	jne	L32
; Location: In[53]:6
L42:
	retq
	nopl	(%rax,%rax)
;}


In [56]:
function g(N)
    acc = 0
    for i in 1:N
        acc += 1.0
    end
    acc
end

g (generic function with 1 method)

In [57]:
@code_warntype g(10)

Body[91m[1m::Union{Float64, Int64}[22m[39m
[90m[55G│╻╷╷╷╷ Colon[1G[39m[90m3 [39m1 ── %1  = (Base.sle_int)(1, N)[36m::Bool[39m
[90m[55G││╻     Type[1G[39m[90m  [39m│          (Base.sub_int)(N, 1)
[90m[55G│││┃     unitrange_last[1G[39m[90m  [39m│    %3  = (Base.ifelse)(%1, N, 0)[36m::Int64[39m
[90m[55G││╻╷╷   isempty[1G[39m[90m  [39m│    %4  = (Base.slt_int)(%3, 1)[36m::Bool[39m
[90m[55G││    [1G[39m[90m  [39m└───       goto #3 if not %4
[90m[55G││    [1G[39m[90m  [39m2 ──       goto #4
[90m[55G││    [1G[39m[90m  [39m3 ──       goto #4
[90m[55G│     [1G[39m[90m  [39m4 ┄─ %8  = φ (#2 => true, #3 => false)[36m::Bool[39m
[90m[55G│     [1G[39m[90m  [39m│    %9  = φ (#3 => 1)[36m::Int64[39m
[90m[55G│     [1G[39m[90m  [39m│    %10 = (Base.not_int)(%8)[36m::Bool[39m
[90m[55G│     [1G[39m[90m  [39m└───       goto #15 if not %10
[90m[55G│     [1G[39m[90m  [39m5 ┄─ %12 = φ (#4 => 0, #14 => %27)[91m[1m::Union{

In [46]:
function k(::Type{T}, N) where T
    acc = zero(T)
    for i in 1:N
        acc += one(T)
        clobber()
    end
    return acc
end

k (generic function with 1 method)

In [49]:
@code_native k(Float64, 10)

	.text
; Function k {
; Location: In[46]:2
	vxorpd	%xmm0, %xmm0, %xmm0
; Location: In[46]:3
; Function Colon; {
; Location: range.jl:5
; Function Type; {
; Location: range.jl:255
; Function unitrange_last; {
; Location: range.jl:260
; Function >=; {
; Location: operators.jl:333
; Function <=; {
; Location: int.jl:428
	testq	%rsi, %rsi
;}}}}}
	jle	L42
	movabsq	$139722268545512, %rax  # imm = 0x7F13A02F11E8
	vmovsd	(%rax), %xmm1           # xmm1 = mem[0],zero
	nopw	(%rax,%rax)
; Location: In[46]:4
; Function +; {
; Location: float.jl:395
L32:
	vaddsd	%xmm1, %xmm0, %xmm0
;}
; Location: In[46]:5
; Function iterate; {
; Location: range.jl:575
; Function ==; {
; Location: base.jl:42
	addq	$-1, %rsi
;}}
	jne	L32
; Location: In[46]:7
L42:
	retq
	nopl	(%rax,%rax)
;}


In [52]:
@code_native k(Int64, 10)

	.text
; Function k {
; Location: In[46]:3
; Function Colon; {
; Location: range.jl:5
; Function Type; {
; Location: range.jl:255
; Function unitrange_last; {
; Location: range.jl:260
; Function >=; {
; Location: operators.jl:333
; Function <=; {
; Location: In[46]:2
	testq	%rsi, %rsi
;}}}}}
	jle	L26
	movq	%rsi, %rax
	nopl	(%rax,%rax)
; Location: In[46]:5
; Function iterate; {
; Location: range.jl:575
; Function ==; {
; Location: base.jl:42
L16:
	addq	$-1, %rax
;}}
	jne	L16
; Location: In[46]:7
	movq	%rsi, %rax
	retq
L26:
	xorl	%esi, %esi
; Location: In[46]:7
	movq	%rsi, %rax
	retq
;}


In [58]:
function m(::Type{T}, N) where T
    acc = zero(T)
    for i in 1:N
        acc += one(T)
        escape(acc)
    end
    return acc
end

m (generic function with 1 method)

In [61]:
@code_native m(Int64, 30)

	.text
; Function m {
; Location: In[58]:3
; Function Colon; {
; Location: range.jl:5
; Function Type; {
; Location: range.jl:255
; Function unitrange_last; {
; Location: range.jl:260
; Function >=; {
; Location: operators.jl:333
; Function <=; {
; Location: In[58]:2
	testq	%rsi, %rsi
;}}}}}
	jle	L38
	movq	%rsi, %rax
	negq	%rax
	movl	$1, %ecx
; Location: In[58]:5
; Function iterate; {
; Location: range.jl:575
; Function ==; {
; Location: base.jl:42
L16:
	leaq	(%rax,%rcx), %rdx
	addq	$1, %rdx
;}
; Location: range.jl:576
; Function +; {
; Location: int.jl:53
	addq	$1, %rcx
;}
; Location: range.jl:575
; Function ==; {
; Location: promotion.jl:425
	cmpq	$1, %rdx
;}}
	jne	L16
; Location: In[58]:7
	movq	%rsi, %rax
	retq
L38:
	xorl	%esi, %esi
; Location: In[58]:7
	movq	%rsi, %rax
	retq
	nopl	(%rax)
;}


In [72]:
result2 = @benchmark m($Int64, $N)

BenchmarkTools.Trial: 
  memory estimate:  32 bytes
  allocs estimate:  2
  --------------
  minimum time:     26.050 ms (0.00% GC)
  median time:      29.182 ms (0.00% GC)
  mean time:        37.582 ms (0.00% GC)
  maximum time:     86.320 ms (0.00% GC)
  --------------
  samples:          133
  evals/sample:     1

In [73]:
@benchmark m($Int64, $(N*10))

BenchmarkTools.Trial: 
  memory estimate:  32 bytes
  allocs estimate:  2
  --------------
  minimum time:     251.978 ms (0.00% GC)
  median time:      267.794 ms (0.00% GC)
  mean time:        277.873 ms (0.00% GC)
  maximum time:     368.741 ms (0.00% GC)
  --------------
  samples:          18
  evals/sample:     1

In [75]:
t = time(minimum(result2)) # in ns
N / (t * 1e-9) # in Hz

3.838813444261734e9

Sanity restored: 3.8 GHz is much closer to the frequency of my actual processor 

Note: Benchmarking is hard, careful evalutaion of *what* you are trying to benchmark.

- If we were just interesting in how fast `f(N)` was we would have been fine with our first measurement
- But we were interested in the speed of addition as a proxy of perfromance
- Integer math on a computer is associative, Floating-Point math is not.

In [79]:
@benchmark h($N)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     100.564 ms (0.00% GC)
  median time:      104.880 ms (0.00% GC)
  mean time:        107.600 ms (0.00% GC)
  maximum time:     153.669 ms (0.00% GC)
  --------------
  samples:          47
  evals/sample:     1

In [6]:
function l(N)
    acc = 0.0
    @simd for i in 1:N
        acc += 1.0
    end
    acc
end

l (generic function with 1 method)

In [85]:
@benchmark l($N)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     6.266 ms (0.00% GC)
  median time:      6.351 ms (0.00% GC)
  mean time:        6.416 ms (0.00% GC)
  maximum time:     13.143 ms (0.00% GC)
  --------------
  samples:          779
  evals/sample:     1

# Performance annotiations in Julia

- https://docs.julialang.org/en/v1/manual/performance-tips/
- Julia does bounds checking by default `ones(10)[11]` is an error
- `@inbounds` Turns of bounds-checking locally
- `@fastmath` Turns of strict IEE749 locally -- be very careful this might not to what you want
- `@simd` and `@simd ivdep` stronger gurantuees to encourage LLVM to use SIMD operations

In [86]:
?@simd

```
@simd
```

Annotate a `for` loop to allow the compiler to take extra liberties to allow loop re-ordering

!!! warning
    This feature is experimental and could change or disappear in future versions of Julia. Incorrect use of the `@simd` macro may cause unexpected results.


The object iterated over in a `@simd for` loop should be a one-dimensional range. By using `@simd`, you are asserting several properties of the loop:

```
* It is safe to execute iterations in arbitrary or overlapping order, with special consideration for reduction variables.
* Floating-point operations on reduction variables can be reordered, possibly causing different results than without `@simd`.
```

In many cases, Julia is able to automatically vectorize inner for loops without the use of `@simd`. Using `@simd` gives the compiler a little extra leeway to make it possible in more situations. In either case, your inner loop should have the following properties to allow vectorization:

```
* The loop must be an innermost loop
* The loop body must be straight-line code. Therefore, [`@inbounds`](@ref) is
  currently needed for all array accesses. The compiler can sometimes turn
  short `&&`, `||`, and `?:` expressions into straight-line code if it is safe
  to evaluate all operands unconditionally. Consider using the [`ifelse`](@ref)
  function instead of `?:` in the loop if it is safe to do so.
* Accesses must have a stride pattern and cannot be "gathers" (random-index
  reads) or "scatters" (random-index writes).
* The stride should be unit stride.
```

!!! note
    The `@simd` does not assert by default that the loop is completely free of loop-carried memory dependencies, which is an assumption that can easily be violated in generic code. If you are writing non-generic code, you can use `@simd ivdep for ... end` to also assert that:

    ```
    * There exists no loop-carried memory dependencies
    * No iteration ever waits on a previous iteration to make forward progress.
    ```



In [82]:
@code_llvm l(10)


; Function l
; Location: In[78]:2
define double @julia_l_38030(i64) {
top:
; Location: In[78]:3
; Function macro expansion; {
; Location: simdloop.jl:65
; Function Colon; {
; Location: range.jl:5
; Function Type; {
; Location: range.jl:255
; Function unitrange_last; {
; Location: range.jl:260
; Function >=; {
; Location: operators.jl:333
; Function <=; {
; Location: int.jl:428
  %1 = icmp sgt i64 %0, 0
;}}
  %2 = select i1 %1, i64 %0, i64 0
;}}}
; Location: simdloop.jl:67
; Function simd_inner_length; {
; Location: simdloop.jl:47
; Function length; {
; Location: range.jl:521
; Function checked_sub; {
; Location: checked.jl:226
; Function sub_with_overflow; {
; Location: checked.jl:198
  %3 = add nsw i64 %2, -1
;}}
; Function checked_add; {
; Location: checked.jl:169
; Function add_with_overflow; {
; Location: checked.jl:136
  %4 = call { i64, i1 } @llvm.sadd.with.overflow.i64(i64 %3, i64 1)
  %5 = extractvalue { i64, i1 } %4, 1
;}
; Location: checked.jl:170
  br i1 %5, label %L27, lab

# Let's revisit our example from earlier!

Slightly more complicated function!

- What is wrong with `mysum3(ones(10_000))`

In [10]:
function mysum3(data::Vector{T}) where T<:Number
  acc = zero(T)
  for x in data
      acc += x
  end
  return acc
end

mysum3 (generic function with 1 method)

In [9]:
@code_warntype mysum3(zeros(3))

Body[91m[1m::Union{Float64, Int64}[22m[39m
[90m[57G│╻╷╷   iterate[1G[39m[90m3 [39m1 ── %1  = (Base.arraylen)(data)[36m::Int64[39m
[90m[57G││╻╷    iterate[1G[39m[90m  [39m│    %2  = (Base.sle_int)(0, %1)[36m::Bool[39m
[90m[57G│││╻     <[1G[39m[90m  [39m│    %3  = (Base.bitcast)(UInt64, %1)[36m::UInt64[39m
[90m[57G││││╻     <[1G[39m[90m  [39m│    %4  = (Base.ult_int)(0x0000000000000000, %3)[36m::Bool[39m
[90m[57G││││╻     &[1G[39m[90m  [39m│    %5  = (Base.and_int)(%2, %4)[36m::Bool[39m
[90m[57G│││   [1G[39m[90m  [39m└───       goto #3 if not %5
[90m[57G│││╻     getindex[1G[39m[90m  [39m2 ── %7  = (Base.arrayref)(false, data, 1)[36m::Float64[39m
[90m[57G│││   [1G[39m[90m  [39m└───       goto #4
[90m[57G│││   [1G[39m[90m  [39m3 ──       goto #4
[90m[57G││    [1G[39m[90m  [39m4 ┄─ %10 = φ (#2 => false, #3 => true)[36m::Bool[39m
[90m[57G││    [1G[39m[90m  [39m│    %11 = φ (#2 => %7)[36m::Float64[39m
[90m

# Task

- Write, a fast and generic `sum` implementation.

# From performance to generic code
- Up until now I have been heavily focused on performance
- Mostly because I am a low-level person and this excites me!
- Performance was the reason why I came to Julia, but I stayed because of the features
- Let's talk about composable and generic code.