# Performance e Operações Paralelas

In [56]:
using Pkg

using BenchmarkTools
using LoopVectorization
using Plots
using StaticArrays
using ThreadsX

using CUDA # há vá?!?!

using Random: seed!, shuffle
using Statistics: mean

# Random Seed
seed!(123)

# Printar cores no terminal
using ANSIColoredPrinters

# Checklist basico para _performance_

1. Arrumar instabilidade de tipo
2. Usar variáveis locais ao invés de gloabais
3. Deixar tudo imutável se possível
4. Desativar checagem de índice em operações com `Array`
5. Ativar suporte SIMD em todos os loops `for` (Single Instruction Multiple Data)

# 1. Arrumar instabilidade de tipo

> Tipo de saída de uma função é __imprevisível__ a partir dos tipos de entradas. Em particular, isso significa que o tipo de saída __pode variar__ dependendo dos valores das entradas.

In [2]:
function positivo(x)
    if x > 0
        return x
    else
        return 0
    end
end

positivo (generic function with 1 method)

> função com untabilidade de tipo >>> `x` é o que?

`@code_warntype` avalia a função como um argumento e prenta um _Abstract Syntax Tree_ (AST)

In [3]:
@code_warntype positivo(-3.4)

Variables
  #self#[36m::Core.Const(positivo)[39m
  x[36m::Float64[39m

Body[91m[1m::Union{Float64, Int64}[22m[39m
[90m1 ─[39m %1 = (x > 0)[36m::Bool[39m
[90m└──[39m      goto #3 if not %1
[90m2 ─[39m      return x
[90m3 ─[39m      return 0


> onde ficou vermelho, tem algum problema de tipo

### Arrumando a instabilidade de tipo

Anotar os tipos!

In [4]:
function positivo_stabel(x::AbstractFloat)
    if x > 0
        return x 
    else
        return 0.0
    end
end

positivo_stabel (generic function with 1 method)

In [5]:
function positivo_stabel(x::Integer)
    if x > 0
        return x 
    else
        return 0
    end
end

positivo_stabel (generic function with 2 methods)

In [6]:
@code_warntype positivo_stabel(-3.4)

Variables
  #self#[36m::Core.Const(positivo_stabel)[39m
  x[36m::Float64[39m

Body[36m::Float64[39m
[90m1 ─[39m %1 = (x > 0)[36m::Bool[39m
[90m└──[39m      goto #3 if not %1
[90m2 ─[39m      return x
[90m3 ─[39m      return 0.0


In [7]:
@code_warntype positivo_stabel(-3)

Variables
  #self#[36m::Core.Const(positivo_stabel)[39m
  x[36m::Int64[39m

Body[36m::Int64[39m
[90m1 ─[39m %1 = (x > 0)[36m::Bool[39m
[90m└──[39m      goto #3 if not %1
[90m2 ─[39m      return x
[90m3 ─[39m      return 0


### Pq é importante anotar os tipos?

In [8]:
x = rand(1_000);

In [9]:
@benchmark positivo.($x)

BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m1.960 μs[22m[39m … [35m 1.343 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 99.38%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m3.830 μs              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m7.102 μs[22m[39m ± [32m36.201 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m18.09% ±  3.58%

  [39m▅[39m▂[39m [39m [39m [39m [39m█[39m█[34m▅[39m[39m▃[39m▃[39m▂[39m▂[39m▁[39m▂[39m▃[39m▃[39m▂[39m▂[39m▁[39m [39m [32m [39m[39m [39m [39m [39m [39m [39m▄[39m▆[39m▅[39m▄[39m▃[39m▂[39m▂[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂
  [39m█[39m█[39m▇[39m▇[39m▇[39m▇[39m

In [10]:
@benchmark positivo_stabel.($x)

BenchmarkTools.Trial: 10000 samples with 174 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m801.149 ns[22m[39m … [35m84.583 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 96.27%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m  1.178 μs              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m  2.050 μs[22m[39m ± [32m 3.333 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m18.37% ± 12.55%

  [39m▄[34m█[39m[39m▅[39m▃[39m▂[32m [39m[39m [39m [39m [39m [39m▃[39m▃[39m▂[39m▂[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [39m█[34m█[39m[39m

### Dicas!

Dar preferência para usar __tipos abstrtatos__ ao invés de tipos concretos.

`AbstractFlosa` >>> `Float64` or `Float32`  
`Integer` >>> `Int128` or `Int64` or `Int342`

### Tipos paramétricos

São introduzidos com as chaves `{}` e usando palavras-chave `where`

In [11]:
# por exemplo qualquer subtipo de `Real`
subtypes(Real)

8-element Vector{Any}:
 AbstractFloat
 AbstractIrrational
 FixedPointNumbers.FixedPoint
 Integer
 Rational
 StatsBase.PValue
 StatsBase.TestStat
 VectorizationBase.AbstractSIMD

In [12]:
function positivo_stable2(x::T) where T <: Real
    if x > 0
        return x
    else
        return 0::T
    end
end

positivo_stable2 (generic function with 1 method)

In [13]:
@code_warntype positivo_stable2(-3.4)

Variables
  #self#[36m::Core.Const(positivo_stable2)[39m
  x[36m::Float64[39m

Body[36m::Float64[39m
[90m1 ─[39m %1 = (x > 0)[36m::Bool[39m
[90m└──[39m      goto #3 if not %1
[90m2 ─[39m      return x
[90m3 ─[39m      Core.typeassert(0, $(Expr(:static_parameter, 1)))
[90m└──[39m      Core.Const(:(return %4))


In [14]:
@code_warntype positivo_stable2(-3)

Variables
  #self#[36m::Core.Const(positivo_stable2)[39m
  x[36m::Int64[39m

Body[36m::Int64[39m
[90m1 ─[39m %1 = (x > 0)[36m::Bool[39m
[90m└──[39m      goto #3 if not %1
[90m2 ─[39m      return x
[90m3 ─[39m %4 = Core.typeassert(0, $(Expr(:static_parameter, 1)))[36m::Core.Const(0)[39m
[90m└──[39m      return %4


### Funciona tambem com `Array`s

* `AbstractArray{T, N}` 
* `AbstractMatrix{T}` atalho para `AbstractArray{T, 2}` 
* `AbstractVector{T}` atalho para `AbstractArray{T, 1}` 

In [15]:
# específico para vector
function meus_zeros(X::AbstractVector{T}) where T <: Real
    return zeros(eltype(x), size(x))    
end

meus_zeros (generic function with 1 method)

In [16]:
# Generalizado para qualquer dimenmção de Array
function meus_zeros(X::AbstractArray{T, N}) where T <: Real where N <: Integer
    return zeros(eltype(x), size(x))    
end

meus_zeros (generic function with 2 methods)

In [17]:
@code_warntype meus_zeros([1, 0, 3])

Variables
  #self#[36m::Core.Const(meus_zeros)[39m
  X[36m::Vector{Int64}[39m

Body[91m[1m::Any[22m[39m
[90m1 ─[39m %1 = Main.eltype(Main.x)[91m[1m::Any[22m[39m
[90m│  [39m %2 = Main.size(Main.x)[91m[1m::Any[22m[39m
[90m│  [39m %3 = Main.zeros(%1, %2)[91m[1m::Any[22m[39m
[90m└──[39m      return %3


# 2. Usar variáveis locais ao invés de gloabais

Com variáveis globais o compilador LLVM tem dificuldades em otimizar o código Assembly

In [18]:
# var global 
x = rand(1_000);

In [19]:
function sum_global()
    s = 0.0
    for i ∈ x 
        s += i
    end
    return s
end

sum_global (generic function with 1 method)

In [20]:
@benchmark sum_global()

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m 78.700 μs[22m[39m … [35m  8.218 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 98.30%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m 80.200 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m107.664 μs[22m[39m ± [32m204.285 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m5.60% ±  3.24%

  [34m█[39m[39m▄[39m▂[39m▁[39m▂[39m▂[39m [32m [39m[39m▂[39m [39m [39m [39m [39m [39m [39m [39m [39m▃[39m▂[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [34m█[39m[39m█[

In [21]:
function sum_arg(x)
    s = 0.0
    for i ∈ x 
        s += i
    end
    return s
end

sum_arg (generic function with 1 method)

In [22]:
@benchmark sum_arg($x)

BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m1.100 μs[22m[39m … [35m 4.370 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m1.110 μs              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m1.116 μs[22m[39m ± [32m74.873 ns[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m▄[39m [39m [34m█[39m[39m [32m [39m[39m▄[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [39m█[39m▁[39m▁[34m█[39m[39m▁[32m▁[39m

In [23]:
@code_warntype sum_global()

Variables
  #self#[36m::Core.Const(sum_global)[39m
  @_2[91m[1m::Any[22m[39m
  s[91m[1m::Any[22m[39m
  i[91m[1m::Any[22m[39m

Body[91m[1m::Any[22m[39m
[90m1 ─[39m       (s = 0.0)
[90m│  [39m %2  = Main.x[91m[1m::Any[22m[39m
[90m│  [39m       (@_2 = Base.iterate(%2))
[90m│  [39m %4  = (@_2 === nothing)[36m::Bool[39m
[90m│  [39m %5  = Base.not_int(%4)[36m::Bool[39m
[90m└──[39m       goto #4 if not %5
[90m2 ┄[39m %7  = @_2[91m[1m::Any[22m[39m
[90m│  [39m       (i = Core.getfield(%7, 1))
[90m│  [39m %9  = Core.getfield(%7, 2)[91m[1m::Any[22m[39m
[90m│  [39m       (s = s + i)
[90m│  [39m       (@_2 = Base.iterate(%2, %9))
[90m│  [39m %12 = (@_2 === nothing)[36m::Bool[39m
[90m│  [39m %13 = Base.not_int(%12)[36m::Bool[39m
[90m└──[39m       goto #4 if not %13
[90m3 ─[39m       goto #2
[90m4 ┄[39m       return s


In [24]:
@code_warntype sum_arg(x)

Variables
  #self#[36m::Core.Const(sum_arg)[39m
  x[36m::Vector{Float64}[39m
  @_3[33m[1m::Union{Nothing, Tuple{Float64, Int64}}[22m[39m
  s[36m::Float64[39m
  i[36m::Float64[39m

Body[36m::Float64[39m
[90m1 ─[39m       (s = 0.0)
[90m│  [39m %2  = x[36m::Vector{Float64}[39m
[90m│  [39m       (@_3 = Base.iterate(%2))
[90m│  [39m %4  = (@_3 === nothing)[36m::Bool[39m
[90m│  [39m %5  = Base.not_int(%4)[36m::Bool[39m
[90m└──[39m       goto #4 if not %5
[90m2 ┄[39m %7  = @_3::Tuple{Float64, Int64}[36m::Tuple{Float64, Int64}[39m
[90m│  [39m       (i = Core.getfield(%7, 1))
[90m│  [39m %9  = Core.getfield(%7, 2)[36m::Int64[39m
[90m│  [39m       (s = s + i)
[90m│  [39m       (@_3 = Base.iterate(%2, %9))
[90m│  [39m %12 = (@_3 === nothing)[36m::Bool[39m
[90m│  [39m %13 = Base.not_int(%12)[36m::Bool[39m
[90m└──[39m       goto #4 if not %13
[90m3 ─[39m       goto #2
[90m4 ┄[39m       return s


## Se tiver de usar variáveis globeis, use `const`

In [25]:
# var global constante
const const_x = rand(1_000);

In [26]:
function sum_const_global()
    s = 0.0
    for i ∈ const_x 
        s += i
    end
    return s
end

sum_const_global (generic function with 1 method)

In [27]:
@benchmark sum_const_global()

BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m1.100 μs[22m[39m … [35m 61.090 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m1.110 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m1.154 μs[22m[39m ± [32m715.223 ns[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [34m█[39m[39m [39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▃[34m█[39m[39m▅[39m▁[39m▂[

In [28]:
@code_warntype sum_const_global()

Variables
  #self#[36m::Core.Const(sum_const_global)[39m
  @_2[33m[1m::Union{Nothing, Tuple{Float64, Int64}}[22m[39m
  s[36m::Float64[39m
  i[36m::Float64[39m

Body[36m::Float64[39m
[90m1 ─[39m       (s = 0.0)
[90m│  [39m %2  = Main.const_x[36m::Core.Const([0.4125969203275026, 0.5551098615656092, 0.240819002168303, 0.3965751905591446, 0.8785777477484837, 0.8559115150020329, 0.5722891892469402, 0.955608370221668, 0.5821652710126011, 0.30942078587307686, 0.3247860797558668, 0.24900774646163404, 0.265146177131381, 0.048977939909959245, 0.4467992969582193, 0.08283564682071676, 0.04944164473060653, 0.07956302223576484, 0.20789828761025375, 0.1294812292053089, 0.716321412986181, 0.8443808914960056, 0.9514839964844957, 0.8423388789716104, 0.49510437280583464, 0.5253371662810817, 0.27701498572864747, 0.37308104673931464, 0.1727720764019487, 0.2939256224532212, 0.004889915718355287, 0.7410506462885127, 0.12499304478405016, 0.495218792191364, 0.5677843328878067, 0.63181384136714

# 3. Deixar tudo imutável se possível

Tudo que é mutável faz com que o compilador LLVM não saiv=ba o que vem pela frente e não consiga otimizar.

## Tuplas vs Arrays

Tuplas são __imutáveis__ e Arrays podem ser modificados após a instanciação.

In [29]:
rand_tuple_point() = (rand(), rand())

rand_tuple_point (generic function with 1 method)

In [33]:
rand_vector_point() = [rand(), rand()]

rand_vector_point (generic function with 1 method)

In [34]:
tuple_points = [rand_tuple_point() for _ ∈ 1:500];

In [35]:
vector_points = [rand_vector_point() for _ ∈ 1:500];

In [36]:
function difference_matrix(points)
    return [p1 .- p2 for p1 in points, p2 in points]
end

difference_matrix (generic function with 1 method)

In [37]:
@benchmark difference_matrix($tuple_points)

BenchmarkTools.Trial: 2404 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m809.000 μs[22m[39m … [35m25.752 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 91.76%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m  1.294 ms              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m  2.050 ms[22m[39m ± [32m 2.362 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m24.96% ± 19.73%

  [39m▆[39m█[34m▇[39m[39m▆[39m▅[39m▃[32m▃[39m[39m▂[39m▂[39m▂[39m [39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [39m█[39m█[34m█[39m[3

In [39]:
@benchmark difference_matrix($vector_points)

BenchmarkTools.Trial: 189 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m10.654 ms[22m[39m … [35m63.310 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 24.82%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m25.094 ms              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m26.514 ms[22m[39m ± [32m11.566 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m16.85% ± 18.69%

  [39m [39m [39m [39m█[39m [39m▂[39m [39m [39m [39m▂[39m [39m [39m▂[39m [39m [39m [39m [39m▃[34m [39m[32m [39m[39m [39m [39m [39m [39m [39m [39m [39m▂[39m [39m▂[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m█[39m█[39m▆[39m█[39m▇[39m█

## `struct` vs `mutable struct`

Jóia        Imutável    
Jóia^-1     Mutável

In [46]:
abstract type Point end

In [47]:
struct ImmutablePoint <: Point 
    x::Float64
    y::Float64
end

In [48]:
mutable struct MutablePoint <: Point 
    x::Float64
    y::Float64
end

In [49]:
function mean_point(p::Point)
    return mean([p.x, p.y])
end

mean_point (generic function with 1 method)

In [50]:
@benchmark mean_point.([ImmutablePoint(rand(), rand()) for _ ∈ 1:500_000])

BenchmarkTools.Trial: 107 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m33.640 ms[22m[39m … [35m67.467 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 13.49%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m45.714 ms              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m16.02%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m47.017 ms[22m[39m ± [32m 7.826 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m15.27% ±  7.69%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m [39m█[39m [39m▂[39m [34m [39m[39m▅[39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▃[39m█[39m▃[39m▅[39m▅[39m▃

In [51]:
@benchmark mean_point.([MutablePoint(rand(), rand()) for _ ∈ 1:500_000])

BenchmarkTools.Trial: 84 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m35.122 ms[22m[39m … [35m122.437 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 17.11%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m56.933 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m30.98%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m59.697 ms[22m[39m ± [32m 12.231 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m28.69% ±  7.48%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▅[39m [39m [39m [39m█[34m [39m[39m [39m [32m▅[39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▅[39m▁[39m▁[39m▃[39

## `StaticArrays.jl`

__Arrays estáticas em Julia__

* `StaticArray{Size, T, Dims} <: AbstractArray{T, Dims}`    
* `sVector{N, T}`: alias para `StaticArray{N, T, 1}` 
* `sMatrix{M, N, T}`: alias para `StaticArray{(M, N), T, 2}`    

<br>
<br>

__Beenchmarks__ para matrizes `Float64` 3x3:


|__Operação__|*__Speedup__*|
|------------|:-----------:|
|multiplicação|5.9x|
|adição|33.1x|
|determinante|112.9x|
|inversa|67.8x|
|decomposição de autovetores|25.0x|
|decomposição Cholesky|8.8x|
|decomposição LU|6.1x|
|decomposição QR|65.0x|

<br>
<br>
<br>
        
> __Quando usar?__ Como regra-geral, se vc tiver uma `Array` de __até 100 elementos__ é interessante usar uma `StaticArray`.

### Instanciamento

In [52]:
abstract type MeuTipo end

In [53]:
struct MyImmutable <: MeuTipo
    x::Vector{Int}
end

In [54]:
mutable struct MyMutable <: MeuTipo
    x::Vector{Int}
end

In [58]:
struct MySArray <: MeuTipo
	x::SVector{2, Int}
end

In [59]:
function f_immutable()
    for i ∈ 1:1_000
        x = MyImmutable([rand(Int), rand(Int)])
    end
    return nothing
end

f_immutable (generic function with 1 method)

In [60]:
@benchmark f_immutable()

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m34.000 μs[22m[39m … [35m 14.304 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 99.42%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m42.600 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m72.199 μs[22m[39m ± [32m344.177 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m16.33% ±  3.70%

  [39m█[39m▇[34m▅[39m[39m▅[39m▅[39m▄[39m▄[32m▄[39m[39m▃[39m▂[39m▁[39m▂[39m▁[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂
  [39m█[39m█[34m█[39m[

In [61]:
function f_mutable()
    for i ∈ 1:1_000
        x = MyImmutable([rand(Int), rand(Int)])
    end
    return nothing
end

f_mutable (generic function with 1 method)

In [62]:
@benchmark f_mutable()

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m33.700 μs[22m[39m … [35m 12.811 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 99.55%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m39.100 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m62.440 μs[22m[39m ± [32m320.505 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m18.39% ±  3.71%

  [39m█[39m█[34m▆[39m[39m▆[39m▄[39m▄[39m▃[39m▄[39m▄[39m▃[39m▃[32m▃[39m[39m▂[39m▁[39m▂[39m▂[39m▃[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂
  [39m█[39m█[34m█[39m[

In [63]:
function f_sarray()
    for i ∈ 1:1_000
        x = MySArray(SVector(rand(Int), rand(Int)))
    end
    return nothing
end

f_sarray (generic function with 1 method)

In [64]:
@benchmark f_sarray()

BenchmarkTools.Trial: 10000 samples with 4 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m7.125 μs[22m[39m … [35m241.425 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m7.575 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m8.458 μs[22m[39m ± [32m  4.772 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m▁[39m█[34m▇[39m[39m▁[39m [39m▃[32m▂[39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [39m█[39m█[34m█[39m[39m█[39m█[3

### Operações

In [65]:
function mean_meu_tipo(m::MeuTipo)
    return mean(m.x)
end

mean_meu_tipo (generic function with 1 method)

In [66]:
@benchmark mean_meu_tipo(MyImmutable([rand(Int), rand(Int)]))

BenchmarkTools.Trial: 10000 samples with 991 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m45.510 ns[22m[39m … [35m 11.848 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 99.29%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m49.647 ns               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m77.758 ns[22m[39m ± [32m310.580 ns[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m14.33% ±  3.69%

  [39m█[34m▆[39m[39m▄[39m▃[39m▁[39m▃[39m▄[39m▃[32m▂[39m[39m▂[39m▁[39m▂[39m▂[39m▁[39m▁[39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [39m█[34m█[39m[39m

In [67]:
@benchmark mean_meu_tipo(MyMutable([rand(Int), rand(Int)]))

BenchmarkTools.Trial: 10000 samples with 991 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m45.005 ns[22m[39m … [35m  8.180 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 98.79%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m49.041 ns               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m70.854 ns[22m[39m ± [32m259.556 ns[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m13.24% ±  3.69%

  [39m▆[34m█[39m[39m▅[39m▄[39m▃[39m▂[39m▁[39m▁[39m▄[32m▄[39m[39m▂[39m▂[39m▁[39m▁[39m [39m▁[39m▃[39m▂[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂
  [39m█[34m█[39m[39m

In [68]:
@benchmark mean_meu_tipo(MySArray(SVector(rand(Int), rand(Int))))

BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m13.413 ns[22m[39m … [35m 1.730 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m14.815 ns              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m21.996 ns[22m[39m ± [32m25.554 ns[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m█[34m▆[39m[39m [39m▂[39m [39m [39m [39m [39m [39m [39m [39m▄[32m▄[39m[39m [39m▁[39m▂[39m▂[39m [39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m▂[39m▃[39m▅[39m▃[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [39m▇[39m█[34m█[39m[39m▆[39m█

# 4. Desativar checagem de índice em operações com `Array` - `@inbounds`

A maioria das ligaugens modernas, poe questões de segurança, garantem que não haverá acesso fora dos elementos de uma array.    
<br>
Porém, em algumas situações que _performance_ é crítica, você pode remover essa checagem de acesso. 
<br>
Para fazer isso em Juia é só usar o macro `@inbounds`
<br>
>___Cuidado!!!___ Remover o _bound check_ de Julia é __perigoso__. Se certifique que não está fazendo um loop inseguro antes de usar `@inbounds`

In [73]:
array_x = rand(10_000);

In [74]:
array_y = rand(10_000);

In [72]:
function inner(x, y)
    s = zero(eltype(x))
    for i ∈ eachindex(x)
        s += x[i]*y[i]
    end
    return s
end

inner (generic function with 1 method)

In [77]:
@benchmark inner($array_x, $array_y)

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m11.400 μs[22m[39m … [35m450.800 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m11.500 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m12.363 μs[22m[39m ± [32m 10.201 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [34m█[39m[39m▄[39m▃[32m▁[39m[39m▂[39m▃[39m▂[39m▂[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [34m█[39m[39m█[39m█[32m█

In [78]:
function inner_inbound(x, y)
    s = zero(eltype(x))
    for i ∈ eachindex(x)
        @inbounds s += x[i]*y[i]
    end
    return s
end

inner_inbound (generic function with 1 method)

In [79]:
@benchmark inner_inbound($array_x, $array_y)

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m11.400 μs[22m[39m … [35m114.500 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m11.500 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m12.113 μs[22m[39m ± [32m  3.944 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [34m█[39m[39m▄[39m▃[32m▁[39m[39m▁[39m▃[39m▂[39m▂[39m▁[39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [34m█[39m[39m█[39m█[32m█

# 5. Ativar suporte SIMD em todos os loops `for` (Single Instruction Multiple Data)