# Performance e Operações Paralelas

In [3]:
using Pkg

using BenchmarkTools
using LoopVectorization
using Plots
using StaticArrays
using ThreadsX

using CUDA # há vá?!?!

using Random: seed!, shuffle
using Statistics: mean

# Random Seed
seed!(123)

# Printar cores no terminal
using ANSIColoredPrinters

# Checklist basico para _performance_

1. Arrumar instabilidade de tipo
2. Usar variáveis locais ao invés de gloabais
3. Deixar tudo imutável se possível
4. Desativar checagem de índice em operações com `Array`
5. Ativar suporte SIMD em todos os loops `for` (Single Instruction Multiple Data)

# 1. Arrumar instabilidade de tipo

> Tipo de saída de uma função é __imprevisível__ a partir dos tipos de entradas. Em particular, isso significa que o tipo de saída __pode variar__ dependendo dos valores das entradas.

In [4]:
function positivo(x)
    if x > 0
        return x
    else
        return 0
    end
end

positivo (generic function with 1 method)

> função com untabilidade de tipo >>> `x` é o que?

`@code_warntype` avalia a função como um argumento e prenta um _Abstract Syntax Tree_ (AST)

In [5]:
@code_warntype positivo(-3.4)

Variables
  #self#[36m::Core.Const(positivo)[39m
  x[36m::Float64[39m

Body[91m[1m::Union{Float64, Int64}[22m[39m
[90m1 ─[39m %1 = (x > 0)[36m::Bool[39m
[90m└──[39m      goto #3 if not %1
[90m2 ─[39m      return x
[90m3 ─[39m      return 0


> onde ficou vermelho, tem algum problema de tipo

### Arrumando a instabilidade de tipo

Anotar os tipos!

In [6]:
function positivo_stabel(x::AbstractFloat)
    if x > 0
        return x 
    else
        return 0.0
    end
end

positivo_stabel (generic function with 1 method)

In [7]:
function positivo_stabel(x::Integer)
    if x > 0
        return x 
    else
        return 0
    end
end

positivo_stabel (generic function with 2 methods)

In [8]:
@code_warntype positivo_stabel(-3.4)

Variables
  #self#[36m::Core.Const(positivo_stabel)[39m
  x[36m::Float64[39m

Body[36m::Float64[39m
[90m1 ─[39m %1 = (x > 0)[36m::Bool[39m
[90m└──[39m      goto #3 if not %1
[90m2 ─[39m      return x
[90m3 ─[39m      return 0.0


In [9]:
@code_warntype positivo_stabel(-3)

Variables
  #self#[36m::Core.Const(positivo_stabel)[39m
  x[36m::Int64[39m

Body[36m::Int64[39m
[90m1 ─[39m %1 = (x > 0)[36m::Bool[39m
[90m└──[39m      goto #3 if not %1
[90m2 ─[39m      return x
[90m3 ─[39m      return 0


### Pq é importante anotar os tipos?

In [11]:
x = rand(1_000);

In [13]:
@benchmark positivo.($x)

BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m1.930 μs[22m[39m … [35m 1.357 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 99.49%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m3.980 μs              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m7.814 μs[22m[39m ± [32m37.978 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m17.16% ±  3.58%

  [39m▄[39m▁[39m [39m▆[39m█[34m▅[39m[39m▃[39m▃[39m▃[39m▃[39m▂[39m [39m [39m [32m [39m[39m [39m▄[39m▃[39m▄[39m▅[39m▃[39m▂[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [39m█[39m█[39m█[39m█[39m█[34m█[39m

In [14]:
@benchmark positivo_stabel.($x)

BenchmarkTools.Trial: 10000 samples with 186 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m803.763 ns[22m[39m … [35m348.528 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 96.99%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m  1.267 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m  2.478 μs[22m[39m ± [32m  6.365 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m20.15% ± 13.30%

  [39m▇[34m█[39m[39m▆[39m▄[39m▃[32m▃[39m[39m▂[39m▁[39m▂[39m▂[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [39m█[34m█

### Dicas!

Dar preferência para usar __tipos abstrtatos__ ao invés de tipos concretos.

`AbstractFlosa` >>> `Float64` or `Float32`  
`Integer` >>> `Int128` or `Int64` or `Int342`

### Tipos paramétricos

São introduzidos com as chaves `{}` e usando palavras-chave `where`

In [16]:
# por exemplo qualquer subtipo de `Real`
subtypes(Real)

8-element Vector{Any}:
 AbstractFloat
 AbstractIrrational
 FixedPointNumbers.FixedPoint
 Integer
 Rational
 StatsBase.PValue
 StatsBase.TestStat
 VectorizationBase.AbstractSIMD

In [17]:
function positivo_stable2(x::T) where T <: Real
    if x > 0
        return x
    else
        return 0::T
    end
end

positivo_stable2 (generic function with 1 method)

In [18]:
@code_warntype positivo_stable2(-3.4)

Variables
  #self#[36m::Core.Const(positivo_stable2)[39m
  x[36m::Float64[39m

Body[36m::Float64[39m
[90m1 ─[39m %1 = (x > 0)[36m::Bool[39m
[90m└──[39m      goto #3 if not %1
[90m2 ─[39m      return x
[90m3 ─[39m      Core.typeassert(0, $(Expr(:static_parameter, 1)))
[90m└──[39m      Core.Const(:(return %4))


In [19]:
@code_warntype positivo_stable2(-3)

Variables
  #self#[36m::Core.Const(positivo_stable2)[39m
  x[36m::Int64[39m

Body[36m::Int64[39m
[90m1 ─[39m %1 = (x > 0)[36m::Bool[39m
[90m└──[39m      goto #3 if not %1
[90m2 ─[39m      return x
[90m3 ─[39m %4 = Core.typeassert(0, $(Expr(:static_parameter, 1)))[36m::Core.Const(0)[39m
[90m└──[39m      return %4


### Funciona tambem com `Array`s

* `AbstractArray{T, N}` 
* `AbstractMatrix{T}` atalho para `AbstractArray{T, 2}` 
* `AbstractVector{T}` atalho para `AbstractArray{T, 1}` 

In [21]:
# específico para vector
function meus_zeros(X::AbstractVector{T}) where T <: Real
    return zeros(eltype(x), size(x))    
end

meus_zeros (generic function with 1 method)

In [22]:
# Generalizado para qualquer dimenmção de Array
function meus_zeros(X::AbstractArray{T, N}) where T <: Real where N <: Integer
    return zeros(eltype(x), size(x))    
end

meus_zeros (generic function with 2 methods)

In [25]:
@code_warntype meus_zeros([1, 0, 3])

Variables
  #self#[36m::Core.Const(meus_zeros)[39m
  X[36m::Vector{Int64}[39m

Body[91m[1m::Any[22m[39m
[90m1 ─[39m %1 = Main.eltype(Main.x)[91m[1m::Any[22m[39m
[90m│  [39m %2 = Main.size(Main.x)[91m[1m::Any[22m[39m
[90m│  [39m %3 = Main.zeros(%1, %2)[91m[1m::Any[22m[39m
[90m└──[39m      return %3


# 2. Usar variáveis locais ao invés de gloabais

Com variáveis globais o compilador LLVM tem dificuldades em otimizar o código Assembly

In [30]:
# var global 
x = rand(1_000);

In [31]:
function sum_global()
    s = 0.0
    for i ∈ x 
        s += i
    end
    return s
end

sum_global (generic function with 1 method)

In [32]:
@benchmark sum_global()

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m 78.300 μs[22m[39m … [35m  5.809 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 97.68%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m 80.200 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m109.528 μs[22m[39m ± [32m164.077 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m4.20% ±  3.04%

  [34m█[39m[39m▄[39m▂[39m▁[39m▂[39m▁[39m▁[32m▂[39m[39m▁[39m [39m [39m [39m [39m [39m▃[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [34m█[39m[39m█[

In [33]:
function sum_arg(x)
    s = 0.0
    for i ∈ x 
        s += i
    end
    return s
end

sum_arg (generic function with 1 method)

In [34]:
@benchmark sum_arg($x)

BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m1.100 μs[22m[39m … [35m 27.100 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m1.110 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m1.202 μs[22m[39m ± [32m513.365 ns[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [34m█[39m[39m▅[39m▂[39m▁[32m [39m[39m▃[39m▃[39m▃[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [34m█[39m[39m█[39m█[39m█[32m█[

In [35]:
@code_warntype sum_global()

Variables
  #self#[36m::Core.Const(sum_global)[39m
  @_2[91m[1m::Any[22m[39m
  s[91m[1m::Any[22m[39m
  i[91m[1m::Any[22m[39m

Body[91m[1m::Any[22m[39m
[90m1 ─[39m       (s = 0.0)
[90m│  [39m %2  = Main.x[91m[1m::Any[22m[39m
[90m│  [39m       (@_2 = Base.iterate(%2))
[90m│  [39m %4  = (@_2 === nothing)[36m::Bool[39m
[90m│  [39m %5  = Base.not_int(%4)[36m::Bool[39m
[90m└──[39m       goto #4 if not %5
[90m2 ┄[39m %7  = @_2[91m[1m::Any[22m[39m
[90m│  [39m       (i = Core.getfield(%7, 1))
[90m│  [39m %9  = Core.getfield(%7, 2)[91m[1m::Any[22m[39m
[90m│  [39m       (s = s + i)
[90m│  [39m       (@_2 = Base.iterate(%2, %9))
[90m│  [39m %12 = (@_2 === nothing)[36m::Bool[39m
[90m│  [39m %13 = Base.not_int(%12)[36m::Bool[39m
[90m└──[39m       goto #4 if not %13
[90m3 ─[39m       goto #2
[90m4 ┄[39m       return s


In [36]:
@code_warntype sum_arg(x)

Variables
  #self#[36m::Core.Const(sum_arg)[39m
  x[36m::Vector{Float64}[39m
  @_3[33m[1m::Union{Nothing, Tuple{Float64, Int64}}[22m[39m
  s[36m::Float64[39m
  i[36m::Float64[39m

Body[36m::Float64[39m
[90m1 ─[39m       (s = 0.0)
[90m│  [39m %2  = x[36m::Vector{Float64}[39m
[90m│  [39m       (@_3 = Base.iterate(%2))
[90m│  [39m %4  = (@_3 === nothing)[36m::Bool[39m
[90m│  [39m %5  = Base.not_int(%4)[36m::Bool[39m
[90m└──[39m       goto #4 if not %5
[90m2 ┄[39m %7  = @_3::Tuple{Float64, Int64}[36m::Tuple{Float64, Int64}[39m
[90m│  [39m       (i = Core.getfield(%7, 1))
[90m│  [39m %9  = Core.getfield(%7, 2)[36m::Int64[39m
[90m│  [39m       (s = s + i)
[90m│  [39m       (@_3 = Base.iterate(%2, %9))
[90m│  [39m %12 = (@_3 === nothing)[36m::Bool[39m
[90m│  [39m %13 = Base.not_int(%12)[36m::Bool[39m
[90m└──[39m       goto #4 if not %13
[90m3 ─[39m       goto #2
[90m4 ┄[39m       return s


## Se tiver de usar variáveis globeis, use `const`

In [38]:
# var global constante
const const_x = rand(1_000);



In [39]:
function sum_const_global()
    s = 0.0
    for i ∈ const_x 
        s += i
    end
    return s
end

sum_const_global (generic function with 1 method)

In [40]:
@benchmark sum_const_global()

BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m1.100 μs[22m[39m … [35m 15.850 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m1.110 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m1.177 μs[22m[39m ± [32m422.373 ns[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [34m█[39m[39m▄[39m▂[32m▁[39m[39m [39m▃[39m▂[39m▃[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [34m█[39m[39m█[39m█[32m█[39m[3

In [41]:
@code_warntype sum_const_global()

Variables
  #self#[36m::Core.Const(sum_const_global)[39m
  @_2[33m[1m::Union{Nothing, Tuple{Float64, Int64}}[22m[39m
  s[36m::Float64[39m
  i[36m::Float64[39m

Body[36m::Float64[39m
[90m1 ─[39m       (s = 0.0)
[90m│  [39m %2  = Main.const_x[36m::Core.Const([0.16177249940502847, 0.06470205727455447, 0.22871236376036652, 0.556913457304788, 0.578843623826564, 0.5243372527452201, 0.5123207296427339, 0.8269323285749814, 0.8808212556704482, 0.7988402955861709, 0.5955474189908894, 0.3594057958548704, 0.42909377516866853, 0.2510995943330936, 0.339641850887098, 0.12419283006409554, 0.7716956508831458, 0.204475387594375, 0.8576726775715193, 0.8875257825543101, 0.11095576791526507, 0.9050631317153395, 0.756857868257089, 0.5572430416565493, 0.3422838517212543, 0.4122504738010766, 0.24031751887689712, 0.9628092785912741, 0.6122769927207683, 0.05213547983352762, 0.5832862892557842, 0.3089583085057257, 0.5061622722653056, 0.6571088693831761, 0.3305949639598311, 0.09541141551684529, 0

# 3. Deixar tudo imutável se possível

3. Deixar tudo imutável se possível
4. Desativar checagem de índice em operações com `Array`
5. Ativar suporte SIMD em todos os loops `for` (Single Instruction Multiple Data)