# Programación rápida  con Julia (en serie)

Sí, este es un curso de computación paralela, pero para escribir programas paralelos eficientes, primero debemos aprender a escribir código serial Julia rápido. Esta es una introducción rápida a la programación de alto rendimiento (en serie).

_Recomiendo mucho_ revisar los Consejos de rendimiento en el manual [Performance Tips](https://docs.julialang.org/en/v1.1/manual/performance-tips/). Esto sólo va a introducir brevemente algunos de los conceptos principales.

## Medir, medir, medir.
Es muy fácil experimentar en Julia; puede probar rápidamente muchas opciones y ver cuál es la más rápida.

Usa el paquete  [BenchmarkTools](https://github.com/JuliaCI/BenchmarkTools.jl):

In [2]:
using BenchmarkTools

"""
    findclosest(data, point)

Un ejemplo simple que regresa el elemento en `data`que esta mas cercano al punto dado `point`.
"""
function findclosest(data, point)
    _, index =  findmin(abs.(data .- point))
    return data[index]
end
data = rand(5000)
findclosest(data, 0.5)

0.4999597704984099

In [3]:
#data = rand(5000)
findmin(abs.(data .- 0.5))

(4.022950159010552e-5, 1328)

In [4]:
@btime findclosest($data, $0.5)

  26.400 μs (2 allocations: 39.11 KiB)


0.4999597704984099

In [5]:
@benchmark findclosest($data, $0.5)

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m24.000 μs[22m[39m … [35m  8.763 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 99.05%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m73.200 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m79.453 μs[22m[39m ± [32m202.231 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m7.10% ±  2.79%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▄[39m▄[39m▅[39m▆[39m█[34m▄[39m[39m▂[39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▁[39m▁[39m▁[39m▁[39

### Profile!

In [6]:
using Profile

Profile.clear()
@profile for _ in 1:1000; findclosest(data, 0.5); end

Profile.print(maxdepth=11)

Overhead ╎ [+additional indent] Count File:Line; Function
  ╎52 @Base\task.jl:429; (::IJulia.var"#15#18")()
  ╎ 52 @IJulia\src\eventloop.jl:8; eventloop(socket::ZMQ.Socket)
  ╎  52 @Base\essentials.jl:714; invokelatest
  ╎   52 @Base\essentials.jl:716; #invokelatest#2
  ╎    52 ...c\execute_request.jl:67; execute_request(socket::ZMQ.Soc...
  ╎     52 ...\SoftGlobalScope.jl:65; softscope_include_string(m::Mod...
  ╎    ╎ 52 @Base\loading.jl:1196; include_string(mapexpr::typeo...
  ╎    ╎  52 @Base\boot.jl:373; eval
  ╎    ╎   52 ...ile\src\Profile.jl:28; top-level scope
  ╎    ╎    52 In[6]:4; macro expansion
  ╎    ╎     52 In[2]:9; findclosest(data::Vector{Flo...
  ╎    ╎    ╎ 40 @Base\broadcast.jl:860; materialize
  ╎    ╎    ╎ 12 @Base\reducedim.jl:1005; findmin
Total snapshots: 52


### Iterar!

Antes teniamos:
```julia
function findclosest(data, point)
    _, index =  findmin(abs.(data .- point))
    return data[index]
end
```

Propongamos una nueva definición que pueda combinar las dos operaciones:

In [7]:
function findclosest2(data, point)
    bestval = first(data)
    bestdist = abs(bestval - point)
    for elt in data
        dist = abs(elt - point)
        if dist < bestdist
            bestval = elt
            bestdist = dist
        end
    end
    return bestval
end

# Y se hace una verificación al azar para asegurar de que hicimos la optimización correctamente:
findclosest2(data, 0.5) == findclosest(data, 0.5)

true

In [8]:
@benchmark findclosest2($data, $0.5)

BenchmarkTools.Trial: 10000 samples with 6 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m 5.650 μs[22m[39m … [35m196.050 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m16.583 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m15.716 μs[22m[39m ± [32m  5.100 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m▄[39m▂[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [32m [39m[39m [34m█[39m[39m▂[39m▄[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [39m█[39m█[39m█[39m█[39m

In [9]:
@btime findclosest2($data, $0.5)

  5.700 μs (0 allocations: 0 bytes)


0.4999597704984099

## Una palabra rápida en las macros

Las macros son esas cosas divertidas que comienzan con `@`. Pueden reinterpretar lo que
escribes y hacer algo diferente, esencialmente introduciendo una nueva palabra clave.

Por ejemplo, la macro `@assert` simplemente toma una expresión y lanza un
excepción si devuelve `falso`.

In [10]:
@assert 2+2 == 4

In [11]:
@assert 2+2 == 8

LoadError: AssertionError: 2 + 2 == 8

Lo hace literalmente reescribiendo lo que escribiste. Puedes verlo en acción
con `@macroexpand`

In [12]:
@macroexpand @assert 2+2 == 4

:(if 2 + 2 == 4
      nothing
  else
      Base.throw(Base.AssertionError("2 + 2 == 4"))
  end)

In [13]:
@macroexpand @time 2+2 == 4

quote
    [90m#= timing.jl:216 =#[39m
    while false
        [90m#= timing.jl:216 =#[39m
    end
    [90m#= timing.jl:217 =#[39m
    local var"#31#stats" = Base.gc_num()
    [90m#= timing.jl:218 =#[39m
    local var"#33#elapsedtime" = Base.time_ns()
    [90m#= timing.jl:219 =#[39m
    local var"#34#compile_elapsedtime" = Base.cumulative_compile_time_ns_before()
    [90m#= timing.jl:220 =#[39m
    local var"#32#val" = $(Expr(:tryfinally, :(2 + 2 == 4), quote
    var"#33#elapsedtime" = Base.time_ns() - var"#33#elapsedtime"
    [90m#= timing.jl:222 =#[39m
    var"#34#compile_elapsedtime" = Base.cumulative_compile_time_ns_after() - var"#34#compile_elapsedtime"
end))
    [90m#= timing.jl:224 =#[39m
    local var"#35#diff" = Base.GC_Diff(Base.gc_num(), var"#31#stats")
    [90m#= timing.jl:225 =#[39m
    Base.time_print(var"#33#elapsedtime", (var"#35#diff").allocd, (var"#35#diff").total_time, Base.gc_alloc_count(var"#35#diff"), var"#34#compile_elapsedtime", true)
    [90m#

In [14]:
@which @time 2+2 == 4

Cada macro puede definir su propia sintaxis especial, y esto se usa ampliamente para la introspección de código, las mejoras de rendimiento en serie y, quizás lo más importante, ¡pprivilegios de paralelización!

## ¿Qué tan rápido es Julia?

Al comprender los conceptos básicos de cómo Julia _puede_ ser rápida, puede obtener una mejor
sentido de cómo escribir código Julia rápido.

Perhaps most importantly, Julia can reason about types. Recall: this is the definition of `findclosest2`:

```julia
function findclosest2(data, point)
    bestval = first(data)
    bestdist = abs(bestval - point)
    for elt in data
        dist = abs(elt - point)
        if dist < bestdist
            bestval = elt
            bestdist = dist
        end
    end
    return bestval
end
```

In [15]:
@code_typed optimize=false findclosest2(data, 0.5)

CodeInfo(
[90m1 ─[39m       (bestval = Main.first(data))[90m::Float64[39m
[90m│  [39m %2  = (bestval - point)[36m::Float64[39m
[90m│  [39m       (bestdist = Main.abs(%2))[90m::Float64[39m
[90m│  [39m %4  = data[36m::Vector{Float64}[39m
[90m│  [39m       (@_4 = Base.iterate(%4))[90m::Union{Nothing, Tuple{Float64, Int64}}[39m
[90m│  [39m %6  = (@_4 === nothing)[36m::Bool[39m
[90m│  [39m %7  = Base.not_int(%6)[36m::Bool[39m
[90m└──[39m       goto #6 if not %7
[90m2 ┄[39m %9  = @_4[36m::Tuple{Float64, Int64}[39m
[90m│  [39m       (elt = Core.getfield(%9, 1))[90m::Float64[39m
[90m│  [39m %11 = Core.getfield(%9, 2)[36m::Int64[39m
[90m│  [39m %12 = (elt - point)[36m::Float64[39m
[90m│  [39m       (dist = Main.abs(%12))[90m::Float64[39m
[90m│  [39m %14 = (dist < bestdist)[36m::Bool[39m
[90m└──[39m       goto #4 if not %14
[90m3 ─[39m       (bestval = elt)[90m::Float64[39m
[90m└──[39m       (bestdist = dist)[90m::Float64[39m
[90m4

In [16]:
typeof(data)

Vector{Float64} (alias for Array{Float64, 1})

In [17]:
newdata = Real[data...]
typeof(newdata)

Vector{Real} (alias for Array{Real, 1})

In [19]:
@code_typed optimize=false findclosest2(newdata, 0.5)

CodeInfo(
[90m1 ─[39m       (bestval = Main.first(data))[90m::Real[39m
[90m│  [39m %2  = (bestval - point)[36m::Any[39m
[90m│  [39m       (bestdist = Main.abs(%2))[90m::Any[39m
[90m│  [39m %4  = data[36m::Vector{Real}[39m
[90m│  [39m       (@_4 = Base.iterate(%4))[90m::Union{Nothing, Tuple{Real, Int64}}[39m
[90m│  [39m %6  = (@_4 === nothing)[36m::Bool[39m
[90m│  [39m %7  = Base.not_int(%6)[36m::Bool[39m
[90m└──[39m       goto #6 if not %7
[90m2 ┄[39m %9  = @_4[36m::Tuple{Real, Int64}[39m
[90m│  [39m       (elt = Core.getfield(%9, 1))[90m::Real[39m
[90m│  [39m %11 = Core.getfield(%9, 2)[36m::Int64[39m
[90m│  [39m %12 = (elt - point)[36m::Any[39m
[90m│  [39m       (dist = Main.abs(%12))[90m::Any[39m
[90m│  [39m %14 = (dist < bestdist)[36m::Any[39m
[90m└──[39m       goto #4 if not %14
[90m3 ─[39m       (bestval = elt)[90m::Real[39m
[90m└──[39m       (bestdist = dist)[90m::Any[39m
[90m4 ┄[39m       (@_4 = Base.iterate(%4, %

In [20]:
@benchmark findclosest2($newdata, $0.5)

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m142.000 μs[22m[39m … [35m  3.003 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 85.01%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m403.900 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m433.027 μs[22m[39m ± [32m163.739 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m1.44% ±  4.75%

  [39m▂[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m▂[39m▇[34m█[39m[39m▄[32m▃[39m[39m▃[39m▁[39m▂[39m▂[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [39m█[39m▇[39m▆

In [21]:
@code_warntype findclosest2(newdata, 0.5)

MethodInstance for findclosest2(::Vector{Real}, ::Float64)
  from findclosest2(data, point) in Main at In[7]:1
Arguments
  #self#[36m::Core.Const(findclosest2)[39m
  data[36m::Vector{Real}[39m
  point[36m::Float64[39m
Locals
  @_4[33m[1m::Union{Nothing, Tuple{Real, Int64}}[22m[39m
  bestdist[91m[1m::Any[22m[39m
  bestval[91m[1m::Real[22m[39m
  elt[91m[1m::Real[22m[39m
  dist[91m[1m::Any[22m[39m
Body[91m[1m::Real[22m[39m
[90m1 ─[39m       (bestval = Main.first(data))
[90m│  [39m %2  = (bestval - point)[91m[1m::Any[22m[39m
[90m│  [39m       (bestdist = Main.abs(%2))
[90m│  [39m %4  = data[36m::Vector{Real}[39m
[90m│  [39m       (@_4 = Base.iterate(%4))
[90m│  [39m %6  = (@_4 === nothing)[36m::Bool[39m
[90m│  [39m %7  = Base.not_int(%6)[36m::Bool[39m
[90m└──[39m       goto #6 if not %7
[90m2 ┄[39m %9  = @_4[91m[1m::Tuple{Real, Int64}[22m[39m
[90m│  [39m       (elt = Core.getfield(%9, 1))
[90m│  [39m %11 = Core.getfield(%9, 

In [22]:
@code_warntype findclosest2(data, 0.5)

MethodInstance for findclosest2(::Vector{Float64}, ::Float64)
  from findclosest2(data, point) in Main at In[7]:1
Arguments
  #self#[36m::Core.Const(findclosest2)[39m
  data[36m::Vector{Float64}[39m
  point[36m::Float64[39m
Locals
  @_4[33m[1m::Union{Nothing, Tuple{Float64, Int64}}[22m[39m
  bestdist[36m::Float64[39m
  bestval[36m::Float64[39m
  elt[36m::Float64[39m
  dist[36m::Float64[39m
Body[36m::Float64[39m
[90m1 ─[39m       (bestval = Main.first(data))
[90m│  [39m %2  = (bestval - point)[36m::Float64[39m
[90m│  [39m       (bestdist = Main.abs(%2))
[90m│  [39m %4  = data[36m::Vector{Float64}[39m
[90m│  [39m       (@_4 = Base.iterate(%4))
[90m│  [39m %6  = (@_4 === nothing)[36m::Bool[39m
[90m│  [39m %7  = Base.not_int(%6)[36m::Bool[39m
[90m└──[39m       goto #6 if not %7
[90m2 ┄[39m %9  = @_4[36m::Tuple{Float64, Int64}[39m
[90m│  [39m       (elt = Core.getfield(%9, 1))
[90m│  [39m %11 = Core.getfield(%9, 2)[36m::Int64[39m
[90m│ 

### Tipo de estabilidad

Una función se denomina de tipo estable si Julia puede inferir cuál será el tipo de salida basándose únicamente en los tipos de las entradas.

Cosas que frustran la estabilidad del tipo:

* Ejecutar cosas en el ámbito global: ¡cree funciones en su lugar!
* Contenedores de un tipo no concreto
* Estructuras con campos de tipo abstracto
* Globales no constantes (¡podrían cambiar!)
* Funciones que cambian lo que devuelven en función de los _valores_ :

#### Más sobre macros
Todas y cada una de las macros pueden definir su propia sintaxis. La macro `@benchmark` usa `$` de una manera especial. El objetivo detrás de `@benchmark` es evaluar el rendimiento de un fragmento de código como si estuviera escrito en una función. Use `$` para marcar lo que será un argumento o una variable local en la función. Olvidarse de usar `$` puede resultar en tiempos más rápidos o más lentos que el rendimiento del mundo real.

In [23]:
x = 0.5 # non-constant global
@btime sin(x)
@btime sin($x)

  18.908 ns (1 allocation: 16 bytes)
  3.103 ns (0 allocations: 0 bytes)


0.479425538604203

In [24]:
@btime sin(0.5) # constant literal!
@btime sin($0.5)

  0.900 ns (0 allocations: 0 bytes)
  3.700 ns (0 allocations: 0 bytes)


0.479425538604203

In [25]:
x=1
f()=sin(x)

f (generic function with 1 method)

In [26]:
@btime f()

  21.988 ns (1 allocation: 16 bytes)


0.8414709848078965

In [27]:
g(x)=sin(x)

g (generic function with 1 method)

In [28]:
@btime g(10.5)

  0.001 ns (0 allocations: 0 bytes)


-0.87969575997167

In [29]:
t=10.5
@btime g($t)

  6.300 ns (0 allocations: 0 bytes)


-0.87969575997167

## Especializaciones

El razonamiento de Julia sobre los tipos es particularmente importante ya que genera código de máquina especializado específicamente para los argumentos dados.

In [30]:
@code_llvm 1 + 2

[90m;  @ int.jl:87 within `+`[39m
[90m; Function Attrs: uwtable[39m
[95mdefine[39m [36mi64[39m [93m@"julia_+_3361"[39m[33m([39m[36mi64[39m [95msignext[39m [0m%0[0m, [36mi64[39m [95msignext[39m [0m%1[33m)[39m [0m#0 [33m{[39m
[91mtop:[39m
  [0m%2 [0m= [96m[1madd[22m[39m [36mi64[39m [0m%1[0m, [0m%0
  [96m[1mret[22m[39m [36mi64[39m [0m%2
[33m}[39m


Esto se aplica de la misma manera a cualquier función que escribamos, incluso a las más complicadas:

In [31]:
@code_llvm findclosest2(Float32[2.2,3.4,4.5],Float32(3.2))

[90m;  @ In[7]:1 within `findclosest2`[39m
[90m; Function Attrs: uwtable[39m
[95mdefine[39m [36mfloat[39m [93m@julia_findclosest2_3382[39m[33m([39m[33m{[39m[33m}[39m[0m* [95mnonnull[39m [95malign[39m [33m16[39m [95mdereferenceable[39m[33m([39m[33m40[39m[33m)[39m [0m%0[0m, [36mfloat[39m [0m%1[33m)[39m [0m#0 [33m{[39m
[91mtop:[39m
[90m;  @ In[7]:2 within `findclosest2`[39m
[90m; ┌ @ abstractarray.jl:398 within `first`[39m
[90m; │┌ @ array.jl:861 within `getindex`[39m
    [0m%2 [0m= [96m[1mbitcast[22m[39m [33m{[39m[33m}[39m[0m* [0m%0 [95mto[39m [33m{[39m [36mi8[39m[0m*[0m, [36mi64[39m[0m, [36mi16[39m[0m, [36mi16[39m[0m, [36mi32[39m [33m}[39m[0m*
    [0m%3 [0m= [96m[1mgetelementptr[22m[39m [95minbounds[39m [33m{[39m [36mi8[39m[0m*[0m, [36mi64[39m[0m, [36mi16[39m[0m, [36mi16[39m[0m, [36mi32[39m [33m}[39m[0m, [33m{[39m [36mi8[39m[0m*[0m, [36mi64[39m[0m, [36mi16[39m[0m, 

In [32]:
remove_comments(s) = join(filter(x->!startswith(x, ";"), split(s, "\n")), "\n")
sprint(code_llvm, findclosest2, Tuple{Vector{Float32}, Int}) |> remove_comments |> print

define float @julia_findclosest2_3390({}* nonnull align 16 dereferenceable(40) %0, i64 signext %1) #0 {
top:
    %2 = bitcast {}* %0 to { i8*, i64, i16, i16, i32 }*
    %3 = getelementptr inbounds { i8*, i64, i16, i16, i32 }, { i8*, i64, i16, i16, i32 }* %2, i64 0, i32 1
    %4 = load i64, i64* %3, align 8
    %.not = icmp eq i64 %4, 0
    br i1 %.not, label %oob, label %L23

L23:                                              ; preds = %top
    %5 = bitcast {}* %0 to float**
    %6 = load float*, float** %5, align 8
    %7 = load float, float* %6, align 4
       %8 = sitofp i64 %1 to float
    %.not1823.not = icmp eq i64 %4, 1
   br i1 %.not1823.not, label %L56, label %L50.preheader

L50.preheader:                                    ; preds = %L23
   %9 = fsub float %7, %8
   %10 = call float @llvm.fabs.f32(float %9)
   br label %L50

L50:                                              ; preds = %L50, %L50.preheader
   %11 = phi i64 [ %value_phi424, %L50 ], [ 1, %L50.preheader ]
   %value

## Efectos de hardware modernos

Hay muchas pequeñas peculiaridades de rendimiento en las computadoras modernas; Solo cubriré dos interesantes aquí:

In [33]:
@benchmark findclosest2($data, $0.5)

BenchmarkTools.Trial: 10000 samples with 4 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m 6.925 μs[22m[39m … [35m351.775 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m16.550 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m15.959 μs[22m[39m ± [32m  5.341 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m▂[39m▂[39m [39m [39m▁[39m [39m▂[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [32m▁[39m[39m [39m█[34m▇[39m[39m▂[39m [39m▂[39m▄[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [39m▆[39m█[39m█[39m█[39m

In [34]:
sorteddata = sort(data)
@benchmark findclosest2($sorteddata, $0.5)

BenchmarkTools.Trial: 10000 samples with 4 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m 6.850 μs[22m[39m … [35m213.850 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m16.550 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m15.640 μs[22m[39m ± [32m  4.504 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m▂[39m▂[39m▁[39m▂[39m▁[39m▂[39m▂[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [32m [39m[39m▁[39m▂[39m█[34m▇[39m[39m▂[39m▁[39m▄[39m▃[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [39m▅[39m█[39m█[39m█[39m

Desafortunadamente, esto no se puede demostrar en una plataforma de nube fortalecida... porque
es un gran riesgo de seguridad!

* https://meltdownattack.com
* https://discourse.julialang.org/t/psa-microbenchmarks-remember-branch-history/17436

In [35]:
idxs = sortperm(data)
sortedview = @view data[idxs]
@benchmark findclosest2($sortedview, $0.5)

BenchmarkTools.Trial: 10000 samples with 3 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m 7.100 μs[22m[39m … [35m136.567 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m16.133 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m15.367 μs[22m[39m ± [32m  4.104 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m [39m [39m▁[39m▁[39m▁[39m▂[39m▁[39m▁[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [32m [39m[39m [39m [34m█[39m[39m▃[39m [39m▂[39m [39m▃[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [39m█[39m▇[39m█[39m█[39m

### Latencias de memoria

| System Event                   | Actual Latency | Scaled Latency |
| ------------------------------ | -------------- | -------------- |
| One CPU cycle                  |     0.4 ns     |     1 s        |
| Level 1 cache access           |     0.9 ns     |     2 s        |
| Level 2 cache access           |     2.8 ns     |     7 s        |
| Level 3 cache access           |      28 ns     |     1 min      |
| Main memory access (DDR DIMM)  |    ~100 ns     |     4 min      |
| Intel Optane memory access     |     <10 μs     |     7 hrs      |
| NVMe SSD I/O                   |     ~25 μs     |    17 hrs      |
| SSD I/O                        |  50–150 μs     | 1.5–4 days     |
| Rotational disk I/O            |    1–10 ms     |   1–9 months   |
| Internet call: SF to NYC       |      65 ms     |     5 years    |
| Internet call: SF to Hong Kong |     141 ms     |    11 years    |

 (Obtenida de: https://www.prowesscorp.com/computer-latency-at-a-human-scale/)

### Conclusiones clave
* ¡Mide, mide, mide!
* Familiarícese con los consejos de rendimiento [Performance Tips](https://docs.julialang.org/en/v1/manual/performance-tips/)
* No tengas miedo de  `@code_typed`/`@code_warntype` y `@code_llvm`