# JIT compilation

Julia code is JIT-compiled on a language level:

In [1]:
using BenchmarkTools

SyntaxError: invalid syntax (Temp/ipykernel_31480/3233607441.py, line 1)

`:` is used to create ranges, both integer and floating point. Note, that Julia uses `1`-based indexing and hence right edge is included:

In [2]:
N = 100000
# : is used to create ranges
a = collect(1.:N)
typeof(a)

SyntaxError: invalid syntax (Temp/ipykernel_31480/3395463998.py, line 3)

In [3]:
a[1:5]

5-element Array{Float64,1}:
 1.0
 2.0
 3.0
 4.0
 5.0

In [4]:
a[end-4:end]

5-element Array{Float64,1}:
  99996.0
  99997.0
  99998.0
  99999.0
 100000.0

In [5]:
@btime sum(a)

  12.001 μs (1 allocation: 16 bytes)


5.00005e9

In [6]:
@btime reduce(+, a)

  12.154 μs (1 allocation: 16 bytes)


5.00005e9

In [7]:
function sum_array(a)
    result = 0.

    for el in a
        result += el
    end
    return result
end

sum_array (generic function with 1 method)

In [8]:
@assert sum_array(a) == sum(a)

In [9]:
@btime sum_array(a)

  89.134 μs (1 allocation: 16 bytes)


5.00005e9

You can always look inside the code on different levels:

In [10]:
@code_typed sum_array(a)

CodeInfo(
[90m1 ──[39m %1  = Base.bitcast(UInt64, 1)[36m::UInt64[39m
[90m│   [39m %2  = Base.bitcast(UInt64, 1)[36m::UInt64[39m
[90m│   [39m %3  = Base.sub_int(%1, %2)[36m::UInt64[39m
[90m│   [39m %4  = Base.arraylen(a)[36m::Int64[39m
[90m│   [39m %5  = Base.sle_int(0, %4)[36m::Bool[39m
[90m│   [39m %6  = Base.bitcast(UInt64, %4)[36m::UInt64[39m
[90m│   [39m %7  = Base.ult_int(%3, %6)[36m::Bool[39m
[90m│   [39m %8  = Base.and_int(%5, %7)[36m::Bool[39m
[90m└───[39m       goto #3 if not %8
[90m2 ──[39m %10 = Base.arrayref(false, a, 1)[36m::Float64[39m
[90m│   [39m %11 = Base.add_int(1, 1)[36m::Int64[39m
[90m└───[39m       goto #4
[90m3 ──[39m       goto #4
[90m4 ┄─[39m %14 = φ (#2 => false, #3 => true)[36m::Bool[39m
[90m│   [39m %15 = φ (#2 => %10)[36m::Float64[39m
[90m│   [39m %16 = φ (#2 => %11)[36m::Int64[39m
[90m└───[39m       goto #5
[90m5 ──[39m %18 = Base.not_int(%14)[36m::Bool[39m
[90m└───[39m       goto #11 if not

# Multiple dispatch

Custom types are as first-class citizens as built-in's are:

In [11]:
abstract type Geometry end

mutable struct Point{N<:Number} <: Geometry
    x::N
    y::N
end

Note, that `Point` is a *parametrized* type:

In [12]:
Point(1,2)

Point{Int64}(1, 2)

In [13]:
Point(1., 2.)

Point{Float64}(1.0, 2.0)

We can make it look nearly as numbers:

In [14]:
import Base: +, -, /, *

In [15]:
+(a::Point, b::Point) = Point(a.x + b.x, a.y + b.y)
-(a::Point, b::Point) = Point(a.x - b.x, a.y - b.y)
/(a::Point, f::Number) = Point(a.x / f, a.y / f)

/ (generic function with 119 methods)

In [16]:
Point(1., 2.) + Point(3, 4)

Point{Float64}(4.0, 6.0)

In [17]:
Point(1., 2.) - Point(3, 4)

Point{Float64}(-2.0, -2.0)

In [18]:
(Point(1., 2.) + Point(3, 4)) / 2

Point{Float64}(2.0, 3.0)

Or create array with it:

In [19]:
[Point(a, a+1) for a in 1:10]

10-element Array{Point{Int64},1}:
 Point{Int64}(1, 2)
 Point{Int64}(2, 3)
 Point{Int64}(3, 4)
 Point{Int64}(4, 5)
 Point{Int64}(5, 6)
 Point{Int64}(6, 7)
 Point{Int64}(7, 8)
 Point{Int64}(8, 9)
 Point{Int64}(9, 10)
 Point{Int64}(10, 11)

Generic programming works nicely:

In [20]:
using Statistics
mean([Point(a, a+1) for a in 1:10])

Point{Float64}(5.5, 6.5)

Function in Julia have *methods* (i.e. definitions for various type combinations):

In [21]:
methods(+)

# Performance

Julia is generally faster or at least on par with NumPy (even with Numba's JIT). Linear algebra and high-performance array are parts of the language:

In [22]:
N = 1000
a = randn(Float64, (N, N))
b = randn(Float64, (N, N));

Dot product:

In [23]:
@btime a'b

  11.793 ms (3 allocations: 7.63 MiB)


1000×1000 Array{Float64,2}:
   36.3836     2.68183  -18.8163    …   46.0052   -17.0937     -1.85315
  -14.9754    31.3802    20.6415        21.6241    33.3622    -28.7524
  -23.6128   -70.6184   -31.2678        27.1373    -5.25556     9.67235
  -26.7845   -41.4347    31.7274        18.0297   -32.3872    -17.052
 -123.482    -39.9744    46.5933       -20.5944   -61.9957     39.4594
  -15.7643    -7.89918    0.87401   …   19.5444   -22.5482     -9.8964
   27.7172   -29.3661    -1.12873       -1.76254   -5.50504   -18.9882
  -14.0425   -37.7022    -0.609573      45.9838   -36.3091     55.8736
   45.6109    12.7233     8.83768      -31.4803    -2.19259    -1.47937
  -40.6316     6.48881    1.23287       12.9021   -22.5957      0.955281
   35.0708    -5.0211    -2.86876   …   -9.09642   11.8081     23.8518
    8.107     36.0158   -12.9676        26.3615    44.8587     19.0931
  -50.2948   -59.2748   -28.9821        44.0984    47.9837    -22.355
    ⋮                               ⋱         

# Vectorization and fusion

Let's define a scalar function:

In [24]:
relu(a::Number) = max(0, a)

relu (generic function with 1 method)

And vectorize (i.e. *broadcast*) it:

In [25]:
@btime relu.(a)

  602.067 μs (5 allocations: 7.63 MiB)


1000×1000 Array{Float64,2}:
 0.0        0.0       0.760295   0.417175  …  1.29851     0.576061  0.0
 0.0        0.0       0.815669   0.0          0.0         0.956743  1.89822
 0.0        0.0       0.0        0.445039     2.07947     0.0       0.0
 2.62331    0.781411  0.0        0.695569     0.0         0.895726  0.521313
 0.220799   0.0       0.171349   0.169255     0.0         0.721496  0.0
 0.0        0.420295  0.0        0.0       …  0.551026    0.0       0.0
 0.0        0.0       0.529636   0.459494     0.0         0.469946  0.0
 0.934899   2.34038   0.0        0.0          0.0         1.47002   0.0821255
 0.0        0.584367  0.290647   0.640149     1.61115     0.0       0.0
 2.04438    0.219168  0.0181634  0.0          0.256746    0.0       0.123959
 0.0        0.0       0.686056   0.0       …  1.09091     0.0       0.64629
 0.323789   0.097196  1.24745    0.0          0.0         0.348634  0.0
 0.0        1.40279   3.26789    1.75513      0.0         0.237101  0.479302
 ⋮     

Of course, we do not need a separate functions for this, cause broadcasting works as expected directly:

In [26]:
max.(0, a)

1000×1000 Array{Float64,2}:
 0.0        0.0       0.760295   0.417175  …  1.29851     0.576061  0.0
 0.0        0.0       0.815669   0.0          0.0         0.956743  1.89822
 0.0        0.0       0.0        0.445039     2.07947     0.0       0.0
 2.62331    0.781411  0.0        0.695569     0.0         0.895726  0.521313
 0.220799   0.0       0.171349   0.169255     0.0         0.721496  0.0
 0.0        0.420295  0.0        0.0       …  0.551026    0.0       0.0
 0.0        0.0       0.529636   0.459494     0.0         0.469946  0.0
 0.934899   2.34038   0.0        0.0          0.0         1.47002   0.0821255
 0.0        0.584367  0.290647   0.640149     1.61115     0.0       0.0
 2.04438    0.219168  0.0181634  0.0          0.256746    0.0       0.123959
 0.0        0.0       0.686056   0.0       …  1.09091     0.0       0.64629
 0.323789   0.097196  1.24745    0.0          0.0         0.348634  0.0
 0.0        1.40279   3.26789    1.75513      0.0         0.237101  0.479302
 ⋮     

This may not sound cool, unless you broadcast and *fuse* (which is generally not possible in Python):

In [27]:
@btime relu.(relu.(a))

  639.813 μs (7 allocations: 7.63 MiB)


1000×1000 Array{Float64,2}:
 0.0        0.0       0.760295   0.417175  …  1.29851     0.576061  0.0
 0.0        0.0       0.815669   0.0          0.0         0.956743  1.89822
 0.0        0.0       0.0        0.445039     2.07947     0.0       0.0
 2.62331    0.781411  0.0        0.695569     0.0         0.895726  0.521313
 0.220799   0.0       0.171349   0.169255     0.0         0.721496  0.0
 0.0        0.420295  0.0        0.0       …  0.551026    0.0       0.0
 0.0        0.0       0.529636   0.459494     0.0         0.469946  0.0
 0.934899   2.34038   0.0        0.0          0.0         1.47002   0.0821255
 0.0        0.584367  0.290647   0.640149     1.61115     0.0       0.0
 2.04438    0.219168  0.0181634  0.0          0.256746    0.0       0.123959
 0.0        0.0       0.686056   0.0       …  1.09091     0.0       0.64629
 0.323789   0.097196  1.24745    0.0          0.0         0.348634  0.0
 0.0        1.40279   3.26789    1.75513      0.0         0.237101  0.479302
 ⋮     

# Custom types

Let's create a type for one-hot encoded matrix.

In [28]:
import Base: size, getindex

In [29]:
mutable struct OneHot <: AbstractArray{AbstractFloat, 2}
    nclasses::Int64
    classes::Array{Int64, 1}

    function OneHot(ncls, cls)
        ncls < maximum(cls) ? throw(ErrorException("actual number of classes $(maximum(cls)) > $ncls")) : new(ncls, cls)
    end
end

size(o::OneHot) = (length(o.classes), o.nclasses)

function Base.getindex(o::OneHot, i)
    o.classes[i]
end

function Base.getindex(o::OneHot, i, j)
    Float64(o.classes[i] == j)
end


In [30]:
OneHot(2, [1, 3, 1])

LoadError: actual number of classes 3 > 2

In [31]:
o = OneHot(2, [1, 2, 1])

3×2 OneHot:
 1.0  0.0
 0.0  1.0
 1.0  0.0

In [32]:
o[2]

2

In [33]:
A = [[1, 2, 3] [4, 5, 6] [7, 8, 9]]
v = [1, 2, 3]

3-element Array{Int64,1}:
 1
 2
 3

In [34]:
A

3×3 Array{Int64,2}:
 1  4  7
 2  5  8
 3  6  9

In [35]:
A * o

3×2 Array{Any,2}:
  8.0  4.0
 10.0  5.0
 12.0  6.0

In [36]:
@btime A * o

  97.277 ns (7 allocations: 224 bytes)


3×2 Array{Any,2}:
  8.0  4.0
 10.0  5.0
 12.0  6.0

In [37]:
o_ar = [[1., 0, 1] [0, 1, 0]]

3×2 Array{Float64,2}:
 1.0  0.0
 0.0  1.0
 1.0  0.0

In [38]:
A * o_ar

3×2 Array{Float64,2}:
  8.0  4.0
 10.0  5.0
 12.0  6.0

In [39]:
@btime A * o_ar

  171.811 ns (5 allocations: 352 bytes)


3×2 Array{Float64,2}:
  8.0  4.0
 10.0  5.0
 12.0  6.0