# Speed
One of the biggest advantages of Julia is its speedy runtime, which doesn't come at the cost of a steep learning curve for the language (in contrast to C). We can call different languages within Julia and use the `BenchmarkTools` package to generate accurate estimates of the runtime for each of them.

In [2]:
using BenchmarkTools
using Libdl

Below, we will compare 8 implementations of a simple function to sum the values in a vector:
* C hand-written
* C hand-written (with SIMD)
* Python built-in
* Python numpy
* Python hand-written
* Julia built-in
* Julia hand-written
* Julia hand-written (with SIMD)

SIMD = Single Instruction, Multiple Data (a protocol for efficiently carrying out the same operation across a collection of data points)

The test data will be a vector of 10 million points, randomly sampled between 0 and 1. The expected sum is 5e-06.

In [3]:
a = rand(10^7)

10000000-element Vector{Float64}:
 0.5935398849264476
 0.23994742230378574
 0.08119081157407271
 0.6136316802900761
 0.7297958336412042
 0.636462438511688
 0.9667670690076329
 0.11866606250201062
 0.9350907377571541
 0.29406742624604565
 0.550551935542502
 0.4182250028457005
 0.772469868246459
 ⋮
 0.23565738236694933
 0.030347460542345828
 0.07267701622156353
 0.22729196760881665
 0.058697647900362804
 0.11304761384773554
 0.10758199517908162
 0.9984043059213005
 0.3237884470692286
 0.6975879478227242
 0.9625706804902854
 0.6315852011793057

For each method, we will write the lowest speed obtained to a dict

In [5]:
performance = Dict()

Dict{Any, Any}()

## C
### C hand-written
Don't ask me what this C code actually does, but according to the JupyterHub Introduction to Julia course it is correct (see [this](https://github.com/JuliaAcademy/Introduction-to-Julia/blob/main/9%20-%20Julia%20is%20fast.ipynb) notebook)

In [4]:
using Libdl
C_code = """
#include <stddef.h>
double c_sum(size_t n, double *X) {
    double s = 0.0;
    for (size_t i = 0; i < n; ++i) {
        s += X[i];
    }
    return s;
}
"""

const Clib = tempname()   # make a temporary file


# compile to a shared library by piping C_code to gcc
# (works only if you have gcc installed):

open(`gcc -fPIC -O3 -msse3 -xc -shared -o $(Clib * "." * Libdl.dlext) -`, "w") do f
    print(f, C_code) 
end

# define a Julia function that calls the C function:
c_sum(X::Array{Float64}) = ccall(("c_sum", Clib), Float64, (Csize_t, Ptr{Float64}), length(X), X)

c_sum (generic function with 1 method)

In [9]:
c_bench = @benchmark c_sum($a)
performance["C hand-written"] = minimum(c_bench.times) / 1e6 # in milliseconds

9.304666

### C hand-written (with SIMD)
We use the same code as written above, but add the `-ffast-math` flag to specify the SIMD implementation

In [10]:
const Clib_fastmath = tempname()   # make a temporary file

# The same as above but with a -ffast-math flag added
open(`gcc -fPIC -O3 -msse3 -xc -shared -ffast-math -o $(Clib_fastmath * "." * Libdl.dlext) -`, "w") do f
    print(f, C_code) 
end

# define a Julia function that calls the C function:
c_sum_fastmath(X::Array{Float64}) = ccall(("c_sum", Clib_fastmath), Float64, (Csize_t, Ptr{Float64}), length(X), X)

c_sum_fastmath (generic function with 1 method)

In [11]:
c_simd_bench = @benchmark c_sum_fastmath($a)
performance["C hand-written SIMD"] = minimum(c_simd_bench.times) / 1e6

1.778125

## Python
We can use the `PyCall` package to call Python commands within Julia

In [14]:
using PyCall

### Python built-in

In [16]:
pysum = pybuiltin("sum")
python_built_in_bench = @benchmark $pysum($a)
performance["Python built-in"] = minimum(python_built_in_bench.times) / 1e6

984.474125

### Python numpy
`numpy` should use SIMD if it is available on your hardware, so we are giving Python the best shot at performing well

In [17]:
using Conda
Conda.add("numpy")

[32m[1m   Resolving[22m[39m package versions...
[32m[1m    Updating[22m[39m `~/.julia/environments/v1.7/Project.toml`
 [90m [8f4d0f93] [39m[92m+ Conda v1.7.0[39m
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Manifest.toml`
┌ Info: Running `conda install -y numpy` in root environment
└ @ Conda /Users/kmlm215/.julia/packages/Conda/x2UxR/src/Conda.jl:127


Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.



In [19]:
numpy_sum = pyimport("numpy")["sum"]
python_numpy_bench = @benchmark $numpy_sum($a)
performance["Python numpy"] = minimum(python_numpy_bench.times) / 1e6

1.887292

### Python hand-written

In [20]:
py"""
def py_sum(A):
    s = 0.0
    for a in A:
        s += a
    return s
"""

sum_py = py"py_sum"
python_hand_written_bench = @benchmark $sum_py($a)
performance["Python hand-written"] = minimum(python_hand_written_bench.times) / 1e6

1126.752292

## Julia

### Julia built-in

In [23]:
julia_built_in_bench = @benchmark sum($a)
performance["Julia built-in"] = minimum(julia_built_in_bench.times) / 1e6

1.829333

### Julia hand-written

In [24]:
function juliasum(A)   
    s = 0.0
    for a in A
        s += a
    end
    s
end
julia_hand_written_bench = @benchmark $juliasum($a)
performance["Julia hand-written"] = minimum(julia_hand_written_bench.times) / 1e6

9.312375

### Julia hand-written with SIMD

In [25]:
function juliasum_simd(A)   
    s = 0.0
    @simd for a in A
        s += a
    end
    s
end
julia_hand_written_simd_bench = @benchmark $juliasum_simd($a)
performance["Julia hand-written SIMD"] = minimum(julia_hand_written_simd_bench.times) / 1e6

2.352334

## Results
If we sort the runtimes in ascending order, we see that:

1. The Julia built-in function is as quick as the C hand-written function with SIMD
2. The Julia hand-written function is as quick as the C hand-written function
3. The Julia hand-written function with SIMD is nearly as fast as numpy
4. The Python built-in and hand-written functions are 3-4 orders of magnitude slower than Julia

In [26]:
for (key, value) in sort(collect(performance), by=last)
    println(rpad(key, 25, "."), lpad(round(value; digits=1), 6, "."))
end

C hand-written SIMD.........1.8
Julia built-in..............1.8
Python numpy................1.9
Julia hand-written SIMD.....2.4
C hand-written..............9.3
Julia hand-written..........9.3
Python built-in...........984.5
Python hand-written......1126.8


## Conclusions
Julia allows us to write code that reads like Python and runs like C. Python has additional packages that are optimised for speed e.g. numpy, but these are much harder to tweak for specific purposes (35% of the numpy repo is C). In Julia, we can tweak existing code for our purpose, or write our own code from scratch, and be confident that it will still run quickly. 