# Programming language interoperability (Interop)

## Python

In [1]:
using PythonCall

[32m[1m    CondaPkg [22m[39m[0mFound dependencies: /scratch/hpc-lco-usrtr/bauerc/JuliaUCL24/CondaPkg.toml
[32m[1m    CondaPkg [22m[39m[0mFound dependencies: /scratch/hpc-lco-usrtr/.julia_ucl/packages/PythonCall/wXfah/CondaPkg.toml
[32m[1m    CondaPkg [22m[39m[0mDependencies already up to date


In [2]:
@pyeval "3+3"

Python: 6

In [3]:
np = pyimport("numpy")

Python: <module 'numpy' from '/scratch/hpc-lco-usrtr/bauerc/JuliaUCL24/.CondaPkg/env/lib/python3.12/site-packages/numpy/__init__.py'>

In [4]:
np.linalg.eigvals(np.random.rand(5,5))

Python:
array([ 2.35826646+0.j        ,  0.26870637+0.40290013j,
        0.26870637-0.40290013j, -0.18368499+0.23339492j,
       -0.18368499-0.23339492j])

In [5]:
M = rand(5,5)
np.linalg.eigvals(M)

Python:
array([ 2.80234277+0.j        ,  0.38177933+0.j        ,
        0.16719254+0.j        , -0.35657771+0.18166077j,
       -0.35657771-0.18166077j])

In [6]:
@pyexec """
global sinpi, np
import numpy as np

def sinpi(x):
    return np.sin(np.pi * x)
"""

In [7]:
py_sinpi(x) = pyconvert(Float64, @pyeval("sinpi")(x))

py_sinpi (generic function with 1 method)

In [8]:
py_sinpi(10)

-1.2246467991473533e-15

In [9]:
using BenchmarkTools
@btime py_sinpi(10);
@btime sinpi(10); # built-in Julia function

  2.181 μs (4 allocations: 80 bytes)
  1.492 ns (0 allocations: 0 bytes)


## C

In [10]:
c_code = """
#include <stddef.h>
double c_sum(size_t n, double *X) {
    double s = 0.0;
    for (size_t i = 0; i < n; ++i) {
        s += X[i];
    }
    return s;
}
""";

Compile to a shared library by piping `c_code` to gcc:

In [11]:
using Libdl
const Clib = tempname() * "." * Libdl.dlext

open(`gcc -fPIC -O3 -msse3 -xc -shared -o $Clib -`, "w") do f
    print(f, c_code)
end

In [12]:
Clib

"/tmp/jl_1yL8ZQlaPO.so"

Binding the function from the shared library:

In [13]:
c_sum(X::Array{Float64}) = @ccall Clib.c_sum(length(X)::Csize_t, X::Ptr{Float64})::Float64

c_sum (generic function with 1 method)

In [14]:
c_sum(rand(10))

6.183885800393947

In [15]:
x = rand(10)
@btime c_sum($x);

  5.620 ns (0 allocations: 0 bytes)


## Mixing Julia, Python, and C

Julia (`real`), Python/numpy (`py_sinpi`), C (`c_sum`)

In [16]:
x = rand(10);

In [17]:
abs(py_sinpi(c_sum(x)))

0.9551166719720826

In [18]:
@btime abs(py_sinpi(c_sum($x)));

  2.176 μs (4 allocations: 80 bytes)


See [JuliaInterop](https://github.com/JuliaInterop) for more, such as [RCall.jl](https://github.com/JuliaInterop/RCall.jl), [JavaCall.jl](https://github.com/JuliaInterop/JavaCall.jl), and [MATLAB.jl](https://github.com/JuliaInterop/MATLAB.jl).

# Julia Microbenchmark: Summation

Let's look at and benchmark the sum function:

$$\mathrm{sum}(x) = \sum_{i=1}^n x_i$$

In [19]:
x = rand(10^7);

In [20]:
sum(x)

5.000422208174056e6

In [21]:
d = Dict() # to store the measurement results

Dict{Any, Any}()

## Python

In [22]:
using BenchmarkTools
using PythonCall

### numpy

In [23]:
np = pyimport("numpy")

Python: <module 'numpy' from '/scratch/hpc-lco-usrtr/bauerc/JuliaUCL24/.CondaPkg/env/lib/python3.12/site-packages/numpy/__init__.py'>

In [24]:
numpy_sum = np.sum

Python: <function sum at 0x1552157da970>

In [25]:
b = @benchmark $numpy_sum($x)

BenchmarkTools.Trial: 1160 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m4.218 ms[22m[39m … [35m  9.193 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m4.251 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m4.291 ms[22m[39m ± [32m195.489 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m█[34m▄[39m[39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▆[39m█[34m█[39m[39m▇[39m▆[32m

In [26]:
d["Python (numpy)"] = minimum(b.times) / 1e6

4.218236

### hand-written

In [27]:
@pyexec """
global mysum

def mysum(a):
    s = 0.0
    for x in a:
        s = s + x
    return s
"""

In [28]:
mysum_py = @pyeval("mysum")

Python: <function mysum at 0x15522026b920>

In [29]:
x_py = pylist(x);

In [30]:
b = @benchmark $mysum_py($x_py)

BenchmarkTools.Trial: 15 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m337.983 ms[22m[39m … [35m354.046 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m338.459 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m340.376 ms[22m[39m ± [32m  4.458 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m▂[34m█[39m[39m [39m [39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m█[34m█[39m[39m▅

In [31]:
d["Python (hand-written)"] = minimum(b.times) / 1e6

337.982605

### built-in

In [32]:
# get the Python built-in "sum" function:
pysum = pybuiltins.sum

Python: <built-in function sum>

In [33]:
b = @benchmark $pysum($x_py)

BenchmarkTools.Trial: 76 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m65.021 ms[22m[39m … [35m 72.007 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m65.697 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m65.838 ms[22m[39m ± [32m891.848 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m [39m [39m▁[39m▃[39m [39m [39m▄[39m█[39m▃[34m▃[39m[39m▃[32m▄[39m[39m▆[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▄[39m▄[39m▄[39m▇[39m█[3

In [34]:
d["Python (built-in)"] = minimum(b.times) / 1e6

65.021184

## C

### hand-written

In [35]:
c_code = """
#include <stddef.h>
double c_sum(size_t n, double *X) {
    double s = 0.0;
    for (size_t i = 0; i < n; ++i) {
        s += X[i];
    }
    return s;
}
""";

In [36]:
# compile to a shared library by piping C_code to gcc:
# (only works if you have gcc installed)
using Libdl
const Clib = tempname() * "." * Libdl.dlext



"/tmp/jl_9WPYJmRwxE.so"

In [37]:
open(`gcc -fPIC -O3 -msse3 -xc -shared -o $Clib -`, "w") do f
    print(f, c_code)
end

In [38]:
c_sum(X::Array{Float64}) = @ccall Clib.c_sum(length(X)::Csize_t, X::Ptr{Float64})::Float64

c_sum (generic function with 1 method)

In [39]:
c_sum(x) ≈ sum(x)

true

In [40]:
b = @benchmark c_sum($x)

BenchmarkTools.Trial: 534 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m9.130 ms[22m[39m … [35m 12.211 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m9.318 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m9.351 ms[22m[39m ± [32m233.711 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m [39m▁[39m█[39m [39m [39m [39m [34m [39m[32m▁[39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▃[39m▂[39m▃[39m█[39m█[39m█[39m

In [41]:
d["C"] = minimum(b.times) / 1e6

9.130141

### hand-written (with `-fast-math`)

In [42]:
const Clib_fastmath = tempname() * "." * Libdl.dlext

# The same as above but with a -ffast-math flag added
open(`gcc -fPIC -O3 -msse3 -xc -shared -ffast-math -o $Clib_fastmath -`, "w") do f
    print(f, c_code) 
end

# define a Julia function that calls the C function:
c_sum_fastmath(X::Array{Float64}) = @ccall Clib_fastmath.c_sum(length(X)::Csize_t, X::Ptr{Float64})::Float64

c_sum_fastmath (generic function with 1 method)

In [43]:
b = @benchmark c_sum_fastmath($x)

BenchmarkTools.Trial: 904 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m5.301 ms[22m[39m … [35m  7.829 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m5.498 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m5.517 ms[22m[39m ± [32m121.881 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m [39m▂[39m▃[39m▅[39m▄[34m█[39m[39m▃[32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▂[39m▁[39m▁[39m▁[39m▁[39m▂[39m

In [44]:
d["C (fastmath)"] = minimum(b.times) / 1e6

5.301114

## Julia

### built-in

In [45]:
b = @benchmark sum($x)

BenchmarkTools.Trial: 1388 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m3.545 ms[22m[39m … [35m 4.855 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m3.568 ms              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m3.585 ms[22m[39m ± [32m72.681 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m [39m▁[39m▅[39m█[34m▇[39m[39m▄[39m▂[39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▂[39m▄[39m█[39m█[39m█[39m█[34m█[39m[

In [46]:
d["Julia (built-in)"] = minimum(b.times) / 1e6

3.545317

### built-in (with `Vector{Any}`)

In [47]:
x_any = Vector{Any}(x)
b = @benchmark sum($x_any)

BenchmarkTools.Trial: 25 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m190.826 ms[22m[39m … [35m213.335 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 6.37%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m204.993 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m6.68%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m204.161 ms[22m[39m ± [32m  5.532 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m5.40% ± 2.73%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [32m [39m[39m█[34m█[39m[39m▄[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▆[39m▁[39m▁[39m▆

In [48]:
d["Julia (built-in, Any)"] = minimum(b.times) / 1e6

190.826031

### hand-written

In [49]:
function mysum(A)
    s = zero(eltype(A)) # the correct type of zero for A
    for a in A
        s += a
    end
    return s
end

mysum (generic function with 1 method)

In [50]:
b = @benchmark mysum($x)

BenchmarkTools.Trial: 533 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m9.142 ms[22m[39m … [35m 10.784 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m9.318 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m9.377 ms[22m[39m ± [32m145.659 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▃[39m▇[39m█[39m▇[34m▅[39m[39m▂[39m▁[39m [32m▁[39m[39m [39m▃[39m▄[39m▅[39m▄[39m▂[39m [39m▁[39m [39m [39m [39m [39m [39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▆[39m▄[39m▄[39m▄[39m▁[39m▅[39m

In [51]:
d["Julia (hand-written)"] = minimum(b.times) / 1e6

9.142474

### hand-written (with `@fastmath`)

In [52]:
function mysum_fastmath(A)
    s = zero(eltype(A)) # the correct type of zero for A
    @fastmath for a in A
        s += a
    end
    return s
end

mysum_fastmath (generic function with 1 method)

In [53]:
b = @benchmark mysum_fastmath($x)

BenchmarkTools.Trial: 1468 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m3.335 ms[22m[39m … [35m  5.146 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m3.357 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m3.384 ms[22m[39m ± [32m128.987 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m▄[39m▇[39m█[34m▇[39m[39m▄[39m▁[32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m▂[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m█[39m█[39m█[34m█[39m[39m█[39m

In [54]:
d["Julia (hand-written, fastmath)"] = minimum(b.times) / 1e6

3.334718

## Summary

In [55]:
for (key, value) in sort(collect(d), by=x->x[2])
    println(rpad(key, 30, "."), lpad(round(value, digits=2), 10, "."))
end

Julia (hand-written, fastmath)......3.33
Julia (built-in)....................3.55
Python (numpy)......................4.22
C (fastmath).........................5.3
C...................................9.13
Julia (hand-written)................9.14
Python (built-in)..................65.02
Julia (built-in, Any).............190.83
Python (hand-written).............337.98


And of course, our hand-written Julia implementation is type-generic!

In [56]:
mysum_fastmath(rand(ComplexF64, 10))

5.417515850299518 + 5.714801331679269im