# Julia vs. Python

**Julia Workshop: Compute the future** (23/11/2023)<br>
Presenter: Robbe Ceulemans

## Estimating $\pi$ \[1\]
You have just started learning the basics of Julia, because some random person is convinced it is as easy to learn as **Python** but with better performance. Is it really? Let's put it to the test. In this notebook a simple example of estimating the value of $\pi$ using Monte Carlo sampling is used to compare [\[1\]](https://blakeaw.github.io/2019-09-20-numba-vs-julia/). In a first instance a naive Python implementation is tested:

In [None]:
import numpy as np

In [None]:
def estimate_pi(nMC):
    radius = 1.
    diameter = 2.*radius
    n_circle = 0
    for i in range(nMC):
        x = (np.random.random()-0.5)*diameter
        y = (np.random.random()-0.5)*diameter
        r = np.sqrt(x**2 + y**2)
        if r <= radius:
           n_circle += 1
    return 4.*n_circle/nMC

nMC = 100000000
for i in range(3):
    %time pi_est = estimate_pi(nMC)
    print(pi_est)

Not the best result, but we shouldn't discredit Python too much, because this can be optimized much more. The `numpy` library can be used to a greater extent besides the random number generating done in this example.<br>
<br>
In the next example the `for`-loop, that has a notoriously bad performance in Python, is traded for a vectorized solution.

In [None]:
def estimate_pi(nMC):
    radius = 1.
    diameter = 2.*radius
    xy = (np.random.random((nMC,2))-0.5) * diameter
    r = np.sqrt((xy**2).sum(axis=1))
    circle_mask = r <= radius
    n_circle = r[circle_mask].shape[0]
    print(n_circle)
    return 4.*n_circle/nMC

nMC = 100000000
for i in range(3):
    %time pi_est = estimate_pi(nMC)
    print(pi_est)

Clearly a great improvement compared to the naive implementation. We gain almost a factor 60 in speed. It does require rewriting (vectorizing) the code, which is possible in this case, but certainly not all the time.<br>
<br>
An argument that is often brought up as to why Julia is redundant is that there are extensions to Python that also make use of the JIT-compilation. Simply adding one line of code to the naive implementation in Python gives a remarkable speedup.

In [None]:
import numba

@numba.njit
def estimate_pi(nMC):
    radius = 1.
    diameter = 2.*radius
    n_circle = 0
    for i in range(nMC):
        x = (np.random.random()-0.5)*diameter
        y = (np.random.random()-0.5)*diameter
        r = np.sqrt(x**2 + y**2)
        if r <= radius:
           n_circle += 1
    return 4.*n_circle/nMC

nMC = 100000000
for i in range(3):
    %time pi_est = estimate_pi(nMC)
    print(pi_est)

It's a staggering 180 times faster. So why the need for Julia when packages like Cython and Numba exist? Well, these packages are not guaranteed to speedup all Python functions. There application is limited. Python code was never designed to be compiled.<br>
<br>
What about Julia's performance for this problem?

In [None]:
using BenchmarkTools
function estimate_pi(nMC)
    radius = 1.
    diameter = 2. * radius
    n_circle = 0
    for i in 1:nMC
        x = (rand() - 0.5) * diameter
        y = (rand() - 0.5) * diameter
        r = sqrt(x^2 + y^2)
        if r <= radius
           n_circle += 1
        end
    end
    return (n_circle/nMC) * 4.
end

nMC = 100000000
@benchmark pi_est = estimate_pi(nMC)

Even with the most basic implementation of the problem we still gain a factor of almost 3 in performance. Although the gap in performance is not huge, it is clear that this top-speed comes standard with Julia and is not restricted to a subset of functionalities. The solution that Julia offers is full language. It's designed with speed and performance in mind, while at the same time keeping the implementation as simple as possible.<br>
<br>
For those who are interested in exploring this further, [*the benchmarks game*](https://benchmarksgame-team.pages.debian.net/benchmarksgame/index.html) compares many different toy programs for a whole list of scripting and compiled languages, including Julia.

\[1\] Blake A. Wilson, *Python+Numba vs. Julia; Monte Carlo estimation of Pi*,https://blakeaw.github.io/2019-09-20-numba-vs-julia/, \[September 2019\].

## Lorenz attractor \[2\]

Of course you will find benchmarks of small functions (or methods in the case of Julia) where Python can reach a similar speed as Julia or even cases where Julia is outperformed. People often use one microbenchmark comparison to make claims on the whole package, but this is no reference for real application that we want to use our programming language for. Let's look back at the problem of the Lorenz system. The `scipy` package in Python provides fundamental algorithms, including differential equation solvers. In this example the limitations of the Numba package become evident.<br>

In [None]:
import numpy as np
from scipy.integrate import odeint
import timeit
import numba
 
def lorenz(u, t, sigma, rho, beta):
    x, y, z = u
    return [sigma * (y - x), x * (rho - z) - y, x * y - beta * z]
 
u0 = [1.0,0.0,0.0]
tspan = (0., 100.)
t = np.linspace(0, 100, 1001)
sol = odeint(lorenz, u0, t, args=(10.0,28.0,8/3))

def time_func():
    odeint(lorenz, u0, t, args=(10.0,28.0,8/3),rtol = 1e-8, atol=1e-8)
 
time_func()
timeit.Timer(time_func).timeit(number=100)/100 

In [None]:
numba_f = numba.jit(lorenz,nopython=True)
def time_func():
    odeint(numba_f, u0, t, args=(10.0,28.0,8/3),rtol = 1e-8, atol=1e-8)
 
time_func()
timeit.Timer(time_func).timeit(number=100)/100 

Where you can optimize the function `lorenz()` itself, which is called (multiple times) at each timestep, it is not possible to further optimize the code that sits between two of your optimized function calls. This is exactly what can hurt your performance.

Now you can allow the function to be Julia code using the very recent `diffeqpy` library which introduces the *DifferentialEquations.jl* package to python.

In [None]:
from julia import Main
from diffeqpy import de
import numpy as np
jul_f = Main.eval("""
function f(du,u,p,t)
  x, y, z = u[1], u[2], u[3]
  sigma, rho, beta = p[1], p[2], p[3]
  du[1] = sigma * (y - x)
  du[2] = x * (rho - z) - y
  du[3] = x * y - beta * z
end""")
u0 = [1.0,0.0,0.0]
tspan = (0., 100.)
t = np.linspace(0, 100, 1001)
p = [10.0,28.0,8/3]
prob = de.ODEProblem(jul_f, u0, tspan, p)
sol = de.solve(prob,saveat=t,abstol=1e-8,reltol=1e-8)
 
def time_func():
    sol = de.solve(prob,saveat=t,abstol=1e-8,reltol=1e-8)
 
time_func()
timeit.Timer(time_func).timeit(number=100)/100 # 0.0033111710000000016 seconds

This way you can avoid the Python context switches that were the remaining bottleneck.<br>
<br>
Again we can compare with the most basic implementation in Julia and see that there is already a gain in performance.

In [2]:
using DifferentialEquations, BenchmarkTools
function lorenz!(du,u,p,t)
    du[1] = p[1]*(u[2]-u[1])
    du[2] = u[1]*(p[2]-u[3]) - u[2]
    du[3] = u[1]*u[2] - p[3]*u[3]
end
u0 = [1.0;0.0;0.0]
p = [10.0,28.0,8/3]
tspan = (0.0,100.0)
prob = ODEProblem(lorenz!,u0,tspan,p)
@btime solve(prob,saveat=0.1,reltol=1e-8,abstol=1e-8); # 2.467 ms (13436 allocations: 1.00 MiB)
@btime solve(prob,Tsit5(),saveat=0.1,reltol=1e-8,abstol=1e-8); # 2.904 ms (1081 allocations: 155.70 KiB)

  2.211 ms (8127 allocations: 701.91 KiB)
  2.795 ms (1053 allocations: 127.42 KiB)


With the most optimized implementation that we wrote the difference becomes even bigger.

In [3]:
using StaticArrays
function lorenz_static(u,p,t)
    @inbounds begin
        dx = p[1]*(u[2]-u[1])
        dy = u[1]*(p[2]-u[3]) - u[2]
        dz = u[1]*u[2] - p[3]*u[3]
    end
    @SVector [dx,dy,dz]
end
u0 = @SVector [1.0,0.0,0.0]
p  = @SVector [10.0,28.0,8/3]
tspan = (0.0,100.0)
prob = ODEProblem(lorenz_static,u0,tspan,p)
@btime solve(prob,saveat=0.1,reltol=1e-8,abstol=1e-8);
@btime solve(prob,Tsit5(),saveat=0.1,reltol=1e-8,abstol=1e-8);

  1.078 ms (75 allocations: 89.66 KiB)
  1.661 ms (27 allocations: 63.19 KiB)


\[2\] Chris Rackauckas, *Why Numba and Cython are not substitutes for Julia*, Stochastic Life, http://www.stochasticlifestyle.com/why-numba-and-cython-are-not-substitutes-for-julia/, \[August 2018\]