# Profiling of GAlibrate's core genetic algorithm implementations

In this notebook, we will profile the performance of some of GAlibrates implementations, including the Python, and Numba, versions using cProfile (via the `prun` magic command). Since the Cythonized and Julia-enhanced versions of the GAO port several core functions to non-Python, they aren't captured by cProfile, so we won't try profiling them here.

To run this notebook we need to have galibrate installed along with NumPy and Numba.  You also need the microbench package.


------

## Imports and setup

First we'll do all our imports:

In [2]:
import numpy as np
import numba

In [3]:
# galibrate imports
import galibrate
from galibrate import gao
from galibrate.sampled_parameter import SampledParameter



In [4]:
#from allversions import *

In [5]:
# Manually import each of the run_gao modules -- normally galibrate does this internally
# and automatically assigns the version based whether Numba and/or Cython are installed.
# Python-only
from galibrate import run_gao_py

In [6]:
# Numba accelerated
from galibrate import run_gao_numba

And we'll setup a custom Bench from microbench to capture some key info about the Python version and some of the library versions:

In [30]:
# Setup a microbench for meta data collection
from microbench import MicroBench, MBCondaPackages, MBHostInfo, MBPythonVersion

class GaoBench(MicroBench, MBHostInfo, MBPythonVersion):
    capture_versions = (np, numba)

   

In [31]:
gaobench = GaoBench()

@gaobench
def meta():
    pass
    
meta() 

In [32]:
bench_results = pd.read_json(gaobench.outfile.getvalue(), lines=True)

In [33]:
# Python version
bench_results['python_version'][0]

'3.10.11'

In [34]:
# Here julia is the PyJulia package.
for pack in bench_results['package_versions']:
    print(pack)

{'numpy': '1.23.5', 'numba': '0.56.4'}


#### OS and hardware

  * Windows 11 - 64-bit operating system, x64-based processor
  * Processor: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz   2.30 GHz
  * RAM: 16.0 GB (15.6 GB usable)

## Model problem and fitness function

For the purposes of this testing, we'll use the N-dimensional sphere function defined in `galibrate.benchmarks` with a parameter search in \[-100:100\].

Here is the fitness function:

In [14]:
from galibrate.benchmarks import sphere

def fitness(chromosome):
    return -sphere(chromosome)

In [17]:
# Define the parameters for the GAO.
# 100 parameters
ndim = 100
# We'll fix the population size at 100:
popsize = 1000
# And the number of generations also to 100:
generations = 500    
# Loop over the dimensionality (number of parameters)
#print("Profiling: ", version)
sampled_parameters = [SampledParameter(name=i, loc=-100.0, width=200.0) for i in range(ndim)]

------

## Python-only

We'll start by examining the profiling of the base Python-only version:

In [18]:
gao.run_gao = run_gao_py

In [19]:
%%prun -s cumulative -q -l 20 -T cprof.out

go = gao.GAO(sampled_parameters, fitness, popsize, generations=generations)
#print(ndim, popsize, gen)
go.run()


 
*** Profile printout saved to text file 'cprof.out'.


In [20]:
print(open('cprof.out', 'r').read())

         33474994 function calls in 20.046 seconds

   Ordered by: cumulative time
   List reduced from 55 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   20.046   20.046 {built-in method builtins.exec}
        1    0.000    0.000   20.046   20.046 <string>:1(<module>)
        1    0.000    0.000   20.046   20.046 gao.py:136(run)
        1    0.849    0.849   20.041   20.041 run_gao_py.py:15(run_gao)
      499    6.417    0.013   15.101    0.030 run_gao_py.py:182(mutation)
 27693469    8.781    0.000    8.781    0.000 {method 'random' of 'numpy.random.mtrand.RandomState' objects}
      500    0.004    0.000    3.165    0.006 run_gao_py.py:27(evaluate_fitnesses)
      500    0.189    0.000    3.138    0.006 run_gao_py.py:28(<listcomp>)
   501000    0.159    0.000    2.954    0.000 3982212097.py:3(fitness)
   501000    0.509    0.000    2.795    0.000 benchmarks.py:15(sphere)
   501000    0.216    0.000 

### Key results

We can see of the `run_gao_py` functions (besides `run_gao`) the most expensive are the following:

| name | cumtime | percall | tottime | ncalls |
| ------ | ------ | ------ | ------ | ------ |
| mutation | 15.101 | 0.013 | 6.417 | 499 | 
| evaluate_fitnesses | 3.165 | 0.000 | 0.004 | 500 |
| crossover | 0.524 | 0.000 | 0.369 | 125000 |
| choose_mating_pairs | 0.274 | 0.000 | 0.204 | 500 |
| random_population | 0.084 | 0.083 | 0.083 | 1 |

We can see that applying mutations (`mutation`) and evaluating the fitnesses (`evaluate_fitnesses`) are the major bottlenecks, followed by the crossover operation (`crossover`) and selection of mating pairs (`choose_mating_pairs`). 

------

## Numba version

The Numba-enhanced version of the GAO compiles several core functions, including the `mutation` and `crossover` functions. It also encapsulates some of the other parts of the `run_gao` function into compilable functions to try and further improve overall performance. 


In [21]:
gao.run_gao = run_gao_numba

In [22]:
%%prun -s cumulative -q -l 20 -T cprof.out

go = gao.GAO(sampled_parameters, fitness, popsize, generations=generations)
#print(ndim, popsize, gen)
go.run()


 
*** Profile printout saved to text file 'cprof.out'.


In [23]:
print(open('cprof.out', 'r').read())

         3078193 function calls (3073219 primitive calls) in 2.328 seconds

   Ordered by: cumulative time
   List reduced from 1389 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     69/1    0.000    0.000    2.328    2.328 {built-in method builtins.exec}
        1    0.000    0.000    2.328    2.328 <string>:1(<module>)
        1    0.000    0.000    2.328    2.328 gao.py:136(run)
        1    0.017    0.017    2.320    2.320 run_gao_numba.py:17(run_gao)
      500    0.000    0.000    1.631    0.003 run_gao_numba.py:43(evaluate_fitnesses)
      500    0.129    0.000    1.631    0.003 run_gao_numba.py:179(_compute_fitnesses)
   251500    0.077    0.000    1.509    0.000 3982212097.py:3(fitness)
   251500    0.261    0.000    1.432    0.000 benchmarks.py:15(sphere)
   251500    0.110    0.000    1.171    0.000 <__array_function__ internals>:177(sum)
   252501    0.096    0.000    1.058    0.000 {built-in method numpy.core._multia

### Key results

We can see for this version, `run_gao_numba`, it is roughly a factor of ten times faster (2.328 s versus 20.046 s):

| name | cumtime | percall | tottime | ncalls |
| ------ | ------ | ------ | ------ | ------ |
| evaluate_fitnesses | 1.631 | 0.000 | 0.000 | 500 |
| mutation | 0.197 | 0.000 | 0.197 | 499 |

Although the `mutation` function shows up in the list, it has been significantly reduced by > 50x. The remaining major bottleneck is evaluating the fitnesses (`evaluate_fitness`). Interestingly, the cumulative time of the `evaluate_fitness` function is lower in this case (by roughly a factor of 2).

------