# High Performance, Part I

In [64]:
N=10000

## List comprehension vs generator comprehension

In [65]:
%%timeit
foo=[x for x in range(N)]

143 µs ± 1.18 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [66]:
%%timeit
foo=list((x for x in range(N)))

256 µs ± 442 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


## Loop vs recursion

In [67]:
def func_loop(v:list[int])->int:
    s=0
    for i in v: s+=i
    return s

def func_recursion(v:list[int])->int:
    match v:
        case []: return 0
        case x,*xs: return x+func_recursion(xs)
def func_recursion1(v:list[int])->int:
    if v:
        x,*xs=v
        return x+func_recursion1(xs)
    else:
        return 0
N=1000
v=[x for x in range(N)]

In [68]:
func_loop(v)

499500

In [69]:
%%timeit
func_loop(v)

18.4 µs ± 467 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [70]:
func_recursion(v)

499500

In [71]:
%%timeit
func_recursion(v)

2.11 ms ± 301 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [72]:
%%timeit
func_recursion1(v)

1.75 ms ± 80.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


Conclusion:
- N=10000 is too big for strict recursion
- strict recursion is about 1000 times slower than the for loop 
- the pattern matching statement is as fast as the if-else statement

In [73]:
from typing import Generator
def func_recursion_g(x0,v:list[int])->Generator[int,None,None]:
    match v:
        case []: yield x0
        case x,*xs: yield from func_recursion_g(x+x0,xs)

In [74]:
N=2968
v=[x for x in range(N)]

In [75]:
%%timeit
func_loop(v)

54.5 µs ± 569 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [76]:
%%timeit
foo=func_recursion_g(0,v)
next(foo)

23 ms ± 325 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [77]:
import sys
sys.getrecursionlimit()

3000

Conlusions:
- There is no tail recursion optimization in pythons, 
- Use iteration instead of tail recursion
- There is very little difference between strict and lazy recursion for folding operations.

## Numpy functions vs native python functions

In [78]:
import numpy as np
N=10000
v=[x for x in range(N)]
v=list(map(float,v))

In [79]:
%%timeit
np.sum(v)

241 µs ± 1.57 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [80]:
%%timeit
sum(v)

35 µs ± 4.82 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [81]:
%%timeit
np.sum(np.arange(N))

8.49 µs ± 587 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [82]:
%%timeit
sum(np.arange(N))

430 µs ± 4.51 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


Conclusions for function sum:
- numpy.sum on numpy.array is the fastest, close to 10 times faster than python sum on python list
- python sum on python list is faster than numpy.sum on python list, also close to factor of 10.

In [83]:
%%timeit
np.sin(v)

356 µs ± 46 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [84]:
import math

In [85]:
%%timeit
for i in v : math.sin(i)

319 µs ± 3.18 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [94]:
%%timeit
[math.sin(i) for i in v]

381 µs ± 6.76 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [86]:
%%timeit
np.sin(np.arange(N))

88.9 µs ± 91.9 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [92]:
%%timeit
np.sin(0.5)

482 ns ± 4.67 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [93]:
%%timeit
math.sin(0.5)

26.3 ns ± 1.52 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [89]:
vsin=np.vectorize(math.sin)

In [90]:
%%timeit
vsin(v)

1.01 ms ± 3.31 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [91]:
%%timeit
list(map(math.sin,v))

316 µs ± 772 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


Conclusions for sin:
- non-vectorized computation is just slow.
- math package for single point computation is faster than numpy function, factor of 20, so the real advantage of numpy is vector computation.
- vectorized math.sin on python list is slightly slower than map