# **Profiling and Pure Python Optimization**

## 2nd Qishi Advanced Python Programming Study Group

## 8/7/2022

______________________________________________________________________________

# **Overview of the lectures**

1. Profiling and pure python optimization (Today)

2. Cython, numba, and other compilers 

3. Automatic differentiation 

4. Accelerated linear algebra 

5. Concurrency: asyncio and parallel processing 

6. Threading and cocurrennt web requests 

7. Processes 

8. Deadlocks, starvation, race conditions, and GIL

# **Profiling**

A profile is a set of statistics that describes how often and for how long various parts of the program executed.

`cProfile` is a C extension with reasonable overhead that makes it suitable for profiling long-running programs. 

Advantages of `cProfile`:

*   It gives you the total run time taken by the entire code.
*   It also shows the time taken by each individual step. This allows you to compare and find which parts need optimization.
*   It tells the number of times certain functions are being called.
*   The data inferred can be exported easily using `pstats` module.
*   The data can be visualized nicely using `snakeviz` module. 

### **Example of `cProfile`**

In [4]:
import numpy as np

In [5]:
def is_prime(n):
    """check if a given integer is prime or not"""
    
    if n == 2:
        return True
    elif n < 2 or n % 2 == 0:
        return False
    for i in range(3,int(np.sqrt(n))+1,2):
        if n % i == 0:
            return False
    return True

In [6]:
def primes_between(a,b):
    """ get all primes in [a,b] """
    return np.array([n for n in range(a,b+1) if is_prime(n)])

In [7]:
def ratio(x):
    """ ratio of the number of primes not exceeding x and x/ln(x) """
    return len(primes_between(1,x)) / (x / np.log(x))

In [8]:
ratio(1000000)

1.0844899477790795

In [9]:
import cProfile

In [10]:
pr = cProfile.Profile()
pr.enable()
ratio(1000000)
pr.disable()
pr.print_stats()

         1000058 function calls in 3.083 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1000000    2.834    0.000    2.834    0.000 1854258423.py:1(is_prime)
        1    0.000    0.000    3.083    3.083 2744068205.py:1(primes_between)
        1    0.243    0.243    3.078    3.078 2744068205.py:3(<listcomp>)
        1    0.000    0.000    3.083    3.083 306192731.py:1(<cell line: 3>)
        1    0.000    0.000    0.000    0.000 306192731.py:1(<cell line: 4>)
        1    0.000    0.000    3.083    3.083 4139244131.py:1(ratio)
        2    0.000    0.000    0.000    0.000 codeop.py:149(__call__)
        4    0.000    0.000    0.000    0.000 compilerop.py:174(extra_flags)
        2    0.000    0.000    0.000    0.000 contextlib.py:102(__init__)
        2    0.000    0.000    0.000    0.000 contextlib.py:130(__enter__)
        2    0.000    0.000    0.000    0.000 contextlib.py:139(__exit__)
        2    0.000    0.000    

Improvement using sieve of eratosthenes:



In [11]:
def countPrimes(x):
    x = int(x+1)
    if x < 2:
        return 0
    primes = np.ones(x)
    primes[0] = primes[1] = 0
    for i in range(2, int(x ** 0.5) + 1):
        if primes[i]:
            primes[i * i: x: i] = np.zeros( len(primes[i * i: x: i]) )
    return sum(primes)

In [12]:
def ratio_2(x):
    """ ratio of the number of primes not exceeding x and x/ln(x) """
    return countPrimes(x) / (x / np.log(x))

In [13]:
pr = cProfile.Profile()
pr.enable()
ratio_2(1000000)
pr.disable()
pr.print_stats()

         397 function calls in 0.077 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.013    0.013    0.076    0.076 1033194060.py:1(countPrimes)
        1    0.000    0.000    0.076    0.076 1827341102.py:1(ratio_2)
        1    0.000    0.000    0.076    0.076 2690932688.py:1(<cell line: 3>)
        1    0.000    0.000    0.000    0.000 2690932688.py:1(<cell line: 4>)
        1    0.000    0.000    0.001    0.001 <__array_function__ internals>:2(copyto)
        2    0.000    0.000    0.000    0.000 codeop.py:149(__call__)
        4    0.000    0.000    0.000    0.000 compilerop.py:174(extra_flags)
        2    0.000    0.000    0.000    0.000 contextlib.py:102(__init__)
        2    0.000    0.000    0.000    0.000 contextlib.py:130(__enter__)
        2    0.000    0.000    0.000    0.000 contextlib.py:139(__exit__)
        2    0.000    0.000    0.000    0.000 contextlib.py:279(helper)
        4    0.000   

# **Numpy**

1.   Indexing
2.   Broadcasting
3.   Combining `ndarray`s
4.   Splitting `ndarray`s
5.   Vectorization
6.   `numexpr`



### **Indexing**

In [14]:
x = np.arange(24).reshape((2,3,4))
x

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

### Extract elements in numpy array

In [15]:
x[:,1:,1:3]

array([[[ 5,  6],
        [ 9, 10]],

       [[17, 18],
        [21, 22]]])

In [16]:
x[:,1:,::2]

array([[[ 4,  6],
        [ 8, 10]],

       [[16, 18],
        [20, 22]]])

Fancy indexing

In [17]:
x[np.ix_([0,1], [0,2], [0,1,3])]

array([[[ 0,  1,  3],
        [ 8,  9, 11]],

       [[12, 13, 15],
        [20, 21, 23]]])

Negative indices

In [18]:
x[:,:,-1:-3:-1]

array([[[ 3,  2],
        [ 7,  6],
        [11, 10]],

       [[15, 14],
        [19, 18],
        [23, 22]]])

Warning: a slice is a view, not a copy

In [19]:
y = x[1:,:,:]
y[:,0,:] = 100
x

array([[[  0,   1,   2,   3],
        [  4,   5,   6,   7],
        [  8,   9,  10,  11]],

       [[100, 100, 100, 100],
        [ 16,  17,  18,  19],
        [ 20,  21,  22,  23]]])

In [20]:
x = np.arange(24).reshape((2,3,4))
z = x[1:,:,:].copy()
z[:,0,:] = 100
x

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

Boolean indexing

In [21]:
x[x % 7 == 0]

array([ 0,  7, 14, 21])

In [22]:
np.where(x % 7 == 0)

(array([0, 0, 1, 1], dtype=int64),
 array([0, 1, 0, 2], dtype=int64),
 array([0, 3, 2, 1], dtype=int64))

In [23]:
np.logical_and(x % 7 == 0, x % 2 == 0)

array([[[ True, False, False, False],
        [False, False, False, False],
        [False, False, False, False]],

       [[False, False,  True, False],
        [False, False, False, False],
        [False, False, False, False]]])

In [24]:
x[np.logical_and(x % 7 == 0, x % 2 == 0)]

array([ 0, 14])

### **Broadcasting**

Broadcasting is what happens when `numpy` tries to perform binary operations on two arrays with different shapes. In general, shapes are *promoted* to make the arrays compatible using the following rule

- For each axis from highest to lowest
    - If both dimensions are the same, do nothing
    - If one of the dimensions is 1 or None and the other is k, promote to k
    - Otherwise print error message

In [25]:
y = np.arange(6).reshape((2,3))
y

array([[0, 1, 2],
       [3, 4, 5]])

In [26]:
x

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

In [27]:
#x + y

In [28]:
y[:,:,np.newaxis]

array([[[0],
        [1],
        [2]],

       [[3],
        [4],
        [5]]])

In [29]:
x + y[:,:,np.newaxis]

array([[[ 0,  1,  2,  3],
        [ 5,  6,  7,  8],
        [10, 11, 12, 13]],

       [[15, 16, 17, 18],
        [20, 21, 22, 23],
        [25, 26, 27, 28]]])

In [30]:
x + y[:,:,None]

array([[[ 0,  1,  2,  3],
        [ 5,  6,  7,  8],
        [10, 11, 12, 13]],

       [[15, 16, 17, 18],
        [20, 21, 22, 23],
        [25, 26, 27, 28]]])

In [31]:
a = np.arange(4)
b = np.random.randint(0,10,size=4)
print(a,b)
a[:,np.newaxis] * b[np.newaxis,:]

[0 1 2 3] [0 6 2 2]


array([[ 0,  0,  0,  0],
       [ 0,  6,  2,  2],
       [ 0, 12,  4,  4],
       [ 0, 18,  6,  6]])

### **Combining `ndarray`s**

### Binding rows when number of columns is the same

In [32]:
x = np.arange(6).reshape((2,3))
y = np.arange(15).reshape((5,3))
np.r_[x,y]

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

### Binding columns when number of rows is the same

In [33]:
x = np.arange(6).reshape((2,3))
z = np.arange(10).reshape((2,5))
w = np.arange(4).reshape((2,2))
np.c_[x,z,w]


array([[0, 1, 2, 0, 1, 2, 3, 4, 0, 1],
       [3, 4, 5, 5, 6, 7, 8, 9, 2, 3]])

### **Splitting `ndarray`s**

In [34]:
x = np.arange(6).reshape((2,3))
np.split(x,2)

[array([[0, 1, 2]]), array([[3, 4, 5]])]

In [35]:
np.split(x,3,axis=1)

[array([[0],
        [3]]),
 array([[1],
        [4]]),
 array([[2],
        [5]])]

### **Vectorization**

### A common expression

\begin{align}
y_i &= \alpha + \sum_{j=1}^p \beta_j x_{ij} \\
\end{align}

In [36]:
import timeit

In [37]:
n = 100
p = 10
alpha = np.random.randn()
beta = 1 + np.random.randn(p)
x = np.random.randn(n*p).reshape((n,p))



In [38]:
%timeit -r3 -n2 y = np.array([alpha + np.dot(beta, x[i,:]) for i in range(n)])

171 µs ± 75.6 µs per loop (mean ± std. dev. of 3 runs, 2 loops each)


In [39]:
%timeit -r3 -n2 y = alpha + x @ beta


The slowest run took 97.14 times longer than the fastest. This could mean that an intermediate result is being cached.
69.9 µs ± 94.8 µs per loop (mean ± std. dev. of 3 runs, 2 loops each)


### Numpy `vectorize`

In [40]:
def foo(a, b):
    """
    If a >= b return a + b,
    else return a - b.
    """
    if a >= b:
       return a + b
    else:
       return a - b

In [41]:
# Create a vectorized version of foo
foo_vectorized = np.vectorize(foo)
foo_vectorized(np.arange(5),3)


array([-3, -2, -1,  6,  7])

In [42]:
%timeit -r1 -n2 np.array(foo(i,3) for i in np.arange(5))

12.3 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 2 loops each)


In [43]:
%timeit -r1 -n2 foo_vectorized(np.arange(5),3)

29.2 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 2 loops each)


In [44]:
foo_vectorized(np.arange(5)[:,None],np.arange(10).reshape(5,2))

array([[ 0, -1],
       [-1, -2],
       [-2, -3],
       [-3, -4],
       [-4, -5]])

### **`numexpr`**

In [45]:
import numexpr as ne

ModuleNotFoundError: No module named 'numexpr'

In [None]:
def calcNorm(x):
    return np.sqrt(np.sum(x ** 2))

In [None]:
def calcNormNumExpr(x):
    sum_of_squares = ne.evaluate('sum(x ** 2)')
    return ne.evaluate('sqrt(sum_of_squares)')

In [None]:
%timeit -r3 -n3 calcNorm(np.random.randn(1000000))

3 loops, best of 3: 69.8 ms per loop


In [None]:
%timeit -r3 -n3 calcNormNumExpr(np.random.randn(1000000))

3 loops, best of 3: 71.4 ms per loop


### Other Numpy topics: strides, masked array, etc.

## **Pandas**



In [None]:
import pandas as pd

In [None]:
numbers_to_check = np.array([1e2,1e4,1e6])

In [None]:
df = pd.DataFrame({"upper bound":numbers_to_check, 
                   "number of primes and its estimate": list(map(lambda x: [countPrimes(x),round(x/np.log(x),2)], numbers_to_check)), 
                   "ratio": list(map(lambda x: ratio_2(x), numbers_to_check))})

In [None]:
df

Unnamed: 0,upper bound,number of primes and its estimate,ratio
0,100.0,"[25.0, 21.71]",1.151293
1,10000.0,"[1229.0, 1085.74]",1.131951
2,1000000.0,"[78498.0, 72382.41]",1.08449


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   upper bound                        3 non-null      float64
 1   number of primes and its estimate  3 non-null      object 
 2   ratio                              3 non-null      float64
dtypes: float64(2), object(1)
memory usage: 200.0+ bytes


In [None]:
df_splitted = df["number of primes and its estimate"].apply(pd.Series).rename({0 : "number of primes", 1 : "estimate"}, axis = 1)
df = pd.concat([df, df_splitted], axis = 'columns')
df.drop(columns = "number of primes and its estimate", inplace = True)


In [None]:
df

Unnamed: 0,upper bound,ratio,number of primes,estimate
0,100.0,1.151293,25.0,21.71
1,10000.0,1.131951,1229.0,1085.74
2,1000000.0,1.08449,78498.0,72382.41


In [None]:
df[ "number of primes" ] = df[ "number of primes" ].astype(int)

In [None]:
df.loc[:, df.columns.str.contains('r')]

Unnamed: 0,upper bound,ratio,number of primes
0,100.0,1.151293,25
1,10000.0,1.131951,1229
2,1000000.0,1.08449,78498


In [None]:
pd.cut(df.ratio, bins = [0,1,1.1,10])

0    (1.1, 10.0]
1    (1.1, 10.0]
2     (1.0, 1.1]
Name: ratio, dtype: category
Categories (3, interval[float64, right]): [(0.0, 1.0] < (1.0, 1.1] < (1.1, 10.0]]

In [None]:
df

Unnamed: 0,upper bound,ratio,number of primes,estimate
0,100.0,1.151293,25,21.71
1,10000.0,1.131951,1229,1085.74
2,1000000.0,1.08449,78498,72382.41


### Other Pandas topics: Mapping, grouping, aggregations, transforms, joining, etc.

## **Xarray**

In [None]:
import xarray as xr

In [None]:
np.random.seed(123)
size = 4
temperature = 15 + 10 * np.random.randn(size)
lat = np.random.uniform(low=-90, high=90, size=size)
lon = np.random.uniform(low=-180, high=180, size=size)

# round to two digits after decimal point
temperature, lat , lon = np.around([temperature, lat, lon], decimals=2)

In [None]:
df = pd.DataFrame({"temperature":temperature, "lat":lat, "lon":lon})
df

Unnamed: 0,temperature,lat,lon
0,4.14,39.5,-6.86
1,24.97,-13.84,-38.84
2,17.83,86.54,-56.46
3,-0.06,33.27,82.46


### DataArray

In [None]:
idx = pd.MultiIndex.from_arrays(arrays=[lat,lon], names=["lat","lon"])
s = pd.Series(data=temperature, index=idx)
s
# use from_series method
da = xr.DataArray.from_series(s)
da

In [None]:
da.mean(dim = ["lat"])

In [None]:
da.mean(dim = ["lat", "lon"])

In [None]:
da.sel(lat = slice(-180,0))

In [None]:
for i,la in enumerate(df.lat.unique()):
    for j,lo in enumerate(df.lon.unique()):
        if i != j:
            df.loc[len(df.index)] = [round(15 + 10 * np.random.randn(),2), la, lo] 
df

Unnamed: 0,temperature,lat,lon
0,4.14,39.5,-6.86
1,24.97,-13.84,-38.84
2,17.83,86.54,-56.46
3,-0.06,33.27,82.46
4,8.21,39.5,-38.84
5,14.05,39.5,-56.46
6,29.91,39.5,82.46
7,8.61,-13.84,-6.86
8,10.56,-13.84,-56.46
9,10.66,-13.84,82.46


In [None]:
lat = np.array(df.lat)
lon = np.array(df.lon)
temperature = np.array(df.temperature)

In [None]:
idx = pd.MultiIndex.from_arrays(arrays=[lat,lon], names=["lat","lon"])
s = pd.Series(data=temperature, index=idx)
s
# use from_series method
da = xr.DataArray.from_series(s)
da

In [None]:
da.groupby('lat').std(dim = 'lon')

### Dataset

In [None]:
da2 = xr.Dataset.from_dataframe(df)

In [None]:
da2.temperature.mean()