Since a large number of operations are being dealed with every second, it is mandatory that your code is not only providing the correct solution, but also in a fastest manner. 

In [2]:
import numpy as np
import cv2

## measuring performance with OpenCV

**cv2.getTickCount** function returns the number of clock-cycles after a reference event to the moment this function is called. 

**cv2.getTickFrequencey** function returns the frequency of clock-cycles or the number of clock cycles per second. so, to get the enture time of execution :- just divide the time with the frequency. 

In [3]:
e1 = cv2.getTickCount()
e2 = cv2.getTickCount()
time = (e2 - e1)/cv2.getTickFrequency()

In [4]:
time

5.8542e-05

## Default optimization in OpenCV

many of the opencv functions are opptimized using see2, avx etc. it contaisns unoptimized code alsso. so if our suste, supports these features, we should explot them. it is enabled by default while conmpiling. so opencv runs the optimized code if it is enabled, else it runs the unoptimised code. 

you can use cv2.useOptimized() to check if it is enabled or disabled and use cv2.setUseOptimized() to enable or disable it. 

In [8]:
cv2.useOptimized()

True

In [9]:
img = cv2.imread('./sample_imgs/rasenshuriken.jpeg')

In [11]:
%timeit res = cv2.medianBlur(img,49)

2.24 ms ± 5.67 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [12]:
cv2.setUseOptimized(False)


In [13]:
cv2.useOptimized()

False

In [14]:
%timeit res = cv2.medianBlur(img,49)

2.26 ms ± 24.2 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


median filtering is SIMD optimied. so, we can use this to enable optimization on top of our code. 

## measuring performance in python

Ipython gives a magic command *%timeit* to perform this. it runs the code several times to get more accurate resutls. they are suitable to measure single line codes. 
havint a modular code helps with this, as we can just call a function to measure its performance




In [15]:
x = 5

In [16]:
%timeit y = x**2

15.7 ns ± 0.0271 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)


In [18]:
%timeit y = x*x

8.86 ns ± 0.0147 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)


In [19]:
z = np.uint([5])


In [20]:
%timeit y = z*z

229 ns ± 1.63 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [22]:
%timeit y = np.square(z)

208 ns ± 2.36 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [26]:
## works only on grayscale images
#%timeit z = cv2.countNonZero(img)


In [24]:
%timeit z = np.count_nonzero(img)

4.47 μs ± 22.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


## performance optimization techniques

- avoid using loops in python as far as possible, they are inherently slow

- vectorize the algorithm/code to the maximum possible extent because numpy and openc are optimized for vector operations

- exploit the cache coherence

- never make copies of array unless it is needed. try to use views instead, array copying is a costly operation. 



https://wiki.python.org/moin/PythonSpeed/PerformanceTips

https://scipy-lectures.org/advanced/advanced_numpy/index.html#advanced-numpy

https://pynash.org/lander