Apart from OpenCV, Python also provides a module __time__ which is helpful in measuring the time of execution. Another module __profile__ helps to get detailed report on the code, like how much time each function in the code took, how many times the function was called etc. But, if you are using IPython, all these features are integrated in an user-friendly manner. 

### Measuring Performance with OpenCV

__cv2.getTickCount__ function returns the number of clock-cycles after a reference event (like the moment machine was switched ON) to the moment this function is called. So if you call it before and after the function execution, you get number of clock-cycles used to execute a function.

__cv2.getTickFrequency__ function returns the frequency of clock-cycles, or the number of clock-cycles per second. So to find the time of execution in seconds, you can do following:

In [5]:
import cv2

e1 = cv2.getTickCount()
# your code execution
e2 = cv2.getTickCount()
time = (e2 - e1)/ cv2.getTickFrequency()

In [27]:
img1 = cv2.imread('messi5.jpg',0)

e1 = cv2.getTickCount()
for i in xrange(5,49,2):
    img1 = cv2.medianBlur(img1,i)
e2 = cv2.getTickCount()
t = (e2 - e1)/cv2.getTickFrequency()
print t

# You can do the same with time module. Instead of cv2.getTickCount, 
# use time.time() function. Then take the difference of two times.

0.835023033512


### Default Optimization in OpenCV

Many of the OpenCV functions are optimized using SSE2, AVX etc. It contains unoptimized code also. So if our system support these features, we should exploit them (almost all modern day processors support them). It is enabled by default while compiling. So OpenCV runs the optimized code if it is enabled, else it runs the unoptimized code. You can use __cv2.useOptimized()__ to check if it is enabled/disabled and __cv2.setUseOptimized()__ to enable/disable it. Let’s see a simple example

In [8]:
# check if optimization is enabled
cv2.useOptimized()

True

In [10]:
 %timeit res = cv2.medianBlur(img1,49)

10 loops, best of 3: 45.8 ms per loop


In [11]:
# Disable it
cv2.setUseOptimized(False)

In [12]:
cv2.useOptimized()

False

In [14]:
%timeit res = cv2.medianBlur(img1,49)

10 loops, best of 3: 102 ms per loop


See, optimized median filtering is ~2x faster than unoptimized version. If you check its source, you can see median filtering is SIMD optimized. So you can use this to enable optimization at the top of your code (remember it is enabled by default).



### Measuring Performance in IPython

Sometimes you may need to compare the performance of two similar operations. IPython gives you a magic command __%timeit__ to perform this. It runs the code several times to get more accurate results. Once again, they are suitable to measure single line codes.

For example, do you know which of the following addition operation is more better, x = 5; y = x\*\*2, x = 5; y = x\*x, x = np.uint8([5]); y = x\*x or y = np.square(x) ? We will find it with %timeit in IPython shell

In [16]:
x = 5
%timeit y = x**2

10000000 loops, best of 3: 40.3 ns per loop


In [18]:
%timeit y = x*x

10000000 loops, best of 3: 38.6 ns per loop


In [21]:
import numpy as np

z = np.uint8([5])
%timeit y=z*z

The slowest run took 24.59 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 351 ns per loop


In [22]:
%timeit y=np.square(z)

The slowest run took 27.20 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 353 ns per loop


You can see that, x = 5 ; y = x\*x is fastest and it is around 20x faster compared to Numpy. If you consider the array creation also, it may reach upto 100x faster. Cool, right? (Numpy devs are working on this issue)

Python scalar operations are faster than Numpy scalar operations. So for operations including one or two elements, Python scalar is better than Numpy arrays. Numpy takes advantage when size of array is a little bit bigger.

We will try one more example. This time, we will compare the performance of cv2.countNonZero() and np.count_nonzero() for same image.



In [28]:
%timeit z = cv2.countNonZero(img1)

The slowest run took 79.49 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 209 µs per loop


In [29]:
%timeit z = np.count_nonzero(img1)

1000 loops, best of 3: 685 µs per loop


Normally, OpenCV functions are faster than Numpy functions. So for same operation, OpenCV functions are preferred. But, there can be exceptions, especially when Numpy works with views instead of copies.


### Performance Optimization Techniques
There are several techniques and coding methods to exploit maximum performance of Python and Numpy. Only relevant ones are noted here and links are given to important sources. The main thing to be noted here is that, first try to implement the algorithm in a simple manner. Once it is working, profile it, find the bottlenecks and optimize them.

1. Avoid using loops in Python as far as possible, especially double/triple loops etc. They are inherently slow.
2. Vectorize the algorithm/code to the maximum possible extent because Numpy and OpenCV are optimized for vector operations.
3. Exploit the cache coherence.
4. Never make copies of array unless it is needed. Try to use views instead. Array copying is a costly operation.

Even after doing all these operations, if your code is still slow, or use of large loops are inevitable, use additional libraries like Cython to make it faster.