<a href="https://colab.research.google.com/github/OSGeoLabBp/tutorials/blob/master/english/python/vectorization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Vectorization

Vectorization is used to speed up your Python code. It is essential working with large data sets.

A vectorized code contains no loop written in Python, instead we make operations on compound data structures line *numpy* arrays or *pandas* data series. These modules (*numpy*, *pandas*) are written in C/C++ and the loops are more effective.

The vectorized solution is not only faster but the code is shorter (easier to maintain and debug).

Let's see some examples using non-vectorized and vectorized solution.



## Vector and scalar product

We have a vector of 10 000 float numbers and we would like to scale the elements of the vector.

In [24]:
import numpy as np
import random
import time

n = 10_000_000              # size of vector
scalar = 2.564              # scaler for the vector
vlist = [random.random() for i in range(n)]  # generating random list (non-vectorized)
vect = np.array(vlist)      # generating random vector (vectorized)

In [20]:
start_time = time.time()    # get current time
slist = []
for i in range(n):
    slist.append(vlist[i] * scalar)
print(f'Non vectorized solution for {n} items in {n}, {(time.time() - start_time):.2f} seconds')

Non vectorized solution for 10000000 items in 10000000, 1.90 seconds


In [22]:
start_time = time.time()    # get current time
s1list = [v * scalar for v in vlist]
print(f'List comprehension solution for {n} items in {(time.time() - start_time):.2f} seconds')

List comprehension solution for 10000000 items in 0.86 seconds


In [23]:
start_time = time.time()    # get current time
svect = vect * scalar
print(f'List comprehension solution for {n} items in {(time.time() - start_time):.2f} seconds')

List comprehension solution for 10000000 items in 0.05 seconds


## Find the largest value in a vector

In [25]:
start_time = time.time()    # get current time
vmax = vlist[0]
for v in vlist[1:]:
    if v > vmax: vmax = v
print(f'Max item non-vectorized {vmax} in {(time.time() - start_time):.2f} seconds')

Max item non-vectorized 0.9999999642196017 in 1.05 seconds


In [27]:
start_time = time.time()    # get current time
vmax = max(vlist)
print(f'Max item list-vectorized {vmax} in {(time.time() - start_time):.2f} seconds')

Max item list-vectorized 0.9999999642196017 in 0.14 seconds


In [28]:
start_time = time.time()    # get current time
vmax = np.max(vect)
print(f'Max item list-vectorized {vmax} in {(time.time() - start_time):.2f} seconds')

Max item list-vectorized 0.9999999642196017 in 0.01 seconds


## Find the largest absolute difference between the neighboring vector items

In [30]:
start_time = time.time()    # get current time
max_dif = abs(vlist[0] - vlist[1])
for i in range(1, n):
    dif = abs(vlist[i-1] - vlist[i])
    if dif > max_dif: max_dif = dif
print(f'Max abs difference non-vectorized {max_dif} in {(time.time() - start_time):.2f} seconds')

Max abs difference non-vectorized 0.9996819978873251 in 2.87 seconds


In [31]:
start_time = time.time()    # get current time
max_dif = np.max(np.abs(vect[:-1] - vect[1:]))
print(f'Max abs difference vectorized {max_dif} in {(time.time() - start_time):.2f} seconds')

Max abs difference vectorized 0.9996819978873251 in 0.12 seconds
