## Expressing Conditional Logic as Array Operations

The numpy.where function is a vectorized version of the ternary expression x if condition else y. Suppose we had a boolean array and two arrays of values:

In [1]:
import numpy as np

In [15]:
xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])

yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])

condtion = np.array([True, False, True, True, False])

Suppose we wanted to take a value from xarr whenever the corresponding value in condtion is True otherwise take the value from yarr. A list comprehension doing this might look like:


In [16]:
result = [(x if c else y) for x, y , c in zip (xarr, yarr, condtion)]

result

[1.1, 2.2, 1.3, 1.4, 2.5]

This has multiple problems. First, it will not be very fast for large arrays (because all the work is being done in pure Python). Secondly, it will not work with multidimensional arrays. With np.where you can write this very concisely:

In [21]:
# np.where(if condtion is true, give xarr, if false give yarr)
result = np.where(condtion, xarr, yarr)

result

array([1.1, 2.2, 1.3, 1.4, 2.5])

The second and third arguments to np.where don’t need to be arrays; one or both of them can be scalars. A typical use of where in data analysis is to produce a new array of values based on another array. Suppose you had a matrix of randomly generated data and you wanted to replace all positive values with 2 and all negative values with -2. This is very easy to do with np.where:

In [24]:
arr = np.random.randn(4, 4)

arr

array([[ 1.13860979,  1.08433815, -0.67610902,  0.80159305],
       [ 0.98610134, -0.15544366,  1.73097004,  0.46716979],
       [-1.2297467 ,  0.14613143, -0.13589708,  0.73965765],
       [-1.4655826 , -0.87721587,  0.09262178, -0.40033742]])

In [26]:
# By using np.where function, calculating that where value of arry is greater then 0, put 2 there else put -2

np.where(arr > 0, 2, -2)

array([[ 2,  2, -2,  2],
       [ 2, -2,  2,  2],
       [-2,  2, -2,  2],
       [-2, -2,  2, -2]])

> The arrays passed to where can be more than just equal sizes array or scalers.

With some cleverness you can use where to express more complicated logic; consider this example where I have two boolean arrays, cond1 and cond2, and wish to assign a different value for each of the 4 possible pairs of boolean values:

In [52]:
result = []
n = 6

cond1 = np.array([True, True, True, False, True, False])
cond2 = np.array([False, True, False, False, True, True])

for i in range(n,2):
    if cond1 and cond2[i]:
        result.append(0)
    elif cond1[i]:
        result.append(1)
    elif cond2[i]:
        result.append(2)
    else:
        result.append(3)

In [53]:
result

[]

> While perhaps not immediately obvious, this for loop can be converted into a nested where expression:

In [54]:
np.where(cond1 & cond2, 0, np.where(cond1, 1, np.where(cond2, 2,3)))

# Where cond1 and 2 are true put 0 there, where cond1 is true put 1 there, and where cond2 is true put 2 there else put 3

array([1, 0, 1, 3, 0, 2])

In this particular example, we can also take advantage of the fact that boolean values are treated as 0 or 1 in calculations, so this could alternatively be expressed (though a bit more cryptically) as an arithmetic operation:

In [57]:
result = 1 * cond1 + 2 * cond2 + 3 * ~(cond1 | cond2)

In [58]:
result

array([1, 3, 1, 3, 3, 2])

In [59]:
cond1, cond2

(array([ True,  True,  True, False,  True, False]),
 array([False,  True, False, False,  True,  True]))