# 4.2 Expressing Conditional Logic as Array Operations

In [1]:
import numpy as np

The `numpy.where` function is a vectorized version of the ternary expression `x if condition else y`.

## 4.2.1 `np.where`

Suppose we had a boolean array and two arrays of values.

In [2]:
xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
cond = np.array([True, False, True, True, False])

Let's say we wanted to take a value from `xarr` whenever the corresponding value in `cond` is `True`, and otherwise take the value from `yarr`. A list comprehension for this would look like:

In [3]:
result = [(x if c else y) for x, y, c in zip(xarr, yarr, cond)]
print(result)

[np.float64(1.1), np.float64(2.2), np.float64(1.3), np.float64(1.4), np.float64(2.5)]


This has several problems. First, it will not be very fast for large arrays because all the work is being done in interpreted Python. Second, it will not work with multidimensional arrays. With `np.where` you can write this very concisely:

In [4]:
result = np.where(cond, xarr, yarr)
print(result)

[1.1 2.2 1.3 1.4 2.5]


## 4.2.2 Applications of `np.where`

A typical use of `where` in data analysis is to produce a new array of values based on another array. Suppose you had a matrix of randomly generated data and you wanted to replace all positive values with 2 and all negative values with â€“2. This is very easy to do with `np.where`.

In [5]:
rng = np.random.default_rng(seed=12345)
arr = rng.standard_normal((4, 4))
print(f"Original array:\n{arr}")

Original array:
[[-1.42382504  1.26372846 -0.87066174 -0.25917323]
 [-0.07534331 -0.74088465 -1.3677927   0.6488928 ]
 [ 0.36105811 -1.95286306  2.34740965  0.96849691]
 [-0.75938718  0.90219827 -0.46695317 -0.06068952]]


In [6]:
# Replace positive values with 2 and negative values with -2
result = np.where(arr > 0, 2, -2)
print(f"Result:\n{result}")

Result:
[[-2  2 -2 -2]
 [-2 -2 -2  2]
 [ 2 -2  2  2]
 [-2  2 -2 -2]]


You can combine scalars and arrays when using `np.where`. For example, I can replace all positive values in `arr` with the constant 2, like so:

In [7]:
result = np.where(arr > 0, 2, arr) # set only positive values to 2
print(f"Result:\n{result}")


Result:
[[-1.42382504  2.         -0.87066174 -0.25917323]
 [-0.07534331 -0.74088465 -1.3677927   2.        ]
 [ 2.         -1.95286306  2.          2.        ]
 [-0.75938718  2.         -0.46695317 -0.06068952]]
