This chapter covers the use of Boolean masks to examine and manipulate values
within NumPy arrays. Masking comes up when you want to extract, modify, count, or
otherwise manipulate values in an array based on some criterion: for example, you
might wish to count all values greater than a certain value, or remove all outliers that
are above some threshold. In NumPy, Boolean masking is often the most efficient
way to accomplish these types of tasks.

In [None]:
# Example: Counting Rainy days
import numpy as np
from seattle2014 import data
# Use DataFrame operations to extract rainfall as a NumPy array
rainfall_mm = np.array(
data.seattle_weather().set_index('date')['precipitation']['2015'])
len(rainfall_mm)

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')

1. Comparison Operators as Ufuncs

In [1]:
import numpy as np
x = np.array([1, 2, 3, 4, 5])

In [2]:
x < 3

array([ True,  True, False, False, False])

In [3]:
x <= 3

array([ True,  True,  True, False, False])

In [4]:
x >= 3

array([False, False,  True,  True,  True])

In [5]:
x != 3

array([ True,  True, False,  True,  True])

In [6]:
x == 3

array([False, False,  True, False, False])

In [7]:
(2 * x) == (x ** 2)

array([False,  True, False, False, False])

In [8]:
rng = np.random.default_rng(seed=1701) # default_rng(seed=1701) creates a reproducible random number generator.
x = rng.integers(10, size=(3, 4)) # rng.integers(10, size=(3, 4)) generates random integers from 0 to 9 (exclusive of 10), in a 3×4 array.
x

array([[9, 4, 0, 3],
       [8, 6, 3, 1],
       [3, 7, 4, 0]])

In [9]:
x < 6

array([[False,  True,  True,  True],
       [False, False,  True,  True],
       [ True, False,  True,  True]])

2. Working with Boolean Arrays

In [10]:
print(x)

[[9 4 0 3]
 [8 6 3 1]
 [3 7 4 0]]


In [11]:
# To count the number of True entries in a Boolean array, np.count_nonzero is usefu
np.count_nonzero(x < 6)

8

In [12]:
# Another way to get at this information is to use np.sum; in this case, False is interpreted as 0, and True is inter‐ preted as 1
np.sum(x < 6)

np.int64(8)

In [13]:
# How many values less than 6 in each row
np.sum(x < 6, axis = 1)

array([3, 2, 3])

In [14]:
# are there any values greater than 8
np.any(x > 8)

np.True_

In [15]:
# are there any values less than zero
np.any(x < 0)

np.False_

In [16]:
# are all values less than 10
np.all(x< 10)

np.True_

In [17]:
# are all values equal to 6
np.all(x == 6)

np.False_

In [18]:
# are all values in each row is less than 8
np.all(x < 8, axis = 1)

array([False, False,  True])

2. Boolean Operators

3. Boolean Arrays as Masks

In [19]:
x

array([[9, 4, 0, 3],
       [8, 6, 3, 1],
       [3, 7, 4, 0]])

In [20]:
x < 5

array([[False,  True,  True,  True],
       [False, False,  True,  True],
       [ True, False,  True,  True]])

In [21]:
# Now, to select these values from the array, we can simply index on this Boolean array; this is known as a masking operation
x[x < 5]

array([4, 0, 3, 3, 1, 3, 4, 0])

4. Using the Keywords and/or Versus the Operators &/|

In [22]:
bool(42), bool(0)

(True, False)

In [23]:
bool(42 and 0)

False

In [24]:
bool(42 or 0)

True

In [25]:
bin(42)

'0b101010'

In [26]:
bin(59)

'0b111011'

In [27]:
bin(42 & 59)

'0b101010'

In [28]:
bin(42 | 59)

'0b111011'

In [30]:
A = np.array([1, 0, 1, 0, 1, 0], dtype=bool)
B = np.array([1, 1, 1, 0, 1, 1], dtype=bool)
A|B

array([ True,  True,  True, False,  True,  True])

In [31]:
A or B

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [34]:
x = np.arange(10) # x is an array of integers from 0 to 9
(x > 4) & (x < 8) # This creates a Boolean array where each element is True if the corresponding element in x is greater than 4 and less than 8, and False otherwise.

array([False, False, False, False, False,  True,  True,  True, False,
       False])

In [35]:
# Trying to evaluate the truth or falsehood of the entire array will give the same ValueError
(x > 4) and (x < 8)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()