## Comparison, Masking and Boolean Logic

This notebooks explains how we can use NumPy to manipulate array values based on a criterion, we'll talk about the following:

1. How to use comparison operators as element-wise universal functions.
2. How to use boolean arrays with composite conditions.
3. How to use boolean arrays as masks to extract the required values.

In [1]:
import numpy as np

In [2]:
##defining an array
arr = np.array([10,20,30,40,50])

arr >= 20

array([False,  True,  True,  True,  True])

In [3]:
arr == 30

array([False, False,  True, False, False])

In [4]:
##equivalent ufunc for the comparison operator
np.greater(arr, 30)

array([False, False, False,  True,  True])

### Boolean Arrays

In [6]:
##creating a 2D array of random integers with values < 10
arr = np.random.randint(10, size=(3,4))
arr

array([[8, 5, 9, 2],
       [1, 1, 3, 9],
       [4, 7, 4, 4]], dtype=int32)

In [7]:
##counting number of values that are greater than 5 using np.sum()
np.sum(arr > 5)

np.int64(4)

In [8]:
##counting number of values that are greater than 5 in each row
np.sum(arr > 5, axis=1)

array([2, 1, 1])

In [9]:
##checking if any or all value are above or below a threshold
np.any(arr > 5)

np.True_

In [10]:
##check if all values are > 5
np.all(arr > 5)

np.False_

## Boolean Operators to handle multiple conditions

In [11]:
##math test scores of students in a class
test_scores = np.random.randint(100, size=(30))
test_scores

array([54, 50, 98, 81, 72,  5, 67, 48, 43, 72,  3, 86, 72, 15, 92, 38, 70,
       12, 75,  4, 69, 82, 91, 97, 92, 20, 87, 45, 55,  2], dtype=int32)

In [15]:
##number of students who scored more than 50 but less than 75

np.sum((test_scores > 50) & (test_scores < 75))

np.int64(8)

In [16]:
##extracting all the indices which meet the condition
np.where((test_scores > 50) & (test_scores < 75))

(array([ 0,  4,  6,  9, 12, 16, 20, 28]),)

## Masking

Using Boolean arrays as masks

In [17]:
##Extracting all the elements(values) that adhere to the mask
mask = test_scores > 75

test_scores[mask]

array([98, 81, 86, 92, 82, 91, 97, 92, 87], dtype=int32)