### Boolean Masking

`Boolean Masking` means selecting values from an array based on a condition that returns True or False for each element.

In [2]:
import numpy as np

#### Boolean Masking with 1D Arrays

In [3]:
# create an array
arr = np.array([10, 20, 30, 40, 50])

#Apply a condition to the array
mask = arr > 25 # This will create a boolean mask where each element is True if the condition is met, otherwise False.

# Use the mask to filter
result = arr[mask]
result

array([30, 40, 50])

In [7]:
arr = np.arange(1, 11)
print(arr)

mask = arr > 5
result = arr[mask]
print(result)
print(mask)

[ 1  2  3  4  5  6  7  8  9 10]
[ 6  7  8  9 10]
[False False False False False  True  True  True  True  True]


#### Boolean Masking with 2D Arrays

In [6]:
arr2D = np.arange(1, 10).reshape(3, 3)
arr2D

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [9]:
# Get all elements greater than 5
mask2D = arr2D > 5
# Use the mask to filter
result2D = arr2D[mask2D]
print(result2D)
print(mask2D)

[6 7 8 9]
[[False False False]
 [False False  True]
 [ True  True  True]]


#### Common Use Cases

- Filtering arrays
- Replacing values
- Working with missing data in pandas
- Appyling conditions to images, datasets

In [10]:
arr

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [11]:
#replacing values
arr[arr < 7] = 0
arr

array([ 0,  0,  0,  0,  0,  0,  7,  8,  9, 10])

### Comparison Operators as ufuncs

In Computation on NumPy Arrays: Universal Functions we introduced ufuncs, and focused in particular on arithmetic operators. 

We saw that using +, -, *, /, and others on arrays leads to element-wise operations.

NumPy also implements `comparison operators` such as < (less than) and > (greater than) as element-wise ufuncs.

The result of these comparison operators is always an array with a Boolean data type.

In [12]:
x = np.arange(1, 6)
x

array([1, 2, 3, 4, 5])

In [13]:
x > 3 # This will return a boolean array indicating which elements are greater than 3

array([False, False, False,  True,  True])

In [14]:
x <= 3  # This will return a boolean array indicating which elements are greater than or equal to 3

array([ True,  True,  True, False, False])

In [None]:
x >= 3 # This will return a boolean array indicating which elements are greater than or equal to 3

array([False, False,  True,  True,  True])

In [15]:
x != 3 # This will return a boolean array indicating which elements are not equal to 3

array([ True,  True, False,  True,  True])

In [16]:
x == 3 # This will return a boolean array indicating which elements are equal to 3

array([False, False,  True, False, False])

It is also possible to do an element-wise comparison of two arrays, and to include compound expressions:

In [15]:
x

array([1, 2, 3, 4, 5])

In [17]:
2 * x # This will return an array with each element multiplied by 2

array([ 2,  4,  6,  8, 10])

In [18]:
x ** 2  # This will return an array with each element squared

array([ 1,  4,  9, 16, 25])

In [21]:
(2 * x)  == (x ** 2)

array([False,  True, False, False, False])

As in the case of arithmetic operators, the comparison operators are implemented as ufuncs in NumPy; for example, when you write x < 3, internally NumPy uses np.less(x, 3).

#### summary of the comparison operators and their equivalent ufunc

| Operator | Equivalent ufunc     |       | Operator | Equivalent ufunc       |
|----------|----------------------|-------|----------|------------------------|
| `==`     | `np.equal`           |       | `!=`     | `np.not_equal`         |
| `<`      | `np.less`            |       | `<=`     | `np.less_equal`        |
| `>`      | `np.greater`         |       | `>=`     | `np.greater_equal`     |

Just as in the case of arithmetic ufuncs, these will work on arrays of any size and shape. Here is a two-dimensional example:

In [24]:
import random

In [38]:
np.random.seed(42)
x = np.random.randint(1, 13, size=12).reshape(3, 4)
print(x)  # Generate 12 random integers between 1 and 12

[[ 7  4 11  8]
 [ 5  7 10  3]
 [ 7 11 11  8]]


In [39]:
x > 6

array([[ True, False,  True,  True],
       [False,  True,  True, False],
       [ True,  True,  True,  True]])

#### Working with Boolean Arrays

In [40]:
print(x)

[[ 7  4 11  8]
 [ 5  7 10  3]
 [ 7 11 11  8]]


#### Counting entries

To count the number of True entries in a Boolean array, we use `np.count_nonzero` 

In [42]:
# how many values less than 6?
np.count_nonzero(x < 7)

np.int64(3)

In [43]:
x

array([[ 7,  4, 11,  8],
       [ 5,  7, 10,  3],
       [ 7, 11, 11,  8]], dtype=int32)

Another way to get at this information is to use `np.sum`; in this case, False is interpreted as 0, and True is interpreted as 1

In [44]:
np.sum(x < 7)

np.int64(3)

This summation can be done along rows or columns as well:

In [45]:
x

array([[ 7,  4, 11,  8],
       [ 5,  7, 10,  3],
       [ 7, 11, 11,  8]], dtype=int32)

In [46]:
# how many values less than 6 in each row?
np.sum(x < 6, axis=1)

array([1, 2, 0])

In [47]:
# how many values less than 6 in each column?
np.sum(x < 6, axis=0)

array([1, 1, 0, 1])

If we’re interested in quickly checking whether any or all the values are true, we can use `np.any`

In [48]:
x

array([[ 7,  4, 11,  8],
       [ 5,  7, 10,  3],
       [ 7, 11, 11,  8]], dtype=int32)

In [49]:
# are there any values greater than 8?
np.any(x > 8)

np.True_

In [50]:
# are there any values less than zero?
np.any(x < 0)

np.False_

In [51]:
x

array([[ 7,  4, 11,  8],
       [ 5,  7, 10,  3],
       [ 7, 11, 11,  8]], dtype=int32)

In [52]:
# are all values less than 10?
np.all(x < 10)

np.False_

In [48]:
# are all values equal to 6?
np.all(x == 6)

np.False_

np.all and np.any can be used along particular axes as well.

In [53]:
x

array([[ 7,  4, 11,  8],
       [ 5,  7, 10,  3],
       [ 7, 11, 11,  8]], dtype=int32)

In [54]:
# are all values in each row less than 8?
np.all(x < 8, axis=1)

array([False, False, False])

#### Boolean Operators

`Boolean operators` are used to perform logical operations on values. 

They work with Boolean values (True and False) and are often used in conditional statements and masking operations.

In [55]:
y = np.array([10, 15, 20, 25, 30])
y

array([10, 15, 20, 25, 30])

In [56]:
# apply boolean conditions
y > 15

array([False, False,  True,  True,  True])

In [59]:
# condition 2
y < 30

array([ True,  True,  True,  True, False])

In [57]:
y

array([10, 15, 20, 25, 30])

#### 1. AND (&)

- Returns True if both conditions are True.
- For NumPy arrays, use & (with parentheses around conditions).

In [58]:
# Use Boolean Operators
# Select values greater than 15 AND less than 30
(y > 15) & (y < 30)
y[(y > 15) & (y < 30)]

array([20, 25])

#### 2. OR (|) — Select values less than 15 OR greater than 25

In [62]:
(y < 15) | (y > 25)

# y[(y < 15) | (y > 25)]

array([ True, False, False, False,  True])

####  3. NOT (~) — Invert the condition y > 20

In [60]:
y

array([10, 15, 20, 25, 30])

In [59]:
~(y > 20)

y[~(y > 20)]

array([10, 15, 20])

#### Bitwise Boolean operators and their equivalent ufuncs

| Operator | Equivalent ufunc     |         | Operator | Equivalent ufunc      |
|----------|----------------------|---------|----------|------------------------|
| `&`      | `np.bitwise_and`     |         | `|`      | `np.bitwise_or`       |
| `^`      | `np.bitwise_xor`     |         | `~`      | `np.bitwise_not`      |


### Boolean Arrays as Masks

We can use Boolean arrays as masks, to select particular subsets of the data themselves.

In [64]:
x

array([[ 7,  4, 11,  8],
       [ 5,  7, 10,  3],
       [ 7, 11, 11,  8]], dtype=int32)

In [65]:
# using a Boolean array for this condition
x > 5

array([[ True, False,  True,  True],
       [False,  True,  True, False],
       [ True,  True,  True,  True]])

In [67]:
x

array([[ 7,  4, 11,  8],
       [ 5,  7, 10,  3],
       [ 7, 11, 11,  8]], dtype=int32)

Now to select these values from the array, we can simply index on this Boolean array; this is known as a masking operation

In [66]:
x[x > 5]  # This will return an array with elements greater than 5

array([ 7, 11,  8,  7, 10,  7, 11, 11,  8], dtype=int32)

### Fancy Indexing

Fancy indexing is a way to access elements from a NumPy array using integer arrays or `lists of indices`, instead of just using slices (:) or single numbers.

It’s `fancier` than regular indexing because it lets you select multiple arbitrary elements at once.

#### Indexing with a List of Indices

In [68]:
F = np.random.randint(1, 50, size=(5,))
F

array([36, 40, 24,  3, 22], dtype=int32)

In [71]:
# select specific elements at index 0, 2 and 4
# F[[0, 2, 4]] 
F[[0, 2]]

array([36, 24], dtype=int32)

#### Fancy Indexing with 2D Arrays

In [72]:
F2 = np.arange(1, 13).reshape(3, 4)
F2

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [73]:
# Select elements at positions (0,0), (1,1), (2,2)
# F2[[0,1,2], [0,1,2]]
F2[[0, 1, 2], [1, 2, 3]]

array([ 2,  7, 12])

In [78]:
F2[:, [1, 2]]

array([[ 2,  3],
       [ 6,  7],
       [10, 11]])

#### Fancy Indexing (less clean for blocks)

In [79]:
F2

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [86]:
F2[1:, [1, 2]]

array([[ 6,  7],
       [10, 11]])

In [88]:
# F2[1:3,0:2]
row = np.array([1,2])
col = np.array([0,1])
F2[np.ix_(row, col)] 

array([[ 5,  6],
       [ 9, 10]])

`np.ix_()` creates a mesh grid for selecting rows and columns

In [89]:
x

array([[ 7,  4, 11,  8],
       [ 5,  7, 10,  3],
       [ 7, 11, 11,  8]], dtype=int32)

In [104]:
x[[0, 1], [0, 3]]

array([7, 3], dtype=int32)

In [97]:
row = np.array([0, 1, 2])
col = np.array([0, 1, 2])
F2[row, col]

array([ 1,  6, 11])

The pairing of indices in fancy indexing follows all the broadcasting rules that were mentioned in `Computation on Arrays: Broadcasting`.

So, for example, if we combine a column vector and a row vector within the indices, we get a two-dimensional result:

In [98]:
F2[row[:, np.newaxis], col]  # This will return a 2D array with the specified indices

array([[ 1,  2,  3],
       [ 5,  6,  7],
       [ 9, 10, 11]])

Here, each row value is matched with each column vector,

### Combined Indexing

Fancy indexing can be combined with the other indexing schemes

In [105]:
print(F2)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


#### We can combine fancy and simple indices:

In [106]:
F2[2, [3, 1, 0]]

array([12, 10,  9])

In [107]:
F2

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [108]:
F2[1:, [0, 1]]

array([[ 5,  6],
       [ 9, 10]])

In [120]:
F2

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

#### Modifying Values with Fancy Indexing

Just as fancy indexing can be used to access parts of an array, it can also be used to modify parts of an array.

For example, imagine we have an array of indices and we’d like to set the corresponding items in an array to some value:

In [114]:
x = np.arange(10)
i = np.array([2, 1, 8, 4])
x[i] = 99
print(x)

[ 0 99 99  3 99  5  6  7 99  9]


### Sorting Arrays

NumPy provides several ways to sort arrays, whether 1D, 2D, or more. Sorting can be done in-place (modifies the original) or out-of-place (returns a sorted copy).

#### 1. Sorting a 1D Array

In [115]:
arr = np.random.randint(1, 10, size = (9,))
arr

array([5, 2, 8, 6, 2, 5, 1, 6, 9], dtype=int32)

In [116]:
# sorting a 1D array
sorted_arr = np.sort(arr)
sorted_arr

array([1, 2, 2, 5, 5, 6, 6, 8, 9], dtype=int32)

- `np.sort()` does not modify the original array.
- Use `arr.sort()` if you want to sort in-place.

#### 2. Sorting a 2D Array
By Default: `np.sort()` sorts each row (axis=1)

In [117]:
x = np.array([[8, 2, 5],
              [1, 9, 3]])

np.sort(x)

array([[2, 5, 8],
       [1, 3, 9]])

#### 3. Sorting by Columns or Rows
You can control the axis with axis=:
- axis=1: sort each row
- axis=0: sort each column

In [119]:
x

array([[8, 2, 5],
       [1, 9, 3]])

In [118]:
# Sort each column (axis=0)
np.sort(x, axis=0)

array([[1, 2, 3],
       [8, 9, 5]])

#### 4. Get the Sorted Indices – np.argsort()
Sometimes you want the indices that would sort the array:

In [120]:
arr = np.array([40, 10, 20])
indices = np.argsort(arr)
print(indices)

[1 2 0]


You can use these indices to reorder another array:

In [121]:
arr[indices]

array([10, 20, 40])

#### 5. Sort in Descending Order
NumPy doesn't have a built-in descending sort, but you can reverse the result:

In [122]:
arr = np.array([4, 1, 5])
sorted_desc = np.sort(arr)[::-1]
print(sorted_desc)

[5 4 1]
