# Numpy Array Basics - Boolean Selection

In [1]:
import sys
print(sys.version)
import numpy as np
print(np.__version__)

2.7.11 |Anaconda 2.2.0 (x86_64)| (default, Dec  6 2015, 18:57:58) 
[GCC 4.2.1 (Apple Inc. build 5577)]
1.9.2


In [2]:
npa = np.arange(20)

In [3]:
npa

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [4]:
[x for x in npa if x % 2 == 0]

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

In [5]:
list(filter(lambda x: x % 2 ==0, npa))

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

In [6]:
npa % 2 == 0

array([ True, False,  True, False,  True, False,  True, False,  True,
       False,  True, False,  True, False,  True, False,  True, False,
        True, False], dtype=bool)

It’s an interesting notation but what the result of what we're getting isn't really so different. 

We're basically just getting the boolean value of the result of each value in the array.
so how might we complete the filter? Easy, we just treat it like a dictionary and query our original array for those values that are true.

In [8]:
filter(lambda x: x % 2 ==0, npa)

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

In [9]:
npa[npa % 2 == 0]

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

Now you might ask yourself why things are done this way and now we’re starting going to get into the efficiency of the operation. And for datasets of reasonable size this is typically orders of magnitude. Let me show you very quickly before we move on to different boolean selections.

In [10]:
np2 = np.arange(20000)

In [11]:
%timeit [x for x in np2 if x % 2 == 0]

100 loops, best of 3: 4.5 ms per loop


In [12]:

%timeit np2[np2 % 2 == 0]

1000 loops, best of 3: 283 µs per loop


We can see that it is orders of magnitude faster than our original list comprehension.

In [13]:
npa

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

Now here’s an exercise, try to do the same thing but get all numbers from that array that are greater than 10. Go ahead and pause and try it out.

In [14]:
npa[npa > 10]

array([11, 12, 13, 14, 15, 16, 17, 18, 19])

on a final note, for boolean selection to occur, you just have to pass in a list with the same length as the original list and has boolean values.

In [16]:

np3 = np.array([True for x in range(20)])

In [17]:
npa[np3]

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])