# Masking invalid entries in numpy

## Boolean masks

To select certain parts of an array, or to get rid of NaN values in your arrays, we can use boolean masks. A boolean array is just an array of True or False values, and we can use this to select out certain values from an array.

In [1]:
import numpy 

We'll make ourselves two test arrays, which contain NaN (Not a Number) values:

In [2]:
testarr = numpy.array([1,2,3,4,numpy.nan,6,7])

In [3]:
testarr #Print out the array

array([ 1.,  2.,  3.,  4., nan,  6.,  7.])

In [4]:
testarr2 = numpy.array([2,4,6,numpy.nan,10,12,14])

In [5]:
testarr2 #Print out the array

array([ 2.,  4.,  6., nan, 10., 12., 14.])

### Creating boolean masks
We can create boolean masks of True and False values, for example selecting out values in testarr which are greater than 3, in the following way. 

nB you might get a warning message saying 'invalid value encountered in greater', don't worry about this, it's just because of the NaN value in there - as it's not a number, there's no sense in asking python whether it's greater or less than 3. It's like if I asked you if the letter 'a' was greater or less than 3, it doesn't make sense. The mask is still created, as you can see by running the next cell. 

In [6]:
mask = (testarr > 3)

  """Entry point for launching an IPython kernel.


We can print out this 'mask' array - it's a list of True and False values, with True for the elements within testarr which meet our condition of being > 3, and False for those that don't. False is always returned for any NaN values, whether you set your condition to be either greater or less than a value.

In [7]:
print(mask)

[False False False  True False  True  True]


### Applying boolean masks

So we now have a boolean mask - what next?

Boolean masks are really useful for selecting out certain parts of an array - the way we do this is by using square brackets like in the cell below - i.e. array[mask] .

Run the next cell, and you'll see that our mask has worked, only the values within testarr which are >3 have been returned.

In [8]:
testarr[mask]

array([4., 6., 7.])

We can apply the same mask to different arrays, which will come in handy for you when you have flux and time arrays from your data - if you have NaN values in your flux array, then you want to remove those elements, as well as the corresponding elements in the time array, or you'll end up in a mess where you have time values with no assigned flux value. 


We do this in exactly the same fashion as before, and we can see that it has selected the elements out of testarr2 which match the >3 values in testarr, i.e. the 4th, 6th and 7th element in the array.

In [9]:
testarr2[mask]

array([nan, 12., 14.])

## Getting rid of NaN values

To get rid of NaN values in your array, numpy provides some handy functions:

$\textbf{numpy.isnan()}$ - If you give this function either a value, or an array, it will tell you which elements are NaN. 

For example:

In [10]:
print(numpy.isnan(3)) #returns False
print(numpy.isnan(numpy.nan)) #returns True
print(numpy.isnan([1,2,numpy.nan, 3, numpy.nan, 4])) #returns an array of True and False

False
True
[False False  True False  True False]


$\textbf{numpy.isfinite()}$ - This function basically does the opposite, but it'll also catch any infinite values. So for any integer or float values in your array, it'll return True, and for any NaN or infinite values, it'll return False.

For example:

In [11]:
print(numpy.isfinite(4)) #returns True
print(numpy.isfinite(numpy.inf))#returns False
print(numpy.isfinite(numpy.nan)) #returns False
print(numpy.isfinite([1,2,numpy.nan, 3, numpy.inf, 5])) #returns an array of True and False

True
False
False
[ True  True False  True False  True]


## Combining different masks

Masks can be combined in a few different ways, using the AND operator (&), the OR operator (|) and the bitwise inverse operator (~), they works as follows:

The AND operator checks if a value is True in both arrays:
 - If an element is True in one array, and the corresponding element in another array is also True, then the AND operator will return True.
 - If an element was True in one array, but the corresponding element in another array was False, then this operator would return False. 
 - If the elements in both arrays were False, then the operator will return False. 
 
 e.g. [True, False, False] & [True, True, False] = [True, False, False]
 
------------------------------------------------------------------

The OR operator will return True if the element in either array is True:

e.g. [True, False, False] | [True, True, False] = [True, True, False]

--------------------------------------------------------------------

The bitwise inverse operator goes through each element and swaps it to the inverse - i.e. any True values become False, and any False values become True:

e.g. ~[True, False, False] = [False, True, True]. 

This is handy for removing NaNs if you wanted to use numpy.isnan.

------------------------------------------------------------------

So, if we wanted to remove NaNs from both testarr and testarr2, we could do this in two ways:
    
First, we'll use np.isfinite:

In [12]:
# Create a mask for each array
mask_testarr = numpy.isfinite(testarr)
mask_testarr2 = numpy.isfinite(testarr2)

#Combine the masks - we want to use AND as we only want the values which are finite in both testarr and testarr2
doublemask = mask_testarr & mask_testarr2

#Print it out to check it looks right, and then test it out on the test arrays:
print(doublemask)
print(testarr[doublemask])
print(testarr2[doublemask])

[ True  True  True False False  True  True]
[1. 2. 3. 6. 7.]
[ 2.  4.  6. 12. 14.]


It worked! Now we only have the values for which the element in testarr $\textbf{and}$ the element in testarr2 are finite (i.e. not NaN).

We could also use np.isnan to select out the values which are NaN, and then the bitwise inverse to flip the mask and select out the not-NaN values:

In [13]:
# Create masks which select out the values which are NaN
mask_testarr= numpy.isnan(testarr)
mask_testarr2 = numpy.isnan(testarr2)

# Combine them - this time we want to use the OR operator, so we have True values 
# for if the element is NaN in either array:
doublemask = mask_testarr | mask_testarr2
print(doublemask) #This has True values for the positions which are NaN in either array

#Now we want to invert it:
doublemask = ~doublemask
print(doublemask) ##This should look the same as the doublemask we had before

# And apply it to the test arrays to check it works:
print(testarr[doublemask])
print(testarr2[doublemask])

[False False False  True  True False False]
[ True  True  True False False  True  True]
[1. 2. 3. 6. 7.]
[ 2.  4.  6. 12. 14.]


That works too! As a final note, we can create the masks in a bit of a neater way, using fewer lines of code, like this: 

In [14]:
# Using numpy.isfinite:
nicemask = numpy.isfinite(testarr) & numpy.isfinite(testarr2)
print(nicemask)
print(testarr[nicemask])
print(testarr2[nicemask])

[ True  True  True False False  True  True]
[1. 2. 3. 6. 7.]
[ 2.  4.  6. 12. 14.]


In [15]:
## Using numpy.isnan:
nicemask = ~numpy.isnan(testarr) & ~numpy.isnan(testarr2)
#Note that here we invert before we combine the masks, so we use the AND operator. 
print(nicemask)
print(testarr[nicemask])
print(testarr2[nicemask])

[ True  True  True False False  True  True]
[1. 2. 3. 6. 7.]
[ 2.  4.  6. 12. 14.]


You can use either numpy.isnan or numpy.isfinite for your work - personally I'd recommend numpy.isfinite, just because it's less confusing as you don't have to worry about inverting the array or whether you want to use AND or OR, it just selects the ones you do want, and you'll always use AND.