[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/alikn/coding_for_analytics/blob/main/numpy/2_numpy_part_2.ipynb)

The NumPy notebooks are heavily based on the following resources:
- [*Python Data Science Handbook*](https://jakevdp.github.io/PythonDataScienceHandbook/), Jake VanderPlas
- [*NumPy Illustrated: The Visual Guide to NumPy*](https://betterprogramming.pub/numpy-illustrated-the-visual-guide-to-numpy-3b1d4976de1d), Lev Maximov


In [2]:
import numpy as np

# Broadcasting
NumPy makes operations on arrays of the same size easy. These binary operations are performed on an element-by-element basis:

In [3]:
a = np.array([0, 1, 2])
b = np.array([5, 5, 5])
a + b

array([5, 6, 7])

But, many operations we are interested in are not between arrays of the same size. E.g.  adding a number to all elements of an array. NumPy makes that easy too.

In [4]:
a + 4

array([4, 5, 6])

One way to think about this is when the shapes do not match (in the above example number 4 is 0D), the smaller element is stretched (4 is converted to [4, 4, 4]) and then operation is performed.

Here is another example between a one dimensional and a two dimensional arrays.

In [5]:
M = np.ones((3, 3))
M

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [6]:
M + a

array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

While broadcasting can involve more complicated cases, we are mostly interested in the simpler ones.

![](https://jakevdp.github.io/PythonDataScienceHandbook/figures/02.05-broadcasting.png)

### Exercise
Create a np array of your grades in all your courses this semester out of 100 and name it *my_grades*. Calculate the average of your grades (you can use np.mean) and then center your grades around the mean (it means subtract the average from each of your grades).

In [None]:
# Add your code here

# Comparisons, masks, and boolean logic

Up to this point we have seen how NumPy helps us with two of the main three data manipulation patterns: transformation (through the blazing fast universal functions) and aggregation (through aggregation functions). Masking helps us with the third one: filtering.

## Comparison operators as ufuncs
We saw the use of ufuncs for arithmetic operators. We saw that using +, -, *, /, and others on arrays leads to element-wise operations. NumPy also implements comparison operators such as < (less than) and > (greater than) as element-wise ufuncs. The result of these comparison operators is always an array with a Boolean data type. All six of the standard comparison operations are available:

In [7]:
x = np.array([1, 2, 3, 4, 5])

In [8]:
x < 3  # less than

array([ True,  True, False, False, False])

In [9]:
x > 3  # greater than

array([False, False, False,  True,  True])

In [10]:
x <= 3  # less than or equal

array([ True,  True,  True, False, False])

In [11]:
x >= 3  # greater than or equal

array([False, False,  True,  True,  True])

In [12]:
x != 3  # not equal

array([ True,  True, False,  True,  True])

In [13]:
x == 3  # equal

array([False, False,  True, False, False])

## Counting elements with a certain criterion

In Python `False` is interpreted as `0` and `True` is interpreted as `1`. This combined with boolean arrays resulting from NumPy comparison operations on arrays allows us to count the elements in an array that fit a certain criterion.

In [14]:
# How many values in x are higher than 3
np.sum(x > 3)

2

### Exercise
Find out how many of your grades in *my_grades* are above 80.

In [None]:
# Add your code here

If rather than counting, we are interested in whether any or all values are true, we can use `np.any` and `np.all` NumPy functions.

In [15]:
# are there any values greater than 8?
np.any(x > 8)

False

In [16]:
# are there any values less than two?
np.any(x < 2)

True

## Boolean array operators
What if we want to find elements in an array that fit more than one the conditions, e.g. all grades in your *my_grades* array which are in a certain range? This is accomplished through bitwise logic operators: `&`, `|`, `^`, and `~`.  Like with the standard arithmetic operators, NumPy overloads these as ufuncs which work element-wise on (usually Boolean) arrays.

|Operator| Description|
|---|---|
|&| Bitwise and|
|\|| Bitwise or|
|^| Bitwise xor|
|~| Bitwise not|

Note that these operators work on NumPy arrays of boolean. The boolean operators we saw earlier (`and`, `or`, and `not`) work on boolean values (not arrays) and cannot be used here.

In [21]:
rain_in_inch = np.array([0.5, 0.6, 1.5, 0, 0, 0, 1, 0, 0, 0, 2, 1, 3, 0, 0])

no_rain = rain_in_inch == 0
heavy_rain = rain_in_inch >= 2

light_rain = ~(no_rain | heavy_rain)
light_rain

array([ True,  True,  True, False, False, False,  True, False, False,
       False, False,  True, False, False, False])

### Exercise
Find how many of your grades in *my_grades* are between 70 and 90.

In [None]:
# Add your code here

## Boolean array as masks
We can use Boolean arrays as masks, to select particular subsets of the data themselves. This is how NumPy helps us filter elements based on a condition. We first create a boolean array as we did in the above section, and then use it to select the elements from the original array that fit the criteria.

In [22]:
# Filter down the rain data to get only days with light rain
rain_in_inch[light_rain]

array([0.5, 0.6, 1.5, 1. , 1. ])

In [23]:
# Get the data for days with more than 1 inch rain
rain_in_inch[rain_in_inch > 1]

array([1.5, 2. , 3. ])

What is returned is a one-dimensional array filled with all the values that meet this condition; in other words, all the values in positions at which the mask array is True.

We are then free to operate on these values as we wish. 

In [24]:
# Calculate the average amount of rain in rainy days
mask = rain_in_inch > 0
print(np.mean(rain_in_inch[mask]))

1.3714285714285714


### Exercise
Calculate the average of all your grades in *my_grade* under 90.

In [None]:
# Add your code here