# [4 NumPy Tricks Every Python Beginner should Learn](https://towardsdatascience.com/4-numpy-tricks-every-python-beginner-should-learn-bdb41febc2f2)

## 1. Arg functions — positions

For an array `arr`, 

- `np.argmax(arr)` : return the indices of maximum values
- `np.argmin(arr)` : return the indices of manimum values
- `np.argwhere(condition(arr))` : return indices of values that satisfy a user-defined condition.

- `np.argsort()` : return the indices that would sort an array
We can use `np.argsort` to **sort values of arrays according to another array.**
The sorted name array can also be **transformed back to its original order** using `np.argsort(np.argsort(score))`.
Its performance is faster than using built-in Python function `sorted(zip())`, and is arguably more readable.

In [27]:
score = np.array([70, 60, 50, 10, 90, 40, 80])
name = np.array(['Ada', 'Ben', 'Charlie', 'Danny', 'Eden', 'Fanny', 'George'])
sorted_name = name[np.argsort(score)] # an array of names in ascending order of their scores
print(sorted_name)   # ['Danny' 'Fanny' 'Charlie' 'Ben' 'Ada' 'George' 'Eden']

original_name = sorted_name[np.argsort(np.argsort(score))]
print(original_name) # ['Ada' 'Ben' 'Charlie' 'Danny' 'Eden' 'Fanny' 'George']


%timeit name[np.argsort(score)] 
# 1.83 µs ± 182 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit sorted(zip(score, name))
# 3.2 µs ± 76.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

['Danny' 'Fanny' 'Charlie' 'Ben' 'Ada' 'George' 'Eden']
['Ada' 'Ben' 'Charlie' 'Danny' 'Eden' 'Fanny' 'George']
2.15 µs ± 100 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
3.61 µs ± 59.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


## 2. Broadcasting — shapes
Broadcasting **vectorizes array operations without making needless copies of data**. This leads to efficient algorithm implementations and higher code readability.

### But how do we know if two arrays are compatible with broadcasting?
Each dimension of both arrays have to be either **equal**, or **one of them is 1**. They do not need to have the same number of dimensions.

```
Argument 1  (4D array): 7 × 5 × 3 × 1
Argument 2  (3D array):     1 × 3 × 9
Output      (4D array): 7 × 5 × 3 × 9
```

# 3. Ellipsis and NewAxis — dimensions
- When dealing with arrays with higher dimensions, we use `:` for selecting the whole indices along each axis.
- We can also use `…` can select **all indices across multiple axes**. The exact number of axes expanded is **inferred**.

Using `np.newaxis` **inserts a new axis at a user-defined axis position**. This operation expands the shape of an array by one unit of dimension. While this can also be done by `np.expand_dims()`, using `np.newaxis` is much more readable and arguably more elegant.

In [28]:
arr = np.array(range(1000)).reshape(2,5,2,10,-1)
print(arr[:,:,:,3,2] == arr[...,3,2])
# [[[ True,  True],
#   [ True,  True],
#   [ True,  True],
#   [ True,  True],
#   [ True,  True]],
#  [[ True,  True],
#   [ True,  True],
#   [ True,  True],
#   [ True,  True],
#   [ True,  True]]])

print(arr.shape)                       # (2, 5, 2, 10, 5)
print(arr[...,np.newaxis,:,:,:].shape) # (2, 5, 1, 2, 10, 5)

[[[ True  True]
  [ True  True]
  [ True  True]
  [ True  True]
  [ True  True]]

 [[ True  True]
  [ True  True]
  [ True  True]
  [ True  True]
  [ True  True]]]
(2, 5, 2, 10, 5)
(2, 5, 1, 2, 10, 5)


## 4. Masked Array — selection
Datasets are imperfect. They always contain arrays with missing or invalid entries, and we often want to ignore those entries. For example, measurements from a weather station might contain missing values because of sensor failure.

Numpy has a submodule `numpy.ma` that **supports data arrays with masks**. A masked array contains an ordinary numpy array and **a mask that indicates the position of invalid entries**.

```
np.ma.MaskedArray(data=arr, mask=invalid_mask)
```

Invalid entries in an array are sometimes marked using negative values or strings. If we know the masked value, say `-999`, we can also create a masked array using `np.ma.masked_values(arr, value=-999)`. Any numpy operation taking a masked array as an argument will automatically ignore those invalid entries as shown below.

In [30]:
import math
def is_prime(n):
    assert n > 1, 'Input must be larger than 1'
    if n % 2 == 0 and n > 2:
        return False
    return all(n % i for i in range(3, int(math.sqrt(n)) + 1, 2))

arr = np.array(range(2,100))
non_prime_mask = [not is_prime(n) for n in arr]
prime_arr = np.ma.MaskedArray(data=arr, mask=non_prime_mask)
print(prime_arr)
# [2 3 -- 5 -- 7 -- -- -- 11 -- 13 -- -- -- 17 -- 19 -- -- -- 23 -- -- -- --
#  -- 29 -- 31 -- -- -- -- -- 37 -- -- -- 41 -- 43 -- -- -- 47 -- -- -- --
#  -- 53 -- -- -- -- -- 59 -- 61 -- -- -- -- -- 67 -- -- -- 71 -- 73 -- --
#  -- -- -- 79 -- -- -- 83 -- -- -- -- -- 89 -- -- -- -- -- -- -- 97 -- --]

arr = np.array(range(11))
print(arr.sum())        # 55

arr[-1] = -999 # indicates missing value
masked_arr = np.ma.masked_values(arr, -999)
print(masked_arr.sum()) # 45

[2 3 -- 5 -- 7 -- -- -- 11 -- 13 -- -- -- 17 -- 19 -- -- -- 23 -- -- -- --
 -- 29 -- 31 -- -- -- -- -- 37 -- -- -- 41 -- 43 -- -- -- 47 -- -- -- --
 -- 53 -- -- -- -- -- 59 -- 61 -- -- -- -- -- 67 -- -- -- 71 -- 73 -- --
 -- -- -- 79 -- -- -- 83 -- -- -- -- -- 89 -- -- -- -- -- -- -- 97 -- --]
55
45
