# Five Numpy functions you didn't know you need
#### By Kyaw Htet Paing Win


### Interesting Numpy functions 

Numpy is a popular Python library that has been optimized to perform calculations on large amount of data stored in an array data structure. Let's see some useful functions to augment your ability to process large data! We will cover the following 5 important functions in this notebook

- argsort
- apply_along_axis
- tile
- meshgrid
- extract

In [1]:
!pip install jovian --upgrade -q

In [2]:
import jovian

In [3]:
jovian.commit(project='numpy-array-operations')

<IPython.core.display.Javascript object>

[jovian] Updating notebook "kyawhtetwin/numpy-array-operations" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/kyawhtetwin/numpy-array-operations[0m


'https://jovian.ai/kyawhtetwin/numpy-array-operations'

Let's begin by importing Numpy and listing out the functions covered in this notebook.

In [4]:
import numpy as np

In [5]:
# List of functions explained 
function1 = np.argsort
function2 = np.apply_along_axis
function3 = np.tile
function4 = np.meshgrid
function5 = np.extract

## Function 1 - argsort 

First, sorting! Let's say you want to retrieve indices of a Numpy array after it has been sorted. You can use argsort to do that along any dimension.  

In [6]:
# Example 1: Simple 1 dimensional array
arr1 = np.array([4, -9, 0, 6, 1])
idx1 = np.argsort(arr1, kind='stable')
print(idx1)

[1 2 4 0 3]


We see that argsort returns the indices that would have sorted the original array.
* kind: Specifies a sorting algorithm. I have opted to use the most stable algorithm among its available algorithm. Note: default algorithm is 'quicksort' 

In [7]:
# Example 2: In 2-dimensional array, sorting along multiple axis 
arr2 = np.array([
    [9, -1, -1, 5],
    [7, 3, -4, 9]
])

In [8]:
idx2 = np.argsort(arr2, axis=0, kind='stable')
print(idx2)

[[1 0 1 0]
 [0 1 0 1]]


Notice that arr2 has shape (2, 4) along axis 0 and 1 respectively. So, what if we are interested in indices of sorting along a particular axis? In above example, I have specified axis 0 (visualize it as downwards). Well, since arr2's first dimension (i.e. axis 0) is 2, you see that indices argsort returns range from 0 to 1. If you pay attention to the first "column" (along axis=0) of arr2, you see that 7 at index 1 is smaller than 9 at index 0. Thus, you see that argsort returns [1 0] as its first column. 
* axis: Specifies the axis to sort along. Default -1

As you would expect, the shape of indices that argsort returns matches the shape of original array since they are indexes of the original array.

In [9]:
# Along the second axis
idx3 = np.argsort(arr2, axis=1, kind='stable')
print(idx3)

[[1 2 3 0]
 [2 1 0 3]]


See if you can explain the result along second axis yourself. Notice that the indices of same element also seem to be sorted.

In [10]:
idx4 = np.argsort(arr2, kind='stable')
print(idx4)

[[1 2 3 0]
 [2 1 0 3]]


Notice that axis defaults to -1 (i.e. the last axis). So we see the same result as when setting axis to 1

Side Note: What if you also want to retrieve the original values according to these sorted indices? In that case, you can use take_along_axis() as seen below along with argsort to accomplish it. In effect, we have mimic the implementation of np.sort(). Check out np.sort in documentation.

In [11]:
sorted_arr2 = np.take_along_axis(arr2, idx4, axis=1)
print(sorted_arr2)

[[-1 -1  5  9]
 [-4  3  7  9]]


In [12]:
# Example 3 (Wrong Use Case) - Be mindful of which axis the sorting occurs 
incorrect_sorted_arr2 = np.take_along_axis(arr2, idx4, axis=0)
print(incorrect_sorted_arr2)

IndexError: index 2 is out of bounds for axis 0 with size 2

In example 3, the indices of the first dimension can't exceed 1. So it breaks if we don't pass in the correct axis that we are sorting along.  

Remember argsort() when you need indices of the sorted array. You also saw how we can create the sorted array using argosrt() in conjunction with take_along_axis()

In [13]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Updating notebook "kyawhtetwin/numpy-array-operations" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/kyawhtetwin/numpy-array-operations[0m


'https://jovian.ai/kyawhtetwin/numpy-array-operations'

## Function 2 - apply_along_axis

Suppose we have some rows and columns of a dataset. We are interested in certain statistics across rows or columns. How can we solve that? Let's take a look.

In [14]:
"""
Example 1:
I am interested in finding the difference between the max & min value 
of each column in the data below (i.e. first column=[9, 6, 1] etc..)
"""
data1 = np.array([
                 [9., 14., 3, 4],
                 [6, 12, 4, 9],
                 [1 , 0, 13, 14]
                ])

In [15]:
# Helper function 
def max_min_diff(arr):
    """
    Returns the difference between the max value & min value 
    in the given 1D array
    """
    return np.amax(arr) - np.amin(arr)

I can use apply_along_axis as follows:
* func1d: Expects a function that describes what operation to perform on 1D array
* axis: The axis along which arr is sliced
* arr: Numpy ndarray

In [16]:
np.apply_along_axis(func1d=max_min_diff, axis=0, arr=data1)

array([ 8., 14., 10., 10.])

This function first slices the array. Since I have specified the axis to 0 (i.e. I want everything along axis 0), it extracts 1D slice from arr[:, 0] which is [9., 6., 1.] and pass it to the function that I write, which in turn returns the difference 8. So on and so forth. Thus, you see the differences between max & min of each column in the data.

In [17]:
"""
Example 2: 
Quite often when we have missing values in your data, you want to 
replace them with certain computed values (such as mean, median, ...)
Let's see how we can use apply_along_axis function to do that. 
"""
data = np.array([
    [10, 200, 50], 
    [np.nan, 400, 60],
    [20, np.nan, 50],
    [40, 600, np.nan],
])

In [18]:
def fill_nan(arr, stats_func):
    """
        Applies a statistical function to 1D array &
        replace nan values in array with that statistics
    """
    stat = stats_func(arr)
    arr[np.isnan(arr)] = stat
    return arr

In [19]:
from functools import partial
np.apply_along_axis(partial(fill_nan, stats_func=np.nanmean), axis=0, arr=data)

array([[ 10.        , 200.        ,  50.        ],
       [ 23.33333333, 400.        ,  60.        ],
       [ 20.        , 400.        ,  50.        ],
       [ 40.        , 600.        ,  53.33333333]])

As you can see, the average of each column values is computed and are replacing nan values in original data

In [20]:
"""
Example 3 (Wrong Use Case). 
Suppose you did the following to the Example 1 problem. 
The results isn't what we are looking for.
"""
np.apply_along_axis(func1d=max_min_diff, axis=1, arr=data1)

array([11.,  8., 14.])

In [21]:
data1

array([[ 9., 14.,  3.,  4.],
       [ 6., 12.,  4.,  9.],
       [ 1.,  0., 13., 14.]])

Above is a logical error. When you set axis as 1, it will slice array as data1[0, :] for the first time and call the function max_min_diff, and so on. Thus, the first value you see in output is 11 (14-3). Be mindful of how you want to slice the array and apply the function to match your expectation.

Consider apply_along_axis when you want you apply the same operation, i.e. function, along certain axis.

In [22]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Updating notebook "kyawhtetwin/numpy-array-operations" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/kyawhtetwin/numpy-array-operations[0m


'https://jovian.ai/kyawhtetwin/numpy-array-operations'

## Function 3 - tile

Let's say you want to replicate certain elements in your array A a number of times. In that case, you can use tile() to achieve that which takes the following parameters:
* A: the original array
* reps: The number of repetitions of A along each axis. Can be integer or iterable

In [23]:
"""
Example 1: Using tile on 1D array
"""
arr3 = np.array([1, 2, 3, 4])
rep_arr3 = np.tile(arr3, reps=(3, 2))
print(rep_arr3)

[[1 2 3 4 1 2 3 4]
 [1 2 3 4 1 2 3 4]
 [1 2 3 4 1 2 3 4]]


In [24]:
arr3.shape

(4,)

In [25]:
print(rep_arr3.shape)

(3, 8)


I start with an array with 4 elements. I have specified reps as tuple (3, 2). Since I want to see these 4 elements replicated 3 times along axis 0, I have an array that kind of looks like this:<br>
    
    [[1 2 3 4]
     [1 2 3 4]
     [1 2 3 4]]
Then, I also want to replicate this resulting array 2 times along axis 1. Thus, I ended up with the array you see in above run. 

In [27]:
"""
Example 2: 
Suppose you have two arrays: 
arr1 = [1, 2] 
arr2 = [2, 4, 6]
You want to create the resulting array as follows:
    Multiply each element in arr1 with each element in arr2
    (i.e 1 multiplies to 2, 4, 6, then 2 multiplies to 2, 4, 6)
"""
array1 = np.array([1, 2])
array2 = np.array([2, 4, 6])
array3 = np.tile(array2, reps=(array1.size, 1))
print(f"Array 1\n{array1[:, np.newaxis]}")
print(f"Replicated Array 2\n{array3}")
print("Resulting array:\n",array1[:, np.newaxis]*array3)

Array 1
[[1]
 [2]]
Replicated Array 2
[[2 4 6]
 [2 4 6]]
Resulting array:
 [[ 2  4  6]
 [ 4  8 12]]


Instead of using two loops to accomplish this in Python, we can use tile() method to replicate elements in array a certain amount of time and perform operations that we want. 

In above example, since I want to multiply all the elements in array 2 with each element in array 1 I have replicated elements in array 2 with the number of elements in array 1. Then, I can just simply multiply them while Numpy does broadcasting for me as well. 

In [28]:
# Example 3 - breaking (to illustrate when it breaks)
np.tile(array2, reps=(0))

array([], dtype=int64)

Non-sensible to repeat it 0 times

Consider tile functions to repeat elements in Numpy array so that you don't have to rely on for loops!

In [29]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Updating notebook "kyawhtetwin/numpy-array-operations" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/kyawhtetwin/numpy-array-operations[0m


'https://jovian.ai/kyawhtetwin/numpy-array-operations'

## Function 4 - meshgrid

Can we do something similar to Example 2 under function tile() using another function? What about meshgrid? Let's see what it does!

In [30]:
"""
    Example 1:
    Same problem 
    arr1 = [1, 2]
    arr2 = [2, 4, 6]
"""
a1 = np.array([1, 2])
a2 = np.array([2, 4, 6])
a1_g, a2_g = np.meshgrid(a1, a2)
print(f"Array 1:\n{a1_g}")
print(f"Array 2:\n{a2_g}")

Array 1:
[[1 2]
 [1 2]
 [1 2]]
Array 2:
[[2 2]
 [4 4]
 [6 6]]


In [31]:
print(f"Resulting array:\n{a1_g*a2_g}")

Resulting array:
[[ 2  4]
 [ 4  8]
 [ 6 12]]


Just as earlier, we have multiplied each element in array 1 to array 2. This time, we did so using meshgrid to essentially create a "grid" of values. You can see a pair of values for every values in array 1 with every values in array 2 (Kind of like creating xy coordinates pair).

In [32]:
"""
    Example 2:
    Using meshgrid to index into multi-dimensional arays.
    In the following array:
        Change odd row numbers into 4.
        Change even col numbers into -5 
"""
arr4 = np.zeros(shape=(6, 6))
rows, columns = np.meshgrid(np.arange(0, 5, 2), np.arange(1, 6, 2))
print(rows)
print(columns)

[[0 2 4]
 [0 2 4]
 [0 2 4]]
[[1 1 1]
 [3 3 3]
 [5 5 5]]


In [33]:
arr4[rows] = 4.
arr4[:, columns] = -5.

In [34]:
arr4

array([[ 4., -5.,  4., -5.,  4., -5.],
       [ 0., -5.,  0., -5.,  0., -5.],
       [ 4., -5.,  4., -5.,  4., -5.],
       [ 0., -5.,  0., -5.,  0., -5.],
       [ 4., -5.,  4., -5.,  4., -5.],
       [ 0., -5.,  0., -5.,  0., -5.]])

This example demonstrates how we might use meshgrid to index into multi-dimensional arrays

In [35]:
"""
    Example 3:
    Incorrect Use
"""
x, y = np.meshgrid([1, 2, 3], [4, 5, 6], [7, 8, 9])

ValueError: too many values to unpack (expected 2)

We can extend the usage in Example 2, and pass in arbitary number of iterables as first argument. However, as seen in Example 3, if we are passing 3 iterables, we must unpack 3 values upon function return call. To fix the error above, we can do <br>
```x, y, z = np.meshgrid([1, 2, 3], [4, 5, 6], [7, 8, 9])```

Remember meshgrid too when you don't want to loop!

In [36]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Updating notebook "kyawhtetwin/numpy-array-operations" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/kyawhtetwin/numpy-array-operations[0m


'https://jovian.ai/kyawhtetwin/numpy-array-operations'

## Function 5 - Extract

Finally, let's search our Numpy array for elements based on some condition using extract

In [37]:
"""
    Example 1: Find all the negative elements in array
"""
arr5 = np.array([
    [0, 2., -1, 4],
    [-7, 8, 1, -1],
    [2, 3, 9, 1],
])
print(np.extract(condition=arr5<0, arr=arr5))

[-1. -7. -1.]


As expected, we found elements in arr5 that are negative

In [38]:
"""
    Example 2: Find all the even numbers in array
"""
print(np.extract(arr5%2==0, arr5))

[0. 2. 4. 8. 2.]


As expected, we found all even numbers in arr5. Notice that all elements are returned as 1D array

In [39]:
"""
    Example 3: Incorrect Use
    What if we want to see if our arr5 contains values [2, 4]?
"""
print(np.extract(arr5==[2, 4], arr5))

[]


  print(np.extract(arr5==[2, 4], arr5))


As seen in warning message, Numpy does know how to compare two arrays with different shapes: arr5 has (3, 4) whereas [2, 4] has shape (1,).
The correct way to check would be as follow(Try this):<br>
```np.extract(np.isin(arr5, [2, 4]), arr5)```

Some closing comments about when to use this function.

In [41]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Updating notebook "kyawhtetwin/numpy-array-operations" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/kyawhtetwin/numpy-array-operations[0m


'https://jovian.ai/kyawhtetwin/numpy-array-operations'

## Conclusion

In this notebook, we covered 5 useful functions for your everyday use. We saw sorting & searching for elements in Numpy arrays. We also saw how to apply a function along some axis in Numpy array. In addition, we saw two approaches to get rid of looping using tile & meshgrid. 

There are other functions to do similar things that we did. Make sure to check them out in the documentation or other sources.

## Reference Links
* Numpy official tutorial : https://numpy.org/doc/stable/user/quickstart.html
