# Get confident in sorting with numpy 

NumPy has `.sort()` method. Which are available in two flavors
1. arr.sort() - Inplace
2. np.sort(arr) - Copy

# 

As the first method can work inplace, it can make changes in the original array even if the sort is used only on the view of it.

In [1]:
import numpy as np

In [28]:
arr = np.random.randint(0, 20, 10)
arr

array([ 6, 16,  2,  7,  0, 17, 10, 10,  4,  9])

In [29]:
arr[::2]

array([ 6,  2,  0, 10,  4])

In [30]:
arr[::2].sort()

In [31]:
arr

array([ 0, 16,  2,  7,  4, 17,  6, 10, 10,  9])

In [32]:
arr[::2]

array([ 0,  2,  4,  6, 10])

# 

Also in 2D

In [33]:
arr = arr.reshape((2,5))

In [34]:
arr

array([[ 0, 16,  2,  7,  4],
       [17,  6, 10, 10,  9]])

In [35]:
# Taking only FIRST row
arr[0, :]

array([ 0, 16,  2,  7,  4])

In [36]:
arr[0, :].sort()

In [37]:
arr

array([[ 0,  2,  4,  7, 16],
       [17,  6, 10, 10,  9]])

# 

You also get the feature from `axis` when the array is multidimentional

**Author's note on WHY Descending order** is not available?<br>
It is because - array slicing produces views, thus not producing a copy or requiring any computational work.
___
Frankly, I didn't get the point what he is trying to say, but it is some kind of memory related problem. So - we are not going to discuss that.

But he also adds that - people tend to use a "trick". After sorting people tend to use `arr[::-1]` for 1D and `arr[:, ::-1]` for 2D to achieve the same thing for decending order. 

# 

# Indirect sorts: `argsort()` and `lexsort()`
Bro, you need to used these guys. They are incredible. (Also they have their back in the `df.sort_values()` algorithm.

First, **What is Indirect sort?**: <br>
The sort is called 'indirect' when it is depended on two or more columns to sort.

> The point of this topic is - when you need to obtain the INDICES of the values when they are in sorted order.

# 

# `argsort()` 

In [58]:
arr = np.random.randint(0, 20, 10)
arr

array([19,  3, 18, 12,  3,  0,  2,  7, 17, 14])

In [59]:
arr.argsort()

array([5, 6, 1, 4, 7, 3, 9, 8, 2, 0], dtype=int64)

So we can use them as the indexers.

In [60]:
arr[arr.argsort()]

array([ 0,  2,  3,  3,  7, 12, 14, 17, 18, 19])

# 

But can't use with 2D the way we used in 1d ↑

In [61]:
arr = np.random.randint(0, 20, 10).reshape(2, 5)
arr

array([[13, 12, 15,  6, 16],
       [14, 12, 11,  9,  1]])

In [62]:
arr.argsort()

array([[3, 1, 0, 2, 4],
       [4, 3, 2, 1, 0]], dtype=int64)

In [63]:
arr[:, arr.argsort()]

array([[[ 6, 12, 13, 15, 16],
        [16,  6, 15, 12, 13]],

       [[ 9, 12, 14, 11,  1],
        [ 1,  9, 11, 12, 14]]])

No... it used TWO rows for each row... This is not useful here.

# 

####  BUT, in 2D we can sort all columns by some column!

In [65]:
arr

array([[13, 12, 15,  6, 16],
       [14, 12, 11,  9,  1]])

In [64]:
arr[:, arr[0].argsort()]

array([[ 6, 12, 13, 15, 16],
       [ 9, 12, 14, 11,  1]])

See? The second is also moved along with the first one.


# 

# `lexsort()` 

In [68]:
arr = np.array(['A', 'B', "D", 'V', 'C'])
arr

array(['A', 'B', 'D', 'V', 'C'], dtype='<U1')

In [69]:
arr.argsort()

array([0, 1, 4, 2, 3], dtype=int64)

This works (argsort) in the normal case, but when you want them with the multiple keys... then lexsort will be useful.

# 

In [70]:
fname = np.array(['Aayush', 'Sameer', 'Shah', 'Samir', 'Shah'])
lname = np.array(['Shah', 'Shah', 'Fofadiya', 'Shah', 'Fofdiya'])

In [73]:
np.lexsort((fname, lname))

array([2, 4, 0, 1, 3], dtype=int64)

Here, the LNAME is considered first. EVEN IF it is passed 2nd. That is to keep in mind.

# 

NOTE:
    
    There are 3 main sorting algoriths in NumPy. Usually you won't need to take
    care of them. But keep in mind that they are there.
    
    - MergeSort (default)
    - QuickSort
    - HeapSort
    
    Accessed with `kind` parameter in .sort() or .argsort() or elsewhere.

# 

### There are `np.partition` and `np.argpartition`
They are to Partition arount k-th smallest element.

# 

### `np.searchsorted`
*This is an amazing function - which also helped use in HANDBOOK*

> This function sorts the data then finds the passed element. Returns the position of that element or if not found in array it returns 0 if the element is smaller than the minumum element or 'n' if the element is larger than the maximum element.

In [75]:
arr = np.array([3,4,52,66,4,1])
arr

array([ 3,  4, 52, 66,  4,  1])

In [82]:
np.sort(arr)

array([ 1,  3,  4,  4, 52, 66])

In [83]:
arr.searchsorted(2)

0

It will sort the data or not I am not sure... but it finds the first bigger or same number and as soon as it does find it... it returns the location.

#### This function is used for binning in histogram.

# 

Consider this example

In [84]:
arr = np.array([0, 0, 0, 1, 1, 1])
arr

array([0, 0, 0, 1, 1, 1])

In [86]:
arr.searchsorted(0)

0

In [87]:
arr.searchsorted(1)

3

In [88]:
arr.searchsorted(0.5)

3

Now by default it takes the FIRST occurance of the element. We can make the FIRST from right.

In [90]:
arr.searchsorted(0, side= 'right')

3

In [91]:
arr.searchsorted(1, side= 'right')

6

In [94]:
arr.searchsorted(0.5, side= 'right')

3

Nothing crazy, it just starts searching from right. 

# 

## Amazing help of searchsorted with GroupBy
*bins*

In [103]:
arr = np.random.uniform(0, 100, 25)
arr

array([37.3912743 ,  6.73463692, 84.37081981, 90.94586855, 18.74310803,
       77.1066377 , 47.94920431, 42.32270908, 44.54148218, 80.89196116,
        5.37046425,  0.70150713, 80.25212608, 45.66414497, 16.14841578,
       82.03945326,  2.22496517, 86.5010344 , 49.58995551, 56.18438978,
       44.61657654, 22.00464988, 35.10017368, 63.55675456, 65.67221744])

In [104]:
bins = np.array([0, 10, 50, 100])

In [106]:
labels = bins.searchsorted(arr)
labels

array([2, 1, 3, 3, 2, 3, 2, 2, 2, 3, 1, 1, 3, 2, 2, 3, 1, 3, 2, 3, 2, 2,
       2, 3, 3], dtype=int64)

In [107]:
import pandas as pd

In [114]:
# So if we do the group by bins...
pd.Series(arr).groupby(labels).mean()

1     3.757893
2    36.733790
3    76.752126
dtype: float64

# 

# Next up
The new part will be introduced - Numba. This is an amazing project to make python code run faster than python does!