# <center><div style="width: 370px;"> ![Panel Data](pictures/Panel_Data.jpg)

# <center> Sorting

In [1]:
import pandas as pd
import numpy as np

pandas supports three kinds of sorting: sorting by index labels, sorting by column values, and sorting by a combination of both.

### By index

The `Series.sort_index()` and `DataFrame.sort_index()` methods are
used to sort a pandas object by its index levels.

In [2]:
df = pd.DataFrame(
    {
        "one": pd.Series(np.random.randn(3), index=["a", "b", "c"]),
        "two": pd.Series(np.random.randn(4), index=["a", "b", "c", "d"]),
        "three": pd.Series(np.random.randn(3), index=["b", "c", "d"]),
    }
)

df

Unnamed: 0,one,two,three
a,-0.84253,-0.441442,
b,0.014247,-0.316162,0.201699
c,-0.574427,-0.274793,-0.721859
d,,1.285545,1.163473


In [3]:
unsorted_df = df.reindex(
    index=["a", "d", "c", "b"], columns=["three", "two", "one"]
)
unsorted_df

Unnamed: 0,three,two,one
a,,-0.441442,-0.84253
d,1.163473,1.285545,
c,-0.721859,-0.274793,-0.574427
b,0.201699,-0.316162,0.014247


In [4]:
unsorted_df.sort_index()

Unnamed: 0,three,two,one
a,,-0.441442,-0.84253
b,0.201699,-0.316162,0.014247
c,-0.721859,-0.274793,-0.574427
d,1.163473,1.285545,


In [5]:
unsorted_df.sort_index(ascending=False)

Unnamed: 0,three,two,one
d,1.163473,1.285545,
c,-0.721859,-0.274793,-0.574427
b,0.201699,-0.316162,0.014247
a,,-0.441442,-0.84253


In [6]:
unsorted_df.sort_index(axis=1)

Unnamed: 0,one,three,two
a,-0.84253,,-0.441442
d,,1.163473,1.285545
c,-0.574427,-0.721859,-0.274793
b,0.014247,0.201699,-0.316162


In [7]:
unsorted_df["three"].sort_index()

a         NaN
b    0.201699
c   -0.721859
d    1.163473
Name: three, dtype: float64

> **New in version 1.1.0**
> 
> Sorting by index also supports a `key` parameter that takes a callable
function to apply to the index being sorted. For `MultiIndex` objects,
the key is applied per-level to the levels specified by `level`.

In [8]:
s1 = pd.DataFrame({"a": ["B", "a", "C"], "b": [1, 2, 3], "c": [2, 3, 4]}).set_index(
    list("ab")
)

s1

Unnamed: 0_level_0,Unnamed: 1_level_0,c
a,b,Unnamed: 2_level_1
B,1,2
a,2,3
C,3,4


In [9]:
s1.sort_index(level="a")

Unnamed: 0_level_0,Unnamed: 1_level_0,c
a,b,Unnamed: 2_level_1
B,1,2
C,3,4
a,2,3


In [10]:
s1.sort_index(level="a", key=lambda idx: idx.str.lower())

Unnamed: 0_level_0,Unnamed: 1_level_0,c
a,b,Unnamed: 2_level_1
a,2,3
B,1,2
C,3,4


For information on key sorting by value, see [value sorting](https://pandas.pydata.org/docs/user_guide/basics.html#basics-sort-value-key).

### By values

The `Series.sort_values()` method is used to sort a `Series` by its values. The
`DataFrame.sort_values()` method is used to sort a `DataFrame` by its column or row values.
The optional `by` parameter to `DataFrame.sort_values()` may used to specify one or more columns
to use to determine the sorted order.

In [11]:
df1 = pd.DataFrame(
    {"one": [2, 1, 1, 1], "two": [1, 3, 2, 4], "three": [5, 4, 3, 2]}
)

df1

Unnamed: 0,one,two,three
0,2,1,5
1,1,3,4
2,1,2,3
3,1,4,2


In [12]:
df1.sort_values(by="two")

Unnamed: 0,one,two,three
0,2,1,5
2,1,2,3
1,1,3,4
3,1,4,2


The `by` parameter can take a list of column names, e.g.:

In [13]:
df1[["one", "two", "three"]].sort_values(by=["one", "two"])

Unnamed: 0,one,two,three
2,1,2,3
1,1,3,4
3,1,4,2
0,2,1,5


These methods have special treatment of NA values via the `na_position`
argument:

In [14]:
s = pd.Series(
    ["A", "B", "C", "Aaba", "Baca", np.nan, "CABA", "dog", "cat"], dtype="string"
)
s[2] = np.nan

s

0       A
1       B
2    <NA>
3    Aaba
4    Baca
5    <NA>
6    CABA
7     dog
8     cat
dtype: string

In [15]:
s.sort_values()

0       A
3    Aaba
1       B
4    Baca
6    CABA
8     cat
7     dog
2    <NA>
5    <NA>
dtype: string

In [16]:
s.sort_values(na_position="first")

2    <NA>
5    <NA>
0       A
3    Aaba
1       B
4    Baca
6    CABA
8     cat
7     dog
dtype: string

New in version 1.1.0.

Sorting also supports a `key` parameter that takes a callable function
to apply to the values being sorted.

In [18]:
s1 = pd.Series(["B", "a", "C"])

In [19]:
s1.sort_values()

0    B
2    C
1    a
dtype: object

In [20]:
s1.sort_values(key=lambda x: x.str.lower())

1    a
0    B
2    C
dtype: object

`key` will be given the `Series` of values and should return a `Series`
or array of the same shape with the transformed values. For `DataFrame` objects,
the key is applied per column, so the key should still expect a Series and return
a Series, e.g.

In [21]:
df = pd.DataFrame({"a": ["B", "a", "C"], "b": [1, 2, 3]})

df

Unnamed: 0,a,b
0,B,1
1,a,2
2,C,3


In [22]:
df.sort_values(by="a")

Unnamed: 0,a,b
0,B,1
2,C,3
1,a,2


In [23]:
df.sort_values(by="a", key=lambda col: col.str.lower())

Unnamed: 0,a,b
1,a,2
0,B,1
2,C,3


The name or type of each column can be used to apply different functions to
different columns.

### By indexes and values

Strings passed as the `by` parameter to `DataFrame.sort_values()` may
refer to either columns or index level names.

In [24]:
idx = pd.MultiIndex.from_tuples(
    [("a", 1), ("a", 2), ("a", 2), ("b", 2), ("b", 1), ("b", 1)]
)

In [25]:
idx.names = ["first", "second"]
df_multi = pd.DataFrame({"A": np.arange(6, 0, -1)}, index=idx)
df_multi

Unnamed: 0_level_0,Unnamed: 1_level_0,A
first,second,Unnamed: 2_level_1
a,1,6
a,2,5
a,2,4
b,2,3
b,1,2
b,1,1


Sort by ‘second’ (index) and ‘A’ (column)

In [26]:
df_multi.sort_values(by=["second", "A"])

Unnamed: 0_level_0,Unnamed: 1_level_0,A
first,second,Unnamed: 2_level_1
b,1,1
b,1,2
a,1,6
b,2,3
a,2,4
a,2,5


> **Note:**
> 
> If a string matches both a column name and an index level name then a
warning is issued and the column takes precedence. This will result in an
ambiguity error in a future version.

### searchsorted

Series has the [`searchsorted()`](https://pandas.pydata.org/docs/reference/api/pandas.Series.searchsorted.html#pandas.Series.searchsorted) method, which works similarly to
[`numpy.ndarray.searchsorted()`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.searchsorted.html#numpy.ndarray.searchsorted "(in NumPy v1.23)").

In [28]:
ser = pd.Series([1, 2, 3])

In [29]:
ser.searchsorted([0, 3])

array([0, 2])

In [30]:
ser.searchsorted([0, 4])

array([0, 3])

In [31]:
ser.searchsorted([1, 3], side="right")

array([1, 3])

In [32]:
ser.searchsorted([1, 3], side="left")

array([0, 2])

In [33]:
ser = pd.Series([3, 1, 2])

In [34]:
ser.searchsorted([0, 3], sorter=np.argsort(ser))

array([0, 2])

### smallest / largest values

`Series` has the `nsmallest()` and `nlargest()` methods which return the
smallest or largest \(n\) values. For a large `Series` this can be much
faster than sorting the entire Series and calling `head(n)` on the result.

In [35]:
s = pd.Series(np.random.permutation(10))
s

0    7
1    3
2    0
3    9
4    6
5    8
6    5
7    2
8    1
9    4
dtype: int64

In [36]:
s.sort_values()

2    0
8    1
7    2
1    3
9    4
6    5
4    6
0    7
5    8
3    9
dtype: int64

In [37]:
s.nsmallest(3)

2    0
8    1
7    2
dtype: int64

In [38]:
s.nlargest(3)

3    9
5    8
0    7
dtype: int64

`DataFrame` also has the `nlargest` and `nsmallest` methods.

In [39]:
df = pd.DataFrame(
    {
        "a": [-2, -1, 1, 10, 8, 11, -1],
        "b": list("abdceff"),
        "c": [1.0, 2.0, 4.0, 3.2, np.nan, 3.0, 4.0],
    }
)

In [40]:
df.nlargest(3, "a")

Unnamed: 0,a,b,c
5,11,f,3.0
3,10,c,3.2
4,8,e,


In [41]:
df.nlargest(5, ["a", "c"])

Unnamed: 0,a,b,c
5,11,f,3.0
3,10,c,3.2
4,8,e,
2,1,d,4.0
6,-1,f,4.0


In [42]:
df.nsmallest(3, "a")

Unnamed: 0,a,b,c
0,-2,a,1.0
1,-1,b,2.0
6,-1,f,4.0


In [43]:
df.nsmallest(5, ["a", "c"])

Unnamed: 0,a,b,c
0,-2,a,1.0
1,-1,b,2.0
6,-1,f,4.0
2,1,d,4.0
4,8,e,


### Sorting by a MultiIndex column

You must be explicit about sorting when the column is a MultiIndex, and fully specify
all levels to `by`.

In [44]:
df1.columns = pd.MultiIndex.from_tuples(
    [("a", "one"), ("a", "two"), ("b", "three")]
)

In [45]:
df1.sort_values(by=("a", "two"))

Unnamed: 0_level_0,a,a,b
Unnamed: 0_level_1,one,two,three
0,2,1,5
2,1,2,3
1,1,3,4
3,1,4,2
