# Sorting and ranking

Sorting a record by a criterion is another important built-in function. Sorting lexicographically by row or column index is already described in the section [Reordering and sorting from levels](indexing.ipynb#Rearranging-and-Sorting-Levels). In the following we look at sorting the values with [DataFrame.sort_values](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html) and [Series.sort_values](https://pandas.pydata.org/docs/reference/api/pandas.Series.sort_values.html):

In [1]:
import numpy as np
import pandas as pd


rng = np.random.default_rng()
s = pd.Series(rng.normal(size=7))

s.sort_index(ascending=False)

6   -0.287551
5   -0.073895
4    0.077808
3    0.647918
2    1.370572
1   -0.071934
0    0.823556
dtype: float64

All missing values are sorted to the end of the row by default:

In [2]:
s = pd.Series(rng.normal(size=7))
s[s < 0] = np.nan

s.sort_values()

5    0.502380
3    1.347849
4    1.488811
0         NaN
1         NaN
2         NaN
6         NaN
dtype: float64

With a DataFrame you can sort on both axes. With `by` you specify which column or row is to be sorted:

In [3]:
df = pd.DataFrame(rng.normal(size=(7, 3)))

df.sort_values(by=2, ascending=False)

Unnamed: 0,0,1,2
1,-0.12228,-0.013553,1.622476
2,-0.316663,0.823117,0.678331
5,0.545206,-1.685777,0.533224
4,0.661617,0.054888,-0.228683
6,-0.36861,1.41995,-0.467401
0,0.701885,0.046049,-1.685828
3,0.537244,1.251408,-2.482741


You can also sort rows with `axis=1` and `by`:

In [4]:
df.sort_values(axis=1, by=[0, 1], ascending=False)

Unnamed: 0,0,1,2
0,0.701885,0.046049,-1.685828
1,-0.12228,-0.013553,1.622476
2,-0.316663,0.823117,0.678331
3,0.537244,1.251408,-2.482741
4,0.661617,0.054888,-0.228683
5,0.545206,-1.685777,0.533224
6,-0.36861,1.41995,-0.467401


## Ranking

[DataFrame.rank](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rank.html) and [Series.rank](https://pandas.pydata.org/docs/reference/api/pandas.Series.rank.html) assign ranks from one to the number of valid data points in an array:

In [5]:
df.rank()

Unnamed: 0,0,1,2
0,7.0,3.0,2.0
1,3.0,2.0,7.0
2,2.0,5.0,6.0
3,4.0,6.0,1.0
4,6.0,4.0,4.0
5,5.0,1.0,5.0
6,1.0,7.0,3.0


If ties occur in the ranking, the middle rank is usually assigned in each group.

In [6]:
df2 = pd.concat([df, df[5:]])

df2.rank()

Unnamed: 0,0,1,2
0,9.0,4.0,2.0
1,4.0,3.0,9.0
2,3.0,6.0,8.0
3,5.0,7.0,1.0
4,8.0,5.0,5.0
5,6.5,1.5,6.5
6,1.5,8.5,3.5
5,6.5,1.5,6.5
6,1.5,8.5,3.5


The parameter `min`, on the other hand, assigns the smallest rank in the group:

In [7]:
df2.rank(method="min")

Unnamed: 0,0,1,2
0,9.0,4.0,2.0
1,4.0,3.0,9.0
2,3.0,6.0,8.0
3,5.0,7.0,1.0
4,8.0,5.0,5.0
5,6.0,1.0,6.0
6,1.0,8.0,3.0
5,6.0,1.0,6.0
6,1.0,8.0,3.0


## Other methods with `rank`

Method | Description
:----- | :----------
`average` | default: assign the average rank to each entry in the same group
`min` | uses the minimum rank for the whole group
`max` | uses the maximum rank for the whole group
`first` | assigns the ranks in the order in which the values appear in the data
`dense` | like `method='min'` but the ranks always increase by 1 between groups and not according to the number of same items in a group