# Sorting and Ranking

Sorting a data set by some criterion is another important built-in operation. To sort lexicographically by row or column index, use the sort_index method, which returns a new, sorted object:

In [2]:
import pandas as pd
import numpy as np
from pandas import Series, DataFrame

In [3]:
obj = Series(np.arange(4), index=list('acdb'))

obj

a    0
c    1
d    2
b    3
dtype: int32

In [4]:
obj.sort_index()

a    0
b    3
c    1
d    2
dtype: int32

With a DataFrame, you can sort by index on either axis:

In [5]:
frame = DataFrame(np.arange(8).reshape(2,4), index=list('ba'), columns=list('dabc'))

frame

Unnamed: 0,d,a,b,c
b,0,1,2,3
a,4,5,6,7


In [6]:
frame.sort_index(axis= 0)

Unnamed: 0,d,a,b,c
a,4,5,6,7
b,0,1,2,3


The data is sorted in ascending order by default, but can be sorted in descending order,
too:

In [7]:
frame.sort_index(axis = 1, ascending= False)

Unnamed: 0,d,c,b,a
b,0,3,2,1
a,4,7,6,5


To sort a Series by its values, use its *sort_values* method:

In [8]:
obj = Series([4,7,-3,2])

obj.sort_values()

2   -3
3    2
0    4
1    7
dtype: int64

Any missing values are sorted to the end of the Series by default:

In [9]:
obj = Series([4, np.nan, 7, np.nan, -3, 2])

obj

0    4.0
1    NaN
2    7.0
3    NaN
4   -3.0
5    2.0
dtype: float64

In [10]:
obj.sort_values()

4   -3.0
5    2.0
0    4.0
2    7.0
1    NaN
3    NaN
dtype: float64

In [11]:
frame = DataFrame({'a': [1,2,3,4], 'b': [1,4,3,4]})

frame

Unnamed: 0,a,b
0,1,1
1,2,4
2,3,3
3,4,4


In [12]:
frame.sort_index()

Unnamed: 0,a,b
0,1,1
1,2,4
2,3,3
3,4,4


*Ranking* is closely related to sorting, assigning ranks from one through the number of valid data points in an array. It is similar to the indirect sort indices produced by  numpy.argsort, except that ties are broken according to a rule. The rank methods for Series and DataFrame are the place to look; by default rank breaks ties by assigning each group the mean rank:

In [13]:
obj = Series([7, -5, 7, 4, 2, 0, 4])

obj

0    7
1   -5
2    7
3    4
4    2
5    0
6    4
dtype: int64

In [14]:
obj.rank()

0    6.5
1    1.0
2    6.5
3    4.5
4    3.0
5    2.0
6    4.5
dtype: float64

Ranks can also be assigned according to the order they’re observed in the data:

In [17]:
obj.rank(method ='first')

0    6.0
1    1.0
2    7.0
3    4.0
4    3.0
5    2.0
6    5.0
dtype: float64

Naturally, you can rank in descending order, too:

In [18]:
obj.rank(ascending= False, method='max')

0    2.0
1    7.0
2    2.0
3    4.0
4    5.0
5    6.0
6    4.0
dtype: float64

In [20]:
frame = DataFrame({'b': [4.3, 7, -3, 2], 'a': [0, 1, 0, 1],
                    'c': [-2, 5, 8, -2.5]})


frame

Unnamed: 0,b,a,c
0,4.3,0,-2.0
1,7.0,1,5.0
2,-3.0,0,8.0
3,2.0,1,-2.5


In [24]:
frame.rank(axis = 1)

Unnamed: 0,b,a,c
0,3.0,2.0,1.0
1,3.0,1.0,2.0
2,1.0,2.0,3.0
3,3.0,2.0,1.0


![Tie-breaking methods with rank](../../Pictures/Tie-breaking%20methods%20with%20rank.png)