In [5]:
import pandas as pd

In [6]:
df = pd.DataFrame({
  'playerID': ['pedrodu01', 'pedrodu01','troutmi01', 'cruzne02', 'cruzne02','troutmi01'],
  'yearID': [2016, 2017, 2017, 2016, 2017, 2016],
  'teamID': ['BOS', 'BOS', 'LAA', 'SEA', 'SEA', 'LAA'],
  'HR': [15, 7, 33, 43, 39, 29]})

# A. Sorting by feature
When we deal with a dataset that has many features, it is often useful to sort the dataset. This makes it easier to view the data and spot trends in the values.

In pandas, the sort_values function allows us to sort a DataFrame by one or more of its columns. The first argument is either a column label or a list of column labels to sort by.

The ascending keyword argument allows us to specify whether to sort in ascending or descending order (default is ascending order, i.e. ascending=True).

The code below demonstrates how to use sort_values with a single column label. The first example sorts by 'yearID' in ascending order, while the second sorts 'playerID' in descending lexicographic (alphabetical) order.

In [7]:
# df is predefined
print('{}\n'.format(df))

sort1 = df.sort_values('yearID') # it sorts with 'yeaID' and by default ascending = True 
print('{}\n'.format(sort1))

sort2 = df.sort_values('playerID', ascending=False) # it sorts accroding to 'playerID' and ascending = False so in descending order
print('{}\n'.format(sort2))

    playerID  yearID teamID  HR
0  pedrodu01    2016    BOS  15
1  pedrodu01    2017    BOS   7
2  troutmi01    2017    LAA  33
3   cruzne02    2016    SEA  43
4   cruzne02    2017    SEA  39
5  troutmi01    2016    LAA  29

    playerID  yearID teamID  HR
0  pedrodu01    2016    BOS  15
3   cruzne02    2016    SEA  43
5  troutmi01    2016    LAA  29
1  pedrodu01    2017    BOS   7
2  troutmi01    2017    LAA  33
4   cruzne02    2017    SEA  39

    playerID  yearID teamID  HR
2  troutmi01    2017    LAA  33
5  troutmi01    2016    LAA  29
0  pedrodu01    2016    BOS  15
1  pedrodu01    2017    BOS   7
3   cruzne02    2016    SEA  43
4   cruzne02    2017    SEA  39



When sorting with a list of column labels, each additional label is used to break ties. Specifically, label i in the list acts as a tiebreaker for label i - 1.

The code below demonstrates how to sort with a list of column labels.

In [8]:
# df is predefined
print('{}\n'.format(df))

sort1 = df.sort_values(['yearID', 'playerID']) # first it sort with 'yearID' and if yearID is same then sort by playerID
print('{}\n'.format(sort1))

sort2 = df.sort_values(['yearID', 'HR'],
                       ascending=[True, False]) # same here first it sort with yearID in ascending order then with 'HR' in descending order
print('{}\n'.format(sort2))

    playerID  yearID teamID  HR
0  pedrodu01    2016    BOS  15
1  pedrodu01    2017    BOS   7
2  troutmi01    2017    LAA  33
3   cruzne02    2016    SEA  43
4   cruzne02    2017    SEA  39
5  troutmi01    2016    LAA  29

    playerID  yearID teamID  HR
3   cruzne02    2016    SEA  43
0  pedrodu01    2016    BOS  15
5  troutmi01    2016    LAA  29
4   cruzne02    2017    SEA  39
1  pedrodu01    2017    BOS   7
2  troutmi01    2017    LAA  33

    playerID  yearID teamID  HR
3   cruzne02    2016    SEA  43
5  troutmi01    2016    LAA  29
0  pedrodu01    2016    BOS  15
4   cruzne02    2017    SEA  39
2  troutmi01    2017    LAA  33
1  pedrodu01    2017    BOS   7



When using two column labels to sort, the list's first label represents the main sorting criterion, while the second label is used to break ties. In the example with sorting by 'yearID' and 'playerID', the DataFrame is first sorted by year (in ascending order). For identical years, we sort again by player ID (in ascending order).

For multi-label inputs to sort_values, we are allowed to specify different sorting orders for each column label. In our second example, we specified that 'yearID' would be sorted in ascending order, while 'HR' would be sorted in descending order.