# Sorting

In [None]:
import polars as pl

csv_file = 'Titanic.csv'

df = pl.read_csv(csv_file)
df.head(3)

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
i64,i64,i64,str,str,f64,i64,i64,str,f64,str,str
1,0,3,"""Braund, Mr. Ow…","""male""",22.0,1,0,"""A/5 21171""",7.25,,"""S"""
2,1,1,"""Cumings, Mrs. …","""female""",38.0,1,0,"""PC 17599""",71.2833,"""C85""","""C"""
3,1,3,"""Heikkinen, Mis…","""female""",26.0,0,0,"""STON/O2. 31012…",7.925,,"""S"""


### Sorting a DataFrame

We can sort a DataFrame object on a column with the sort method.

In [None]:
# An auxillary step to display the first 2 and last 2 rows when printing a DataFrame

pl.Config.set_tbl_rows(4)

df.sort('Age')

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
i64,i64,i64,str,str,f64,i64,i64,str,f64,str,str
6,0,3,"""Moran, Mr. Jam…","""male""",,0,0,"""330877""",8.4583,,"""Q"""
18,1,2,"""Williams, Mr. …","""male""",,0,0,"""244373""",13.0,,"""S"""
…,…,…,…,…,…,…,…,…,…,…,…
852,0,3,"""Svensson, Mr. …","""male""",74.0,0,0,"""347060""",7.775,,"""S"""
631,1,1,"""Barkworth, Mr.…","""male""",80.0,0,0,"""27042""",30.0,"""A23""","""S"""


By default the null values are at the start of the sort but we can move the nulls to the end with the nulls_last argument to sort.

We can also sort based on multiple columns with a list.

In [None]:
df.sort(['Pclass', 'Age'])

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
i64,i64,i64,str,str,f64,i64,i64,str,f64,str,str
32,1,1,"""Spencer, Mrs. …","""female""",,1,0,"""PC 17569""",146.5208,"""B78""","""C"""
56,1,1,"""Woolner, Mr. H…","""male""",,0,0,"""19947""",35.5,"""C52""","""S"""
…,…,…,…,…,…,…,…,…,…,…,…
117,0,3,"""Connors, Mr. P…","""male""",70.5,0,0,"""370369""",7.75,,"""Q"""
852,0,3,"""Svensson, Mr. …","""male""",74.0,0,0,"""347060""",7.775,,"""S"""


### Sorting a column with an expression

We can transform a column into sorted order with an expression. In the example below we sort all columns individually.

In [None]:
df.select(pl.all().sort())

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
i64,i64,i64,str,str,f64,i64,i64,str,f64,str,str
1,0,1,"""Abbing, Mr. An…","""female""",,0,0,"""110152""",0.0,,
2,0,1,"""Abbott, Mr. Ro…","""female""",,0,0,"""110152""",0.0,,
…,…,…,…,…,…,…,…,…,…,…,…
890,1,3,"""van Billiard, …","""male""",74.0,8,5,"""WE/P 5735""",512.3292,"""G6""","""S"""
891,1,3,"""van Melkebeke,…","""male""",80.0,8,6,"""WE/P 5735""",512.3292,"""T""","""S"""


### Taking advantage of sorted data (Fast Track algorithm)

For some operations polars can use a fast track algorithm if it knows the data in a column is sorted. For example, if we want the max value on a sorted column we just take the last non-null value. You can easily check if Polars thinks a column is sorted with the flags attribute.

In [None]:
df['PassengerId'].flags

{'SORTED_ASC': False, 'SORTED_DESC': False}

Since in this case polars doesn't think the PassengerID column is sorted the fast track algorithm won't be applied.

### Setting the sorted status

If we know that a column is sorted then we can let Polars know using set_sorted method. Alternatively, if we use the sorting operation on a column with the .sort method, the flag will be applied automatically (either ascending or descending order).

In [None]:
# Setting the flafs manually
df = df.with_column(pl.col('PassengerId').set_sorted())

df['PassengerId'].flags

  df = df.with_column(pl.col('PassengerId').set_sorted())


{'SORTED_ASC': True, 'SORTED_DESC': False}

In [None]:
# Automated flag ascending

df = pl.read_csv(csv_file).sort('PassengerId')
df['PassengerId'].flags

{'SORTED_ASC': True, 'SORTED_DESC': False}

In [None]:
# Automated flag descending

df = pl.read_csv(csv_file).sort('PassengerId', descending = True)
df['PassengerId'].flags

{'SORTED_ASC': False, 'SORTED_DESC': True}