For this week's assignment we will leverage the sorting methods in pandas to sort several columns at once. 

In [None]:
import pandas as pd
Location = "gradedata.csv"
df = pd.read_csv(Location)
df.head()

Applying the method is easy enough, but it might be interesting to peak under the hood of the method to see what different parameters we can pass to fit our potential use cases..

In [37]:
help(df.sort_values)

Help on method sort_values in module pandas.core.frame:

sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False) method of pandas.core.frame.DataFrame instance
    Sort by the values along either axis.
    
    Parameters
    ----------
            by : str or list of str
                Name or list of names to sort by.
    
                - if `axis` is 0 or `'index'` then `by` may contain index
                  levels and/or column labels.
                - if `axis` is 1 or `'columns'` then `by` may contain column
                  levels and/or index labels.
    
                .. versionchanged:: 0.23.0
    
                   Allow specifying index or column level names.
    axis : {0 or 'index', 1 or 'columns'}, default 0
         Axis to be sorted.
    ascending : bool or list of bool, default True
         Sort ascending vs. descending. Specify list for multiple sort
         orders.  If this is a list of bools, must

Notice how we can pass a parameter called 'kind' which allows us to select a sorting algorithm. By default, the method uses quicksort, but we should consider cases where it might be helpful to try other sorting algorithms based on time complexity. 

In [13]:
import timeit # for timining algorithms 

In [45]:
%time df.sort_values(by=['fname', 'lname','age', 'grade'], kind = 'mergesort')

Wall time: 26.8 ms


Unnamed: 0,fname,lname,gender,age,exercise,hours,grade,address
435,Aaron,Knowles,male,16,4,11,88.7,"596 Warren Drive, Brunswick, GA 31525"
721,Aaron,Salas,male,19,4,10,86.7,"642 Kirkland Rd., Eden Prairie, MN 55347"
817,Abbot,Daugherty,male,17,1,9,82.3,"348 Glenridge Ave., Plainfield, NJ 07060"
340,Abbot,Hall,male,16,4,3,58.9,"84 Rock Creek Lane, Durham, NC 27703"
1901,Abbot,Kinney,male,16,4,18,87.5,"643 Wakehurst St., Norman, OK 73072"
...,...,...,...,...,...,...,...,...
652,Zia,Tyson,female,14,3,15,88.2,"561 N. High St., Buckeye, AZ 85326"
1915,Zoe,Collins,female,16,2,15,83.7,"235 Mechanic St., Amsterdam, NY 12010"
1247,Zorita,Ashley,female,19,1,13,80.4,"362 Jennings Street, Natchez, MS 39120"
1447,Zorita,Benson,female,19,3,12,85.0,"64 Lower River Ave., Shepherdsville, KY 40165"


In [44]:
%time df.sort_values(by=['fname', 'lname','age', 'grade'], kind = 'heapsort')

Wall time: 21.9 ms


Unnamed: 0,fname,lname,gender,age,exercise,hours,grade,address
435,Aaron,Knowles,male,16,4,11,88.7,"596 Warren Drive, Brunswick, GA 31525"
721,Aaron,Salas,male,19,4,10,86.7,"642 Kirkland Rd., Eden Prairie, MN 55347"
817,Abbot,Daugherty,male,17,1,9,82.3,"348 Glenridge Ave., Plainfield, NJ 07060"
340,Abbot,Hall,male,16,4,3,58.9,"84 Rock Creek Lane, Durham, NC 27703"
1901,Abbot,Kinney,male,16,4,18,87.5,"643 Wakehurst St., Norman, OK 73072"
...,...,...,...,...,...,...,...,...
652,Zia,Tyson,female,14,3,15,88.2,"561 N. High St., Buckeye, AZ 85326"
1915,Zoe,Collins,female,16,2,15,83.7,"235 Mechanic St., Amsterdam, NY 12010"
1247,Zorita,Ashley,female,19,1,13,80.4,"362 Jennings Street, Natchez, MS 39120"
1447,Zorita,Benson,female,19,3,12,85.0,"64 Lower River Ave., Shepherdsville, KY 40165"


In [46]:
%time df.sort_values(by=['fname', 'lname','age', 'grade'], kind = 'quicksort')

Wall time: 23.1 ms


Unnamed: 0,fname,lname,gender,age,exercise,hours,grade,address
435,Aaron,Knowles,male,16,4,11,88.7,"596 Warren Drive, Brunswick, GA 31525"
721,Aaron,Salas,male,19,4,10,86.7,"642 Kirkland Rd., Eden Prairie, MN 55347"
817,Abbot,Daugherty,male,17,1,9,82.3,"348 Glenridge Ave., Plainfield, NJ 07060"
340,Abbot,Hall,male,16,4,3,58.9,"84 Rock Creek Lane, Durham, NC 27703"
1901,Abbot,Kinney,male,16,4,18,87.5,"643 Wakehurst St., Norman, OK 73072"
...,...,...,...,...,...,...,...,...
652,Zia,Tyson,female,14,3,15,88.2,"561 N. High St., Buckeye, AZ 85326"
1915,Zoe,Collins,female,16,2,15,83.7,"235 Mechanic St., Amsterdam, NY 12010"
1247,Zorita,Ashley,female,19,1,13,80.4,"362 Jennings Street, Natchez, MS 39120"
1447,Zorita,Benson,female,19,3,12,85.0,"64 Lower River Ave., Shepherdsville, KY 40165"


A surprise indeed!For this data, quicksort does not have the fastest execution time. These algorithms are often optimized for the datatypes inside the array, so we should ask ourselves whether the datypes in the columns are modeled in a way which optimizes querying, especially as the size of our dataset grows. In our sort statement, we combine 4 columns of two total different different datatypes. 