##### Algorithms and Data Structures (Winter - Spring 2022)

* [Table of Contents](ADS_TOC.ipynb)
* [Colab view](https://colab.research.google.com/github/4dsolutions/elite_school/blob/master/ADS_sandbox_7.ipynb)
* [nbviewer view](https://nbviewer.org/github/4dsolutions/elite_school/blob/master/ADS_sandbox_7.ipynb)

*Lets talk more about...*
# Sorting

One of the more tedious tasks, bordering on the undoable by humans, once the dataset gets large, is sorting the rows of the dataset by column.  

Which field should be in ascending order?  

Is it used as a lookup field, as in a human language dictionary.  

Alphabetical, numeric, and alphnumeric sorting rules help with lookup i.e. record retreival.

At the basis of our ability to work with data at scale, is our ability to sort it.

In [86]:
YouTubeVideo("FNAUuYmkMPE")

In [87]:
YouTubeVideo("ujb2CIWE8zY")

In [90]:
YouTubeVideo("DSMCZZGbZo4")

In [89]:
YouTubeVideo("8MsTNqK3o_w")

In [46]:
# https://youtu.be/RqQBh_Wbcu4  to embed

A graph can be represented using 3 data structures: 

* adjacency matrix
* adjacency list
* adjacency set

In [47]:
import numpy as np
import pandas as pd

How every symbol ranks in order, top to bottom, has everything to do with its codec, meaning how it encodes and decodes these symbols as binary objects.

In [91]:
"." < "?"

True

In [92]:
"&" > "%"

True

In [95]:
sorted(["p", "&", "#", "P", "!", "]", "{"])

['!', '#', '&', 'P', ']', 'p', '{']

In [104]:
chrs_df = pd.DataFrame({"Char":[chr(48 + i) 
                      for i in range(0, 20)]
             },
              index = range(48, 48 + 20))

In [105]:
chrs_df.index.name = "unicode"
chrs_df

Unnamed: 0_level_0,Char
unicode,Unnamed: 1_level_1
48,0
49,1
50,2
51,3
52,4
53,5
54,6
55,7
56,8
57,9


In [110]:
chrs_df_random = chrs_df.sample(frac=1)
chrs_df_random 

Unnamed: 0_level_0,Char
unicode,Unnamed: 1_level_1
65,A
59,;
62,>
53,5
58,:
63,?
51,3
49,1
57,9
56,8


In [112]:
chrs_df_random.sort_values(by="Char")

Unnamed: 0_level_0,Char
unicode,Unnamed: 1_level_1
48,0
49,1
50,2
51,3
52,4
53,5
54,6
55,7
56,8
57,9


 # QuickSort
 
 Using Python, we would like to sort a list.  Make it go from least to most.
 
 For best Python practices, see PEP 8.

In [4]:
def quick_sort(seq):
    """
    find a pivot point, any elem of seq (e.g. last)
    put all items < pivot in left, all items > pivot in right
    call quicksort on each
    """
    if len(seq) <= 1:
        return seq
    else:
        pivot = seq[-1]
        left  = [elem for elem in seq if elem < pivot]
        right = [elem for elem in seq if elem > pivot]

        return quick_sort(left) + [pivot] + quick_sort(right)

In [2]:
test_object = [37, 61, 32, 12, 14, 2, 3, 23]

In [5]:
quick_sort(test_object)

[2, 3, 12, 14, 23, 32, 37, 61]

In [12]:
from random import randint

In [13]:
def test_factory(n, s_min=0, s_max=99):
    return [randint(s_min, s_max) for _ in range(n)]

In [19]:
new_test = test_factory(200)

In [20]:
print(new_test)

[69, 57, 39, 55, 29, 17, 75, 68, 60, 33, 72, 58, 54, 73, 18, 13, 9, 91, 1, 6, 6, 76, 74, 39, 48, 21, 25, 45, 21, 69, 75, 85, 76, 79, 37, 79, 30, 27, 71, 29, 9, 8, 74, 66, 63, 29, 89, 89, 68, 99, 10, 18, 54, 11, 24, 24, 45, 33, 65, 41, 54, 75, 44, 81, 90, 21, 36, 96, 17, 12, 46, 54, 98, 61, 49, 78, 24, 43, 39, 77, 81, 45, 49, 96, 69, 11, 54, 53, 64, 24, 84, 89, 3, 41, 53, 38, 95, 86, 24, 57, 10, 97, 90, 67, 7, 62, 63, 74, 47, 18, 64, 99, 1, 84, 78, 57, 55, 1, 89, 62, 65, 68, 7, 9, 9, 88, 17, 99, 83, 93, 49, 70, 48, 71, 58, 59, 78, 50, 43, 37, 72, 70, 72, 73, 92, 64, 24, 59, 76, 22, 45, 29, 31, 31, 49, 22, 66, 84, 1, 99, 4, 24, 51, 75, 12, 63, 20, 43, 37, 53, 42, 65, 90, 38, 49, 61, 59, 51, 97, 66, 10, 55, 92, 77, 24, 17, 37, 42, 43, 96, 73, 27, 43, 62, 21, 44, 33, 12, 16, 71]


In [26]:
%timeit sorted_result = quick_sort(new_test)

531 µs ± 52 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [28]:
print(sorted_result)

[1, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 20, 21, 22, 24, 25, 27, 29, 30, 31, 33, 36, 37, 38, 39, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 53, 54, 55, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 81, 83, 84, 85, 86, 88, 89, 90, 91, 92, 93, 95, 96, 97, 98, 99]


In [29]:
hard_test = test_factory(200, -100, 100)

In [31]:
print(hard_test)

[-24, 22, -48, 11, -19, 32, 46, -82, -71, -4, -22, -7, -25, 6, -39, -67, -31, 71, 44, 18, 6, 51, -62, -31, -4, 15, 3, -16, 56, 91, -10, 73, -78, 82, -98, 56, -53, 17, -62, 1, -72, -28, -18, 64, 16, 99, -84, 8, -89, -18, -21, 36, 22, -73, 1, 91, 75, -35, -91, -13, -74, -93, 97, 49, 35, 75, -44, 39, 15, 49, -98, 14, 36, -77, -26, -50, 75, 49, 49, 38, 54, -31, -19, 90, 57, 33, -97, 86, -91, -25, -80, 23, 75, 56, 56, 29, 83, -13, -82, 99, -44, 31, 14, 62, 96, -87, 32, 43, 19, 8, -67, 97, -83, 80, 57, 66, 5, 27, 8, -90, 62, -39, 28, 24, -84, -16, -94, 84, -2, 84, -74, -28, 38, 39, -34, -19, -28, 82, 99, 38, 22, 40, -47, -18, -36, 53, -62, -92, 7, 94, -31, 33, -36, -88, 33, 21, 82, 87, -56, -64, -86, 84, -100, -90, 13, -53, -31, 33, -91, -60, -73, 94, 54, 97, 51, 74, 44, -51, -99, 67, -83, 29, -30, 91, -41, 60, 92, 58, -52, -47, 25, 77, 48, -77, -15, -64, 9, -26, 49, -62]


In [33]:
print(quick_sort(hard_test))

[-100, -99, -98, -97, -94, -93, -92, -91, -90, -89, -88, -87, -86, -84, -83, -82, -80, -78, -77, -74, -73, -72, -71, -67, -64, -62, -60, -56, -53, -52, -51, -50, -48, -47, -44, -41, -39, -36, -35, -34, -31, -30, -28, -26, -25, -24, -22, -21, -19, -18, -16, -15, -13, -10, -7, -4, -2, 1, 3, 5, 6, 7, 8, 9, 11, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 27, 28, 29, 31, 32, 33, 35, 36, 38, 39, 40, 43, 44, 46, 48, 49, 51, 53, 54, 56, 57, 58, 60, 62, 64, 66, 67, 71, 73, 74, 75, 77, 80, 82, 83, 84, 86, 87, 90, 91, 92, 94, 96, 97, 99]
