# Sorting

### Emoji keys:

🗒 background notes <br/>
➡️ next progression exploring this topic <br/>
💡 key point, or insight <br/>
🎓 points for further investigation
 

🗒 Many companies (like Google, Facebook and Airbnb) use algorithm-based interview questions. This demonstrates the candidate's ability to break down a problem into smaller problems, and work towards a solution.

In real life, many business systems don't use these algorithms. I know, because I've been working in that domain for the past 30 years. Perhaps if you work in investment banking you'll see more numerical optimisation algorithms. 

But otherwise it's often about applying business rules to data. 

Let's assume your system handles customers, and those customers have addresses. 

You store all of their addresses. (Actually, you probably shouldn't do that under the GDPR principle of storing data for no longer than is necessary and reasonable. But let's assume you can justify that point.) 

Perhaps the customer has different types of address, and only one of those types can be active at a time. 

So you want to send the customer a letter; you need to find their current active correspondence address. As there probably aren't too many addresses for that customer you could load them all and scan the results for addressType='correspondence' and active=true.

If, however, you have millions of customers and you want to retrieve the ones who live in Birmingham and have placed order in the past year, that could be a larger data set. A larger data set can take longer to process and so we may need a different approach to ensure we return results quickly enough for our users.

This is why it's useful to know about sorting and searching algorithms. There are many resources where you can read about them. The original classic work was Donald Knuth's [Art of Computer Programming](https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming), volume 3.

➡️ Let's generate a collection of random numbers. 

🗒 JavaScript generates pseudo-random numbers. If you want truly random numbers numbers, you may need to use [hardware](https://en.wikipedia.org/wiki/Hardware_random_number_generator) that incorporates a physical process.

➡️ Here is a reusable function to generate random numbers. This function takes a argument specifying the maximum random number value we want, and returns another function. The inner function takes a parameter specifying how many numbers we require.

In [18]:
function randomNumbers(maxValue) {
    return function(quantity) {
        var seen = {};
        var numbers = [];
        for(let i=0; i < quantity; i++) {
            numbers.push(Math.floor(Math.random() * maxValue))
        }     
        return numbers;
    }
}

In [23]:
var randomMax100 = randomNumbers(100)

💡 Top-level varaiables are defined with _var_ so that we can re-declare them in the notebook if we need to. _let_ and _const_ do not allow variables to be declared more than once.

In [26]:
var numbers = randomMax100(20)

In [27]:
numbers

[
  14, 33, 15, 51, 36, 64, 75,
  30, 55, 74, 16, 39, 42, 55,
  52, 66, 62,  1, 49,  6
]

➡️ To illustrate how much work the sorting function has to do, let's create a function that shows how far out of position the unsorted numbers are relative to their sorted position.

In [33]:
function displacement(numbers) {
    /*
     * Sort the numbers to determine their eventual position. 
     * Then scan the unsorted numbers and save how many positions each number would need to move.
     */
    let sorted = [...numbers];
    const sortAscending = function(a,b) { return a-b; }
    sorted.sort(sortAscending);
    let sortedIndexes = {};
    for(let i=0; i < sorted.length; i++) {
        sortedIndexes[sorted[i]] = i;
    }
    // Now compare against the unsorted numbers
    let result = {};
    for(let i=0; i < numbers.length; i++) {
        // What is the index of this number in the sorted array?
        const sortedIndex = sortedIndexes[numbers[i]];
        
        // Use abs because we only want to know the number of positions, not "up" or "down"
        const unsortedIndex = i;
        const offset = Math.abs(sortedIndex - unsortedIndex);
        
        if (result[offset]) {
            result[offset]++;
        } else {
            result[offset] = 1;
        }
    }
    return result;
}

In [34]:
displacement(numbers)

{
  '1': 3,
  '2': 4,
  '3': 3,
  '5': 1,
  '6': 2,
  '8': 2,
  '9': 1,
  '11': 1,
  '13': 1,
  '17': 1,
  '18': 1
}

💡 Key is how far the number is out of position, and value is how many of the original numbers have to move that far. 

💡 Note that JavaScript maps convert the keys to strings.

💡 These numbers aren't perfect because we might have the same number repeated, which could go in any order relative to each other; don't read too much into these numbers ... they are simply there to illustrate that the sorting algorithm has some work to do.

The simplest and least efficient sorting algorithm is [Bubble Sort](https://en.wikipedia.org/wiki/Bubble_sort)
                                                                    
Here is the pseudocode:

```
procedure bubbleSort(A : list of sortable items)
    n := length(A)
    repeat
        swapped := false
        for i := 1 to n-1 inclusive do
            /* if this pair is out of order */
            if A[i-1] > A[i] then
                /* swap them and remember something changed */
                swap(A[i-1], A[i])
                swapped := true
            end if
        end for
    until not swapped
end procedure
```

In [1]:
function bubbleSort(numbers) {
    const result = [...numbers]; // Immutable function, result is sorted but don't modify the input
    let swapped;
    do {
        swapped = false
        for(let i=1; i < result.length; i++) {
            if (result[i-1] > result[i]) {
                let temp = result[i];
                result[i] = result[i-1];
                result[i-1] = temp;
                swapped = true;
            }
        }
        console.log(result);
        
    } while (swapped)
    return result;    
}

💡 Bubble Sort works but repeatedly re-ordering pairs of numbers again and again until eventually all numbers are in order. This has the effect of moving the next-largest number to its correct position in the array after we've processed all pairs.

In [36]:
bubbleSort(numbers)

[
  14, 15, 33, 36, 51, 64, 30,
  55, 74, 16, 39, 42, 55, 52,
  66, 62,  1, 49,  6, 75
]
[
  14, 15, 33, 36, 51, 30, 55,
  64, 16, 39, 42, 55, 52, 66,
  62,  1, 49,  6, 74, 75
]
[
  14, 15, 33, 36, 30, 51, 55,
  16, 39, 42, 55, 52, 64, 62,
   1, 49,  6, 66, 74, 75
]
[
  14, 15, 33, 30, 36, 51, 16,
  39, 42, 55, 52, 55, 62,  1,
  49,  6, 64, 66, 74, 75
]
[
  14, 15, 30, 33, 36, 16, 39,
  42, 51, 52, 55, 55,  1, 49,
   6, 62, 64, 66, 74, 75
]
[
  14, 15, 30, 33, 16, 36, 39,
  42, 51, 52, 55,  1, 49,  6,
  55, 62, 64, 66, 74, 75
]
[
  14, 15, 30, 16, 33, 36, 39,
  42, 51, 52,  1, 49,  6, 55,
  55, 62, 64, 66, 74, 75
]
[
  14, 15, 16, 30, 33, 36, 39,
  42, 51,  1, 49,  6, 52, 55,
  55, 62, 64, 66, 74, 75
]
[
  14, 15, 16, 30, 33, 36, 39,
  42,  1, 49,  6, 51, 52, 55,
  55, 62, 64, 66, 74, 75
]
[
  14, 15, 16, 30, 33, 36, 39,
   1, 42,  6, 49, 51, 52, 55,
  55, 62, 64, 66, 74, 75
]
[
  14, 15, 16, 30, 33, 36,  1,
  39,  6, 42, 49, 51, 52, 55,
  55, 62, 64, 66, 74, 75
]
[
  14, 15, 16, 30, 3

[
   1,  6, 14, 15, 16, 30, 33,
  36, 39, 42, 49, 51, 52, 55,
  55, 62, 64, 66, 74, 75
]

➡️ Let's add in a timing function

In [10]:
function time(f) {
    let start = (new Date()).getTime();
    let result = f();
    let end = (new Date()).getTime();
    return {result, milliSeconds: end-start};
}

In [15]:
var quantity = 50000;
var maxValue = 10000000;
var random50000 = randomNumbers(maxValue)(quantity);
time(() => bubbleSort(random50000));

{
  result: [
       52,   728,   752,   979,  1201,  1669,  1707,  1801,  1990,
     2119,  2202,  2221,  2300,  3003,  3063,  3407,  3486,  3496,
     3582,  3681,  3901,  3944,  3970,  4413,  4479,  4520,  4787,
     5405,  5497,  5670,  5685,  5735,  5913,  5994,  6066,  6176,
     6547,  6928,  7251,  7560,  7635,  7847,  7918,  8728,  8890,
     9356,  9491,  9755, 10032, 10082, 10284, 10462, 10631, 10954,
    10959, 10966, 11210, 11867, 12328, 12415, 12832, 13003, 13179,
    13237, 13249, 13694, 14105, 14351, 14825, 14938, 14946, 15078,
    15262, 16269, 16966, 17109, 17299, 17681, 17845, 18154, 18362,
    18908, 18915, 18930, 19344, 19866, 20116, 20169, 20170, 20379,
    20543, 20942, 20945, 21182, 21292, 21294, 21361, 21616, 21633,
    21871,
    ... 49900 more items
  ],
  milliSeconds: 6474
}

➡️ Compare with the built-in JavaScript function

In [16]:
var unsorted = [...random50000]
time(() => unsorted.sort((a,b) => a-b))

{
  result: [
       52,   728,   752,   979,  1201,  1669,  1707,  1801,  1990,
     2119,  2202,  2221,  2300,  3003,  3063,  3407,  3486,  3496,
     3582,  3681,  3901,  3944,  3970,  4413,  4479,  4520,  4787,
     5405,  5497,  5670,  5685,  5735,  5913,  5994,  6066,  6176,
     6547,  6928,  7251,  7560,  7635,  7847,  7918,  8728,  8890,
     9356,  9491,  9755, 10032, 10082, 10284, 10462, 10631, 10954,
    10959, 10966, 11210, 11867, 12328, 12415, 12832, 13003, 13179,
    13237, 13249, 13694, 14105, 14351, 14825, 14938, 14946, 15078,
    15262, 16269, 16966, 17109, 17299, 17681, 17845, 18154, 18362,
    18908, 18915, 18930, 19344, 19866, 20116, 20169, 20170, 20379,
    20543, 20942, 20945, 21182, 21292, 21294, 21361, 21616, 21633,
    21871,
    ... 49900 more items
  ],
  milliSeconds: 24
}

So you can see that Bubble Sort is by no means the most efficient sorting algorithm. If there is a library function available in your language that does what you need, then you should seriously consider using it, unless there are good reasons not to.

🎓 Which other sorting algorithms have similar time complexity to Bubble Sort? [hint](https://www.bigocheatsheet.com/). Which ones are better? Which algorithm does Node.js use?