# Chapter 5 - Probabilistic Analysis and Randomized Algorithms

## The hiring problem

As a twist on the sorting problem, we will now look at a different kind of cost analysis that the authors call the 'hiring problem'. Imagine an open position and a list of candidates. You must interview each candidate, and replace the person you have hired if they are a better candidate. Interviewing a candidate has a small cost, and hiring them has a large cost.

To put this more computerish terms, we want to iterate through an array, at each step storing the best candidate and . There is a small cost for comparing a value in the array, and a high cost for storing it. We might write this in pseudocode as:

```
initialize current candidate
for candidate in list
  interview candidate
  if candidate is better than current candidate
    replace current candidate
```

In the best case, the cost of the whole procedure will be the interviewing cost times the number of candidates, plus the hiring cost times the number of hires. In the best case, one in which you interview the best candidate first, the total cost will be 

cost<sub>hiring</sub> `+` number<sub>candidates</sub> `x` cost<sub>interview</sub> 

In [13]:
best_case = [10, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In the worst case scenario, every candidate is better than the last, and the total cost will be 

cost<sub>hiring `+` interviewing</sub> `x` number<sub>candidates</sub>.

In [14]:
worst_case = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

If we decide that interviewing costs `1`, and hiring costs `5`, this might look like:

In [15]:
def hiring_process(array, cost_interview, cost_hire):
    current_candidate = -1
    cost = 0
    for candidate in array:
        cost += cost_interview
        if candidate > current_candidate:
            current_candidate = candidate
            cost += cost_hire
    return cost

hiring_process(worst_case, 1, 5), hiring_process(best_case, 1, 5)

(60, 15)

## Indicator random variables

If we are using a hiring agency to facilitate this process, they might present us with a list of candidates approximating our worst-case scenario in order to maximize their own profits. In that case, we can reduce the amount we'll end up paying by randomizing the order of applicants.

To see why this would help, recall that the difference in cost only depends on differences in the number of hiring events. Our expected number of hires in the worst case is equal to the total number of candidates. In randomized order, the expected number of hires drops. The probability that we'll hire the first candidate is always 1. The probability that the next candidate is better than the first (and thus hirable) is 1/2, and the probability that the third candidate is better than the first two (and thus hirable) is 1/3, ...

We can generalize this by saying that the expected number of hires in a list of length n is the sum from i=1 to n of 1/i, which is equivalent to the natural log. Or, in python:

In [32]:
from math import log, e

def expected_hires(array):
    return log(len(array), e)

The difference in the expected number of hires then, and therefore the cost, is:

In [8]:
len(worst_case) - expected_hires(worst_case)

7.697414907005954

## Randomized algorithms

It may help us, then, to randomize the order of our candidates before hiring. Assuming that we already have a random number generator like `numpy.random`, we can randomize the order (or permute) by generating random values, and sorting on those random values.

A faster strategy is to randomize the array in place, by swapping each element of the array with a randomly selected element.

In [9]:
from numpy import random

def randomize_in_place(array):
    n = len(array)
    iterating_ix = 0
    while iterating_ix < n:
        iterating_value = array[iterating_ix]
        random_ix = random.randint(iterating_ix, n)
        random_value = array[random_ix]
        array[iterating_ix] = random_value
        array[random_ix] = iterating_value
        iterating_ix += 1
    return array

As a sanity check, we can see how this handles an array of ordered integers:

In [10]:
randomize_in_place([1,2,3,4,5])

[4, 1, 3, 5, 2]

This can be used to get us the best applicant in ln(n) hires:

In [27]:
hiring_process(randomize_in_place(worst_case), 0, 1)

3

However, hiring can be very expensive, so it might be better to hire only one candidate, even if they are not the best possible candidate. We can hire once and get the best candidate out of `n` candidates, without interviewing all of them, with probability `1/e`, or about 40%, by employing the following strategy: 

We select the first `n/e` candidates, and interview them without hiring any of them. Then, we hire the first candidate afterwards with a higher rank than the ones we have just rejected.

In Python, this might look like:

In [43]:
def online_hiring_process(array):
    current_candidate = -1
    ix = 0
    switch_ix = round(len(array)/e)
    while ix <= switch_ix:
        if array[ix] > current_candidate:
            current_candidate = array[ix]
        ix += 1
    while ix < len(array):
        if array[ix] > current_candidate:
            percent_interviewed = ix / len(array) * 100
            return """
            You interviewed {} percent of candidates, and hired {}
            """.format(percent_interviewed, array[ix])
        else:
            ix += 1

As a sanity check, we can try this on a few randomized arrays where we know that max value:

In [49]:
for i in range(0,10):
    print(online_hiring_process(randomize_in_place(list(range(0,100)))))


            You interviewed 53.0 percent of candidates, and hired 99
            
None
None

            You interviewed 41.0 percent of candidates, and hired 94
            

            You interviewed 90.0 percent of candidates, and hired 99
            

            You interviewed 72.0 percent of candidates, and hired 99
            
None
None
None

            You interviewed 63.0 percent of candidates, and hired 99
            


The advantages here are that this algorithm is fast, and in particular has reduced your hiring cost to the lowest possible level (while still hiring a candidate).

The obvious disadvantage is that occassionally your best candidate is in the first `n/e`, and you hire no one.