# Probabilistic Analysis and Randomized Algorithms

Src: Chapter 5.1 of Cormen et al. discusses probabilistic analysis and randomized algorithms. 
4th edition of CLRS from MIT press 

Probabilistic analysis is a technique used in computer science to analyze algorithms that involve randomness or uncertainty. It involves using probability theory to calculate the expected behavior of an algorithm over many runs, rather than analyzing the behavior of a single run.

Randomized algorithms, on the other hand, are algorithms that make use of random numbers or random choices to solve problems. They can be used to solve problems that are difficult or impossible to solve deterministically, and they are often more efficient than their deterministic counterparts.

Probabilistic analysis is particularly useful for analyzing randomized algorithms, as it allows us to reason about the expected behavior of an algorithm over many runs. For example, if we run a randomized algorithm 100 times and observe that it gives the correct answer 95 times, we can use probabilistic analysis to calculate the probability that the algorithm will give the correct answer on any given run.



## Hire Assistant problem

Suppose you need to hire a new office assistant but your previous attempts have been unsuccessful. To solve this problem, you decide to use an employment agency that will send you one candidate each day for an interview. You have to pay a small fee to the agency for each interview, and hiring an applicant is even more costly as it requires firing your current office assistant and paying a substantial hiring fee to the agency. Since you are committed to having the best possible person for the job, you have decided that if a candidate is more qualified than your current assistant, you will hire the new candidate and fire the current assistant. Although you are willing to pay the resulting cost, you want to estimate the price of this strategy.

![Hire Assistant Problem](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*lXgGoeOwoKEFjXmXRjXZZQ.png)

Src: https://www.cantorsparadise.com/math-based-decision-making-the-secretary-problem-a30e301d8489

### Hire assistant problem pseudocode

The following pseudocode describes the algorithm for hiring an office assistant using the employment agency. The algorithm takes as input a list of candidates from the employment agency and returns the cost of hiring and firing office assistants.

* Set the current best candidate to None and the current best candidate's score to 0.
* While there are still candidates from the employment agency:

a. Interview the next candidate.
b. If the candidate's score is higher than the current best candidate's score:
i. Fire the current office assistant.
ii. Hire the new candidate.
iii. Set the current best candidate to the new candidate.
c. Otherwise, do not hire the candidate.
Return the cost of hiring and firing office assistants.

Note: The exact scoring system used to evaluate candidates is not specified in the problem statement, so it would need to be defined or assumed in the implementation of the procedure. Additionally, the cost of hiring and firing office assistants is not specified, so that would need to be estimated based on the specific circumstances of the problem.

In [None]:
def score(candidate):
    return candidate # for now we assume the candidate is the score

def hire_assistant(candidates, hire_cost, fire_cost):
    best_candidate = None
    best_score = 0
    total_cost = 0
    
    for candidate in candidates:
        # Score the candidate (replace this with your own scoring function)
        candidate_score = score(candidate)
        
        if candidate_score > best_score:
            # Fire the current office assistant and hire the new candidate
            total_cost += fire_cost + hire_cost
            best_candidate = candidate
            best_score = candidate_score
            
        else:
            # Do not hire the candidate
            total_cost += hire_cost
            
    return total_cost

### Hire assistant problem implementation explanation

Above code takes in a list of candidates, the cost of hiring a new office assistant, and the cost of firing the current office assistant. It then iterates through each candidate and evaluates them using the score() function (which you would need to define or replace with your own scoring function). If the candidate has a higher score than the current best candidate, the code fires the current office assistant, hires the new candidate, and updates the best_candidate and best_score variables. If the candidate has a lower or equal score, the code does not hire them and only adds the hire_cost to the total_cost. Finally, the code returns the total cost of hiring and firing office assistants.

## Online Decision Problem

An online decision problem is a problem where the input is revealed over time and decisions must be made without complete knowledge of the future input. In other words, the algorithm must make decisions without seeing the entire input in advance.

In contrast, an offline decision problem is one where the entire input is known in advance and the algorithm can take as much time as it needs to make a decision.

Online decision problems are common in many areas of computer science, including optimization, game theory, machine learning, and networking. In these problems, the algorithm must make decisions based on incomplete information, and the goal is usually to minimize some measure of cost or maximize some measure of performance.

The Hire Assistant problem is an example of an online decision problem because the candidates are revealed over time, and the algorithm must make a decision after each candidate is evaluated, without knowledge of future candidates. Similarly, other examples of online decision problems include routing packets in a computer network, scheduling tasks on a processor, or bidding in an auction.

## Monte Carlo Simulation

Monte Carlo Method is a computational algorithm that uses random sampling to estimate the solutions to problems in various fields such as physics, engineering, finance, and computer science. It is named after the famous Monte Carlo Casino in Monaco, where games of chance use random numbers to determine the outcome.

The Monte Carlo method typically involves simulating a large number of random samples or scenarios to generate estimates of complex systems or problems that are difficult to solve analytically. These random samples are used to estimate probabilities or expected values of the system or problem under investigation.

For example, in physics, the Monte Carlo method is used to simulate the behavior of particles in a system by generating random positions and velocities for each particle and then computing the resulting behavior of the system. In finance, Monte Carlo simulations are used to estimate the value of financial instruments such as options or bonds, by simulating a large number of possible future scenarios and calculating the expected value of the instrument under each scenario.

The Monte Carlo method can be particularly useful in situations where the problem is too complex to be solved analytically, and there are many sources of randomness or uncertainty involved. However, the accuracy of Monte Carlo simulations depends on the number of samples or scenarios simulated, and in some cases, the method can be computationally expensive.

We can simulate the Hire Assistant problem using the Monte Carlo method, which is a probabilistic algorithm that uses random sampling to obtain numerical results.

Here's how we can use Monte Carlo method to simulate the Hire Assistant problem:

1. Generate a large number of candidate pools, each containing a random permutation of the same set of candidates.
2. For each candidate pool, run the Hire Assistant algorithm on the candidates and record the total cost of hiring and firing assistants.
3. Compute the average cost over all the candidate pools to obtain an estimate of the expected cost.

In [None]:
## Monte Carlo Simulation

import random

def hire_assistant_simulate(candidates, hire_cost, fire_cost, num_simulations):
    total_cost = 0
    n = len(candidates)
    
    for i in range(num_simulations):
        # Generate a random permutation of the candidates
        candidate_pool = random.sample(candidates, n)
        
        # Run the Hire Assistant algorithm on the candidate pool and record the cost
        cost = hire_assistant(candidate_pool, hire_cost, fire_cost)
        total_cost += cost
        
    # Compute the average cost over all the simulations
    average_cost = total_cost / num_simulations
    return average_cost

## Side Story - Calculating Pi via Monte Carlo method

To calculate the value of pi using the Monte Carlo method, we can use a probabilistic approach that involves simulating a large number of random points in a square and calculating the proportion of those points that lie inside a quarter-circle inscribed in the square. The value of pi can then be estimated based on the ratio of the area of the quarter-circle to the area of the square.

Here are the steps to calculate the value of pi using Monte Carlo method:

* Generate a large number of random points within a square with sides of length 2 centered at the origin.
* Count the number of points that lie inside the quarter-circle of radius 1 centered at the origin.
* Estimate the area of the quarter-circle as the proportion of points inside the quarter-circle to the total number of points generated.
* Estimate the value of pi as four times the estimated area of the quarter-circle.

![Circle](https://upload.wikimedia.org/wikipedia/commons/thumb/8/84/Pi_30K.gif/440px-Pi_30K.gif)

In [1]:
import random

def estimate_pi(num_points):
    num_points_in_circle = 0
    for _ in range(num_points):
        x = random.uniform(-1, 1)
        y = random.uniform(-1, 1)
        if x**2 + y**2 <= 1:
            num_points_in_circle += 1
    pi_estimate = 4 * num_points_in_circle / num_points
    return pi_estimate

# This code takes in the number of points to generate,
#  generates random points within a square of length 2 centered at the origin, 
# counts the number of points that lie inside the quarter-circle of radius 1 centered at the origin, 
# estimates the area of the quarter-circle as the proportion of points 
# inside the quarter-circle to the total number of points generated, 
# and estimates the value of pi as four times the estimated area of the quarter-circle.
#  The more points generated, the more accurate theb estimate of pi will be.

In [2]:
estimate_pi(100_000) # not quite pi

3.14212

## Side story: The Monty Hall Problem

![Goat](https://upload.wikimedia.org/wikipedia/commons/thumb/3/3f/Monty_open_door.svg/440px-Monty_open_door.svg.png)

## TODO time allowing or next time

The Monty Hall problem is a famous probability puzzle that is named after the host of the game show "Let's Make a Deal," Monty Hall. The problem is based on a hypothetical game show where a contestant is presented with three doors. Behind one of the doors is a valuable prize, while the other two doors hide goats.

The contestant chooses one of the three doors, but before the chosen door is opened, the host (Monty Hall) opens one of the other two doors to reveal a goat. The contestant is then given the option to stick with their original choice or switch to the other unopened door.

The question is whether the contestant should stick with their original choice or switch to the other door in order to increase their chances of winning the prize. The answer may seem counterintuitive, but switching actually increases the contestant's chances of winning the prize from 1/3 to 2/3. This is because when the contestant first made their choice, they had a 1/3 chance of being correct. When the host opened one of the other doors to reveal a goat, the remaining unopened door had a 2/3 chance of hiding the prize.

### Correct strategy for Monty Hall problem

The correct strategy for the Monty Hall problem is to always switch to the other unopened door. This is because the contestant's initial choice has a 1/3 chance of being correct, and the host's choice of door to open has a 2/3 chance of being incorrect. Therefore, the contestant's chances of winning the prize are 1/3 * 2/3 = 2/3 when they switch doors.

Famously in 1990, the question was discussed in Parade magazine in 1990 when the problem was solved by mathematician Marilyn vos Savant.

Wiki: https://en.wikipedia.org/wiki/Monty_Hall_problem

In [3]:

def monty_hall_simulation(switch):
    doors = ["goat", "goat", "car"]
    random.shuffle(doors)
    chosen_door = random.choice(doors)
    if chosen_door == "car":
        if switch:
            return 0
        else:
            return 1
    else:
        if switch:
            return 1
        else:
            return 0

num_simulations = 100000
switch = True
wins = 0

for i in range(num_simulations):
    wins += monty_hall_simulation(switch)

print(f"Probability of winning with switch: {wins / num_simulations:.2f}")
print(f"Probability of winning without switch: {(num_simulations - wins) / num_simulations:.2f}")

Probability of winning with switch: 0.67
Probability of winning without switch: 0.33


## Optimal Stopping Problem

The Hire Assistant problem is a classic example of an online decision problem in which we need to make a sequence of decisions without having full information about future events. The goal is to hire the best candidate while minimizing the total cost of hiring and firing assistants.

The optimal approach to solving the Hire Assistant problem is to use an algorithm called the "optimal stopping rule." This rule states that we should interview and evaluate the first k candidates, where k is a fixed number, and then hire the first candidate that is better than all the previous candidates. The value of k is determined by the expected number of candidates that we need to interview before finding the best candidate.

The expected number of candidates to be interviewed can be calculated as follows:

* Let n be the total number of candidates provided by the employment agency.
* Let p be the probability that a candidate is better than all the previous candidates.
* The expected number of candidates to be interviewed is given by n/p.
* Therefore, the optimal approach is to interview and evaluate the first k = n/e candidates, where e is the mathematical constant equal to approximately 2.71828. After evaluating the first k candidates, we hire the first candidate that is better than all the previous candidates.

This approach guarantees that we will hire the best candidate with a probability of approximately 1/e, and it minimizes the expected cost of hiring and firing assistants.

Note: he optimal stopping rule is based on mathematical analysis of the problem, and is derived using techniques such as probability theory and calculus. The rule is not guaranteed to always produce the best result, but in the long run, it is the most effective strategy for hiring the best candidate while minimizing the total cost.