## Project 3 Report - Scheduling and Decision Analysis with Uncertainty

For the final project, we're going to combine concepts from Lesson 7 (Constraint Programming), Lesson 8 (Simulation), and Lesson 9 (Decision Analysis). We'll do this by revisiting the scheduling problem from Lesson 7. But, we're going to make it a little more true-to-life by acknowledging some of the uncertainty in our estimates, and using simulation to help us come up with better estimates. We'll use our estimated profits to construct a payoff table and make a decision about how to proceed with the building project.

When we originally created the problem, we used the following estimates for time that each task would take:

<img src='images/reliable_table.png' width="450"/>

But based on past experience, we know that these are just the most likely estimates of the time needed for each task. Here's our estimated ranges of values (in days instead of weeks) for each task:

<img src='images/reliable-estimate-ranges.png' width="450"/>

Further, we're going to consider the following factors:

* The base amount that Reliable will earn is \$5.4 million.
* If Reliable completes the project in 280 days or less, they will get a bonus of \$150,000.
* If Reliable misses the deadline of 329 days, there will be a \$25,000 penalty for each day over 329.

### **P3.1** - Simulation

Create a simulation that uses a triangular distribution to estimate the duration for each of the activities. Use the Optimistic Estimate, Most Likely Estimate, and Pessimistic Estimate for the 3 parameters of your triangular distribution.   Use CP-SAT to find the minimal schedule length in each iteration.  Track the total days each simulation takes and the profit for the company.

Put your simulation code in the cell below.  Use at least 1000 iterations.  Check your simulation results to make sure the tasks are being executed in the correct order!

<font color = "blue"> *** 8 points -  answer in cell below *** (don't delete this cell) </font>

In [1]:
import numpy as np
import pandas as pd
from ortools.sat.python import cp_model
from IPython.display import Markdown as md
from bokeh.io import show, output_notebook
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure

In [2]:
# input
estimates = {
    'excavate': [7,14,21],
    'foundation': [14,21,56],
    'rough wall': [42,63,126],
    'roof': [28,35,70],
    'exterior plumbing': [7,28,35],
    'interior plumbing': [28,35,70],
    'exterior siding': [35,42,77],
    'exterior painting': [35,56,119],
    'electrical': [21,49,63],
    'wallboard': [21,63,63],
    'flooring': [21,28,28],
    'interior painting': [7,35,49],
    'exterior fixtures': [7,14,21],
    'interior fixtures': [35,35,63],
}

precedence_dict = {
    'excavate': ['foundation'],
    'foundation': ['rough wall'],
    'rough wall': ['roof', 'exterior plumbing', 'electrical'],
    'roof': ['exterior siding'],
    'exterior plumbing': ['interior plumbing', 'exterior painting'],
    'interior plumbing': ['wallboard'],
    'exterior siding': ['exterior painting'],
    'exterior painting': ['exterior fixtures'],
    'electrical': ['wallboard'],
    'wallboard': ['flooring', 'interior painting'],
    'flooring': ['interior fixtures'],
    'interior painting': ['interior fixtures']
}
task_names = list(estimates.keys())
num_tasks = len(task_names)
task_name_to_number_dict = dict(zip(task_names, np.arange(0, num_tasks)))
base_profit = 5.4
bonus = .15
penalty = .025
extra_cost_flag = False


# returns a random integer from a triangle distribution
def triangle(estimate):
    return int(round(np.random.triangular(left=estimate[0], mode=estimate[1], right=estimate[2]),0))

# calculates profit
def calcProfit(duration):
    # calculate profit
    if duration <= 280:
        profit = base_profit + bonus
    elif duration > 329:
        days_past = duration - 329
        profit = base_profit - days_past * penalty
    else:
        profit = base_profit
    if extra_cost_flag:
        extra_cost = float(np.random.exponential(scale=.1, size=1))
    else:
        extra_cost = 0
    return profit+extra_cost, extra_cost
    
# calculates minimum viable path for tasks num_sims times
def simEstimates(num_sims):
    # track project durations, profit, days past deadline
    project_durations, profit, extra_costs = [], [], []
    days_past, project_profit, proj_extra_cost = 0, 0, 0
    
    for i in range(num_sims):
        durations = [triangle(estimates[estimate]) for estimate in estimates]
        horizon = sum(durations)

        model = cp_model.CpModel()

        start_vars = [model.NewIntVar(0, horizon, name=f'start_{t}') for t in task_names]
        end_vars = [model.NewIntVar(0, horizon, name=f'end_{t}') for t in task_names]

        # start + duration = end
        intervals = [
            model.NewIntervalVar(start_vars[i],
                                 durations[i],
                                 end_vars[i],
                                 name=f'interval_{task_names[i]}')
            for i in range(num_tasks)
        ]

        # precedence constraints
        for before in list(precedence_dict.keys()):
            for after in precedence_dict[before]:
                before_index = task_name_to_number_dict[before]
                after_index = task_name_to_number_dict[after]
                model.Add(end_vars[before_index] <= start_vars[after_index])

        obj_var = model.NewIntVar(0, horizon, 'largest_end_time')
        model.AddMaxEquality(obj_var, end_vars)
        model.Minimize(obj_var)

        # solve
        solver = cp_model.CpSolver()
        status = solver.Solve(model)
        
        # capture duration
        project_duration = solver.ObjectiveValue()
        project_durations.append(project_duration)
        
        # capture profit and extra costs
        project_profit, project_extra_cost = calcProfit(project_duration)
        profit.append(project_profit)
        extra_costs.append(project_extra_cost)
    
        # Check your simulation results to make sure the tasks are being executed in the correct order!
#         print(f'\nOptimal Schedule Length: {solver.ObjectiveValue()}')
#         for i in range(num_tasks):
#             print(f'{task_names[i]} start at {solver.Value(start_vars[i])} and end at {solver.Value(end_vars[i])}')
#         print(f"Profit: {project_profit}")
    
    return project_durations, profit, extra_costs

proj_durations, profit, extra_costs = simEstimates(1000)



What is the probability that Reliable Company will finish the bid in 280 days or fewer, more than 280 and 329 days or fewer, or more than 329 days? What is their average profit?

Include code to answer these questions with output below:

<font color = "blue"> *** 2 points -  answer in cell below *** (don't delete this cell) </font>

In [3]:
avg_profit = sum(profit)/len(profit)
avg_extra_costs = sum(extra_costs)/len(extra_costs)
prob_bonus = sum(1 for i in proj_durations if i<=280)/len(proj_durations)
prob_on_time = sum(1 for i in proj_durations if 281<=i<=329)/len(proj_durations)
prob_late = sum(1 for i in proj_durations if i>329)/len(proj_durations)
print(f"Probability project complete in 280 days or less: {prob_bonus}")
print(f"Probability project complete more than 280 and 329 days or fewer: {prob_on_time}")
print(f"Probability project complete greater than 329 days: {prob_late}")
print(f"Average profit: ${avg_profit:0.3f} million.")
print(f"Average extra costs: ${avg_extra_costs:0.3f} million.")

Probability project complete in 280 days or less: 0.07
Probability project complete more than 280 and 329 days or fewer: 0.578
Probability project complete greater than 329 days: 0.352
Average profit: $5.241 million.
Average extra costs: $0.000 million.


### **P3.2** - Add Random Cost
From past experience, we know that special artifacts are sometimes found in the area where Reliable Construction is planning this building project.  When special artifacts are found, the excavation phase takes considerably longer and the entire project costs more - sometimes much more. They're never quite sure how much longer it will take, but it peaks around an extra 15 days, and takes at least an extra 7 days. They've seen some sites where relocating the special artifacts took as much as 365 extra days (yes - a whole year)! 

In addition, there are usually unanticipated costs that include fines and other things.  The accounting departments suggest that we model those costs with an exponential distribution with mean (scale) \$100,000.


Run a second simulation with these new parameters and using at least 1000 iterations.  Note, we are assuming that artifacts were found for this simulation.

Put your simulation code in the cell below.

<font color = "blue"> *** 8 points -  answer in cell below *** (don't delete this cell) </font>

In [4]:
# add extra days from discovery of artifacts
extra_days = [7, 15, 365]
for i in range(len(estimates['excavate'])):
    estimates['excavate'][i] += extra_days[i]

# add random extra costs
extra_cost_flag = True

art_proj_durations, art_profit, art_extra_costs = simEstimates(1000)

When artifacts are found, what is the probability that Reliable Company will finish the bid in 280 days or fewer, more than 280 and 329 days or fewer, or more than 329 days? What is their average profit?

Include code to answer these questions with output below:

<font color = "blue"> *** 2 points -  answer in cell below *** (don't delete this cell) </font>

In [5]:
art_avg_profit = sum(art_profit)/len(art_profit)
art_avg_extra_costs = sum(art_extra_costs)/len(art_extra_costs)
art_prob_bonus = sum(1 for i in art_proj_durations if i<=280)/len(art_proj_durations)
art_prob_on_time = sum(1 for i in art_proj_durations if 281<=i<=329)/len(art_proj_durations)
art_prob_late = sum(1 for i in art_proj_durations if i>329)/len(art_proj_durations)
print(f"Probability project complete in 280 days or less: {art_prob_bonus}")
print(f"Probability project complete more than 280 and 329 days or fewer: {art_prob_on_time}")
print(f"Probability project complete greater than 329 days: {art_prob_late}")
print(f"Average profit: ${art_avg_profit:0.3f} million.")
print(f"Average extra costs: ${art_avg_extra_costs:0.3f} million.")

Probability project complete in 280 days or less: 0.003
Probability project complete more than 280 and 329 days or fewer: 0.053
Probability project complete greater than 329 days: 0.944
Average profit: $2.520 million.
Average extra costs: $0.107 million.


### **P3.3** - Make Decision about Insurance

Clearly dealing with artifacts can be very costly for Reliable Construction.  It is known from past experience that about 30% of building sites in this area contain special artifacts.  Fortunately, they can purchase an insurance policy - a quite expensive insurance policy. The insurance policy costs \$500000, but it covers all fines and penalities for delays in the event that special artifacts are found that require remediation. Effectively, this means that Reliable could expect the same profit they would get if no artifacts were found (minus the cost of the policy).

Given the estimated profit without artifacts, the estimated profit with artifacts, the cost of insurance, the 30% likelihood of finding artifacts, create a payoff table and use Baye's Decision Rule to determine what decision Reliable should make.  You should round the simulated profits to the nearest \$100,000 and use units of millions of dollars so that, for example, \$8,675,309 is 8.7 million dollars.

Provide appropriate evidence for the best decision such as a payoff table or picture of a suitable (small) decision tree.

<font color = "blue"> *** 6 points -  answer in cell below *** (don't delete this cell) </font>

<font color = "green">
<table style="border: solid; color: green;">
    <tr>
        <th style="border-right: solid;"> </th>
        <th colspan=2 style="border-right: solid;">State of Nature</th>
        <th> </th>
    </tr>
    <tr style="border-bottom: solid;">
        <th style="border-right: solid;">Alternative</th>
        <th>$S_1$ (Artifacts Found)</th>
        <th style="border-right: solid;">$S_2$ (Artifacts Not Found)</th>
        <th>Expected Payoff</th>
    </tr>
    <tr style="text-align: center;">
        <th style="border-right: solid;">$A_1$ (Purchase Insurance)</th>
        <td>4.7</td>
        <td style="border-right: solid">4.7</td>
        <td>4.7(0.3) + 4.7(0.7) = 4.7</td>
    </tr>
    <tr style="text-align: center;">
        <th style="border-right: solid;">$A_2$ (Don't Purchase)</th>
        <td>2.5</td>
        <td style="border-right: solid">5.2</td>
        <td>2.5(0.3) + 5.2(0.7) = 4.4</td>
    </tr>
    <tr style="text-align: center; border-top: solid">
        <th style="border-right: solid">Prior probability</th>
        <td>.3</td>
        <td style="border-right: solid">.7</td>
        <td> </td>
    </tr>
</table>

Describe, in words, the best decision and the reason for that decision:

<font color = "blue"> *** 2 points -  answer in cell below *** (don't delete this cell) </font>

<font color = "green">
According to Bayes' Decision Rule, Reliable should purchase the insurance policy.  The expected payoff of \$4.7 million is greater than the expected payoff of \$4.4 million that results without the policy.
</font>

### **P3.4** - Posterior Probabilities
Reliable has been contacted by an archeological consulting firm. They assess sites and predict whether special artifacts are present. They have a pretty solid track record of being right when there are artifacts present - they get it right about 86% of the time. Their track record is less great when there are no artifacts - they're right about 72% of the time.

First find the posterior probabilities and provide evidence for how you got them (Silver Decisions screenshot or ?).

<font color = "blue"> *** 6 points -  answer in cell below *** (don't delete this cell) </font>

<font color = "green">
<table style="border: solid; color: green;">
    <tr>
        <th style="border-right: solid;"> </th>
        <th colspan=2>State of Nature</th>
    </tr>
    <tr style="border-bottom: solid;">
        <th style="border-right: solid;"> </th>
        <th>$S1$ (Artifact Found)</th>
        <th>$S2$ (Artifact Not Found)</th>
    </tr>
    <tr style="text-align: center;">
        <th style="border-right: solid">$P(S_i)$</th>
        <td>$.3$</td>
        <td>$.7$</td>
    </tr>
    <tr style="text-align: center;">
        <th style="border-right: solid">$P(F_1 | S_i)$</th>
        <td>0.86</td>
        <td>0.28</td>
    </tr>
    <tr style="text-align: center;">
        <th style="border-right: solid;">$P(F_2 | S_i)$</th>
        <td>0.14</td>
        <td>0.72</td>
    </tr>
    <tr style="text-align: center;">
        <th style="border-right: solid;">$P(S_i | F_1)$</th>
        <td>0.568</td>
        <td>0.432</td>
    </tr>
    <tr style="text-align: center; border-bottom: solid">
        <th style="border-right: solid;">$P(S_i | F_2)$</th>
        <td>0.077</td>
        <td>0.923</td>
    </tr>
</table>
$
P(S_1 | F_1) = \frac{P(F_1 | S_1) * P(S_1)}{P(F_1 | S_1) * P(S_1) + P(F_1 | S_2) * P(S_2)} = \frac{.86 * .3}{.86 * .3 + .28 * .7} = 0.568 \\
P(S_1 | F_2) = \frac{P(F_2 | S_1) * P(S_1)}{P(F_2 | S_1) * P(S_1) + P(F_2 | S_2) * P(S_2)} = \frac{.14 * .3}{.14 * .3 + .72 * .7} = 0.077\\
P(S_2 | F_1) = \frac{P(F_1 | S_2) * P(S_2)}{P(F_1 | S_1) * P(S_1) + P(F_1 | S_2) * P(S_2)} = \frac{.28 * .7}{.86 * .3 + .28 * .7} = 0.432 \\
P(S_2 | F_2) = \frac{P(F_2 | S_2) * P(S_2)}{P(F_2 | S_1) * P(S_1) + P(F_2 | S_2) * P(S_2)} = \frac{.72 * .7}{.14 * .3 + .72 * .7} = 0.923 \\
$
</font>

The consulting fee for the site in question is \$50,000. 

Construct a decision tree to help Reliable decide if they should hire the consulting firm or not and if they should buy insurance or not.  Again, you should round the simulated profits to the nearest $100,000 and use units of millions of dollars (e.g. 3.8 million dollars) in your decision tree.

Include a picture of the tree exported from Silver Decisions.

<font color = "blue"> *** 10 points -  answer in cell below *** (don't delete this cell) </font>

<img src="images/9.4.png" width="650">

Summarize the optimal policy in words here:

<font color = "blue"> *** 2 points -  answer in cell below *** (don't delete this cell) </font>

<font color = "green">
The optimal policy is to hire the consulting firm.  If they predict artifacts will be found, Reliable should purchase insurance.  If they predict no artifacts will be found, they should skip the insurance. 
</font>

### **P3.5** - Final Steps

How confident do you feel about the results of your decision analysis? If you were being paid to complete this analysis, what further steps might you take to increase your confidence in your results?

<font color = "blue"> *** 4 points -  answer in cell below *** (don't delete this cell) </font>

<font color = "green">
I am as confident as a student with two weeks of experience in this particular domain should be... this translates to, I think I've done everything right. There will always be real-world factors that don't make it into the model. This method inserted some uncertainty to attempt to capture that variation, but it is possible that unpredictable events transpire.  Sensitivity analysis could be conducted to allow for more granularity in decision-making.
</font>