*Note: this blog post sits firmly at the intersection of management and data analysis, and it is most applicable for those who are trained in both subjects.*

One of the habits of a good manager is to vigilantly keep track of the risk that project deadlines will not be met. Managers worth their salt will make a list of all potential project risks, crossing risks off when they no longer apply, and adjusting the relative likelihood of each risk as new information is received. Even if this list is not openly shared with superiors, a deep understanding of potential risks is essential to completing a project on-time and on-budget.

Risk is defined as "*probability* times *consequence*". The *probability* of each risk refers to its likelihood of occurrence, which must be updated over the lifetime of the project. In the beginning of the project, when uncertainty reigns supreme, the typical risk might be high. As the final deadline nears, many risks may decrease or be eliminated. The *consequence* of the risk is evaluated in terms of its expected impact on the project deadline, usually in units of days or hours. It's expected for these probability and consequence values to be somewhat subjective, but they should be realistic assessments based on past experience.

As an example, imagine that a project goal is for a team of data scientists to develop a machine learning algorithm using data provided by a client. The machine learning algorithm will be implemented by a separate software development team. One of the risks might be "data files are received from the client a week later than expected", with an associated probability of *15%* and an average consequence of *12 hours*. Although team members may be able to work on other tasks while waiting for the data (hence why the consequence is *12 hours* rather than *40 hours*), late receipt of data will still push the project deadline into the future.

This risk assessment process can be used to predict an expected range of project delivery dates. First, determine the *optimistic* project deadline. Most managers are good at this type of estimation. Think about how quickly the project could be accomplished if *everything* goes right, team members are allocated full-time to the project with no other responsibilities, and no one takes sick days or vacation. (It's the mark of a bad manager to treat this *optimistic* deadline as the *actual* deadline.)

Next, make a list of potential project risks. I've developed a spreadsheet (located HERE) for this purpose. The intuition behind this table is that it's easier to estimate a low-to-high range of possible values than it is to assign a single, static value. For example, instead of selecting a *15%* probability, you define a low end (e.g. *10%*) and a high end (e.g. *20%*). It is recommended to err on the conservative side when assessing the probability and consequence of each risk.

An example of project risks might look like:

[PICTURE 1]

The calculation of the deadline risk will involve two major steps: 

1. For each risk event, find the low and high risk predictions by multiplying the probability times the consequence
2. To find the overall risk of the deadline, sum the risk distributions of all risk events

During the first step, the product of the probability and consequence distributions for each risk event are found. The low range of the risk is found by multiplying the low range of the probability by the low range of the consequence, and the high range of the risk is found by multiplying the high range of the probability by the high range of the consequence (see assumption #2 below). The mean and variance and variance can be calculated under the assumption that the low and high end of the risk distribution is one standard deviation away (see assumption #3 below).

In the second step, the deadline risk is calculated as the sum of all risk events which may contribute to an increased deadline. By summing the means and variances of each risk event, the overall risk of the deadline can be found (a property of normal distributions).

Following along in the Excel sheet (tab "Risks"), step 1 is calculated in cells G2:J5, while step 2 is accomplished in cells I9:J9. 

The resulting deadline risk distribution for this example will look like the following figure. The x axis can be interpreted as the number of hours beyond the *optimistic* deadline, and the y axis is the relative risk.

[PICTURE 2]

Following along in the Excel sheet, this figure is shown in tab "DeadlinePrediction".

List of major assumptions:

1. **Probability can be modeled by a Gaussian distribution**. This assumption is not true, because probability is only defined between 0 and 1, and selecting a Gaussian function allows the probability to dip below 0 or increase above 1. I considered other distributions, but I found in practice that violating the assumption was fairly negligible, since the majority of the probability distribution was in the 0 to 1 range. Making this assumption significantly simplified finding the product of the two distributions, so I believe it was a good tradeoff, but feel free to choose more realistic options, which will result in more complicated statistics.
2. **The high and low ranges of the distributions are symmetric**. This assumption is likely not true for most non-trivial cases. In my experience, managers are generally good at estimating the low end of consequences, but predicting the "worst case scenario" is more difficult, especially since it relies partially on ignorance (if you could anticipate it, you might be able to prevent it). The Weibull distribution is an excellent choice for modeling this situation, and I recommend it. Again, the tradeoff was between simplicity and accuracy, and I chose simplicity.
3. **The low and high ranges of each distribution are both one standard deviation away.** Accordingly, the mean and variance were calculated using the familiar statistical formulas, assuming again that the low and high ranges are symmetric:

u = (h+l)/2

v = (u-l)^2

where h is the high limit, l is the low limit, u is the mean, and v is the variance.