In [47]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

This simulation is based on the confidence level. 

The idea of this approach is that you have a list of project, 3 potential estimated timeframes for each project (lower, mid and upper bounds) and the confidence level for each estimation timeframe.
Example: 
   * The team has a project they try to predict the cost for. 
        - 30% chance, this project can be done in 2 engineer months
        - 40% chance, it will be done in 4 engineer months, as this sounds more realistic
        - 30% chance, the team will need 5 engineer motnhs to finish it, in case some dependencies arise
   
In addition to this, we can edit the complexity level `complexity = round(np.random.normal(1, 0.4), 2)`, 
    where 0.4 represents a scale from 0.1 to 0.9 to decide if the project is complex


In [93]:
n = 10**5 # number of rolls 
project_record = [] #a list where we save all roll results
estimated_timeframe = [2, 4, 5] # projected range (months, days, points)
estimation_probability = [0.1, 0.7, 0.2] #confidence level (0.3 represents 30% confidence )

for i in range(n): 
    '''
    complexity can be any variable that defines uncertainty
    0.1 means that no complexity or uncertainty is expected
    0.9 complexity means uncertainty or potential complexity
    '''
    '''
    We seprately randomize both complexity and the estimated_cost
    '''
    complexity = round(np.random.normal(1, 0.4),2)
    estimated_cost = np.random.choice(estimated_timeframe, 1, p=estimation_probability)[0] 

    project_time = complexity * estimated_cost
    project_record.append(project_time) 

proj_df = pd.DataFrame({"Months": project_record, }) 

print(proj_df.head(10))


   Months
0    5.40
1    3.56
2    7.20
3    4.08
4    1.44
5    3.92
6    4.20
7    6.52
8    6.00
9    6.60


Running a `describe` command on the data frame, the result shows:
* `count` - a number of rolls
* `mean` - a mean value of the estimated project cost based on a number of rolls
* `min` - the minimum value resulted from the roll
* `25%, 50%, 75%` - the potential 25%, 50%, 75% probability
* `max` - the maximum values resulted from the dice roll

In [96]:
print(proj_df["Months"].describe())

count    100000.000000
mean          3.996583
std           1.800119
min          -3.880000
25%           2.680000
50%           3.960000
75%           5.200000
max          13.200000
Name: Months, dtype: float64


The next line prints the 20 buckets resulting in the most dice roll outcomes, where the first column represents the value and the second column represents the number of times the value resulted in the roll dice.

In [99]:
print(proj_df["Months"].groupby(proj_df["Months"]).count().nlargest(20))

Months
4.40    924
4.20    884
3.80    841
4.00    829
5.00    823
4.80    786
3.00    750
3.20    748
5.20    744
4.08    739
3.44    733
4.24    732
4.12    730
3.72    724
3.40    721
2.60    713
4.04    706
4.36    705
3.92    703
3.96    702
Name: Months, dtype: int64


To answer the question: `At 95th percintile, what will the cost of this project be?`

In [90]:
print(round(np.percentile(proj_record,95),2))

7.05
