# Master Thesis - LJMU

Thabor Walbeek, October 2019

# **C++ Data conversion for reading Patterson files with Python**

Converted C++ code to Python code to read the Patterson files

--------------------------------------------
Source:

    PattersonInstance.h
    IncentiveScheduling

     Created by Louis-Philippe Kerkhove on 18/08/15.
    Copyright (c) 2015 Louis-Philippe Kerkhove. All rights reserved.
    louisphilippe.kerkhove@gmail.com

    For more information on the available datesets visit:
    http://www.projectmanagement.ugent.be/?q=research/data/RanGen

    For more information on the Patterson file format visit:
    http://www.p2engine.com/p2reader/patterson_format
    
----------------------------------------------

In [1]:
# Import required packages
import pandas as pd
import numpy as np
from itertools import groupby
from collections import Counter
from time import sleep
import random
from __future__ import division
from decimal import Decimal

## Read the .rcp File

Each project network has been generated in a controlled environment. 4 different sets have been created. For each set the 4 topological indicators have been set with fixed parameters and random numbers $\in [0,1]$

The first set has created 900 .rcp files, the parameters are set as: 

$SP \in {(0.2;0.3;0.4;0.5;0.6;0.7;0.8;0.9)}$

AD $\in[0,1]$, LA $\in[0,1]$ and TF $\in[0,1]$

Below we enter the name of the file and prepare the file for further processing

In [2]:
#Filename = 'EV1.rcp'

In [3]:
#OpenFileName = open(Filename)
#lines = [i for i in OpenFileName.readlines()]

#MyFileName = pd.DataFrame(lines)
#MyFileName.dropna(inplace = True)
#MyFileName= MyFileName[0].str.split(expand = True)

## Number of activties | **nbAct**

Define nbAct, which is the number of activities in the file. All activities have 2 dummy activities (start and end), which will be included in the count. nbAct will be used heavily in for loops for other functions. We also convert it to an integer value, as we need it to loop through a number in the for loops.
<br \>
In the file we opened we can find this number at **row [1], column [0]**

In [4]:
##### Define the number of activities (nbAct) ####
nbAct = int(MyFileName[0][1])
nbAct

32

## Activity Duration | **actDur**

Define **actDur** for all activities in the file. For this we need to loop through all the activities **i** to nbAct [32] and assign the duration value to each activity, which can be found in **rows 4:35, and column 0**

In [5]:
## Read in the information for each individual activity ####
# Create a list with all the activities in the file
actList = list(range(0, nbAct))

actDur = actList
actDur = pd.DataFrame(actDur).T
actDurList = []
for i in range(0,int(nbAct),1):
    ## Activity Duration ##
    actDurInput = MyFileName.loc[i+4,0]
    actDurInput = int(actDurInput)
    actDurList.append(actDurInput)
actDurList = pd.DataFrame(actDurList)
actDurList_T = actDurList.T
actDur = actDur.append(actDurList_T)
actDur = actDur.T
actDur.columns = ['Activity ID', 'Activity Duration']
actDur.style.hide_index()

Activity ID,Activity Duration
0,0
1,6
2,9
3,6
4,5
5,8
6,10
7,6
8,3
9,9


## Number of Successors | **nbSuc**

Define **nbSuc** (Number of Successors) for all activities in the file. For this we need to loop through all the activities **i** to nbAct [32] and assign the resource requirement values to each activity, which can be found in **rows 4:35, and column 5**

In [6]:
#### Number of successors ####
nbSucInput = np.array(MyFileName.loc[4:35,5])
nbSucInput = nbSucInput.astype(int)
nbSucInput = pd.DataFrame(nbSucInput).T
nbSucList = actList
nbSucList = pd.DataFrame(nbSucList).T
nbSucList = nbSucList.append(nbSucInput)
nbSuc = nbSucList.T
nbSuc.columns = ['Activity ID', 'Number of Successors']
nbSuc.style.hide_index()

Activity ID,Number of Successors
0,3
1,4
2,3
3,2
4,1
5,1
6,1
7,2
8,2
9,2


## Successors per Activity | **actSuc**

Define **actSuc** (Actual Successors) for all activities in the file. For this we need to loop through all the activities **i**  to nbAct [32] and assign the resource requirement values to each activity, which can be found in **rows 4:35, and column 6:maxSuc*. To determine the maximum number of columns, we take the max number of successors (nbSuc) and keep that as value to loop through for each column

In [61]:
# Get the maximum number of columns that have successors
maxSuc = nbSuc.loc[nbSuc['Number of Successors'].idxmax()]
maxSuc = int(maxSuc[1])

# Define the end column in the MyFileName set, which is column 6 (start column) + maxSuc columns
maxSuc = maxSuc + 6

# Create a DataFrame of all the successors for each activity
actSuc = np.array(MyFileName.loc[4:35,6:maxSuc])
actSuc = pd.DataFrame(actSuc)
newColName = maxSuc-6
newColName = str(newColName)
actSuc[newColName] = 0
for c in actSuc:
   if str(actSuc[c].dtype) in ('object', 'string_', 'unicode_'):
        actSuc[c].fillna(value='0', inplace=True)
actSuc = actSuc.astype(int)-1
actSuc
actSucList = actList
actSucList = pd.DataFrame(actSucList)
actSucList.columns = ['Activity ID']
actSucList = actSucList.T
actSuc = actSuc.T
actSucList = actSucList.append(actSuc)
actSuc = actSucList.T
actSuc

Unnamed: 0,Activity ID,0,1,2,3,4
0,0,1,2,3,-1,-1
1,1,9,8,6,4,-1
2,2,8,6,4,-1,-1
3,3,8,4,-1,-1,-1
4,4,5,-1,-1,-1,-1
5,5,7,-1,-1,-1,-1
6,6,7,-1,-1,-1,-1
7,7,14,10,-1,-1,-1
8,8,14,10,-1,-1,-1
9,9,14,10,-1,-1,-1


## Predecessors per Activity | **actPre**

Define **actPre** (Actual Predecessors) for all activities in the file. We know all the successors per activity from above code. We now have to find each successor value and add this value together with the activity row ID

In [62]:
columns = ['Activity ID', 'Predecessor', 'indexNumber']
actPreList = pd.DataFrame(0, index=range(0,nbAct,1), columns=columns)

for i in range(0,nbAct,1):
    actPreList.loc[i][0] = i

actPreList = [(0,0,0)]
for i in range(0,nbAct,1):
    nbSucIter = nbSuc['Number of Successors'][i]
    for j in range(0,nbSucIter,1):
        z = np.array(MyFileName.loc[i+4,j+6])
        z = int(z)-1
        z = (z,i)
        actPreList.append(z)
actPreList = pd.DataFrame(actPreList)
actPre = actPreList
actPre.columns = ['Activity ID', 'Predecessor', 'indexNumber']
actPre = actPre.sort_values([actPre.columns[0],actPre.columns[1]], ascending = True)
actPre['index'] = range(1, len(actPre) + 1)
actPre = actPre.reset_index(drop=True)
actPre.style.hide_index()
actPrePivot = actPre.pivot(index='index', columns='Activity ID', values='Predecessor')
actPrePivot = actPrePivot.T
actPrePivot

index,1,2,3,4,5,6,7,8,9,10,...,47,48,49,50,51,52,53,54,55,56
Activity ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,0.0,,,,,,,,,,...,,,,,,,,,,
1,,0.0,,,,,,,,,...,,,,,,,,,,
2,,,0.0,,,,,,,,...,,,,,,,,,,
3,,,,0.0,,,,,,,...,,,,,,,,,,
4,,,,,1.0,2.0,3.0,,,,...,,,,,,,,,,
5,,,,,,,,4.0,,,...,,,,,,,,,,
6,,,,,,,,,1.0,2.0,...,,,,,,,,,,
7,,,,,,,,,,,...,,,,,,,,,,
8,,,,,,,,,,,...,,,,,,,,,,
9,,,,,,,,,,,...,,,,,,,,,,


## Number of Predecessors per Activity | **nbPre**

Define **nbPre** (Number of Predecessors) for all activities in the file. We have to group **actPre** and count the number of items per activity ID.

In [63]:
nbPre = actPre.groupby(["Activity ID"]).count()
nbPre

Unnamed: 0_level_0,Predecessor,indexNumber,index
Activity ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,1,1,1
1,1,0,1
2,1,0,1
3,1,0,1
4,3,0,3
5,1,0,1
6,2,0,2
7,2,0,2
8,3,0,3
9,1,0,1


# Function for creating paths

## Recursive function to retrieve the paths

In [64]:
longest_path = 0

In [115]:
def calculateCPDuration(act_index, current_length):
        global longest_path
        actualduration = actDur.loc[act_index]['Activity Duration']
        pathlength = current_length + actualduration
        actualSuc = nbSuc.loc[act_index]['Number of Successors']

        if  actualSuc == 0:
            return pathlength
        
        else:
            addNewCol = actualSuc + 1
            for s in range(0,addNewCol,1):

                actualSucpath = actSuc.loc[act_index][s]
                
                if actualSucpath != -1:
                
                    path = calculateCPDuration(actualSucpath,pathlength)
          
                    longest_path = max(path, longest_path)
    
            return longest_path

In [116]:
calculateCPDuration(0,0)

91

## Calculate number of Arcs & Nodes | nbArcs & nbNodes

Calculating the number of Arcs & Nodes, exclusive of the dummy activities.

In [117]:
nbArcs = 0

for i in range(1,nbAct,1):
    nbArcs = nbArcs + nbSuc.loc[i][1]

nbNodes = nbAct - 2

i = nbPre.loc[nbAct-1]['Predecessor']
nbArcs = nbArcs - i

## Coefficient of network complexity | CNC

Calculating the complexity of the network

In [118]:
CNC = nbArcs / nbNodes

## Order Strength | OS

Calculating the order strength of the network

In [119]:
# Note that the OS also includes transitive precedence relationships between activities (See p55 in Measuring Time)
max_rel = (nbNodes * (nbNodes - 1)) / 2
is_a_successor = pd.DataFrame(0, index=np.arange(0, 32), columns=np.arange(32))
is_a_successor = is_a_successor.astype(int)
has_been_calculated = pd.DataFrame(0, index=np.arange(0, 32), columns=np.arange(1))
has_been_calculated = has_been_calculated.astype(int)

def nb_direct_and_indirect_successors(act, is_a_successor, has_been_calculated, predecessors_lst):
    global a
    
    predecessors_lst = nbPre.loc[act][0]    
    predecessors_lst = int(predecessors_lst)
    x = has_been_calculated.loc[act][0]

    if x == 0:
        
        z = nbSuc.loc[act][1]

        for s in range(1,z,1):
            y = actSuc.loc[act][s]
            if y != nbAct -1:
                pInd = 0
                for q in range(pInd,predecessors_lst,1):
                    a = s
                    b = y
                    is_a_successor[a][b] = 1
                nb_direct_and_indirect_successors(b, is_a_successor, has_been_calculated, predecessors_lst)
        has_been_calculated.loc[act] = 1

def calc_total_arcs_incl_trans():
    global a, total_arcs
    
    for a in range(0,nbAct-1,1):
        predecessors_lst = []
        nb_direct_and_indirect_successors(a, is_a_successor, has_been_calculated, predecessors_lst)
    total_arcs = is_a_successor.values.sum() 
    nbArcsAct0 = nbSuc.loc[0][1]
    total_arcs = total_arcs + nbArcsAct0
    return total_arcs 

total_arcs_incl_trans = calc_total_arcs_incl_trans()
OS = total_arcs_incl_trans / max_rel

## Serial Parallel Indicator | SP

Calculating the serial/parallel indicator

In [120]:
def SPind():
    global max_progressive_output, progressive_level, SP
    # Create an empty dataframe for all activities, so we can add the maximum progressive level for each activity
    progressive_level = pd.DataFrame(0, index=np.arange(0, 32), columns=np.arange(1))
    
    # Set the maximum progressive level to 0
    max_progressive_level = 0
    
    # set Index value to 1, as starting row from where we want to collect the number of predecessors (nbPre). This value should increase
    # during the for loops, as for each activity we want to know the starting row and then collect the value of nbPre
    setIndex = 1 # checked and correct
    
    # The first for loop, runs through all activities starting from 0 to the last dummy-activity 31
    for i in range(0, nbAct, 1):
        # When the activity is 0 or 31, we will give the progressive_level at that point the value -1
        if (i == 0 or i == nbAct - 1):
            progressive_level.loc[i][0] = -1
        # Otherwise make the progressive_level for the activities 1
        else:
            progressive_level.loc[i][0] = 1
            
            # Set the iteration values for the for-loop, which is the number of Predecessors for activity in loop "i"
            nbPreIter = nbPre.loc[i][0]
            
            # We create a for-loop to loop through the number of Predecessors per activity "i" 
            for j in range(0,nbPreIter,1):
                
                # Set the index
                setIndex = setIndex
                
                # Get the value of the activity that is the actual predecessor at this point [i,j]
                actPredecessor = actPre.loc[setIndex][1]
                if actPredecessor != 0: 
                
                    checkProgLevel = progressive_level.loc[actPredecessor][0]
                    checkProgLevel = checkProgLevel + 1
                
                    progressive_level.loc[i][0] = max(progressive_level.loc[i][0], checkProgLevel)
                
                setIndex = setIndex + 1
            
            max_progressive_level = max(max_progressive_level, progressive_level.loc[i][0])
            max_progressive_output = progressive_level[0].max()
            
    if max_progressive_level == 1:
        SP = 1
    else:
        SP = round((max_progressive_level - 1) / (nbNodes - 1),1)

    return SP

SPind()

0.5

## Activity Distribution Indicator | AD

Calculating the Activity Distribution indicator

In [121]:
# Calculate the average as levels of total activities
total_act = nbAct - 2
max_level = max_progressive_output

averageWidth = int(total_act) / int(max_level)

# calculate the denominator
denominator = 2 * (max_progressive_output - 1) * (averageWidth -1)

# calculate the width at each level

progressive_level.columns = ['x']
nominator = 0
width_lst = []

for i in range(0,max_progressive_output,1):
    widthCount = progressive_level[(progressive_level.x == i+1)].count()
    widthCount = widthCount.astype(int)
    widthCount = int(widthCount)

    width_lst.append(widthCount)
          
    absWidth = abs(widthCount - averageWidth)
    nominator = nominator + absWidth
    nominator = int(nominator)
    
width_lst = pd.DataFrame(width_lst)

# Set the value for AD:

AD = round(nominator / denominator, 3)

In [None]:
AD

## Length of Arcs | LA

Calculating the Length of Arcs indicator

In [122]:
# Calculate the second value of nominator and denominator
n = (nbAct-2) + width_lst.loc[0][0]

def GetLA():
    global LA
    max_arcs_l1 = 0
    
    # This calculates the max number of arcs in this file (e.g. 224 arcs in EV2.rcp file)
    for i in range(0, max_progressive_output-1, 1):
        max_arcs = width_lst.loc[i][0] *  width_lst.loc[i+1][0]
        max_arcs_l1 = max_arcs_l1 + max_arcs
  
    if max_arcs_l1 == 1 - width_lst[0][0]:
        LA = 1
    else:
        arcs_with_l1 = 0
        
        for i in range(0, nbNodes+2, 1):
            nbSucIter = int(nbSuc.loc[i][1])
            for j in range(0,nbSucIter,1):
                actSucCheck = actSuc.loc[i][j]
    LA = round((arcs_with_l1 - nbNodes + width_lst[0][0]) / (max_arcs_l1  - nbNodes + width_lst[0][0]),3)
    return LA

GetLA()

-0.9

## Topological Float | TF

Calculating the Topological Float indicator

In [123]:
# Create a dataframe with all activities (incl dummy activities and assign -1 to the dummy activities)

regressive_level = pd.DataFrame(0, index=np.arange(0, 32), columns=np.arange(1))
regressive_level.loc[0][0] = -1
regressive_level.loc[31][0] = -1

def GetTF():
    global TF
    for i in range(nbNodes,0,-1):
        # Default value when there are no successors = m
        regressive_level.loc[i][0] = max_progressive_output
        
        nbSucIter = nbSuc.loc[i][1]

        for j in range(0, nbSucIter, 1):
            actSucCheck = actSuc.loc[i][j]
            
            if actSucCheck != nbAct -1:
                check1 = int(regressive_level.loc[actSucCheck])
                check2 = check1 - 1
                regressive_level.loc[i][0] = min(regressive_level.loc[i][0],check2)
    
    if max_progressive_output == 1 or max_progressive_output == nbNodes:
        TF = 0
    else:
        numerator = 0
        for i in range(0,nbAct,1):
            check1 = int(regressive_level.loc[i]) - int(progressive_level.loc[i])
            numerator = numerator + check1
        
        denominator = (max_progressive_output - 1) * (nbNodes - max_progressive_output)
        
        TF = round(numerator / denominator, 3)
        return TF

GetTF()

0.086

## Create Ganntt Chart | Gantt

Create a Gantt chart of the project network, based on the parameters calculated above.

In [124]:
height = nbAct
width = longest_path+1
ganttchart = pd.DataFrame(0, index=range(height), columns=range(width))

actDur['Activity Start'] = 1
actDur['Activity End'] = (1-1)+actDur['Activity Duration']

collength = len(actPrePivot.columns)

for i in range(0,nbAct,1):
    for j in range(1,collength,1):
        
        predvalue = actPrePivot.loc[i][j] # predecessor value at this point
        
        if predvalue == 0:
            continue
        
        elif i == 31:
            continue
        
        elif predvalue > 0:
            # find the Activity End of the predecessor
            EndValuePredecessor = actDur.loc[predvalue]['Activity End']
            NewStartValue = EndValuePredecessor + 1
            CurrentStartValue = actDur.loc[i]['Activity Start']
            NewAdjStartValue = max(NewStartValue, CurrentStartValue)
            NewEndValue = NewAdjStartValue + actDur.loc[i]['Activity Duration']
            actDur.loc[i,'Activity Start'] = NewAdjStartValue
            actDur.loc[i,'Activity End'] = NewEndValue - 1
        else:
            continue
    
for i in range(0,nbAct,1):
    for j in range(0,width,1):
        startValue = actDur.loc[i]['Activity Start']
        endValue = actDur.loc[i]['Activity End']
        if j >= startValue and j <= endValue:
            ganttchart.loc[i][j] = 1

ganttchart = ganttchart.astype(float)
ganttchart

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,82,83,84,85,86,87,88,89,90,91
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


**END OF STANDARD PROJECT NETWORK**