# 2020-21 S2 Portfolio-level Utilizations (June - Oct; filtered)
Utilization is calculated at the portfolio level as the total billable hours divided by a weighted FTE based on the proportion of the company's time spent towards that portfolio.

To calculate utilization at the portfolio level, information required includes:
* **org_bill_hrs**: Total billable hours for the portfolio
* **org_total_hrs**: Total hours (billable and non-billable) for the portfolio (e.g., Planning & Ops, HR, General)
* **fte_hrs**: FTE hours for the company (i.e., total workable days * number of employees each day * 8 hours per day)
* **total_hrs**: Total hours for the company (i.e., total of all hours worked)

This workbook evaluates portfolio-level utilizations by first filtering by team member (i.e., excluding employees that did not log any hours to the portfolio). See *2020-21 S2 Portfolio-level Utilizations* for comparison without filtering. 

In [1]:
import pandas as pd
import numpy as np
portfolio = 'Habitat'

In [2]:
# read in data
util_df = pd.read_csv(r'C:\Users\Erik\Downloads\Utilization Tabular (12).csv', sep='\t',
                       encoding='utf_16_le')
util_df['Hours Date'] = pd.to_datetime(util_df['Hours Date'])
filt = (util_df['Hours Date'] < pd.to_datetime('2020-10-31')) &  (util_df['Hours Date'] > pd.to_datetime('2020-06-01'))
util_df = util_df[filt]
org_df = pd.read_csv(r'C:\Users\Erik\Downloads\Organizations.csv', sep='\t',
                       encoding='utf_16_le')
emp_df = pd.read_csv(r'C:\Users\Erik\Downloads\Employees (14).csv', sep='\t',
                       encoding='utf_16_le')
emp_df['Hire date'] = pd.to_datetime(emp_df['Hire date'])
emp_df['Termination date'] = pd.to_datetime(emp_df['Termination date'])

In [3]:
util_df.head()

Unnamed: 0,Employee ID,Last Name,First Name,Work Schedule Description,Org Name,Project ID,Project Name,User Defined Code 3,Hours Date,Entered Hours,Approved Hours,Comments
3,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.001.001.01,Planning and Ops Gen Intl,OVH,2020-06-02,1.5,0,
4,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.001.009.01,Planning and Ops CH,G&A,2020-06-02,7.5,0,
5,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.001.009.01,Planning and Ops CH,G&A,2020-06-03,5.86,0,
6,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.002.009.01,HR & Recruiting CH,G&A,2020-06-03,5.0,0,
7,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.005.001.01,R&D General Intl,IRD,2020-06-03,1.67,0,


In [4]:
org_df.head()

Unnamed: 0,Project ID,Project Name,Organization ID,Organization Name,Level Number
0,1001,USAID Measuring Impact II,1.01.01.01,General Intl,4
1,1001.AFR,BI-AFR,1.01.01.04,Africa,4
2,1001.AFR.001,BI-AFR,1.01.01.04,Africa,4
3,1001.AFR.001.01,16.0.AFR_BuyIn_Mgmt,1.01.01.04,Africa,4
4,1001.AFR.001.02,16.0.AFR_Zambia FS,1.01.01.04,Africa,4


In [5]:
# confirm Project ID is unique
len(org_df) == len(org_df['Project ID'].unique())

True

In [6]:
# merge hours entries and organizations
df = pd.merge(util_df, org_df, how='left')
df.head()

Unnamed: 0,Employee ID,Last Name,First Name,Work Schedule Description,Org Name,Project ID,Project Name,User Defined Code 3,Hours Date,Entered Hours,Approved Hours,Comments,Organization ID,Organization Name,Level Number
0,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.001.001.01,Planning and Ops Gen Intl,OVH,2020-06-02,1.5,0,,1.01.01.01,General Intl,4.0
1,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.001.009.01,Planning and Ops CH,G&A,2020-06-02,7.5,0,,1.01.90.01,HR and Operations,4.0
2,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.001.009.01,Planning and Ops CH,G&A,2020-06-03,5.86,0,,1.01.90.01,HR and Operations,4.0
3,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.002.009.01,HR & Recruiting CH,G&A,2020-06-03,5.0,0,,1.01.90.01,HR and Operations,4.0
4,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.005.001.01,R&D General Intl,IRD,2020-06-03,1.67,0,,1.01.01.01,General Intl,4.0


In [7]:
# confirm merge did not add new time entries
len(util_df) - len(df)

0

In [8]:
df.columns

Index(['Employee ID', 'Last Name', 'First Name', 'Work Schedule Description',
       'Org Name', 'Project ID', 'Project Name', 'User Defined Code 3',
       'Hours Date', 'Entered Hours', 'Approved Hours', 'Comments',
       'Organization ID', 'Organization Name', 'Level Number'],
      dtype='object')

## Filter for habitat

In [9]:
filt = df['Organization Name'] == portfolio
hab_df = df.loc[filt]
hab_df.head()

Unnamed: 0,Employee ID,Last Name,First Name,Work Schedule Description,Org Name,Project ID,Project Name,User Defined Code 3,Hours Date,Entered Hours,Approved Hours,Comments,Organization ID,Organization Name,Level Number
6093,100022.0,Anderson,Erik,Standard,Domestic,INDR.001.002.02,Planning and Ops Habitat,OVH,2020-06-02,1.23,0,,1.01.02.01,Habitat,4.0
6095,100022.0,Anderson,Erik,Standard,Domestic,INDR.003.002.02,General OH Habitat,OVH,2020-06-02,1.0,0,,1.01.02.01,Habitat,4.0
6096,100022.0,Anderson,Erik,Standard,Domestic,INDR.005.002.02,R&D Habitat,IRD,2020-06-02,2.33,0,,1.01.02.01,Habitat,4.0
6099,100022.0,Anderson,Erik,Standard,Domestic,2005.UNB,NV CCS Unbillable,UNA,2020-06-03,1.0,0,present credit Summary tool,1.01.02.01,Habitat,4.0
6100,100022.0,Anderson,Erik,Standard,Domestic,INDR.001.002.02,Planning and Ops Habitat,OVH,2020-06-03,0.82,0,,1.01.02.01,Habitat,4.0


In [10]:
hab_df['Organization Name'].unique()

array(['Habitat'], dtype=object)

In [11]:
# check for null organizations
filt = hab_df['Organization Name'].isnull()
no_org_df = hab_df.loc[filt]
no_org_df['Project Name'].unique()

array([], dtype=object)

In [12]:
no_org_df['Entered Hours'].sum()

0.0

In [13]:
# get list of employees to filter emp_df by
emps = hab_df['Employee ID'].unique()
emps

array([100022., 100023., 100025., 100026., 100029., 100038., 100041.,
       100047., 100067., 100074., 100076., 100092.])

In [14]:
emp_df.head()

Unnamed: 0,Employee ID,Last Name,First Name,Middle Initial,E-mail Address,Active Flag,Hire date,Termination date,Work Schedule,Work Schedule Description,Default Org
0,100041,Abragan,Maria Celes,L,mabragan@enviroincentives.com,Y,2019-04-08,NaT,STD,Standard,1.01.01
1,100003,Ajroud,Brittany,N,bajroud@enviroincentives.com,Y,2016-10-18,NaT,STD,Standard,1.01.01
2,100001,Alexandrovich,Andrew,,andrew@enviroincentives.com,Y,2010-04-05,NaT,STD,Standard,1.01.90
3,100022,Anderson,Erik,T,eanderson@enviroincentives.com,Y,2014-03-17,NaT,STD,Standard,1.01.02
4,100027,Armanino,Molly,,marmanino@enviroincentives.com,N,2017-06-18,2019-12-14,,,


In [15]:
filt = emp_df['Employee ID'].isin(emps)
emp_df = emp_df.loc[filt]
emp_df.head()

Unnamed: 0,Employee ID,Last Name,First Name,Middle Initial,E-mail Address,Active Flag,Hire date,Termination date,Work Schedule,Work Schedule Description,Default Org
0,100041,Abragan,Maria Celes,L,mabragan@enviroincentives.com,Y,2019-04-08,NaT,STD,Standard,1.01.01
3,100022,Anderson,Erik,T,eanderson@enviroincentives.com,Y,2014-03-17,NaT,STD,Standard,1.01.02
9,100023,Boysen,Kristen,N,kboysen@enviroincentives.com,Y,2016-06-17,NaT,STD,Standard,1.01.02
10,100038,Brock,Cameryn,J,cbrock@enviroincentives.com,Y,2020-06-15,NaT,STD,Standard,1.01.02.01
12,100067,Castillo Ferri,Mauricio,R,mcastilloferri@enviroincentives.com,Y,2019-11-18,NaT,STD,Standard,1.01.90


In [16]:
# which employees billed to the portfolio?
emp_df['Last Name'].unique()

array(['Abragan', 'Anderson', 'Boysen', 'Brock', 'Castillo Ferri',
       'Connolly', 'Daniels', 'Mirghani', 'Motlow', 'Riley', 'Shay',
       'Sokulsky'], dtype=object)

In [17]:
# check to make sure that Abragan did bill to Habitat
filt = hab_df['Last Name'] == 'Abragan'
hab_df.loc[filt]

Unnamed: 0,Employee ID,Last Name,First Name,Work Schedule Description,Org Name,Project ID,Project Name,User Defined Code 3,Hours Date,Entered Hours,Approved Hours,Comments,Organization ID,Organization Name,Level Number
11648,100041.0,Abragan,Maria Celes,Standard,International,INDR.002.002.02,HR & Recruiting Habitat,OVH,2020-06-15,1.3,0,,1.01.02.01,Habitat,4.0


## fte_hrs

In [18]:
# fte hours is total number of fte days by start and end dates (ignore part time and very part time (e.g., CB))
sem_start = pd.to_datetime('2020-06-01')
sem_end = pd.to_datetime('2020-10-31')

fte_df = emp_df[['Employee ID', 'Hire date', 'Termination date']].copy()

def update_start(hire_date):
    if hire_date < sem_start:
        return sem_start
    elif hire_date > sem_end:
        return sem_end
    else:
        return hire_date
    
def update_end(termination_date):
    if pd.isnull(termination_date):
        return sem_end
    if termination_date > sem_end:
        return sem_end
    elif termination_date < sem_start:
        return sem_start
    else:
        return termination_date

fte_df['sem_start'] = fte_df['Hire date'].apply(update_start)
fte_df['sem_end'] = fte_df['Termination date'].apply(update_end)
fte_df['bushrs'] = np.busday_count(fte_df['sem_start'].dt.date, fte_df['sem_end'].dt.date) * 8

fte_df.head()

Unnamed: 0,Employee ID,Hire date,Termination date,sem_start,sem_end,bushrs
0,100041,2019-04-08,NaT,2020-06-01,2020-10-31,880
3,100022,2014-03-17,NaT,2020-06-01,2020-10-31,880
9,100023,2016-06-17,NaT,2020-06-01,2020-10-31,880
10,100038,2020-06-15,NaT,2020-06-15,2020-10-31,800
12,100067,2019-11-18,NaT,2020-06-01,2020-10-31,880


In [19]:
fte_hrs = fte_df['bushrs'].sum()
fte_hrs

9896

## total_hrs

In [20]:
# first filter by emp
filt = df['Employee ID'].isin(emps)
df_select_emp = df.loc[filt]

# total hours is sum of all hours for the included employees 
total_hrs = df_select_emp['Entered Hours'].sum()
total_hrs

10243.78

In [21]:
# what is this group's FTE?
total_hrs / fte_hrs

1.0351434923201295

In [22]:
# review fte by employee based on expected fte hours as calculated
hrs_by_emp = df.groupby('Employee ID').sum()['Entered Hours']
hrs_by_emp = pd.merge(hrs_by_emp, fte_df, left_index=True, right_on='Employee ID', how='left')
hrs_by_emp['fte'] = hrs_by_emp['Entered Hours'] / hrs_by_emp['bushrs']
hrs_by_emp = pd.merge(hrs_by_emp, emp_df, on='Employee ID')
hrs_by_emp.loc[:, ['Last Name', 'First Name', 'Entered Hours', 'sem_start', 'sem_end', 'bushrs', 'fte']].sort_values('fte', ascending=False)

Unnamed: 0,Last Name,First Name,Entered Hours,sem_start,sem_end,bushrs,fte
7,Shay,Arica,1036.07,2020-06-01,2020-10-31,880.0,1.177352
1,Boysen,Kristen,949.6,2020-06-01,2020-10-31,880.0,1.079091
3,Riley,Kathryn,941.96,2020-06-01,2020-10-31,880.0,1.070409
8,Castillo Ferri,Mauricio,940.88,2020-06-01,2020-10-31,880.0,1.069182
10,Daniels,Molly,939.49,2020-06-01,2020-10-31,880.0,1.067602
2,Motlow,Mary-Sophia,937.75,2020-06-01,2020-10-31,880.0,1.065625
0,Anderson,Erik,919.22,2020-06-01,2020-10-31,880.0,1.044568
9,Mirghani,Izzie,792.44,2020-06-01,2020-10-15,784.0,1.010765
6,Abragan,Maria Celes,886.15,2020-06-01,2020-10-31,880.0,1.006989
4,Sokulsky,Jeremy,882.24,2020-06-01,2020-10-31,880.0,1.002545


## org_total_hrs

In [23]:
# calculate total hours by organization
org_total_hrs = hab_df['Entered Hours'].sum()
org_total_hrs

3631.66

In [24]:
# confirm total hours with full hours entries table
df.groupby('Organization Name').sum()['Entered Hours'][portfolio]

3631.660000000001

## org_bill_hrs

In [25]:
# calculate billable hours by organization
filt = hab_df['User Defined Code 3'] == 'SRV'
org_bill_hrs = hab_df.loc[filt].sum()['Entered Hours']
org_bill_hrs

2455.95

In [26]:
# confirm billable hours with full hours entries table
filt = df['User Defined Code 3'] == 'SRV'
org_bill_hrs = df.loc[filt].groupby('Organization Name').sum()['Entered Hours'][portfolio]
org_bill_hrs

2455.9500000000044

## Calculations
### Proportion to organization
prop_to_org = org_total_hrs / total_hrs

In [27]:
# divide org total hrs by total hours to get proportion
prop_to_org = org_total_hrs / total_hrs
prop_to_org

0.3545234278752569

This is where the math starts to diverge between the whole company calculation and the subgroup calculation. This isn't unexpected, since this group would be expected to have a higher proportion to organization than the company as whole.

### Weighted FTE
weighted_fte = prop_to_org * fte_hrs

In [28]:
# weight fte by prop to org
weighted_fte = prop_to_org * fte_hrs
weighted_fte

3508.3638422535423

Here, the calculation converges again with the company-wide calculation. The weighted FTE for Habitat with the company as a whole is 3,861 rather than 3,508. The driver for the difference in contribution is likely to be found here.

Both prop_to_org and fte_hrs are potential contributors.

To achieve the same result, the ratio of prop_to_org and fte_hrs would need to be scaled by the same factor when subsetting by portfoli.

fte_hrs is simply the man-hrs available over the period for either the group that billed to habitat or the company as a whole

In [29]:
fte_hrs / 50400

0.19634920634920636

prop_to_org is the proportion that this group spent towards the habitat team. Adding or removing employees from this group will have a large effect on both prop_to_org and fte_hrs. For example, Maria only billed an hour or two, however she contributed a full 880 hours of fte expectation.

In [30]:
total_hrs / 47402.56

0.21610183078719802

Applying the ratio of these ratios to utilization should provide us the utilization as calculated for the habitat team only.

In [31]:
64 * .216/.196

70.53061224489795

In [32]:
.216/.196

1.1020408163265305

### Utilization
utilization = org_bill_hrs / weighted_fte

In [33]:
# utilization is billale hours divided by weighted fte
utilization = org_bill_hrs / weighted_fte
utilization

0.7000271666300333

In [34]:
# what proportion of hours went towards billale projects in this portfolio?
org_bill_hrs / total_hrs

0.23975036558770338

The expected utilization as calculated across the entire company for habitat is 64%, rather lower than the 70% calculated here.

## Notes

In [35]:
# Note that Molly does not appear to be updated in the system
filt = emp_df['Last Name'] == 'Armanino'
emp_df.loc[filt]

Unnamed: 0,Employee ID,Last Name,First Name,Middle Initial,E-mail Address,Active Flag,Hire date,Termination date,Work Schedule,Work Schedule Description,Default Org


In [36]:
# However she also does not appear to be billing
filt = util_df['Last Name'] == 'Armanino'
util_df.loc[filt]

Unnamed: 0,Employee ID,Last Name,First Name,Work Schedule Description,Org Name,Project ID,Project Name,User Defined Code 3,Hours Date,Entered Hours,Approved Hours,Comments


In [37]:
util_df['Last Name'].unique()

array(['Alexandrovich', 'Ajroud', 'Chandrasekaran', 'Cook', 'Dubois',
       'Flower', 'Gambrill', 'Gibert', 'Hardeman', 'Hicks', 'Hoye',
       'Lauck', 'Nease', 'Peabody', 'Present', 'Schueler', 'Anderson',
       'Boysen', 'Motlow', 'Riley', 'Sokulsky', 'Praul', 'Hansen',
       'Guetschow', 'Brock', 'Chesterman', 'Abragan', 'Uhl', 'Exline',
       'King', 'Chery', 'Shay', 'Baca', 'Martinez-Sanchez', 'Grange',
       'Castillo Ferri', 'Nico', 'Wolf', 'Mirghani', 'Motolinia',
       'Daniels', 'Durand', 'Haik', 'Bevins', 'Wong', 'Sarkisian',
       'Spencer', 'Ballard', 'Boutemy', 'Schmidt', 'Giannoni', 'Tripp',
       'Reilly', 'Witz', 'Fong', 'Connolly', 'Masood', 'Peimbert',
       'Owusu', 'Walter', 'Byenkya', 'Richards', 'Myers', 'Kapikinyu'],
      dtype=object)