# 2020-21 S1 Portfolio-level Utilizations (June - Oct)
Utilization is calculated at the portfolio level as the total billable hours divided by a weighted FTE based on the proportion of the company's time spent towards that portfolio. Because Deltek was not available in April and May, those hours are dropped from the analysis.

To calculate utilization at the portfolio level, information required includes:
* **org_bill_hrs**: Total billable hours for the portfolio
* **org_total_hrs**: Total hours (billable and non-billable) for the portfolio (e.g., Planning & Ops, HR, General)
* **fte_hrs**: FTE hours for the company (i.e., total workable days * number of employees each day * 8 hours per day)
* **total_hrs**: Total hours for the company (i.e., total of all hours worked)

This workbook evaluates portfolio-level utilizations without first filtering by team member (i.e., excluding employees that did not log any hours to the portfolio). See *2020-21 S2 Portfolio-level Utilizations (filtered)* for comparison. 

In [1]:
import pandas as pd
import numpy as np

In [2]:
# read in data
util_df = pd.read_csv(r'C:\Users\Erik\Downloads\Utilization Tabular (12).csv', sep='\t',
                       encoding='utf_16_le')
util_df['Hours Date'] = pd.to_datetime(util_df['Hours Date'])
filt = (util_df['Hours Date'] < pd.to_datetime('2020-10-31')) &  (util_df['Hours Date'] > pd.to_datetime('2020-06-01'))
util_df = util_df[filt]
org_df = pd.read_csv(r'C:\Users\Erik\Downloads\Organizations.csv', sep='\t',
                       encoding='utf_16_le')
emp_df = pd.read_csv(r'C:\Users\Erik\Downloads\Employees (14).csv', sep='\t',
                       encoding='utf_16_le')
emp_df['Hire date'] = pd.to_datetime(emp_df['Hire date'])
emp_df['Termination date'] = pd.to_datetime(emp_df['Termination date'])

In [3]:
util_df.head()

Unnamed: 0,Employee ID,Last Name,First Name,Work Schedule Description,Org Name,Project ID,Project Name,User Defined Code 3,Hours Date,Entered Hours,Approved Hours,Comments
3,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.001.001.01,Planning and Ops Gen Intl,OVH,2020-06-02,1.5,0,
4,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.001.009.01,Planning and Ops CH,G&A,2020-06-02,7.5,0,
5,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.001.009.01,Planning and Ops CH,G&A,2020-06-03,5.86,0,
6,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.002.009.01,HR & Recruiting CH,G&A,2020-06-03,5.0,0,
7,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.005.001.01,R&D General Intl,IRD,2020-06-03,1.67,0,


In [4]:
org_df.head()

Unnamed: 0,Project ID,Project Name,Organization ID,Organization Name,Level Number
0,1001,USAID Measuring Impact II,1.01.01.01,General Intl,4
1,1001.AFR,BI-AFR,1.01.01.04,Africa,4
2,1001.AFR.001,BI-AFR,1.01.01.04,Africa,4
3,1001.AFR.001.01,16.0.AFR_BuyIn_Mgmt,1.01.01.04,Africa,4
4,1001.AFR.001.02,16.0.AFR_Zambia FS,1.01.01.04,Africa,4


In [5]:
# confirm Project ID is unique
len(org_df) == len(org_df['Project ID'].unique())

True

In [6]:
# merge hours entries and organizations
df = pd.merge(util_df, org_df, how='left')
df.head()

Unnamed: 0,Employee ID,Last Name,First Name,Work Schedule Description,Org Name,Project ID,Project Name,User Defined Code 3,Hours Date,Entered Hours,Approved Hours,Comments,Organization ID,Organization Name,Level Number
0,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.001.001.01,Planning and Ops Gen Intl,OVH,2020-06-02,1.5,0,,1.01.01.01,General Intl,4.0
1,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.001.009.01,Planning and Ops CH,G&A,2020-06-02,7.5,0,,1.01.90.01,HR and Operations,4.0
2,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.001.009.01,Planning and Ops CH,G&A,2020-06-03,5.86,0,,1.01.90.01,HR and Operations,4.0
3,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.002.009.01,HR & Recruiting CH,G&A,2020-06-03,5.0,0,,1.01.90.01,HR and Operations,4.0
4,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.005.001.01,R&D General Intl,IRD,2020-06-03,1.67,0,,1.01.01.01,General Intl,4.0


In [9]:
# confirm merge did not add new time entries
len(util_df) - len(df) == 0

True

In [10]:
df.columns

Index(['Employee ID', 'Last Name', 'First Name', 'Work Schedule Description',
       'Org Name', 'Project ID', 'Project Name', 'User Defined Code 3',
       'Hours Date', 'Entered Hours', 'Approved Hours', 'Comments',
       'Organization ID', 'Organization Name', 'Level Number'],
      dtype='object')

In [11]:
df['Organization Name'].unique()

array(['General Intl', 'HR and Operations', 'Global Adaptive Managemen',
       'Latin America & the Carib', 'General Domestic',
       'Environmental Incentives', 'Comms & KM', 'HR & Ops General',
       'Contract&Fin Spec Initiat', 'HR & Ops Special Initiat', 'Water',
       'Africa', nan, 'Habitat', 'Marketing General'], dtype=object)

In [12]:
# check for null organizations
filt = df['Organization Name'].isnull()
no_org_df = df.loc[filt]
no_org_df['Project Name'].unique()

array(['FAB Mozambique TA', 'FAB Zambia TA', 'FAB Uganda TA',
       'Cross-Miss Coord & Mgmt.', 'EI Transformation',
       'LAC Coordination', 'FAB Field Support TA', 'FAB Malawi TA',
       'FAB Senegal TA', 'FAB Tanzania TA', 'FAB Ghana TA',
       'RFS/C Resilience', 'Somalia Program'], dtype=object)

In [13]:
no_org_df['Entered Hours'].sum()

1904.94

In [14]:
emp_df.head()

Unnamed: 0,Employee ID,Last Name,First Name,Middle Initial,E-mail Address,Active Flag,Hire date,Termination date,Work Schedule,Work Schedule Description,Default Org
0,100041,Abragan,Maria Celes,L,mabragan@enviroincentives.com,Y,2019-04-08,NaT,STD,Standard,1.01.01
1,100003,Ajroud,Brittany,N,bajroud@enviroincentives.com,Y,2016-10-18,NaT,STD,Standard,1.01.01
2,100001,Alexandrovich,Andrew,,andrew@enviroincentives.com,Y,2010-04-05,NaT,STD,Standard,1.01.90
3,100022,Anderson,Erik,T,eanderson@enviroincentives.com,Y,2014-03-17,NaT,STD,Standard,1.01.02
4,100027,Armanino,Molly,,marmanino@enviroincentives.com,N,2017-06-18,2019-12-14,,,


## fte_hrs

In [15]:
# fte hours is total number of fte days by start and end dates (ignore part time and very part time (e.g., CB))
sem_start = pd.to_datetime('2020-06-01')
sem_end = pd.to_datetime('2020-10-31')

fte_df = emp_df[['Employee ID', 'Hire date', 'Termination date']].copy()

def update_start(hire_date):
    if hire_date < sem_start:
        return sem_start
    elif hire_date > sem_end:
        return sem_end
    else:
        return hire_date
    
def update_end(termination_date):
    if pd.isnull(termination_date):
        return sem_end
    if termination_date > sem_end:
        return sem_end
    elif termination_date < sem_start:
        return sem_start
    else:
        return termination_date

fte_df['sem_start'] = fte_df['Hire date'].apply(update_start)
fte_df['sem_end'] = fte_df['Termination date'].apply(update_end)
fte_df['bushrs'] = np.busday_count(fte_df['sem_start'].dt.date, fte_df['sem_end'].dt.date) * 8

fte_df.head()

Unnamed: 0,Employee ID,Hire date,Termination date,sem_start,sem_end,bushrs
0,100041,2019-04-08,NaT,2020-06-01,2020-10-31,880
1,100003,2016-10-18,NaT,2020-06-01,2020-10-31,880
2,100001,2010-04-05,NaT,2020-06-01,2020-10-31,880
3,100022,2014-03-17,NaT,2020-06-01,2020-10-31,880
4,100027,2017-06-18,2019-12-14,2020-06-01,2020-06-01,0


In [16]:
fte_hrs = fte_df['bushrs'].sum()
fte_hrs

50400

## total_hrs

In [17]:
# total hours is straight sum 
total_hrs = df['Entered Hours'].sum()
total_hrs

47402.560000000005

In [18]:
# the proportion of total hours worked is slightly less than full time, consistent with some employees being part time
total_hrs/fte_hrs

0.9405269841269842

In [19]:
# review fte by employee based on expected fte hours as calculated
hrs_by_emp = df.groupby('Employee ID').sum()['Entered Hours']
hrs_by_emp = pd.merge(hrs_by_emp, fte_df, left_index=True, right_on='Employee ID', how='left')
hrs_by_emp['fte'] = hrs_by_emp['Entered Hours'] / hrs_by_emp['bushrs']
hrs_by_emp = pd.merge(hrs_by_emp, emp_df, on='Employee ID')
hrs_by_emp.loc[:, ['Last Name', 'First Name', 'Entered Hours', 'sem_start', 'sem_end', 'bushrs', 'fte']].sort_values('fte', ascending=False)

Unnamed: 0,Last Name,First Name,Entered Hours,sem_start,sem_end,bushrs,fte
64,Byenkya,Tina,296.00,2020-10-02,2020-10-31,168,1.761905
62,Owusu,Philip,364.00,2020-09-15,2020-10-31,272,1.338235
63,Walter,Karen,286.50,2020-09-22,2020-10-31,232,1.234914
13,Hoye,Susan,1062.80,2020-06-01,2020-10-31,880,1.207727
34,Shay,Arica,1036.07,2020-06-01,2020-10-31,880,1.177352
...,...,...,...,...,...,...,...
31,Exline,Kelly,779.10,2020-06-01,2020-10-31,880,0.885341
5,Flower,Kathleen,713.12,2020-06-01,2020-10-31,880,0.810364
27,Brock,Cameryn,625.98,2020-06-15,2020-10-31,800,0.782475
4,Dubois,Natalie,651.00,2020-06-01,2020-10-31,880,0.739773


In [20]:
# what is happening with high ftes for some new employees?
filt = df['Last Name'].isin(['Byenkya'])
df[filt]

Unnamed: 0,Employee ID,Last Name,First Name,Work Schedule Description,Org Name,Project ID,Project Name,User Defined Code 3,Hours Date,Entered Hours,Approved Hours,Comments,Organization ID,Organization Name,Level Number
22022,100097.0,Byenkya,Tina,Standard Kenya,Africa,1009.003.202.01,Somalia Program,SRV,2020-10-02,16.0,0,Adding time after getting access to timesheet,,,
22023,100097.0,Byenkya,Tina,Standard Kenya,Africa,1009.003.202.01,Somalia Program,SRV,2020-10-05,16.0,0,Adding time after getting access to timesheet,,,
22024,100097.0,Byenkya,Tina,Standard Kenya,Africa,1009.003.202.01,Somalia Program,SRV,2020-10-06,16.0,0,Adding time after getting access to timesheet,,,
22025,100097.0,Byenkya,Tina,Standard Kenya,Africa,1009.003.202.01,Somalia Program,SRV,2020-10-07,16.0,0,Adding time after getting access to timesheet,,,
22026,100097.0,Byenkya,Tina,Standard Kenya,Africa,1009.003.202.01,Somalia Program,SRV,2020-10-08,16.0,0,Adding time after getting access to timesheet,,,
22027,100097.0,Byenkya,Tina,Standard Kenya,Africa,1009.003.202.01,Somalia Program,SRV,2020-10-09,16.0,0,Adding time after getting access to timesheet,,,
22028,100097.0,Byenkya,Tina,Standard Kenya,Africa,1009.003.202.01,Somalia Program,SRV,2020-10-12,8.0,0,Adding time after getting access to timesheet,,,
22029,100097.0,Byenkya,Tina,Standard Kenya,Africa,1009.003.202.01,Somalia Program,SRV,2020-10-12,8.0,0,I worked on this day. Based on guidance from A...,,,
22030,100097.0,Byenkya,Tina,Standard Kenya,Africa,1009.003.202.01,Somalia Program,SRV,2020-10-13,16.0,0,Adding time after getting access to timesheet,,,
22031,100097.0,Byenkya,Tina,Standard Kenya,Africa,1009.003.202.01,Somalia Program,SRV,2020-10-14,16.0,0,Adding time after getting access to timesheet,,,


In [21]:
# drop Byenkya due to irregularities in data entries (too many 16 hour days)
df = df[~filt]
df

Unnamed: 0,Employee ID,Last Name,First Name,Work Schedule Description,Org Name,Project ID,Project Name,User Defined Code 3,Hours Date,Entered Hours,Approved Hours,Comments,Organization ID,Organization Name,Level Number
0,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.001.001.01,Planning and Ops Gen Intl,OVH,2020-06-02,1.50,0,,1.01.01.01,General Intl,4.0
1,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.001.009.01,Planning and Ops CH,G&A,2020-06-02,7.50,0,,1.01.90.01,HR and Operations,4.0
2,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.001.009.01,Planning and Ops CH,G&A,2020-06-03,5.86,0,,1.01.90.01,HR and Operations,4.0
3,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.002.009.01,HR & Recruiting CH,G&A,2020-06-03,5.00,0,,1.01.90.01,HR and Operations,4.0
4,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.005.001.01,R&D General Intl,IRD,2020-06-03,1.67,0,,1.01.01.01,General Intl,4.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22100,100098.0,Richards,Anjali,Standard,Global Adaptive Managemen,1009.002.108.02,RFS/C Resilience,SRV,2020-10-30,5.00,0,Buy-in support.,,,
22101,100099.0,Myers,Tiffany,Standard,Global Adaptive Managemen,1009.002.101.01,AFR-SD EGEA,SRV,2020-10-30,2.00,0,First day onboarading,1.01.01.02,Global Adaptive Managemen,4.0
22102,100099.0,Myers,Tiffany,Standard,Global Adaptive Managemen,INDR.002.001.02,HR & Recruiting Global AM,OVH,2020-10-30,6.00,0,HR onboarding,1.01.01.02,Global Adaptive Managemen,4.0
22103,100100.0,Kapikinyu,Takah,Standard,Global Adaptive Managemen,1009.002.111.02,BHA/DMEL Division,SRV,2020-10-30,6.00,0,Onboarding,1.01.01.02,Global Adaptive Managemen,4.0


## org_total_hrs

In [22]:
# calculate total hours by organization
org_total_hrs = df.groupby('Organization Name').sum()['Entered Hours']
org_total_hrs

Organization Name
Africa                        2137.52
Comms & KM                    4623.13
Contract&Fin Spec Initiat       17.25
Environmental Incentives      4340.32
General Domestic               145.47
General Intl                  3655.95
Global Adaptive Managemen    11805.71
HR & Ops General               556.03
HR & Ops Special Initiat       538.68
HR and Operations             4034.58
Habitat                       3631.66
Latin America & the Carib     6345.47
Marketing General              340.72
Water                         3285.13
Name: Entered Hours, dtype: float64

## org_bill_hrs

In [23]:
# calculate billable hours by organization
filt = df['User Defined Code 3'] == 'SRV'
org_bill_hrs = df.loc[filt].groupby('Organization Name').sum()['Entered Hours']
org_bill_hrs

Organization Name
Africa                        1553.50
Comms & KM                    3156.49
General Intl                   285.14
Global Adaptive Managemen    10368.61
Habitat                       2455.95
Latin America & the Carib     5659.26
Water                         1853.05
Name: Entered Hours, dtype: float64

In [27]:
# what is billable in General Intl?
filt = (df['Organization Name'] == 'General Intl') & (df['User Defined Code 3'] == 'SRV')
df.loc[filt, 'Project Name'].unique()

array(['Walton-SLED Phase I (IC)', 'Walton-SLED Phase II (IC)'],
      dtype=object)

In [22]:
# join to single df
hrs_df = pd.merge(org_bill_hrs, org_total_hrs, left_index=True, right_index=True, suffixes=('_bill', '_total'))
hrs_df

Unnamed: 0_level_0,Entered Hours_bill,Entered Hours_total
Organization Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Africa,1553.5,2137.52
Comms & KM,3156.49,4623.13
General Intl,285.14,3655.95
Global Adaptive Managemen,10368.61,11805.71
Habitat,2455.95,3631.66
Latin America & the Carib,5659.26,6345.47
Water,1853.05,3285.13


## Calculations
### Proportion to organization
prop_to_org = org_total_hrs / total_hrs

In [23]:
# divide org total hrs by total hours to get proportion
hrs_df['prop_to_org'] = hrs_df['Entered Hours_total'] / total_hrs
hrs_df

Unnamed: 0_level_0,Entered Hours_bill,Entered Hours_total,prop_to_org
Organization Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Africa,1553.5,2137.52,0.045093
Comms & KM,3156.49,4623.13,0.097529
General Intl,285.14,3655.95,0.077126
Global Adaptive Managemen,10368.61,11805.71,0.249052
Habitat,2455.95,3631.66,0.076613
Latin America & the Carib,5659.26,6345.47,0.133863
Water,1853.05,3285.13,0.069303


### Weighted FTE
weighted_fte = prop_to_org * fte_hrs

In [24]:
# weight fte by prop to org
hrs_df['weighted_fte'] = hrs_df['prop_to_org'] * fte_hrs
hrs_df

Unnamed: 0_level_0,Entered Hours_bill,Entered Hours_total,prop_to_org,weighted_fte
Organization Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Africa,1553.5,2137.52,0.045093,2272.683332
Comms & KM,3156.49,4623.13,0.097529,4915.467688
General Intl,285.14,3655.95,0.077126,3887.129303
Global Adaptive Managemen,10368.61,11805.71,0.249052,12552.228909
Habitat,2455.95,3631.66,0.076613,3861.303356
Latin America & the Carib,5659.26,6345.47,0.133863,6746.717646
Water,1853.05,3285.13,0.069303,3492.860976


### Utilization
utilization = org_bill_hrs / weighted_fte

In [25]:
# utilization is billale hours divided by weighted fte
hrs_df['utilization'] = hrs_df['Entered Hours_bill'] / hrs_df['weighted_fte']
hrs_df.to_csv(r'C:\Users\Erik\Downloads\portfolio_utilization.csv')
hrs_df

Unnamed: 0_level_0,Entered Hours_bill,Entered Hours_total,prop_to_org,weighted_fte,utilization
Organization Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Africa,1553.5,2137.52,0.045093,2272.683332,0.683553
Comms & KM,3156.49,4623.13,0.097529,4915.467688,0.642155
General Intl,285.14,3655.95,0.077126,3887.129303,0.073355
Global Adaptive Managemen,10368.61,11805.71,0.249052,12552.228909,0.826037
Habitat,2455.95,3631.66,0.076613,3861.303356,0.636042
Latin America & the Carib,5659.26,6345.47,0.133863,6746.717646,0.838817
Water,1853.05,3285.13,0.069303,3492.860976,0.530525


In [26]:
# what proportion of hours went towards billale projects
hrs_df['Entered Hours_bill'].sum() / total_hrs

0.53440151755517

In [27]:
# what was the company-wide utilization?
hrs_df['Entered Hours_bill'].sum() / fte_hrs

0.5026190476190476

## Notes

In [28]:
# Note that Molly does not appear to be updated in the system
filt = emp_df['Last Name'] == 'Armanino'
emp_df.loc[filt]

Unnamed: 0,Employee ID,Last Name,First Name,Middle Initial,E-mail Address,Active Flag,Hire date,Termination date,Work Schedule,Work Schedule Description,Default Org
4,100027,Armanino,Molly,,marmanino@enviroincentives.com,N,2017-06-18,2019-12-14,,,


In [29]:
# However she also does not appear to be billing
filt = util_df['Last Name'] == 'Armanino'
util_df.loc[filt]

Unnamed: 0,Employee ID,Last Name,First Name,Work Schedule Description,Org Name,Project ID,Project Name,User Defined Code 3,Hours Date,Entered Hours,Approved Hours,Comments


In [30]:
util_df['Last Name'].unique()

array(['Alexandrovich', 'Ajroud', 'Chandrasekaran', 'Cook', 'Dubois',
       'Flower', 'Gambrill', 'Gibert', 'Hardeman', 'Hicks', 'Hoye',
       'Lauck', 'Nease', 'Peabody', 'Present', 'Schueler', 'Anderson',
       'Boysen', 'Motlow', 'Riley', 'Sokulsky', 'Praul', 'Hansen',
       'Guetschow', 'Brock', 'Chesterman', 'Abragan', 'Uhl', 'Exline',
       'King', 'Chery', 'Shay', 'Baca', 'Martinez-Sanchez', 'Grange',
       'Castillo Ferri', 'Nico', 'Wolf', 'Mirghani', 'Motolinia',
       'Daniels', 'Durand', 'Haik', 'Bevins', 'Wong', 'Sarkisian',
       'Spencer', 'Ballard', 'Boutemy', 'Schmidt', 'Giannoni', 'Tripp',
       'Reilly', 'Witz', 'Fong', 'Connolly', 'Masood', 'Peimbert',
       'Owusu', 'Walter', 'Byenkya', 'Richards', 'Myers', 'Kapikinyu'],
      dtype=object)