# 2020-21 S1 Portfolio-level Utilizations (June - Oct)
Utilization is calculated at the portfolio level as the total billable hours divided by a weighted FTE based on the proportion of the company's time spent towards that portfolio. Because Deltek was not available in April and May, those hours are dropped from the analysis.

To calculate utilization at the portfolio level, information required includes:
* **org_bill_hrs**: Total billable hours for the portfolio
* **org_total_hrs**: Total hours (billable and non-billable) for the portfolio (e.g., Planning & Ops, HR, General)
* **fte_hrs**: FTE hours for the company (i.e., total workable days * number of employees each day * 8 hours per day)
* **total_hrs**: Total hours for the company (i.e., total of all hours worked)

This workbook evaluates portfolio-level utilizations without first filtering by team member (i.e., excluding employees that did not log any hours to the portfolio). See *2020-21 S2 Portfolio-level Utilizations (filtered)* for comparison. 

In [90]:
import pandas as pd
import numpy as np

In [91]:
# read in data
util_df = pd.read_csv(r'C:\Users\Erik\Downloads\Utilization Tabular (12).csv', sep='\t',
                       encoding='utf_16_le')
util_df['Hours Date'] = pd.to_datetime(util_df['Hours Date'])
filt = (util_df['Hours Date'] < pd.to_datetime('2020-10-31')) &  (util_df['Hours Date'] > pd.to_datetime('2020-06-01'))
util_df = util_df[filt]
org_df = pd.read_csv(r'C:\Users\Erik\Downloads\Organizations.csv', sep='\t',
                       encoding='utf_16_le')
emp_df = pd.read_csv(r'C:\Users\Erik\Downloads\Employees (14).csv', sep='\t',
                       encoding='utf_16_le')
emp_df['Hire date'] = pd.to_datetime(emp_df['Hire date'])
emp_df['Termination date'] = pd.to_datetime(emp_df['Termination date'])

In [92]:
util_df.head()

Unnamed: 0,Employee ID,Last Name,First Name,Work Schedule Description,Org Name,Project ID,Project Name,User Defined Code 3,Hours Date,Entered Hours,Approved Hours,Comments
3,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.001.001.01,Planning and Ops Gen Intl,OVH,2020-06-02,1.5,0,
4,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.001.009.01,Planning and Ops CH,G&A,2020-06-02,7.5,0,
5,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.001.009.01,Planning and Ops CH,G&A,2020-06-03,5.86,0,
6,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.002.009.01,HR & Recruiting CH,G&A,2020-06-03,5.0,0,
7,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.005.001.01,R&D General Intl,IRD,2020-06-03,1.67,0,


In [93]:
org_df.head()

Unnamed: 0,Project ID,Project Name,Organization ID,Organization Name,Level Number
0,1001,USAID Measuring Impact II,1.01.01.01,General Intl,4
1,1001.AFR,BI-AFR,1.01.01.04,Africa,4
2,1001.AFR.001,BI-AFR,1.01.01.04,Africa,4
3,1001.AFR.001.01,16.0.AFR_BuyIn_Mgmt,1.01.01.04,Africa,4
4,1001.AFR.001.02,16.0.AFR_Zambia FS,1.01.01.04,Africa,4


In [94]:
# confirm Project ID is unique
len(org_df) == len(org_df['Project ID'].unique())

True

In [100]:
# merge hours entries and organizations
df = pd.merge(util_df, org_df, how='left', left_on='Project ID', right_on='Project ID')
df['Project Name'] = df['Project Name_x']
df = df.drop(columns=['Project Name_x', 'Project Name_y'])
df.head()

Unnamed: 0,Employee ID,Last Name,First Name,Work Schedule Description,Org Name,Project ID,User Defined Code 3,Hours Date,Entered Hours,Approved Hours,Comments,Organization ID,Organization Name,Level Number,Project Name
0,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.001.001.01,OVH,2020-06-02,1.5,0,,1.01.01.01,General Intl,4,Planning and Ops Gen Intl
1,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.001.009.01,G&A,2020-06-02,7.5,0,,1.01.90.01,HR and Operations,4,Planning and Ops CH
2,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.001.009.01,G&A,2020-06-03,5.86,0,,1.01.90.01,HR and Operations,4,Planning and Ops CH
3,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.002.009.01,G&A,2020-06-03,5.0,0,,1.01.90.01,HR and Operations,4,HR & Recruiting CH
4,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.005.001.01,IRD,2020-06-03,1.67,0,,1.01.01.01,General Intl,4,R&D General Intl


In [101]:
# confirm |merge did not add new time entries
len(util_df) - len(df) == 0

True

In [102]:
df.columns

Index(['Employee ID', 'Last Name', 'First Name', 'Work Schedule Description',
       'Org Name', 'Project ID', 'User Defined Code 3', 'Hours Date',
       'Entered Hours', 'Approved Hours', 'Comments', 'Organization ID',
       'Organization Name', 'Level Number', 'Project Name'],
      dtype='object')

In [103]:
df['Organization Name'].unique()

array(['General Intl', 'HR and Operations', 'Global Adaptive Managemen',
       'Latin America & the Carib', 'General Domestic',
       'Environmental Incentives', 'Comms & KM', 'HR & Ops General',
       'Contract&Fin Spec Initiat', 'HR & Ops Special Initiat', 'Water',
       'Africa', 'Habitat', 'Marketing General'], dtype=object)

In [105]:
# check for null organizations
filt = df['Organization Name'].isnull()
no_org_df = df.loc[filt]
no_org_df['Project Name'].unique()

array([], dtype=object)

In [106]:
no_org_df['Entered Hours'].sum()

0.0

In [109]:
emp_df.head()

Unnamed: 0,Employee ID,Last Name,First Name,Middle Initial,E-mail Address,Active Flag,Hire date,Termination date,Work Schedule,Work Schedule Description,Default Org
0,100041,Abragan,Maria Celes,L,mabragan@enviroincentives.com,Y,2019-04-08,NaT,STD,Standard,1.01.01
1,100003,Ajroud,Brittany,N,bajroud@enviroincentives.com,Y,2016-10-18,NaT,STD,Standard,1.01.01
2,100001,Alexandrovich,Andrew,,andrew@enviroincentives.com,Y,2010-04-05,NaT,STD,Standard,1.01.90
3,100022,Anderson,Erik,T,eanderson@enviroincentives.com,Y,2014-03-17,NaT,STD,Standard,1.01.02
4,100027,Armanino,Molly,,marmanino@enviroincentives.com,N,2017-06-18,2019-12-14,,,


## fte_hrs

In [121]:
def get_bus_hrs(emp_df, start, end):
    """start and end as 'YYYY-MM-DD' strings"""
    start = pd.to_datetime(start)
    end = pd.to_datetime(end)
    
    fte_df = emp_df[['Employee ID', 'Hire date', 'Termination date']].copy()

    def update_start(hire_date):
        if hire_date < sem_start:
            return sem_start
        elif hire_date > sem_end:
            return sem_end
        else:
            return hire_date

    def update_end(termination_date):
        if pd.isnull(termination_date):
            return sem_end
        if termination_date > sem_end:
            return sem_end
        elif termination_date < sem_start:
            return sem_start
        else:
            return termination_date

    fte_df['sem_start'] = fte_df['Hire date'].apply(update_start)
    fte_df['sem_end'] = fte_df['Termination date'].apply(update_end)
    fte_df['bushrs'] = np.busday_count(fte_df['sem_start'].dt.date, fte_df['sem_end'].dt.date) * 8
    
    fte_hrs = fte_df['bushrs'].sum()
    
    return fte_hrs

In [122]:
# fte hours is total number of fte days by start and end dates (ignore part time and very part time (e.g., CB))
sem_start = pd.to_datetime('2020-06-01')
sem_end = pd.to_datetime('2020-10-31')

fte_df = emp_df[['Employee ID', 'Hire date', 'Termination date']].copy()

def update_start(hire_date):
    if hire_date < sem_start:
        return sem_start
    elif hire_date > sem_end:
        return sem_end
    else:
        return hire_date
    
def update_end(termination_date):
    if pd.isnull(termination_date):
        return sem_end
    if termination_date > sem_end:
        return sem_end
    elif termination_date < sem_start:
        return sem_start
    else:
        return termination_date

fte_df['sem_start'] = fte_df['Hire date'].apply(update_start)
fte_df['sem_end'] = fte_df['Termination date'].apply(update_end)
fte_df['bushrs'] = np.busday_count(fte_df['sem_start'].dt.date, fte_df['sem_end'].dt.date) * 8

fte_df.head()

Unnamed: 0,Employee ID,Hire date,Termination date,sem_start,sem_end,bushrs
0,100041,2019-04-08,NaT,2020-06-01,2020-10-31,880
1,100003,2016-10-18,NaT,2020-06-01,2020-10-31,880
2,100001,2010-04-05,NaT,2020-06-01,2020-10-31,880
3,100022,2014-03-17,NaT,2020-06-01,2020-10-31,880
4,100027,2017-06-18,2019-12-14,2020-06-01,2020-06-01,0


In [123]:
fte_hrs = get_bus_hrs(emp_df, '2020-06-01', '2020-10-31')
fte_hrs

50400

In [124]:
fte_hrs = fte_df['bushrs'].sum()
fte_hrs

50400

## total_hrs

In [125]:
def get_total_hrs(df, start, end):
    """start and end as 'YYYY-MM-DD' strings"""
    start = pd.to_datetime(start)
    end = pd.to_datetime(end)
    
    filt = (df['Hours Date'] >= start) & (df['Hours Date'] <= end)
    df = df.loc[filt]
    
    total_hrs = df['Entered Hours'].sum()
    
    return total_hrs

In [126]:
total_hrs = get_total_hrs(df, '2020-06-01', '2020-10-31')
total_hrs

47402.560000000005

In [127]:
# total hours is straight sum 
total_hrs = df['Entered Hours'].sum()
total_hrs

47402.560000000005

In [128]:
# the proportion of total hours worked is slightly less than full time, consistent with some employees being part time
total_hrs/fte_hrs

0.9405269841269842

In [129]:
# review fte by employee based on expected fte hours as calculated
hrs_by_emp = df.groupby('Employee ID').sum()['Entered Hours']
hrs_by_emp = pd.merge(hrs_by_emp, fte_df, left_index=True, right_on='Employee ID', how='left')
hrs_by_emp['fte'] = hrs_by_emp['Entered Hours'] / hrs_by_emp['bushrs']
hrs_by_emp = pd.merge(hrs_by_emp, emp_df, on='Employee ID')
hrs_by_emp.loc[:, ['Last Name', 'First Name', 'Entered Hours', 'sem_start', 'sem_end', 'bushrs', 'fte']].sort_values('fte', ascending=False)

Unnamed: 0,Last Name,First Name,Entered Hours,sem_start,sem_end,bushrs,fte
64,Byenkya,Tina,296.00,2020-10-02,2020-10-31,168,1.761905
62,Owusu,Philip,364.00,2020-09-15,2020-10-31,272,1.338235
63,Walter,Karen,286.50,2020-09-22,2020-10-31,232,1.234914
13,Hoye,Susan,1062.80,2020-06-01,2020-10-31,880,1.207727
34,Shay,Arica,1036.07,2020-06-01,2020-10-31,880,1.177352
...,...,...,...,...,...,...,...
31,Exline,Kelly,779.10,2020-06-01,2020-10-31,880,0.885341
5,Flower,Kathleen,713.12,2020-06-01,2020-10-31,880,0.810364
27,Brock,Cameryn,625.98,2020-06-15,2020-10-31,800,0.782475
4,Dubois,Natalie,651.00,2020-06-01,2020-10-31,880,0.739773


In [130]:
# what is happening with high ftes for some new employees?
filt = df['Last Name'].isin(['Byenkya'])
df[filt]

Unnamed: 0,Employee ID,Last Name,First Name,Work Schedule Description,Org Name,Project ID,User Defined Code 3,Hours Date,Entered Hours,Approved Hours,Comments,Organization ID,Organization Name,Level Number,Project Name
22022,100097.0,Byenkya,Tina,Standard Kenya,Africa,1009.003.202.01,SRV,2020-10-02,16.0,0,Adding time after getting access to timesheet,1.01.01.02,Global Adaptive Managemen,4,Somalia Program
22023,100097.0,Byenkya,Tina,Standard Kenya,Africa,1009.003.202.01,SRV,2020-10-05,16.0,0,Adding time after getting access to timesheet,1.01.01.02,Global Adaptive Managemen,4,Somalia Program
22024,100097.0,Byenkya,Tina,Standard Kenya,Africa,1009.003.202.01,SRV,2020-10-06,16.0,0,Adding time after getting access to timesheet,1.01.01.02,Global Adaptive Managemen,4,Somalia Program
22025,100097.0,Byenkya,Tina,Standard Kenya,Africa,1009.003.202.01,SRV,2020-10-07,16.0,0,Adding time after getting access to timesheet,1.01.01.02,Global Adaptive Managemen,4,Somalia Program
22026,100097.0,Byenkya,Tina,Standard Kenya,Africa,1009.003.202.01,SRV,2020-10-08,16.0,0,Adding time after getting access to timesheet,1.01.01.02,Global Adaptive Managemen,4,Somalia Program
22027,100097.0,Byenkya,Tina,Standard Kenya,Africa,1009.003.202.01,SRV,2020-10-09,16.0,0,Adding time after getting access to timesheet,1.01.01.02,Global Adaptive Managemen,4,Somalia Program
22028,100097.0,Byenkya,Tina,Standard Kenya,Africa,1009.003.202.01,SRV,2020-10-12,8.0,0,Adding time after getting access to timesheet,1.01.01.02,Global Adaptive Managemen,4,Somalia Program
22029,100097.0,Byenkya,Tina,Standard Kenya,Africa,1009.003.202.01,SRV,2020-10-12,8.0,0,I worked on this day. Based on guidance from A...,1.01.01.02,Global Adaptive Managemen,4,Somalia Program
22030,100097.0,Byenkya,Tina,Standard Kenya,Africa,1009.003.202.01,SRV,2020-10-13,16.0,0,Adding time after getting access to timesheet,1.01.01.02,Global Adaptive Managemen,4,Somalia Program
22031,100097.0,Byenkya,Tina,Standard Kenya,Africa,1009.003.202.01,SRV,2020-10-14,16.0,0,Adding time after getting access to timesheet,1.01.01.02,Global Adaptive Managemen,4,Somalia Program


In [131]:
# drop Byenkya due to irregularities in data entries (too many 16 hour days)
df = df[~filt]
df

Unnamed: 0,Employee ID,Last Name,First Name,Work Schedule Description,Org Name,Project ID,User Defined Code 3,Hours Date,Entered Hours,Approved Hours,Comments,Organization ID,Organization Name,Level Number,Project Name
0,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.001.001.01,OVH,2020-06-02,1.50,0,,1.01.01.01,General Intl,4,Planning and Ops Gen Intl
1,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.001.009.01,G&A,2020-06-02,7.50,0,,1.01.90.01,HR and Operations,4,Planning and Ops CH
2,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.001.009.01,G&A,2020-06-03,5.86,0,,1.01.90.01,HR and Operations,4,Planning and Ops CH
3,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.002.009.01,G&A,2020-06-03,5.00,0,,1.01.90.01,HR and Operations,4,HR & Recruiting CH
4,100001.0,Alexandrovich,Andrew,Standard,Company Health,INDR.005.001.01,IRD,2020-06-03,1.67,0,,1.01.01.01,General Intl,4,R&D General Intl
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22100,100098.0,Richards,Anjali,Standard,Global Adaptive Managemen,1009.002.108.02,SRV,2020-10-30,5.00,0,Buy-in support.,1.01.01.02,Global Adaptive Managemen,4,RFS/C Resilience
22101,100099.0,Myers,Tiffany,Standard,Global Adaptive Managemen,1009.002.101.01,SRV,2020-10-30,2.00,0,First day onboarading,1.01.01.02,Global Adaptive Managemen,4,AFR-SD EGEA
22102,100099.0,Myers,Tiffany,Standard,Global Adaptive Managemen,INDR.002.001.02,OVH,2020-10-30,6.00,0,HR onboarding,1.01.01.02,Global Adaptive Managemen,4,HR & Recruiting Global AM
22103,100100.0,Kapikinyu,Takah,Standard,Global Adaptive Managemen,1009.002.111.02,SRV,2020-10-30,6.00,0,Onboarding,1.01.01.02,Global Adaptive Managemen,4,BHA/DMEL Division


## org_total_hrs

In [132]:
def get_org_total_hrs(df, start, end):
    """start and end as 'YYYY-MM-DD' strings
    returns df"""
    start = pd.to_datetime(start)
    end = pd.to_datetime(end)
    
    filt = (df['Hours Date'] >= start) & (df['Hours Date'] <= end)
    df = df.loc[filt]
    
    org_total_hrs = df.groupby('Organization Name').sum()['Entered Hours']
    
    return org_total_hrs

In [133]:
# calculate total hours by organization
org_total_hrs = df.groupby('Organization Name').sum()['Entered Hours']
org_total_hrs

Organization Name
Africa                        2303.25
Comms & KM                    4623.13
Contract&Fin Spec Initiat       17.25
Environmental Incentives      4340.32
General Domestic               145.47
General Intl                  3655.95
Global Adaptive Managemen    12819.33
HR & Ops General               556.03
HR & Ops Special Initiat       744.58
HR and Operations             4034.58
Habitat                       3631.66
Latin America & the Carib     6609.16
Marketing General              340.72
Water                         3285.13
Name: Entered Hours, dtype: float64

In [134]:
org_total_hrs = get_org_total_hrs(df, '2020-06-01', '2020-10-31')
org_total_hrs

Organization Name
Africa                        2303.25
Comms & KM                    4623.13
Contract&Fin Spec Initiat       17.25
Environmental Incentives      4340.32
General Domestic               145.47
General Intl                  3655.95
Global Adaptive Managemen    12819.33
HR & Ops General               556.03
HR & Ops Special Initiat       744.58
HR and Operations             4034.58
Habitat                       3631.66
Latin America & the Carib     6609.16
Marketing General              340.72
Water                         3285.13
Name: Entered Hours, dtype: float64

In [135]:
org_total_hrs.sum()

47106.56000000002

## org_bill_hrs

In [136]:
def get_org_bill_hrs(df, start, end):
    """start and end as 'YYYY-MM-DD' strings
    returns df"""
    start = pd.to_datetime(start)
    end = pd.to_datetime(end)
    
    filt = (df['Hours Date'] >= start) & (df['Hours Date'] <= end)
    df = df.loc[filt]
    
    filt = df['User Defined Code 3'] == 'SRV'
    org_bill_hrs = df.loc[filt].groupby('Organization Name').sum()['Entered Hours']
    
    return org_bill_hrs

In [137]:
# calculate billable hours by organization
filt = df['User Defined Code 3'] == 'SRV'
org_bill_hrs = df.loc[filt].groupby('Organization Name').sum()['Entered Hours']
org_bill_hrs

Organization Name
Africa                        1719.23
Comms & KM                    3156.49
General Intl                   285.14
Global Adaptive Managemen    11382.23
Habitat                       2455.95
Latin America & the Carib     5922.95
Water                         1853.05
Name: Entered Hours, dtype: float64

In [138]:
org_bill_hrs = get_org_bill_hrs(df, '2020-06-01', '2020-10-31')
org_bill_hrs

Organization Name
Africa                        1719.23
Comms & KM                    3156.49
General Intl                   285.14
Global Adaptive Managemen    11382.23
Habitat                       2455.95
Latin America & the Carib     5922.95
Water                         1853.05
Name: Entered Hours, dtype: float64

In [139]:
org_bill_hrs.sum()

26775.04

In [140]:
# what is billable in General Intl?
filt = (df['Organization Name'] == 'General Intl') & (df['User Defined Code 3'] == 'SRV')
df.loc[filt, 'Project Name'].unique()

array(['Walton-SLED Phase I (IC)', 'Walton-SLED Phase II (IC)'],
      dtype=object)

In [141]:
def create_hrs_df(org_bill_hrs, org_total_hrs, total_hrs, fte_hrs):
    hrs_df = pd.merge(org_bill_hrs, org_total_hrs, left_index=True, right_index=True, suffixes=('_bill', '_total'))
    # divide org total hrs by total hours to get proportion
    hrs_df['prop_to_org'] = hrs_df['Entered Hours_total'] / total_hrs
    # weight fte by prop to org
    hrs_df['weighted_fte'] = hrs_df['prop_to_org'] * fte_hrs
    # utilization is billale hours divided by weighted fte
    hrs_df['utilization'] = hrs_df['Entered Hours_bill'] / hrs_df['weighted_fte']
    
    return hrs_df

In [142]:
hrs_df = create_hrs_df(org_bill_hrs, org_total_hrs, total_hrs, fte_hrs)
hrs_df

Unnamed: 0_level_0,Entered Hours_bill,Entered Hours_total,prop_to_org,weighted_fte,utilization
Organization Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Africa,1719.23,2303.25,0.048589,2448.893056,0.702044
Comms & KM,3156.49,4623.13,0.097529,4915.467688,0.642155
General Intl,285.14,3655.95,0.077126,3887.129303,0.073355
Global Adaptive Managemen,11382.23,12819.33,0.270435,13629.943868,0.83509
Habitat,2455.95,3631.66,0.076613,3861.303356,0.636042
Latin America & the Carib,5922.95,6609.16,0.139426,7027.081744,0.842875
Water,1853.05,3285.13,0.069303,3492.860976,0.530525


In [143]:
# join to single df
hrs_df = pd.merge(org_bill_hrs, org_total_hrs, left_index=True, right_index=True, suffixes=('_bill', '_total'))
hrs_df

Unnamed: 0_level_0,Entered Hours_bill,Entered Hours_total
Organization Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Africa,1719.23,2303.25
Comms & KM,3156.49,4623.13
General Intl,285.14,3655.95
Global Adaptive Managemen,11382.23,12819.33
Habitat,2455.95,3631.66
Latin America & the Carib,5922.95,6609.16
Water,1853.05,3285.13


## Calculations
### Proportion to organization
prop_to_org = org_total_hrs / total_hrs

In [144]:
# divide org total hrs by total hours to get proportion
hrs_df['prop_to_org'] = hrs_df['Entered Hours_total'] / total_hrs
hrs_df

Unnamed: 0_level_0,Entered Hours_bill,Entered Hours_total,prop_to_org
Organization Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Africa,1719.23,2303.25,0.048589
Comms & KM,3156.49,4623.13,0.097529
General Intl,285.14,3655.95,0.077126
Global Adaptive Managemen,11382.23,12819.33,0.270435
Habitat,2455.95,3631.66,0.076613
Latin America & the Carib,5922.95,6609.16,0.139426
Water,1853.05,3285.13,0.069303


### Weighted FTE
weighted_fte = prop_to_org * fte_hrs

In [145]:
# weight fte by prop to org
hrs_df['weighted_fte'] = hrs_df['prop_to_org'] * fte_hrs
hrs_df

Unnamed: 0_level_0,Entered Hours_bill,Entered Hours_total,prop_to_org,weighted_fte
Organization Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Africa,1719.23,2303.25,0.048589,2448.893056
Comms & KM,3156.49,4623.13,0.097529,4915.467688
General Intl,285.14,3655.95,0.077126,3887.129303
Global Adaptive Managemen,11382.23,12819.33,0.270435,13629.943868
Habitat,2455.95,3631.66,0.076613,3861.303356
Latin America & the Carib,5922.95,6609.16,0.139426,7027.081744
Water,1853.05,3285.13,0.069303,3492.860976


### Utilization
utilization = org_bill_hrs / weighted_fte

In [146]:
# utilization is billale hours divided by weighted fte
hrs_df['utilization'] = hrs_df['Entered Hours_bill'] / hrs_df['weighted_fte']
hrs_df.to_csv(r'C:\Users\Erik\Downloads\portfolio_utilization.csv')
hrs_df

Unnamed: 0_level_0,Entered Hours_bill,Entered Hours_total,prop_to_org,weighted_fte,utilization
Organization Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Africa,1719.23,2303.25,0.048589,2448.893056,0.702044
Comms & KM,3156.49,4623.13,0.097529,4915.467688,0.642155
General Intl,285.14,3655.95,0.077126,3887.129303,0.073355
Global Adaptive Managemen,11382.23,12819.33,0.270435,13629.943868,0.83509
Habitat,2455.95,3631.66,0.076613,3861.303356,0.636042
Latin America & the Carib,5922.95,6609.16,0.139426,7027.081744,0.842875
Water,1853.05,3285.13,0.069303,3492.860976,0.530525


In [147]:
# what proportion of hours went towards billale projects
hrs_df['Entered Hours_bill'].sum() / total_hrs

0.5648437552739767

In [148]:
# what was the company-wide utilization?
hrs_df['Entered Hours_bill'].sum() / fte_hrs

0.5312507936507936

## Notes

In [28]:
# Note that Molly does not appear to be updated in the system
filt = emp_df['Last Name'] == 'Armanino'
emp_df.loc[filt]

Unnamed: 0,Employee ID,Last Name,First Name,Middle Initial,E-mail Address,Active Flag,Hire date,Termination date,Work Schedule,Work Schedule Description,Default Org
4,100027,Armanino,Molly,,marmanino@enviroincentives.com,N,2017-06-18,2019-12-14,,,


In [29]:
# However she also does not appear to be billing
filt = util_df['Last Name'] == 'Armanino'
util_df.loc[filt]

Unnamed: 0,Employee ID,Last Name,First Name,Work Schedule Description,Org Name,Project ID,Project Name,User Defined Code 3,Hours Date,Entered Hours,Approved Hours,Comments


In [30]:
util_df['Last Name'].unique()

array(['Alexandrovich', 'Ajroud', 'Chandrasekaran', 'Cook', 'Dubois',
       'Flower', 'Gambrill', 'Gibert', 'Hardeman', 'Hicks', 'Hoye',
       'Lauck', 'Nease', 'Peabody', 'Present', 'Schueler', 'Anderson',
       'Boysen', 'Motlow', 'Riley', 'Sokulsky', 'Praul', 'Hansen',
       'Guetschow', 'Brock', 'Chesterman', 'Abragan', 'Uhl', 'Exline',
       'King', 'Chery', 'Shay', 'Baca', 'Martinez-Sanchez', 'Grange',
       'Castillo Ferri', 'Nico', 'Wolf', 'Mirghani', 'Motolinia',
       'Daniels', 'Durand', 'Haik', 'Bevins', 'Wong', 'Sarkisian',
       'Spencer', 'Ballard', 'Boutemy', 'Schmidt', 'Giannoni', 'Tripp',
       'Reilly', 'Witz', 'Fong', 'Connolly', 'Masood', 'Peimbert',
       'Owusu', 'Walter', 'Byenkya', 'Richards', 'Myers', 'Kapikinyu'],
      dtype=object)

In [165]:
def calc_period_utilization(df, emp_df, start, end):
    
    def get_bus_hrs(emp_df, start, end):
        """start and end as 'YYYY-MM-DD' strings"""
        start = pd.to_datetime(start)
        end = pd.to_datetime(end)

        fte_df = emp_df[['Employee ID', 'Hire date', 'Termination date']].copy()

        def update_start(hire_date):
            if hire_date < start:
                return start
            elif hire_date > end:
                return end
            else:
                return hire_date

        def update_end(termination_date):
            if pd.isnull(termination_date):
                return end
            if termination_date > end:
                return end
            elif termination_date < start:
                return start
            else:
                return termination_date

        fte_df['sem_start'] = fte_df['Hire date'].apply(update_start)
        fte_df['sem_end'] = fte_df['Termination date'].apply(update_end)
        fte_df['bushrs'] = np.busday_count(fte_df['sem_start'].dt.date, fte_df['sem_end'].dt.date) * 8

        fte_hrs = fte_df['bushrs'].sum()

        return fte_hrs

    
    def get_total_hrs(df, start, end):
        """start and end as 'YYYY-MM-DD' strings"""
        start = pd.to_datetime(start)
        end = pd.to_datetime(end)

        filt = (df['Hours Date'] >= start) & (df['Hours Date'] <= end)
        df = df.loc[filt]

        total_hrs = df['Entered Hours'].sum()

        return total_hrs

    
    def get_org_total_hrs(df, start, end):
        """start and end as 'YYYY-MM-DD' strings
        returns df"""
        start = pd.to_datetime(start)
        end = pd.to_datetime(end)

        filt = (df['Hours Date'] >= start) & (df['Hours Date'] <= end)
        df = df.loc[filt]

        org_total_hrs = df.groupby('Organization Name').sum()['Entered Hours']

        return org_total_hrs

    
    def get_org_bill_hrs(df, start, end):
        """start and end as 'YYYY-MM-DD' strings
        returns df"""
        start = pd.to_datetime(start)
        end = pd.to_datetime(end)

        filt = (df['Hours Date'] >= start) & (df['Hours Date'] <= end)
        df = df.loc[filt]

        filt = df['User Defined Code 3'] == 'SRV'
        org_bill_hrs = df.loc[filt].groupby('Organization Name').sum()['Entered Hours']

        return org_bill_hrs


    def create_hrs_df(org_bill_hrs, org_total_hrs, total_hrs, fte_hrs):
        hrs_df = pd.merge(org_bill_hrs, org_total_hrs, left_index=True, right_index=True, suffixes=('_bill', '_total'))
        # divide org total hrs by total hours to get proportion
        hrs_df['prop_to_org'] = hrs_df['Entered Hours_total'] / total_hrs
        # weight fte by prop to org
        hrs_df['weighted_fte'] = hrs_df['prop_to_org'] * fte_hrs
        # utilization is billale hours divided by weighted fte
        hrs_df['utilization'] = hrs_df['Entered Hours_bill'] / hrs_df['weighted_fte']

        return hrs_df
    
    
    bus_hrs = get_bus_hrs(emp_df, start, end)
    total_hrs = get_total_hrs(df, start, end)
    org_total_hrs = get_org_total_hrs(df, start, end)
    org_bill_hrs = get_org_bill_hrs(df, start, end)
    hrs_df = create_hrs_df(org_bill_hrs, org_total_hrs, total_hrs, bus_hrs)
    print(f'bus_hrs: {bus_hrs}\ntotal_hrs: {total_hrs}\n')
    
    return hrs_df

In [172]:
june_df = calc_period_utilization(df, emp_df, '2020-06-01', '2020-06-30')
june_df

bus_hrs: 7616
total_hrs: 7214.98



Unnamed: 0_level_0,Entered Hours_bill,Entered Hours_total,prop_to_org,weighted_fte,utilization
Organization Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Africa,183.78,325.06,0.045053,343.127349,0.535603
Comms & KM,701.44,1002.3,0.138919,1058.009419,0.662981
Global Adaptive Managemen,1361.45,1730.11,0.239794,1826.27225,0.74548
Habitat,331.59,624.45,0.086549,659.157919,0.503051
Latin America & the Carib,970.83,1052.49,0.145876,1110.989059,0.873843
Water,380.2,579.7,0.080347,611.920643,0.621322


In [173]:
july_df = calc_period_utilization(df, emp_df, '2020-07-01', '2020-07-31')
july_df

bus_hrs: 9568
total_hrs: 9460.93



Unnamed: 0_level_0,Entered Hours_bill,Entered Hours_total,prop_to_org,weighted_fte,utilization
Organization Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Africa,313.92,392.89,0.041528,397.336363,0.790061
Comms & KM,702.33,1004.35,0.106158,1015.716299,0.691463
General Intl,48.02,772.69,0.081672,781.434586,0.061451
Global Adaptive Managemen,2523.23,2893.14,0.305799,2925.881866,0.862383
Habitat,522.27,754.59,0.079759,763.129747,0.684379
Latin America & the Carib,1082.92,1256.38,0.132797,1270.598539,0.852291
Water,506.12,694.16,0.073371,702.015857,0.720952


In [174]:
aug_df = calc_period_utilization(df, emp_df, '2020-08-01', '2020-08-31')
aug_df

bus_hrs: 9328
total_hrs: 9308.0



Unnamed: 0_level_0,Entered Hours_bill,Entered Hours_total,prop_to_org,weighted_fte,utilization
Organization Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Africa,388.8,502.87,0.054026,503.950511,0.771504
Comms & KM,587.67,817.53,0.087831,819.286618,0.717295
General Intl,75.86,695.44,0.074714,696.934284,0.108848
Global Adaptive Managemen,2599.13,2785.44,0.299252,2791.425045,0.931112
Habitat,583.24,783.67,0.084193,785.353863,0.742646
Latin America & the Carib,948.16,1097.56,0.117916,1099.918315,0.862028
Water,392.63,812.19,0.087257,813.935144,0.482385


In [175]:
sept_df = calc_period_utilization(df, emp_df, '2020-09-01', '2020-09-30')
sept_df

bus_hrs: 10720
total_hrs: 10814.35



Unnamed: 0_level_0,Entered Hours_bill,Entered Hours_total,prop_to_org,weighted_fte,utilization
Organization Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Africa,411.16,533.49,0.049332,528.835556,0.777482
Comms & KM,562.94,985.23,0.091104,976.634342,0.576408
General Intl,94.39,650.97,0.060195,645.2906,0.146275
Global Adaptive Managemen,2555.38,2816.07,0.260401,2791.501144,0.915414
Habitat,495.04,685.67,0.063404,679.687859,0.728334
Latin America & the Carib,1398.23,1524.05,0.140928,1510.753397,0.925518
Water,299.6,621.48,0.057468,616.057886,0.486318


In [177]:
oct_df = calc_period_utilization(df, emp_df, '2020-10-01', '2020-10-31')
oct_df

bus_hrs: 11312
total_hrs: 10308.3



Unnamed: 0_level_0,Entered Hours_bill,Entered Hours_total,prop_to_org,weighted_fte,utilization
Organization Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Africa,421.57,548.94,0.053252,602.389267,0.69983
Comms & KM,602.11,813.72,0.078938,892.950403,0.674293
General Intl,66.87,703.9,0.068285,772.437434,0.08657
Global Adaptive Managemen,2343.04,2594.57,0.251697,2847.198456,0.822928
Habitat,523.81,783.28,0.075985,859.546517,0.609403
Latin America & the Carib,1522.81,1678.68,0.162847,1842.12995,0.826657
Water,274.5,577.6,0.056033,633.839838,0.433075
