# Operating Room Utilization Rate Calculation for 2019

In this notebook, we calculate the daily operating room utilization rate for each theatre during 2019 using the cleaned Assuta Ramat HaHayal dataset. Our aim is to quantify how effectively scheduled and unscheduled procedure hours occupy the total available block time, providing a key performance indicator for capacity planning, efficiency analysis, and year-over-year comparisons.

**1. Imports and Data Loading**  
- We import essential libraries (`pandas`, `numpy`, `matplotlib.pyplot`, `datetime`, etc.).  
- We load the cleaned 2019 Excel file (e.g. `2019_data_Assuta_nomissings_for_stat_final.xlsx`) into a pandas DataFrame and inspect the first rows to confirm correct parsing.

**2. Initial Inspection and Copy**  
- We display DataFrame info (columns, dtypes, head) to verify data integrity.  
- We create a working copy (`df_copy`) to preserve the original raw data.

**3. Data Cleaning and “No Show” Removal**  
- We identify and remove any “No Show” entries that should not contribute to utilization metrics.  
- We check for and drop duplicate rows to ensure accurate aggregations.

**4. Date and Time Parsing**  
- We convert key date fields (e.g. planned block date, actual entry/exit dates) into `datetime` objects.  
- We parse time-only columns (planned start/end, incision, closure, actual entry/exit times), handling any conversion errors or missing values.

**5. Timestamp Construction**  
- We merge each planned date with its corresponding planned start and end times to form full `Planned Start DateTime` and `Planned End DateTime` columns, enabling precise duration calculations.

**6. Block Duration and Service Hours**  
- We compute each block's total available hours (`SH_r_hours`) as the difference between planned end and start datetimes (in hours).  
- We group by operating room and date to sum the maximum block durations per room per day, defining that day's total service hours.

**7. Scheduled and Unscheduled Procedure Hours**  
- **Scheduled Procedure Hours (SP):** We limit actual surgery times to within the planned block window and sum per room per day (`total_SP_per_day_room`).  
- **Unscheduled Procedure Hours (SU):** We calculate any additional time used outside incision and closure but within the block window, aggregating per room per day (`SU_p_limited_hours`).

**8. Utilization Rate Calculation**  
- We define the utilization rate as:  
  ```python
  daily_utilization_rate = (total_SP_per_day_room + SU_p_limited_hours) / SH_r_hours * 100
We handle edge cases by assigning NaN where service hours are zero or missing, avoiding division errors.

**9. Merging, Formatting, and Summary**

We merge the calculated daily_utilization_rate back onto the main DataFrame by block date and room number.

We format time-related columns consistently (e.g., HH:MM:SS).

We generate summary statistics and distribution plots for SH_r_hours and daily_utilization_rate to inspect efficiency patterns across the year.

By completing these steps, we produce a robust utilization metric for each operating room in 2019, enabling deeper analysis of capacity trends, identification of under- or over-utilized periods, and informed decision-making for resource allocation.

Calculating the target variable - utilization percentage:

In [1]:
#import of libraries:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings
import itertools
from pandas.api.types import is_numeric_dtype
from scipy.stats import shapiro

In [2]:
file_path = r"/content/2019_data_Assuta_nomissings_for_stat_final_1405.xlsx"
df = pd.read_excel(file_path)
print(df.head())

   patient_id  Site Code Main Surgeon Code  Activity Code  Activity Type Code  \
0          44         20              2982          26.33                  19   
1          45         20              2982          26.33                  19   
2          46         20              2982          26.33                  19   
3          48         20              2982          26.33                  19   
4          50         20              2982          26.33                  19   

   Planned SU Time (Large/Medium/Small) Pre-Surgery Admission Date  Height  \
0                                   6.0                 2019-03-18    1.85   
1                                  19.0                 2018-12-31    1.54   
2                                  20.0                 2019-01-28    1.73   
3                                  20.0                 2018-12-24    1.72   
4                                  18.0                 2019-01-21    1.54   

   Weight  Patient Age (on Surgery Day)  ...

In [3]:
print(df.columns)

Index(['patient_id', 'Site Code', 'Main Surgeon Code', 'Activity Code',
       'Activity Type Code', 'Planned SU Time (Large/Medium/Small)',
       'Pre-Surgery Admission Date', 'Height', 'Weight',
       'Patient Age (on Surgery Day)', 'Background Diseases/Diagnoses',
       'Planned Surgery Date', 'Planned Surgery Time',
       'Surgery Admission Date', 'Administrative Admission Time',
       'Planned Operating Room Number',
       'Pre-Surgery Hospitalization Admission Date',
       'Pre-Surgery Hospitalization Admission Time',
       'Pre-Surgical Admission Time Before Surgery', 'Surgical Team Codes',
       'Anesthesiologist Code', 'Anesthesia Code', 'Type of Anesthesia',
       'Cancellation Reason on Surgery Day', 'Actual Operating Room Number',
       'Actual Surgery Room Entry Date', 'Actual Surgery Room Entry Time',
       'Incision Time', 'Closure Time', 'End of Surgery Date (Exit from OR)',
       'End of Surgery Time (Exit from OR)', 'Planned Surgery Duration',
       'Rec

In [4]:
print(df.dtypes)

patient_id                                       int64
Site Code                                        int64
Main Surgeon Code                               object
Activity Code                                  float64
Activity Type Code                               int64
                                                ...   
Recovery Room Entry Time_Minutes                 int64
Recovery Room Exit Time_Minutes                  int64
Post-Surgery Discharge Time_Minutes              int64
Planned Start Time for Doctor Block_Minutes      int64
Planned End Time for Doctor Block_Minutes        int64
Length: 62, dtype: object


Preparing and checking the time and date columns - converting to an appropriate data type

In [5]:
df_copy=df.copy()

In [6]:
print(f"Number of rows: {df_copy.shape[0]}")
print(f"Number of columns: {df_copy.shape[1]}")

Number of rows: 13570
Number of columns: 62


In [7]:
print(df_copy.head(10))

   patient_id  Site Code Main Surgeon Code  Activity Code  Activity Type Code  \
0          44         20              2982          26.33                  19   
1          45         20              2982          26.33                  19   
2          46         20              2982          26.33                  19   
3          48         20              2982          26.33                  19   
4          50         20              2982          26.33                  19   
5          52         20              2982          26.33                  19   
6          53         20              2982          26.33                  19   
7          54         20              2982          26.33                  19   
8          60         20              2982          26.33                  19   
9          61         20              2982          26.33                  19   

   Planned SU Time (Large/Medium/Small) Pre-Surgery Admission Date  Height  \
0                             

Handling duplicates and missing values of type NALL:

In [8]:
# Overall duplicate count (all rows):
num_duplicates = df_copy.duplicated().sum()
print(f"The number of completely repeating lines: {num_duplicates}")

The number of completely repeating lines: 0


We tried to merge the duplicate rows more effectively based on the following parameters: age, weight, height, surgery entry date, and surgeon code. We attempted to prevent data loss by combining the different values from the duplicate rows. However, the code did not work as expected and distorted the time-related data, resulting in incorrect utilization calculations.

Planned End Time for Doctor Block:

In [9]:
# Print the unique values in a column:
print(df_copy['Planned End Time for Doctor Block'].unique())

['13:30:00' '13:38:00' '13:12:00' '10:06:00' '12:16:00' '13:24:00'
 '12:45:00' '10:16:00' '10:56:00' '14:36:00' '12:31:00' '10:36:00'
 '14:06:00' '14:26:00' '13:54:00' '09:40:00' '11:50:00' '09:25:00'
 '12:56:00' '13:26:00' '10:15:00' '10:01:00' '08:40:00' '08:23:00'
 '11:23:00' '14:20:00' '12:02:00' '10:10:00' '09:52:00' '12:52:00'
 '12:55:00' '08:35:00' '12:07:00' '15:55:00' '09:55:00' '14:00:00'
 '10:52:00' '09:00:00' '08:00:00' '14:51:00' '13:09:00' '14:08:00'
 '08:45:00' '14:12:00' '15:00:00' '14:57:00' '09:50:00' '12:35:00'
 '13:00:00' '12:50:00' '16:20:00' '11:30:00' '10:30:00' '09:30:00'
 '12:15:00' '13:02:00' '12:34:00' '14:30:00' '14:27:00' '08:33:00'
 '09:17:00' '11:45:00' '12:00:00' '14:25:00' '13:15:00' '13:55:00'
 '14:35:00' '14:15:00' '15:22:00' '11:40:00' '09:20:00' '14:55:00'
 '12:40:00' '10:00:00' '15:31:00' '10:24:00' '11:00:00' '14:50:00'
 '10:35:00' '11:59:00' '11:17:00' '13:08:00' '11:22:00' '12:05:00'
 '14:09:00' '11:13:00' '16:30:00' '13:35:00' '10:40:00' '10:38

In [10]:
print(df_copy['Planned End Time for Doctor Block'].dtype)
print(df_copy['Planned End Time for Doctor Block'].head(10))

object
0    13:30:00
1    13:38:00
2    13:12:00
3    13:38:00
4    13:12:00
5    10:06:00
6    10:06:00
7    12:16:00
8    13:24:00
9    13:12:00
Name: Planned End Time for Doctor Block, dtype: object


In [11]:
# Convert to datetime format and then save only the time:
df_copy['Planned End Time for Doctor Block'] = pd.to_datetime(
    df_copy['Planned End Time for Doctor Block'],
    format='%H:%M:%S',
    errors='coerce'  # Converts invalid values to NaT
).dt.time

In [12]:
# Checking the column type and values
print(df_copy['Planned End Time for Doctor Block'].dtype)  # Should return object but with type time
print(df_copy['Planned End Time for Doctor Block'].head(10))
print("Missing values:", df_copy['Planned End Time for Doctor Block'].isna().sum())

object
0    13:30:00
1    13:38:00
2    13:12:00
3    13:38:00
4    13:12:00
5    10:06:00
6    10:06:00
7    12:16:00
8    13:24:00
9    13:12:00
Name: Planned End Time for Doctor Block, dtype: object
Missing values: 0


In [13]:
df_copy['Planned End Time for Doctor Block'] = pd.to_timedelta(
    df_copy['Planned End Time for Doctor Block'].astype(str),
    errors='coerce')

In [14]:
print(df_copy['Planned End Time for Doctor Block'].dtype)

timedelta64[ns]


Closure Time:

In [15]:
print(df_copy['Closure Time'].unique())

['14:51:00' '11:51:00' '13:23:00' ... '00:54:00' '01:02:00' '00:24:00']


In [16]:
print(df_copy['Closure Time'].dtype)
print(df_copy['Closure Time'].head(10))

object
0    14:51:00
1    11:51:00
2    13:23:00
3    13:54:00
4    11:59:00
5    10:35:00
6    08:12:00
7    12:26:00
8    08:08:00
9    07:49:00
Name: Closure Time, dtype: object


In [17]:
df_copy['Closure Time'] = pd.to_datetime(
    df_copy['Closure Time'],
    format='%H:%M:%S',
    errors='coerce'
).dt.time

In [18]:
print(df_copy['Closure Time'].dtype)
print(df_copy['Closure Time'].head(10))
print("Missing values:", df_copy['Closure Time'].isna().sum())

object
0    14:51:00
1    11:51:00
2    13:23:00
3    13:54:00
4    11:59:00
5    10:35:00
6    08:12:00
7    12:26:00
8    08:08:00
9    07:49:00
Name: Closure Time, dtype: object
Missing values: 0


In [19]:
df_copy['Closure Time'] = pd.to_timedelta(
    df_copy['Closure Time'].astype(str),
    errors='coerce')

In [20]:
print(df_copy['Closure Time'].dtype)
print(df_copy['Closure Time'].head(10))
print("Missing values:", df_copy['Closure Time'].isna().sum())

timedelta64[ns]
0   0 days 14:51:00
1   0 days 11:51:00
2   0 days 13:23:00
3   0 days 13:54:00
4   0 days 11:59:00
5   0 days 10:35:00
6   0 days 08:12:00
7   0 days 12:26:00
8   0 days 08:08:00
9   0 days 07:49:00
Name: Closure Time, dtype: timedelta64[ns]
Missing values: 0


Incision Time:

In [21]:
df_copy['Incision Time'] = pd.to_datetime(
    df_copy['Incision Time'],
    format='%H:%M:%S',
    errors='coerce'
).dt.time

In [22]:
print(df_copy['Incision Time'].dtype)
print(df_copy['Incision Time'].head(10))
print("Missing values:", df_copy['Incision Time'].isna().sum())

object
0    13:26:00
1    10:44:00
2    12:45:00
3    12:40:00
4    11:00:00
5    09:19:00
6    07:39:00
7    11:24:00
8    07:38:00
9    07:29:00
Name: Incision Time, dtype: object
Missing values: 0


In [23]:
df_copy['Incision Time'] = pd.to_timedelta(
    df_copy['Incision Time'].astype(str),
    errors='coerce')

In [24]:
print(df_copy['Incision Time'].dtype)
print(df_copy['Incision Time'].head(10))
print("Missing values:", df_copy['Incision Time'].isna().sum())

timedelta64[ns]
0   0 days 13:26:00
1   0 days 10:44:00
2   0 days 12:45:00
3   0 days 12:40:00
4   0 days 11:00:00
5   0 days 09:19:00
6   0 days 07:39:00
7   0 days 11:24:00
8   0 days 07:38:00
9   0 days 07:29:00
Name: Incision Time, dtype: timedelta64[ns]
Missing values: 0


Planned Start Time for Doctor Block:

In [25]:
df_copy['Planned Start Time for Doctor Block'] = pd.to_datetime(
    df_copy['Planned Start Time for Doctor Block'],
    format='%H:%M:%S',
    errors='coerce'
).dt.time

In [26]:
print(df_copy['Planned Start Time for Doctor Block'].dtype)
print(df_copy['Planned Start Time for Doctor Block'].head(10))
print("Missing values:", df_copy['Planned Start Time for Doctor Block'].isna().sum())

object
0    10:10:00
1    07:00:00
2    07:00:00
3    07:00:00
4    07:00:00
5    07:00:00
6    07:00:00
7    07:00:00
8    07:00:00
9    07:00:00
Name: Planned Start Time for Doctor Block, dtype: object
Missing values: 0


In [27]:
df_copy['Planned Start Time for Doctor Block'] = pd.to_timedelta(
    df_copy['Planned Start Time for Doctor Block'].astype(str),
    errors='coerce')

In [28]:
print(df_copy['Planned Start Time for Doctor Block'].dtype)
print(df_copy['Planned Start Time for Doctor Block'].head(10))
print("Missing values:", df_copy['Planned Start Time for Doctor Block'].isna().sum())

timedelta64[ns]
0   0 days 10:10:00
1   0 days 07:00:00
2   0 days 07:00:00
3   0 days 07:00:00
4   0 days 07:00:00
5   0 days 07:00:00
6   0 days 07:00:00
7   0 days 07:00:00
8   0 days 07:00:00
9   0 days 07:00:00
Name: Planned Start Time for Doctor Block, dtype: timedelta64[ns]
Missing values: 0


Actual Surgery Room Entry Time:

In [29]:
df_copy['Actual Surgery Room Entry Time'] = pd.to_datetime(
    df_copy['Actual Surgery Room Entry Time'],
    format='%H:%M:%S',
    errors='coerce'
).dt.time

In [30]:
print(df_copy['Actual Surgery Room Entry Time'].dtype)
print(df_copy['Actual Surgery Room Entry Time'].head(10))
print("Missing values:", df_copy['Actual Surgery Room Entry Time'].isna().sum())

object
0    13:05:00
1    10:49:00
2    12:26:00
3    12:17:00
4    10:34:00
5    08:56:00
6    07:13:00
7    11:08:00
8    07:20:00
9    07:04:00
Name: Actual Surgery Room Entry Time, dtype: object
Missing values: 0


In [31]:
df_copy['Actual Surgery Room Entry Time'] = pd.to_timedelta(
    df_copy['Actual Surgery Room Entry Time'].astype(str),
    errors='coerce')

In [32]:
print(df_copy['Actual Surgery Room Entry Time'].dtype)
print(df_copy['Actual Surgery Room Entry Time'].head(10))
print("Missing values:", df_copy['Actual Surgery Room Entry Time'].isna().sum())

timedelta64[ns]
0   0 days 13:05:00
1   0 days 10:49:00
2   0 days 12:26:00
3   0 days 12:17:00
4   0 days 10:34:00
5   0 days 08:56:00
6   0 days 07:13:00
7   0 days 11:08:00
8   0 days 07:20:00
9   0 days 07:04:00
Name: Actual Surgery Room Entry Time, dtype: timedelta64[ns]
Missing values: 0


In [33]:
# List of columns that contain values in hours format only
time_columns = [
    'Actual Surgery Room Entry Time',
    'Planned Start Time for Doctor Block',
    'Incision Time',
    'Closure Time',
    'Planned End Time for Doctor Block']

In [34]:
import pandas as pd

# List of columns being checked
time_cols = ["Entry DateTime", "Incision DateTime", "Closure DateTime", "Exit DateTime"]

# Make sure the columns are in datetime format
for col in time_cols:
    df_copy[col] = pd.to_datetime(df_copy[col], errors='coerce')

# Error count and list of deleted cells
internal_corrections = 0
external_corrections = 0
deleted_cells = []

# Cleaning up internal inconsistencies within a row
def clean_internal_conflicts(row):
    global internal_corrections
    for i in range(len(time_cols) - 1):
        col1 = time_cols[i]
        col2 = time_cols[i + 1]
        t1 = row[col1]
        t2 = row[col2]

        if pd.notnull(t1) and pd.notnull(t2) and t1 > t2:
            prev_time = row[time_cols[i - 1]] if i > 0 else None
            next_time = row[time_cols[i + 2]] if i + 2 < len(time_cols) else None

            if prev_time is not None and pd.notnull(prev_time) and prev_time > t1:
                bad_col = col1
            elif next_time is not None and pd.notnull(next_time) and t2 > next_time:
                bad_col = col2
            else:
                bad_col = col1  # default

            deleted_cells.append((row.name, bad_col, row[bad_col]))
            row[bad_col] = pd.NaT
            internal_corrections += 1
    return row

df_copy = df_copy.apply(clean_internal_conflicts, axis=1)

#Conflict between current Exit and next Entry
df_copy["next_Entry DateTime"] = df_copy["Entry DateTime"].shift(-1)

def fix_external_conflict(row):
    global external_corrections
    if pd.notnull(row["Exit DateTime"]) and pd.notnull(row["next_Entry DateTime"]):
        if row["Exit DateTime"] > row["next_Entry DateTime"]:
            deleted_cells.append((row.name, "Exit DateTime", row["Exit DateTime"]))
            row["Exit DateTime"] = pd.NaT
            external_corrections += 1
    return row

df_copy = df_copy.apply(fix_external_conflict, axis=1)
df_copy.drop(columns=["next_Entry DateTime"], inplace=True)

# Conflict between current Entry and previous Exit → Always delete the Exit of the previous line
df_copy["prev_Exit DateTime"] = df_copy["Exit DateTime"].shift(1)

def fix_inter_row_conflict_strict(row):
    global external_corrections
    entry = row["Entry DateTime"]
    prev_exit = row["prev_Exit DateTime"]

    if pd.notnull(entry) and pd.notnull(prev_exit) and entry < prev_exit:
        idx_prev = row.name - 1
        deleted_cells.append((idx_prev, "Exit DateTime", df_copy.at[idx_prev, "Exit DateTime"]))
        df_copy.at[idx_prev, "Exit DateTime"] = pd.NaT
        external_corrections += 1

    return row

df_copy = df_copy.apply(fix_inter_row_conflict_strict, axis=1)
df_copy.drop(columns=["prev_Exit DateTime"], inplace=True)

# Summary
print(f"Cleaning finished:")
print(f"{internal_corrections} Internal corrections were made within lines.")
print(f"{external_corrections} Corrections of conflicts between lines have been made.")

# Show an example of deleted entries
if deleted_cells:
    print("\n Example of deleted values (up to the first 5):")
    for row_idx, col, val in deleted_cells[:5]:
        print(f"Row {row_idx}, column '{col}', deleted value: {val}")
else:
    print("No entries were deleted.")

Cleaning finished:
0 Internal corrections were made within lines.
1105 Corrections of conflicts between lines have been made.

 Example of deleted values (up to the first 5):
Row 2, column 'Exit DateTime', deleted value: 2019-02-01 13:35:00
Row 4, column 'Exit DateTime', deleted value: 2019-02-01 12:08:00
Row 20, column 'Exit DateTime', deleted value: 2019-12-20 14:16:00
Row 41, column 'Exit DateTime', deleted value: 2019-04-21 14:31:00
Row 47, column 'Exit DateTime', deleted value: 2019-12-23 11:05:00


SP, only consider shift boundaries!

In [35]:
# Checking how many missing values there are in the shift columns
missing_values = df_copy[['Planned Start Time for Doctor Block', 'Planned End Time for Doctor Block']].isna().sum()
print("Missing values at shift boundaries:\n", missing_values)

Missing values at shift boundaries:
 Planned Start Time for Doctor Block    0
Planned End Time for Doctor Block      0
dtype: int64


In [36]:
# Convert date columns to datetime format
date_columns = ["Actual Surgery Room Entry Date", "Planned Start Date for Doctor Block"]
for col in date_columns:
    df_copy[col] = pd.to_datetime(df[col], errors='coerce')  # Save the date only

In [37]:
print(df_copy['Planned Start Date for Doctor Block'].dtype)

datetime64[ns]


In [38]:
import pandas as pd
import numpy as np

# Convert columns to dates:
datetime_cols = [
    "Actual Surgery Room Entry Date",
    "End of Surgery Date (Exit from OR)",
    "Planned Start Date for Doctor Block",
    "Planned End Date for Doctor Block"
]
for col in datetime_cols:
    df_copy[col] = pd.to_datetime(df_copy[col], errors='coerce')

# Creating new consolidated columns:
df_copy["Planned Start DateTime"] = df_copy["Planned Start Date for Doctor Block"] + pd.to_timedelta(df_copy["Planned Start Time for Doctor Block"].astype(str), errors='coerce')
df_copy["Planned End DateTime"] = df_copy["Planned End Date for Doctor Block"] + pd.to_timedelta(df_copy["Planned End Time for Doctor Block"].astype(str), errors='coerce')

# Actual full surgery duration without shift restrictions:
df_copy["S_p_raw"] = df_copy["Closure DateTime"] - df_copy["Incision DateTime"]

# Convert duration to decimal hours:
df_copy["S_p_raw_hours"] = df_copy["S_p_raw"].dt.total_seconds() / 3600
df_copy["S_p_raw_hours"] = df_copy["S_p_raw_hours"].round(5)

# Actual duration of surgery within the block (shift) boundaries:
df_copy["S_p_limited"] = (
    df_copy[["Closure DateTime", "Planned End DateTime"]].min(axis=1) -
    df_copy[["Incision DateTime", "Planned Start DateTime"]].max(axis=1)
).clip(lower=pd.Timedelta(0))

df_copy.loc[
    df_copy[['Incision DateTime', 'Closure DateTime', 'Planned Start DateTime', 'Planned End DateTime']].isnull().any(axis=1),
    "S_p_limited"
] = pd.NaT

df_copy["S_p_hours_limited"] = df_copy["S_p_limited"].dt.total_seconds() / 3600
df_copy.loc[df_copy["S_p_limited"].isna(), "S_p_hours_limited"] = np.nan
df_copy["S_p_hours_limited"] = df_copy["S_p_hours_limited"].round(5)

df_sp_sum = df_copy.groupby(
    ['Actual Surgery Room Entry Date', 'Actual Operating Room Number'],
    as_index=False
)['S_p_hours_limited'].sum()

df_sp_sum.rename(columns={'S_p_hours_limited': 'total_SP_per_day_room'}, inplace=True)

assert not df_sp_sum.duplicated(subset=["Actual Surgery Room Entry Date", "Actual Operating Room Number"]).any()

df_copy = df_copy.merge(
    df_sp_sum,
    on=['Actual Surgery Room Entry Date', 'Actual Operating Room Number'],
    how='left'
)

SUP:

In [39]:
dt_cols = [
    "Planned Start DateTime", "Planned End DateTime",
    "Entry DateTime", "Exit DateTime",
    "Incision DateTime", "Closure DateTime"
]
for col in dt_cols:
    df_copy[col] = pd.to_datetime(df_copy[col], errors='coerce')

df_copy["SU_p_before_limited"] = (
    (df_copy["Incision DateTime"] - df_copy["Entry DateTime"]).clip(lower=pd.Timedelta(0)) +
    (df_copy["Exit DateTime"] - df_copy["Closure DateTime"]).clip(lower=pd.Timedelta(0))
)
df_copy["SU_p_before_hours"] = df_copy["SU_p_before_limited"].dt.total_seconds() / 3600

prep_start = df_copy[["Entry DateTime", "Planned Start DateTime"]].max(axis=1)
prep_end = df_copy[["Incision DateTime", "Planned End DateTime"]].min(axis=1)
prep_duration = (prep_end - prep_start).clip(lower=pd.Timedelta(0))

post_start = df_copy[["Closure DateTime", "Planned Start DateTime"]].max(axis=1)
post_end = df_copy[["Exit DateTime", "Planned End DateTime"]].min(axis=1)
post_duration = (post_end - post_start).clip(lower=pd.Timedelta(0))

df_copy["SU_p_after_limited"] = prep_duration + post_duration
df_copy["SU_p_after_hours"] = df_copy["SU_p_after_limited"].dt.total_seconds() / 3600

df_copy.loc[df_copy[[
    'Entry DateTime', 'Incision DateTime', 'Closure DateTime', 'Exit DateTime',
    'Planned Start DateTime', 'Planned End DateTime'
]].isnull().any(axis=1), "SU_p_after_limited"] = pd.NaT

df_copy["SU_p_after_hours"] = df_copy["SU_p_after_limited"].dt.total_seconds() / 3600
df_copy.loc[df_copy["SU_p_after_limited"].isna(), "SU_p_after_hours"] = np.nan

su_grouped = df_copy.groupby(
    ["Actual Surgery Room Entry Date", "Actual Operating Room Number"],
    as_index=False
)["SU_p_after_hours"].sum().rename(columns={"SU_p_after_hours": "SU_p_limited_hours"})

df_copy = df_copy.merge(
    su_grouped,
    on=["Actual Surgery Room Entry Date", "Actual Operating Room Number"],
    how="left"
)

print(df_copy[[
    "Actual Surgery Room Entry Date", "Actual Operating Room Number",
    "SU_p_before_hours", "SU_p_after_hours", "SU_p_limited_hours"
]].head())

  Actual Surgery Room Entry Date  Actual Operating Room Number  \
0                     2019-03-22                         20006   
1                     2019-01-04                         20015   
2                     2019-02-01                         20006   
3                     2019-01-04                         20015   
4                     2019-02-01                         20006   

   SU_p_before_hours  SU_p_after_hours  SU_p_limited_hours  
0           0.400000          0.350000            0.350000  
1                NaN               NaN            0.383333  
2                NaN               NaN            1.133333  
3           0.383333          0.383333            0.383333  
4                NaN               NaN            1.133333  


SHR:

In [40]:
# Calculate block duration in hours
df_copy["Block Duration Hours"] = (
    df_copy["Planned End DateTime"] - df_copy["Planned Start DateTime"]
).dt.total_seconds() / 3600

# Get max block duration per surgeon per day and room
df_max = df_copy.groupby(
    ['Main Surgeon Code', 'Actual Surgery Room Entry Date', 'Actual Operating Room Number'],
    as_index=False
)['Block Duration Hours'].max()

df_max.rename(columns={'Block Duration Hours': 'max_hours'}, inplace=True)

# Sum the max values per day and room
df_sum = df_max.groupby(
    ['Actual Surgery Room Entry Date', 'Actual Operating Room Number'],
    as_index=False
)['max_hours'].sum()

# Rename the result column to SH_r_hours
df_sum.rename(columns={'max_hours': 'SH_r_hours'}, inplace=True)

# Merge SH_r_hours back to df_copy
df_copy = df_copy.merge(df_sum, on=['Actual Surgery Room Entry Date', 'Actual Operating Room Number'], how='left')

Utilization rate:

In [41]:
print(df_copy.columns.tolist())

['patient_id', 'Site Code', 'Main Surgeon Code', 'Activity Code', 'Activity Type Code', 'Planned SU Time (Large/Medium/Small)', 'Pre-Surgery Admission Date', 'Height', 'Weight', 'Patient Age (on Surgery Day)', 'Background Diseases/Diagnoses', 'Planned Surgery Date', 'Planned Surgery Time', 'Surgery Admission Date', 'Administrative Admission Time', 'Planned Operating Room Number', 'Pre-Surgery Hospitalization Admission Date', 'Pre-Surgery Hospitalization Admission Time', 'Pre-Surgical Admission Time Before Surgery', 'Surgical Team Codes', 'Anesthesiologist Code', 'Anesthesia Code', 'Type of Anesthesia', 'Cancellation Reason on Surgery Day', 'Actual Operating Room Number', 'Actual Surgery Room Entry Date', 'Actual Surgery Room Entry Time', 'Incision Time', 'Closure Time', 'End of Surgery Date (Exit from OR)', 'End of Surgery Time (Exit from OR)', 'Planned Surgery Duration', 'Recovery Room Entry Date', 'Recovery Room Entry Time', 'Recovery Room Exit Date', 'Recovery Room Exit Time', 'Post

In [42]:
# Clean the data before utilization calculation

# Remove rows where there was a No-Show
num_noshow = df_copy["No Show"].sum()
print(f"Removed {num_noshow} rows due to No-Show status.")
df_copy = df_copy[df_copy["No Show"] != True]

# Remove rows with missing time components needed for utilization calculation
required_cols = [
    "Incision DateTime",
    "Closure DateTime",
    "Planned Start DateTime",
    "Planned End DateTime",
    "Actual Surgery Room Entry Date",
    "Actual Surgery Room Entry Time",
    "End of Surgery Time (Exit from OR)"
]
df_copy = df_copy.dropna(subset=required_cols)

print(f"Remaining rows for utilization calculation: {len(df_copy)}")

if all(col in df_copy.columns for col in ["total_SP_per_day_room", "SU_p_limited_hours", "SH_r_hours"]):
    df_copy["daily_utilization_rate"] = (
        (df_copy["total_SP_per_day_room"] + df_copy["SU_p_limited_hours"]) /
        df_copy["SH_r_hours"]
    ) * 100
    print(f"Calculated utilization for {df_copy['daily_utilization_rate'].notna().sum()} rows.")

    df_copy.loc[df_copy["SH_r_hours"].isna(), "daily_utilization_rate"] = np.nan
    df_copy.loc[df_copy["SH_r_hours"] == 0, "daily_utilization_rate"] = np.nan
else:
    print("Missing columns required to calculate utilization")

Removed 0 rows due to No-Show status.
Remaining rows for utilization calculation: 12939
Calculated utilization for 12939 rows.


In [43]:
df_copy = df_copy.merge(df_copy[["Planned Start Date for Doctor Block",
                                        "Actual Operating Room Number",
                                        "daily_utilization_rate"]],
                        on=["Planned Start Date for Doctor Block", "Actual Operating Room Number"],
                        how="left")

In [44]:
from datetime import timedelta

for col in time_columns:
    if col in df_copy.columns:
        df_copy[col] = df_copy[col].apply(
            lambda x: f"{int(x.total_seconds() // 3600):02}:{int((x.total_seconds() % 3600) // 60):02}:{int(x.total_seconds() % 60):02}"
            if isinstance(x, timedelta) else (x if isinstance(x, str) else "")
        )

In [45]:
df_copy.to_excel("2019_data_with_UTR.xlsx", index=False)