# Operating Room Utilization Rate Calculation for 2018

In this notebook, we calculate the daily operating room utilization rate for each theatre during 2018 using the cleaned Assuta Ramat HaHayal dataset. Our aim is to quantify how effectively scheduled and unscheduled procedure hours occupy the total available block time, producing a key performance indicator for capacity planning and efficiency analysis.

**1. Imports and Data Loading**  
- We import essential libraries (`pandas`, `numpy`, `matplotlib.pyplot`, `datetime`, etc.).  
- We load the cleaned Excel file (`2018_data_Assuta_nomissings_for_stat_final_1205.xlsx`) into a pandas DataFrame and inspect the first rows to confirm correct parsing.

**2. Initial Inspection and Copy**  
- We display basic DataFrame info (columns, dtypes, head) to verify data integrity.  
- We create a working copy (`df_copy`) to preserve the original raw data.

**3. Data Cleaning and Duplicate Check**  
- We count and report any fully duplicated rows to gauge data redundancy.  
- We examine “No Show” status and remove such entries, since they do not contribute to room utilization.

**4. Date and Time Parsing**  
- We convert key date columns (`Actual Surgery Room Entry Date`, `End of Surgery Date (Exit from OR)`, `Planned Start/End Date for Doctor Block`) to `datetime`.  
- We parse multiple time-only fields—`Planned Start/End Time for Doctor Block`, `Incision Time`, `Closure Time`, `Actual Surgery Room Entry/Exit Time`, `End of Surgery Time (Exit from OR)`—first to `datetime.time` or `timedelta`, handling invalid values and counting missing entries after each conversion.

**5. Timestamp Construction**  
- We merge each planned date with its corresponding planned start/end time to form full `Planned Start DateTime` and `Planned End DateTime` columns, enabling accurate duration calculations.

**6. Block Duration and Service Hours**  
- We compute each block’s total available hours (`Block Duration Hours`) as the difference between planned end and start datetimes (converted to hours).  
- We group by surgeon, date, and operating room to extract the maximum block duration per room per day, then sum these maxima to define that day’s total service hours (`SH_r_hours`).

**7. Scheduled and Unscheduled Procedure Hours**  
- We calculate **Scheduled Procedure (SP) hours** by limiting actual surgery times to the planned block window and summing per room per day.  
- We calculate **Unscheduled (SU) hours** by summing any time before incision and after closure that falls within the block window.  
- We aggregate these values into `total_SP_per_day_room` and `SU_p_limited_hours`.

**8. Utilization Rate Calculation**  
- We define the daily utilization rate as:  
  ```python
  daily_utilization_rate = (total_SP_per_day_room + SU_p_limited_hours) / SH_r_hours * 100

We handle edge cases by assigning NaN where service hours are zero or missing, and report the number of rows with a valid utilization rate.

**9. Final Merge, Formatting, and Export**

We merge daily_utilization_rate back onto the main DataFrame by date and room number.

We format all time-related columns consistently as HH:MM:SS strings for reporting.

We save the enriched dataset to 2018_data_with_UTR.xlsx for further analysis and visualization.

By following these steps, we produce a robust utilization metric for each operating room in 2018, laying the groundwork for in-depth efficiency studies, trend identification, and strategic resource allocation.

Calculating the target variable - utilization percentage:

In [1]:
#Import the libraries:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings
import itertools
from pandas.api.types import is_numeric_dtype
from scipy.stats import shapiro

In [2]:
file_path = r"/content/2018_data_Assuta_nomissings_for_stat_final_1205.xlsx"
df = pd.read_excel(file_path)
print(df.head())

   patient_id  Site Code Main Surgeon Code  Activity Code  Activity Type Code  \
0          43         20              2982          26.33                  19   
1          44         20              2982          26.33                  19   
2          45         20              2982          26.33                  19   
3          46         20              2982          26.33                  19   
4          47         20              2982          26.33                  19   

   Planned SU Time (Large/Medium/Small) Pre-Surgery Admission Date  Height  \
0                                   0.0                 2018-01-01    1.78   
1                                  13.0                 2018-03-12    1.73   
2                                  18.0                 2018-07-09    1.86   
3                                  19.0                 2018-04-09    1.58   
4                                  19.0                 2018-04-30    1.65   

   Weight  Patient Age (on Surgery Day)  ...

In [3]:
print(df.columns)

Index(['patient_id', 'Site Code', 'Main Surgeon Code', 'Activity Code',
       'Activity Type Code', 'Planned SU Time (Large/Medium/Small)',
       'Pre-Surgery Admission Date', 'Height', 'Weight',
       'Patient Age (on Surgery Day)', 'Background Diseases/Diagnoses',
       'Planned Surgery Date', 'Planned Surgery Time',
       'Surgery Admission Date', 'Administrative Admission Time',
       'Planned Operating Room Number',
       'Pre-Surgery Hospitalization Admission Date',
       'Pre-Surgery Hospitalization Admission Time',
       'Pre-Surgical Admission Time Before Surgery', 'Surgical Team Codes',
       'Anesthesiologist Code', 'Anesthesia Code', 'Type of Anesthesia',
       'Cancellation Reason on Surgery Day', 'Actual Operating Room Number',
       'Actual Surgery Room Entry Date', 'Actual Surgery Room Entry Time',
       'Incision Time', 'Closure Time', 'End of Surgery Date (Exit from OR)',
       'End of Surgery Time (Exit from OR)', 'Planned Surgery Duration',
       'Rec

In [4]:
print(df.dtypes)

patient_id                                       int64
Site Code                                        int64
Main Surgeon Code                               object
Activity Code                                  float64
Activity Type Code                               int64
                                                ...   
Recovery Room Entry Time_Minutes                 int64
Recovery Room Exit Time_Minutes                  int64
Post-Surgery Discharge Time_Minutes              int64
Planned Start Time for Doctor Block_Minutes      int64
Planned End Time for Doctor Block_Minutes        int64
Length: 62, dtype: object


Preparing and checking the time and date columns - converting to an appropriate data type

In [5]:
df_copy=df.copy()

In [6]:
print(f"Number of rows: {df_copy.shape[0]}")
print(f"Number of columns: {df_copy.shape[1]}")

Number of rows: 12921
Number of columns: 62


In [7]:
print(df_copy.head(10))

   patient_id  Site Code Main Surgeon Code  Activity Code  Activity Type Code  \
0          43         20              2982          26.33                  19   
1          44         20              2982          26.33                  19   
2          45         20              2982          26.33                  19   
3          46         20              2982          26.33                  19   
4          47         20              2982          26.33                  19   
5          48         20              2982          26.33                  19   
6          50         20              2982          26.33                  19   
7          53         20              2982          26.33                  19   
8          55         20              2982          26.33                  19   
9          56         20              2982          26.33                  19   

   Planned SU Time (Large/Medium/Small) Pre-Surgery Admission Date  Height  \
0                             

Handling duplicates and missing values of type NALL:

In [8]:
# Overall duplicate count (all rows):
num_duplicates = df_copy.duplicated().sum()
print(f"The number of completely repeating lines: {num_duplicates}")

The number of completely repeating lines: 0


We tried to merge the duplicate rows more effectively based on the following parameters: age, weight, height, surgery entry date, and surgeon code. We attempted to prevent data loss by combining the different values from the duplicate rows. However, the code did not work as expected and distorted the time-related data, resulting in incorrect utilization calculations.

Planned End Time for Doctor Block:

In [9]:
# Print the unique values in a column:
print(df_copy['Planned End Time for Doctor Block'].unique())

['08:33:00' '13:12:00' '12:36:00' '12:57:00' '13:50:00' '12:48:00'
 '13:43:00' '10:06:00' '11:43:00' '14:41:00' '11:58:00' '08:36:00'
 '11:50:00' '11:35:00' '11:05:00' '11:55:00' '12:11:00' '13:55:00'
 '09:24:00' '12:30:00' '10:34:00' '09:30:00' '09:59:00' '11:49:00'
 '12:44:00' '13:48:00' '10:44:00' '12:05:00' '10:00:00' '09:44:00'
 '13:10:00' '13:54:00' '08:45:00' '12:49:00' '12:39:00' '08:54:00'
 '11:33:00' '11:46:00' '11:22:00' '09:00:00' '14:02:00' '14:36:00'
 '13:57:00' '14:10:00' '14:30:00' '11:40:00' '11:00:00' '14:40:00'
 '12:50:00' '15:00:00' '14:05:00' '12:25:00' '12:34:00' '14:55:00'
 '14:50:00' '14:54:00' '09:14:00' '10:10:00' '13:42:00' '12:27:00'
 '10:30:00' '14:57:00' '12:55:00' '12:07:00' '14:37:00' '10:12:00'
 '12:00:00' '14:27:00' '14:00:00' '14:42:00' '13:37:00' '09:55:00'
 '12:54:00' '14:14:00' '12:14:00' '12:58:00' '09:46:00' '10:54:00'
 '13:00:00' '13:25:00' '08:50:00' '08:30:00' '11:39:00' '11:56:00'
 '12:56:00' '08:55:00' '09:25:00' '08:41:00' '13:03:00' '12:38

In [10]:
print(df_copy['Planned End Time for Doctor Block'].dtype)
print(df_copy['Planned End Time for Doctor Block'].head(10))

object
0    08:33:00
1    13:12:00
2    12:36:00
3    12:57:00
4    12:57:00
5    13:50:00
6    13:12:00
7    13:50:00
8    12:48:00
9    12:57:00
Name: Planned End Time for Doctor Block, dtype: object


In [11]:
# Convert to datetime format and then save only the time:
df_copy['Planned End Time for Doctor Block'] = pd.to_datetime(
    df_copy['Planned End Time for Doctor Block'],
    format='%H:%M:%S',
    errors='coerce'  # Converts invalid values to NaT
).dt.time

In [12]:
# Checking the column type and values
print(df_copy['Planned End Time for Doctor Block'].dtype)  # Should return object but with type time
print(df_copy['Planned End Time for Doctor Block'].head(10))
print("Missing values:", df_copy['Planned End Time for Doctor Block'].isna().sum())

object
0    08:33:00
1    13:12:00
2    12:36:00
3    12:57:00
4    12:57:00
5    13:50:00
6    13:12:00
7    13:50:00
8    12:48:00
9    12:57:00
Name: Planned End Time for Doctor Block, dtype: object
Missing values: 0


In [13]:
df_copy['Planned End Time for Doctor Block'] = pd.to_timedelta(
    df_copy['Planned End Time for Doctor Block'].astype(str),
    errors='coerce')

In [14]:
print(df_copy['Planned End Time for Doctor Block'].dtype)

timedelta64[ns]


Closure Time:

In [15]:
print(df_copy['Closure Time'].unique())

['09:16:00' '13:49:00' '08:31:00' ... '00:35:00' '01:50:00' '00:43:00']


In [16]:
print(df_copy['Closure Time'].dtype)
print(df_copy['Closure Time'].head(10))

object
0    09:16:00
1    13:49:00
2    08:31:00
3    10:27:00
4    12:07:00
5    14:19:00
6    08:34:00
7    12:20:00
8    09:11:00
9    14:20:00
Name: Closure Time, dtype: object


In [17]:
df_copy['Closure Time'] = pd.to_datetime(
    df_copy['Closure Time'],
    format='%H:%M:%S',
    errors='coerce'
).dt.time

In [18]:
print(df_copy['Closure Time'].dtype)
print(df_copy['Closure Time'].head(10))
print("Missing values:", df_copy['Closure Time'].isna().sum())

object
0    09:16:00
1    13:49:00
2    08:31:00
3    10:27:00
4    12:07:00
5    14:19:00
6    08:34:00
7    12:20:00
8    09:11:00
9    14:20:00
Name: Closure Time, dtype: object
Missing values: 0


In [19]:
df_copy['Closure Time'] = pd.to_timedelta(
    df_copy['Closure Time'].astype(str),
    errors='coerce')

In [20]:
print(df_copy['Closure Time'].dtype)
print(df_copy['Closure Time'].head(10))
print("Missing values:", df_copy['Closure Time'].isna().sum())

timedelta64[ns]
0   0 days 09:16:00
1   0 days 13:49:00
2   0 days 08:31:00
3   0 days 10:27:00
4   0 days 12:07:00
5   0 days 14:19:00
6   0 days 08:34:00
7   0 days 12:20:00
8   0 days 09:11:00
9   0 days 14:20:00
Name: Closure Time, dtype: timedelta64[ns]
Missing values: 0


Incision Time:

In [21]:
df_copy['Incision Time'] = pd.to_datetime(
    df_copy['Incision Time'],
    format='%H:%M:%S',
    errors='coerce'
).dt.time

In [22]:
print(df_copy['Incision Time'].dtype)
print(df_copy['Incision Time'].head(10))
print("Missing values:", df_copy['Incision Time'].isna().sum())

object
0    07:45:00
1    12:54:00
2    07:33:00
3    09:52:00
4    11:08:00
5    13:13:00
6    07:36:00
7    11:37:00
8    07:46:00
9    12:48:00
Name: Incision Time, dtype: object
Missing values: 0


In [23]:
df_copy['Incision Time'] = pd.to_timedelta(
    df_copy['Incision Time'].astype(str),
    errors='coerce')

In [24]:
print(df_copy['Incision Time'].dtype)
print(df_copy['Incision Time'].head(10))
print("Missing values:", df_copy['Incision Time'].isna().sum())

timedelta64[ns]
0   0 days 07:45:00
1   0 days 12:54:00
2   0 days 07:33:00
3   0 days 09:52:00
4   0 days 11:08:00
5   0 days 13:13:00
6   0 days 07:36:00
7   0 days 11:37:00
8   0 days 07:46:00
9   0 days 12:48:00
Name: Incision Time, dtype: timedelta64[ns]
Missing values: 0


Planned Start Time for Doctor Block:

In [25]:
df_copy['Planned Start Time for Doctor Block'] = pd.to_datetime(
    df_copy['Planned Start Time for Doctor Block'],
    format='%H:%M:%S',
    errors='coerce'
).dt.time

In [26]:
print(df_copy['Planned Start Time for Doctor Block'].dtype)
print(df_copy['Planned Start Time for Doctor Block'].head(10))
print("Missing values:", df_copy['Planned Start Time for Doctor Block'].isna().sum())

object
0    07:00:00
1    07:00:00
2    07:00:00
3    07:00:00
4    07:00:00
5    07:00:00
6    07:00:00
7    07:00:00
8    07:05:00
9    07:00:00
Name: Planned Start Time for Doctor Block, dtype: object
Missing values: 0


In [27]:
df_copy['Planned Start Time for Doctor Block'] = pd.to_timedelta(
    df_copy['Planned Start Time for Doctor Block'].astype(str),
    errors='coerce')

In [28]:
print(df_copy['Planned Start Time for Doctor Block'].dtype)
print(df_copy['Planned Start Time for Doctor Block'].head(10))
print("Missing values:", df_copy['Planned Start Time for Doctor Block'].isna().sum())

timedelta64[ns]
0   0 days 07:00:00
1   0 days 07:00:00
2   0 days 07:00:00
3   0 days 07:00:00
4   0 days 07:00:00
5   0 days 07:00:00
6   0 days 07:00:00
7   0 days 07:00:00
8   0 days 07:05:00
9   0 days 07:00:00
Name: Planned Start Time for Doctor Block, dtype: timedelta64[ns]
Missing values: 0


Actual Surgery Room Entry Time:

In [29]:
df_copy['Actual Surgery Room Entry Time'] = pd.to_datetime(
    df_copy['Actual Surgery Room Entry Time'],
    format='%H:%M:%S',
    errors='coerce'
).dt.time

In [30]:
print(df_copy['Actual Surgery Room Entry Time'].dtype)
print(df_copy['Actual Surgery Room Entry Time'].head(10))
print("Missing values:", df_copy['Actual Surgery Room Entry Time'].isna().sum())

object
0    07:05:00
1    12:40:00
2    07:22:00
3    09:36:00
4    10:52:00
5    12:45:00
6    07:14:00
7    11:15:00
8    07:20:00
9    12:28:00
Name: Actual Surgery Room Entry Time, dtype: object
Missing values: 0


In [31]:
df_copy['Actual Surgery Room Entry Time'] = pd.to_timedelta(
    df_copy['Actual Surgery Room Entry Time'].astype(str),
    errors='coerce')

In [32]:
print(df_copy['Actual Surgery Room Entry Time'].dtype)
print(df_copy['Actual Surgery Room Entry Time'].head(10))
print("Missing values:", df_copy['Actual Surgery Room Entry Time'].isna().sum())

timedelta64[ns]
0   0 days 07:05:00
1   0 days 12:40:00
2   0 days 07:22:00
3   0 days 09:36:00
4   0 days 10:52:00
5   0 days 12:45:00
6   0 days 07:14:00
7   0 days 11:15:00
8   0 days 07:20:00
9   0 days 12:28:00
Name: Actual Surgery Room Entry Time, dtype: timedelta64[ns]
Missing values: 0


In [33]:
# List of columns that contain values in hours format only
time_columns = [
    'Actual Surgery Room Entry Time',
    'Planned Start Time for Doctor Block',
    'Incision Time',
    'Closure Time',
    'Planned End Time for Doctor Block']

In [34]:
import pandas as pd

# List of columns being checked
time_cols = ["Entry DateTime", "Incision DateTime", "Closure DateTime", "Exit DateTime"]

# Columns in datetime format
for col in time_cols:
    df_copy[col] = pd.to_datetime(df_copy[col], errors='coerce')

# Error count and list of deleted cells
internal_corrections = 0
external_corrections = 0
deleted_cells = []

# Cleaning up internal inconsistencies within a row
def clean_internal_conflicts(row):
    global internal_corrections
    for i in range(len(time_cols) - 1):
        col1 = time_cols[i]
        col2 = time_cols[i + 1]
        t1 = row[col1]
        t2 = row[col2]

        if pd.notnull(t1) and pd.notnull(t2) and t1 > t2:
            prev_time = row[time_cols[i - 1]] if i > 0 else None
            next_time = row[time_cols[i + 2]] if i + 2 < len(time_cols) else None

            if prev_time is not None and pd.notnull(prev_time) and prev_time > t1:
                bad_col = col1
            elif next_time is not None and pd.notnull(next_time) and t2 > next_time:
                bad_col = col2
            else:
                bad_col = col1  # default
            deleted_cells.append((row.name, bad_col, row[bad_col]))
            row[bad_col] = pd.NaT
            internal_corrections += 1
    return row

df_copy = df_copy.apply(clean_internal_conflicts, axis=1)

# Conflict between current Exit and next Entry
df_copy["next_Entry DateTime"] = df_copy["Entry DateTime"].shift(-1)

def fix_external_conflict(row):
    global external_corrections
    if pd.notnull(row["Exit DateTime"]) and pd.notnull(row["next_Entry DateTime"]):
        if row["Exit DateTime"] > row["next_Entry DateTime"]:
            deleted_cells.append((row.name, "Exit DateTime", row["Exit DateTime"]))
            row["Exit DateTime"] = pd.NaT
            external_corrections += 1
    return row

df_copy = df_copy.apply(fix_external_conflict, axis=1)
df_copy.drop(columns=["next_Entry DateTime"], inplace=True)

# Conflict between current Entry and previous Exit → Always delete the Exit of the previous line
df_copy["prev_Exit DateTime"] = df_copy["Exit DateTime"].shift(1)

def fix_inter_row_conflict_strict(row):
    global external_corrections
    entry = row["Entry DateTime"]
    prev_exit = row["prev_Exit DateTime"]

    if pd.notnull(entry) and pd.notnull(prev_exit) and entry < prev_exit:
        idx_prev = row.name - 1
        deleted_cells.append((idx_prev, "Exit DateTime", df_copy.at[idx_prev, "Exit DateTime"]))
        df_copy.at[idx_prev, "Exit DateTime"] = pd.NaT
        external_corrections += 1

    return row

df_copy = df_copy.apply(fix_inter_row_conflict_strict, axis=1)
df_copy.drop(columns=["prev_Exit DateTime"], inplace=True)

# summary
print(f"Cleaning finished:")
print(f"{internal_corrections} Internal corrections were made within lines.")
print(f"{external_corrections} Corrections of conflicts between lines have been made.")

# Show an example of deleted entries
if deleted_cells:
    print("\n Example of deleted values (up to the first 5):")
    for row_idx, col, val in deleted_cells[:5]:
        print(f" row {row_idx}, column '{col}', Deleted entry: {val}")
else:
    print("No entries were deleted.")

Cleaning finished:
0 Internal corrections were made within lines.
1067 Corrections of conflicts between lines have been made.

 Example of deleted values (up to the first 5):
 row 6, column 'Exit DateTime', Deleted entry: 2018-03-23 08:47:00
 row 11, column 'Exit DateTime', Deleted entry: 2018-07-13 12:45:00
 row 17, column 'Exit DateTime', Deleted entry: 2018-09-07 12:26:00
 row 21, column 'Exit DateTime', Deleted entry: 2018-06-08 10:53:00
 row 42, column 'Exit DateTime', Deleted entry: 2018-06-12 10:43:00


SP, only consider shift boundaries!

In [35]:
# Checking how many missing values there are in the shift columns
missing_values = df_copy[['Planned Start Time for Doctor Block', 'Planned End Time for Doctor Block']].isna().sum()
print("Missing values at shift boundaries:\n", missing_values)

Missing values at shift boundaries:
 Planned Start Time for Doctor Block    0
Planned End Time for Doctor Block      0
dtype: int64


In [36]:
# Convert date columns to datetime format
date_columns = ["Actual Surgery Room Entry Date", "Planned Start Date for Doctor Block"]
for col in date_columns:
    df_copy[col] = pd.to_datetime(df[col], errors='coerce')  # Save the date only

In [37]:
print(df_copy['Planned Start Date for Doctor Block'].dtype)

datetime64[ns]


In [38]:
import pandas as pd
import numpy as np

# Convert columns to dates:
datetime_cols = [
    "Actual Surgery Room Entry Date",
    "End of Surgery Date (Exit from OR)",
    "Planned Start Date for Doctor Block",
    "Planned End Date for Doctor Block"
]
for col in datetime_cols:
    df_copy[col] = pd.to_datetime(df_copy[col], errors='coerce')

# Creating new consolidated columns:
df_copy["Planned Start DateTime"] = df_copy["Planned Start Date for Doctor Block"] + pd.to_timedelta(df_copy["Planned Start Time for Doctor Block"].astype(str), errors='coerce')
df_copy["Planned End DateTime"] = df_copy["Planned End Date for Doctor Block"] + pd.to_timedelta(df_copy["Planned End Time for Doctor Block"].astype(str), errors='coerce')

# Actual full surgery duration without shift restrictions:
df_copy["S_p_raw"] = df_copy["Closure DateTime"] - df_copy["Incision DateTime"]

# Convert duration to decimal hours:
df_copy["S_p_raw_hours"] = df_copy["S_p_raw"].dt.total_seconds() / 3600
df_copy["S_p_raw_hours"] = df_copy["S_p_raw_hours"].round(5)

# Actual duration of surgery within the block (shift) boundaries:
df_copy["S_p_limited"] = (
    df_copy[["Closure DateTime", "Planned End DateTime"]].min(axis=1) -
    df_copy[["Incision DateTime", "Planned Start DateTime"]].max(axis=1)
).clip(lower=pd.Timedelta(0))

df_copy.loc[
    df_copy[['Incision DateTime', 'Closure DateTime', 'Planned Start DateTime', 'Planned End DateTime']].isnull().any(axis=1),
    "S_p_limited"
] = pd.NaT

df_copy["S_p_hours_limited"] = df_copy["S_p_limited"].dt.total_seconds() / 3600
df_copy.loc[df_copy["S_p_limited"].isna(), "S_p_hours_limited"] = np.nan
df_copy["S_p_hours_limited"] = df_copy["S_p_hours_limited"].round(5)

df_sp_sum = df_copy.groupby(
    ['Actual Surgery Room Entry Date', 'Actual Operating Room Number'],
    as_index=False
)['S_p_hours_limited'].sum()

df_sp_sum.rename(columns={'S_p_hours_limited': 'total_SP_per_day_room'}, inplace=True)

assert not df_sp_sum.duplicated(subset=["Actual Surgery Room Entry Date", "Actual Operating Room Number"]).any()

df_copy = df_copy.merge(
    df_sp_sum,
    on=['Actual Surgery Room Entry Date', 'Actual Operating Room Number'],
    how='left'
)

SUP:

In [39]:
dt_cols = [
    "Planned Start DateTime", "Planned End DateTime",
    "Entry DateTime", "Exit DateTime",
    "Incision DateTime", "Closure DateTime"
]
for col in dt_cols:
    df_copy[col] = pd.to_datetime(df_copy[col], errors='coerce')

df_copy["SU_p_before_limited"] = (
    (df_copy["Incision DateTime"] - df_copy["Entry DateTime"]).clip(lower=pd.Timedelta(0)) +
    (df_copy["Exit DateTime"] - df_copy["Closure DateTime"]).clip(lower=pd.Timedelta(0))
)
df_copy["SU_p_before_hours"] = df_copy["SU_p_before_limited"].dt.total_seconds() / 3600

prep_start = df_copy[["Entry DateTime", "Planned Start DateTime"]].max(axis=1)
prep_end = df_copy[["Incision DateTime", "Planned End DateTime"]].min(axis=1)
prep_duration = (prep_end - prep_start).clip(lower=pd.Timedelta(0))

post_start = df_copy[["Closure DateTime", "Planned Start DateTime"]].max(axis=1)
post_end = df_copy[["Exit DateTime", "Planned End DateTime"]].min(axis=1)
post_duration = (post_end - post_start).clip(lower=pd.Timedelta(0))

df_copy["SU_p_after_limited"] = prep_duration + post_duration
df_copy["SU_p_after_hours"] = df_copy["SU_p_after_limited"].dt.total_seconds() / 3600

df_copy.loc[df_copy[[
    'Entry DateTime', 'Incision DateTime', 'Closure DateTime', 'Exit DateTime',
    'Planned Start DateTime', 'Planned End DateTime'
]].isnull().any(axis=1), "SU_p_after_limited"] = pd.NaT

df_copy["SU_p_after_hours"] = df_copy["SU_p_after_limited"].dt.total_seconds() / 3600
df_copy.loc[df_copy["SU_p_after_limited"].isna(), "SU_p_after_hours"] = np.nan

su_grouped = df_copy.groupby(
    ["Actual Surgery Room Entry Date", "Actual Operating Room Number"],
    as_index=False
)["SU_p_after_hours"].sum().rename(columns={"SU_p_after_hours": "SU_p_limited_hours"})

df_copy = df_copy.merge(
    su_grouped,
    on=["Actual Surgery Room Entry Date", "Actual Operating Room Number"],
    how="left"
)

print(df_copy[[
    "Actual Surgery Room Entry Date", "Actual Operating Room Number",
    "SU_p_before_hours", "SU_p_after_hours", "SU_p_limited_hours"
]].head())

  Actual Surgery Room Entry Date  Actual Operating Room Number  \
0                     2018-01-05                         20013   
1                     2018-03-23                         20006   
2                     2018-07-13                         20002   
3                     2018-05-11                         20012   
4                     2018-05-11                         20012   

   SU_p_before_hours  SU_p_after_hours  SU_p_limited_hours  
0                NaN               NaN            0.000000  
1           0.383333          0.233333            0.666667  
2                NaN               NaN            0.366667  
3           0.366667          0.366667            1.350000  
4                NaN               NaN            1.350000  


SHR:

In [40]:
# Calculate block duration in hours
df_copy["Block Duration Hours"] = (
    df_copy["Planned End DateTime"] - df_copy["Planned Start DateTime"]
).dt.total_seconds() / 3600

# Get max block duration per surgeon per day and room
df_max = df_copy.groupby(
    ['Main Surgeon Code', 'Actual Surgery Room Entry Date', 'Actual Operating Room Number'],
    as_index=False
)['Block Duration Hours'].max()

df_max.rename(columns={'Block Duration Hours': 'max_hours'}, inplace=True)

# Sum the max values per day and room
df_sum = df_max.groupby(
    ['Actual Surgery Room Entry Date', 'Actual Operating Room Number'],
    as_index=False
)['max_hours'].sum()

# Rename the result column to SH_r_hours
df_sum.rename(columns={'max_hours': 'SH_r_hours'}, inplace=True)

#Merge SH_r_hours back to df_copy
df_copy = df_copy.merge(df_sum, on=['Actual Surgery Room Entry Date', 'Actual Operating Room Number'], how='left')

Utilization rate:

In [41]:
print(df_copy.columns.tolist())

['patient_id', 'Site Code', 'Main Surgeon Code', 'Activity Code', 'Activity Type Code', 'Planned SU Time (Large/Medium/Small)', 'Pre-Surgery Admission Date', 'Height', 'Weight', 'Patient Age (on Surgery Day)', 'Background Diseases/Diagnoses', 'Planned Surgery Date', 'Planned Surgery Time', 'Surgery Admission Date', 'Administrative Admission Time', 'Planned Operating Room Number', 'Pre-Surgery Hospitalization Admission Date', 'Pre-Surgery Hospitalization Admission Time', 'Pre-Surgical Admission Time Before Surgery', 'Surgical Team Codes', 'Anesthesiologist Code', 'Anesthesia Code', 'Type of Anesthesia', 'Cancellation Reason on Surgery Day', 'Actual Operating Room Number', 'Actual Surgery Room Entry Date', 'Actual Surgery Room Entry Time', 'Incision Time', 'Closure Time', 'End of Surgery Date (Exit from OR)', 'End of Surgery Time (Exit from OR)', 'Planned Surgery Duration', 'Recovery Room Entry Date', 'Recovery Room Entry Time', 'Recovery Room Exit Date', 'Recovery Room Exit Time', 'Post

In [42]:
# Clean the data before utilization calculation

# Remove rows where there was a No-Show
num_noshow = df_copy["No Show"].sum()
print(f"Removed {num_noshow} rows due to No-Show status.")
df_copy = df_copy[df_copy["No Show"] != True]

# Remove rows with missing time components needed for utilization calculation
required_cols = [
    "Incision DateTime",
    "Closure DateTime",
    "Planned Start DateTime",
    "Planned End DateTime",
    "Actual Surgery Room Entry Date",
    "Actual Surgery Room Entry Time",
    "End of Surgery Time (Exit from OR)"
]
df_copy = df_copy.dropna(subset=required_cols)

print(f"Remaining rows for utilization calculation: {len(df_copy)}")

if all(col in df_copy.columns for col in ["total_SP_per_day_room", "SU_p_limited_hours", "SH_r_hours"]):
    df_copy["daily_utilization_rate"] = (
        (df_copy["total_SP_per_day_room"] + df_copy["SU_p_limited_hours"]) /
        df_copy["SH_r_hours"]
    ) * 100
    print(f"Calculated utilization for {df_copy['daily_utilization_rate'].notna().sum()} rows.")

    df_copy.loc[df_copy["SH_r_hours"].isna(), "daily_utilization_rate"] = np.nan
    df_copy.loc[df_copy["SH_r_hours"] == 0, "daily_utilization_rate"] = np.nan
else:
    print("Missing columns required to calculate utilization")

Removed 0 rows due to No-Show status.
Remaining rows for utilization calculation: 12351
Calculated utilization for 12351 rows.


In [43]:
df_copy = df_copy.merge(df_copy[["Planned Start Date for Doctor Block",
                                        "Actual Operating Room Number",
                                        "daily_utilization_rate"]],
                        on=["Planned Start Date for Doctor Block", "Actual Operating Room Number"],
                        how="left")

In [44]:
from datetime import timedelta

for col in time_columns:
    if col in df_copy.columns:
        df_copy[col] = df_copy[col].apply(
            lambda x: f"{int(x.total_seconds() // 3600):02}:{int((x.total_seconds() % 3600) // 60):02}:{int(x.total_seconds() % 60):02}"
            if isinstance(x, timedelta) else (x if isinstance(x, str) else "")
        )

In [45]:
df_copy.to_excel("2018_data_with_UTR.xlsx", index=False)