# Date Adjustment Document for 2024 EDA

In this notebook, we implement the date modifications as specified in the EDA_2024 code file. Our aim is to update and standardize all date fields across the dataset to match the revised schedule, ensuring chronological consistency and preparing the data for downstream analysis.

**What We Did and Why**  
- **Identified Date Columns**: We reviewed every column containing date information (e.g., planned surgery date, actual OR entry/exit date, recovery and discharge dates) to locate fields requiring adjustment.  
- **Applied Uniform Offset**: We defined a consistent offset or mapping rule (e.g., shifting all 2023 dates into 2024) and applied it programmatically to each relevant column.  
- **Reparsed Timestamps**: After shifting dates, we reconverted combined date-time strings into proper `datetime` objects to maintain compatibility with prior analyses.  
- **Validated Changes**: We compared summary statistics (min, max, counts by year and month) before and after transformation to confirm that every date was correctly adjusted.  
- **Ensured Downstream Compatibility**: By aligning the dates with the assumptions used in our earlier EDA (see EDA_2024 notebook), we guarantee that feature engineering, conflict detection, and profiling steps operate on the intended temporal range.

By following this process, we ensure that the 2024 dataset reflects the accurate time frame for all surgical events, enabling reliable chronological analyses and preserving the integrity of subsequent modeling efforts.  

In [None]:
import pandas as pd

df = pd.read_excel("2024_data_with_UTR_F.xlsx")

date_col = 'Pre-Surgery Admission Date'
new_col = 'Pre-Surgery Admission Date NEW'

df[date_col] = pd.to_datetime(df[date_col], errors='coerce')

def replace_year(date):
    if pd.isnull(date):
        return pd.NaT
    return date.replace(year=2024)

df[new_col] = df[date_col].apply(replace_year)

cols = list(df.columns)
insert_at = cols.index(date_col) + 1
cols.remove(new_col)
cols.insert(insert_at, new_col)
df = df[cols]

df.to_excel("updated_dates_2024.xlsx", index=False)

In [None]:
print(df[['Pre-Surgery Admission Date', 'Pre-Surgery Admission Date NEW']].head(10))

  Pre-Surgery Admission Date Pre-Surgery Admission Date NEW
0                 2023-07-06                     2024-07-06
1                 2023-07-09                     2024-07-09
2                 2023-10-29                     2024-10-29
3                 2023-09-20                     2024-09-20
4                 2023-11-02                     2024-11-02
5                 2023-07-10                     2024-07-10
6                 2023-09-18                     2024-09-18
7                 2023-10-30                     2024-10-30
8                 2023-11-02                     2024-11-02
9                 2023-11-26                     2024-11-26


In [None]:
df = pd.read_excel("updated_dates_2024.xlsx")

date_col = 'Planned Surgery Date'
new_col = 'Planned Surgery Date NEW'

df[date_col] = pd.to_datetime(df[date_col], errors='coerce')

def replace_year(date):
    if pd.isnull(date):
        return pd.NaT
    return date.replace(year=2024)

df[new_col] = df[date_col].apply(replace_year)

cols = list(df.columns)
insert_at = cols.index(date_col) + 1
cols.remove(new_col)
cols.insert(insert_at, new_col)
df = df[cols]

df.to_excel("updated_dates_2024_1.xlsx", index=False)

In [None]:
print(df[['Planned Surgery Date', 'Planned Surgery Date NEW']].head(10))

  Planned Surgery Date Planned Surgery Date NEW
0           2023-07-14               2024-07-14
1           2023-07-14               2024-07-14
2           2023-11-10               2024-11-10
3           2023-09-22               2024-09-22
4           2023-11-03               2024-11-03
5           2023-07-14               2024-07-14
6           2023-09-22               2024-09-22
7           2023-11-10               2024-11-10
8           2023-11-10               2024-11-10
9           2023-12-08               2024-12-08


In [None]:
df = pd.read_excel("updated_dates_2024_1.xlsx")

date_col = 'Surgery Admission Date'
new_col = 'Surgery Admission Date NEW'

df[date_col] = pd.to_datetime(df[date_col], errors='coerce')

def replace_year(date):
    if pd.isnull(date):
        return pd.NaT
    return date.replace(year=2024)

df[new_col] = df[date_col].apply(replace_year)

cols = list(df.columns)
insert_at = cols.index(date_col) + 1
cols.remove(new_col)
cols.insert(insert_at, new_col)
df = df[cols]

df.to_excel("updated_dates_2024_2.xlsx", index=False)

In [None]:
print(df[['Surgery Admission Date', 'Surgery Admission Date NEW']].head(10))

  Surgery Admission Date Surgery Admission Date NEW
0             2023-07-14                 2024-07-14
1             2023-07-14                 2024-07-14
2             2023-11-10                 2024-11-10
3             2023-09-22                 2024-09-22
4             2023-11-03                 2024-11-03
5             2023-07-14                 2024-07-14
6             2023-09-22                 2024-09-22
7             2023-11-10                 2024-11-10
8             2023-11-10                 2024-11-10
9             2023-12-08                 2024-12-08


In [None]:
df = pd.read_excel("updated_dates_2024_2.xlsx")

date_col = 'Actual Surgery Room Entry Date'
new_col = 'Actual Surgery Room Entry Date NEW'

df[date_col] = pd.to_datetime(df[date_col], errors='coerce')

def replace_year(date):
    if pd.isnull(date):
        return pd.NaT
    return date.replace(year=2024)

df[new_col] = df[date_col].apply(replace_year)

cols = list(df.columns)
insert_at = cols.index(date_col) + 1
cols.remove(new_col)
cols.insert(insert_at, new_col)
df = df[cols]

df.to_excel("updated_dates_2024_3.xlsx", index=False)

In [None]:
print(df[['Actual Surgery Room Entry Date', 'Actual Surgery Room Entry Date NEW']].head(10))

  Actual Surgery Room Entry Date Actual Surgery Room Entry Date NEW
0                     2023-07-14                         2024-07-14
1                     2023-07-14                         2024-07-14
2                     2023-11-10                         2024-11-10
3                     2023-09-22                         2024-09-22
4                     2023-11-03                         2024-11-03
5                     2023-07-14                         2024-07-14
6                     2023-09-22                         2024-09-22
7                     2023-11-10                         2024-11-10
8                     2023-11-10                         2024-11-10
9                     2023-12-08                         2024-12-08


In [None]:
df = pd.read_excel("updated_dates_2024_3.xlsx")

date_col = 'End of Surgery Date (Exit from OR)'
new_col = 'End of Surgery Date (Exit from OR) NEW'

df[date_col] = pd.to_datetime(df[date_col], errors='coerce')

def replace_year(date):
    if pd.isnull(date):
        return pd.NaT
    return date.replace(year=2024)

df[new_col] = df[date_col].apply(replace_year)

cols = list(df.columns)
insert_at = cols.index(date_col) + 1
cols.remove(new_col)
cols.insert(insert_at, new_col)
df = df[cols]

df.to_excel("updated_dates_2024_4.xlsx", index=False)

In [None]:
print(df[['End of Surgery Date (Exit from OR)', 'End of Surgery Date (Exit from OR) NEW']].head(10))

  End of Surgery Date (Exit from OR) End of Surgery Date (Exit from OR) NEW
0                         2023-07-14                             2024-07-14
1                         2023-07-14                             2024-07-14
2                         2023-11-10                             2024-11-10
3                         2023-09-22                             2024-09-22
4                         2023-11-03                             2024-11-03
5                         2023-07-14                             2024-07-14
6                         2023-09-22                             2024-09-22
7                         2023-11-10                             2024-11-10
8                         2023-11-10                             2024-11-10
9                         2023-12-08                             2024-12-08


In [None]:
df = pd.read_excel("updated_dates_2024_4.xlsx")

date_col = 'Recovery Room Entry Date'
new_col = 'Recovery Room Entry Date NEW'

df[date_col] = pd.to_datetime(df[date_col], errors='coerce')

def replace_year(date):
    if pd.isnull(date):
        return pd.NaT
    return date.replace(year=2024)

df[new_col] = df[date_col].apply(replace_year)

cols = list(df.columns)
insert_at = cols.index(date_col) + 1
cols.remove(new_col)
cols.insert(insert_at, new_col)
df = df[cols]

df.to_excel("updated_dates_2024_5.xlsx", index=False)

In [None]:
print(df[['Recovery Room Entry Date', 'Recovery Room Entry Date NEW']].head(10))

  Recovery Room Entry Date Recovery Room Entry Date NEW
0               2023-07-14                   2024-07-14
1               2023-07-14                   2024-07-14
2               2023-11-10                   2024-11-10
3               2023-09-22                   2024-09-22
4               2023-11-03                   2024-11-03
5               2023-07-14                   2024-07-14
6               2023-09-22                   2024-09-22
7               2023-11-10                   2024-11-10
8               2023-11-10                   2024-11-10
9               2023-12-08                   2024-12-08


In [None]:
df = pd.read_excel("updated_dates_2024_5.xlsx")

date_col = 'Recovery Room Exit Date'
new_col = 'Recovery Room Exit Date NEW'

df[date_col] = pd.to_datetime(df[date_col], errors='coerce')

def replace_year(date):
    if pd.isnull(date):
        return pd.NaT
    return date.replace(year=2024)

df[new_col] = df[date_col].apply(replace_year)

cols = list(df.columns)
insert_at = cols.index(date_col) + 1
cols.remove(new_col)
cols.insert(insert_at, new_col)
df = df[cols]

df.to_excel("updated_dates_2024_6.xlsx", index=False)

In [None]:
print(df[['Recovery Room Exit Date', 'Recovery Room Exit Date NEW']].head(10))

  Recovery Room Exit Date Recovery Room Exit Date NEW
0              2023-07-14                  2024-07-14
1              2023-07-14                  2024-07-14
2              2023-11-10                  2024-11-10
3              2023-09-22                  2024-09-22
4              2023-11-03                  2024-11-03
5              2023-07-14                  2024-07-14
6              2023-09-22                  2024-09-22
7              2023-11-10                  2024-11-10
8              2023-11-10                  2024-11-10
9              2023-12-08                  2024-12-08


In [None]:
df = pd.read_excel("updated_dates_2024_6.xlsx")

date_col = 'Planned Start Date for Doctor Block'
new_col = 'Planned Start Date for Doctor Block NEW'

df[date_col] = pd.to_datetime(df[date_col], errors='coerce')

def replace_year(date):
    if pd.isnull(date):
        return pd.NaT
    return date.replace(year=2024)

df[new_col] = df[date_col].apply(replace_year)

cols = list(df.columns)
insert_at = cols.index(date_col) + 1
cols.remove(new_col)
cols.insert(insert_at, new_col)
df = df[cols]

df.to_excel("updated_dates_2024_7.xlsx", index=False)

In [None]:
print(df[['Planned Start Date for Doctor Block', 'Planned Start Date for Doctor Block NEW']].head(10))

  Planned Start Date for Doctor Block Planned Start Date for Doctor Block NEW
0                          2023-07-14                              2024-07-14
1                          2023-07-14                              2024-07-14
2                          2023-11-10                              2024-11-10
3                          2023-09-22                              2024-09-22
4                          2023-11-03                              2024-11-03
5                          2023-07-14                              2024-07-14
6                          2023-09-22                              2024-09-22
7                          2023-11-10                              2024-11-10
8                          2023-11-10                              2024-11-10
9                          2023-12-08                              2024-12-08


In [None]:
df = pd.read_excel("updated_dates_2024_7.xlsx")

date_col = 'Entry DateTime'
new_col = 'Entry DateTime NEW'

df[date_col] = pd.to_datetime(df[date_col], errors='coerce')

def change_to_2024(dt):
    if pd.isnull(dt):
        return pd.NaT
    return dt.replace(year=2024)

df[new_col] = df[date_col].apply(change_to_2024)

cols = list(df.columns)
insert_at = cols.index(date_col) + 1
cols.remove(new_col)
cols.insert(insert_at, new_col)
df = df[cols]

df.to_excel("updated_dates_2024_8.xlsx", index=False)

In [None]:
print(df[['Entry DateTime', 'Entry DateTime NEW']].head(10))

       Entry DateTime  Entry DateTime NEW
0 2023-07-14 10:35:00 2024-07-14 10:35:00
1 2023-07-14 12:41:00 2024-07-14 12:41:00
2 2023-11-10 07:20:00 2024-11-10 07:20:00
3 2023-09-22 07:14:00 2024-09-22 07:14:00
4 2023-11-03 09:06:00 2024-11-03 09:06:00
5 2023-07-14 08:44:00 2024-07-14 08:44:00
6 2023-09-22 08:53:00 2024-09-22 08:53:00
7 2023-11-10 09:27:00 2024-11-10 09:27:00
8 2023-11-10 12:15:00 2024-11-10 12:15:00
9 2023-12-08 07:14:00 2024-12-08 07:14:00


In [None]:
df = pd.read_excel("updated_dates_2024_8.xlsx")

date_col = 'Incision DateTime'
new_col = 'Incision DateTime NEW'

df[date_col] = pd.to_datetime(df[date_col], errors='coerce')

def change_to_2024(dt):
    if pd.isnull(dt):
        return pd.NaT
    return dt.replace(year=2024)

df[new_col] = df[date_col].apply(change_to_2024)

cols = list(df.columns)
insert_at = cols.index(date_col) + 1
cols.remove(new_col)
cols.insert(insert_at, new_col)
df = df[cols]

df.to_excel("updated_dates_2024_9.xlsx", index=False)

In [None]:
print(df[['Incision DateTime', 'Incision DateTime NEW']].head(10))

    Incision DateTime Incision DateTime NEW
0 2023-07-14 11:05:00   2024-07-14 11:05:00
1 2023-07-14 13:10:00   2024-07-14 13:10:00
2 2023-11-10 07:59:00   2024-11-10 07:59:00
3 2023-09-22 07:41:00   2024-09-22 07:41:00
4 2023-11-03 09:25:00   2024-11-03 09:25:00
5 2023-07-14 09:10:00   2024-07-14 09:10:00
6 2023-09-22 09:21:00   2024-09-22 09:21:00
7 2023-11-10 09:53:00   2024-11-10 09:53:00
8 2023-11-10 12:28:00   2024-11-10 12:28:00
9 2023-12-08 07:30:00   2024-12-08 07:30:00


In [None]:
df = pd.read_excel("updated_dates_2024_9.xlsx")

date_col = 'Closure DateTime'
new_col = 'Closure DateTime NEW'

df[date_col] = pd.to_datetime(df[date_col], errors='coerce')

def change_to_2024(dt):
    if pd.isnull(dt):
        return pd.NaT
    return dt.replace(year=2024)

df[new_col] = df[date_col].apply(change_to_2024)

cols = list(df.columns)
insert_at = cols.index(date_col) + 1
cols.remove(new_col)
cols.insert(insert_at, new_col)
df = df[cols]

df.to_excel("updated_dates_2024_10.xlsx", index=False)

In [None]:
print(df[['Closure DateTime', 'Closure DateTime NEW']].head(10))

     Closure DateTime Closure DateTime NEW
0 2023-07-14 12:08:00  2024-07-14 12:08:00
1 2023-07-14 13:57:00  2024-07-14 13:57:00
2 2023-11-10 08:59:00  2024-11-10 08:59:00
3 2023-09-22 08:22:00  2024-09-22 08:22:00
4 2023-11-03 10:46:00  2024-11-03 10:46:00
5 2023-07-14 09:42:00  2024-07-14 09:42:00
6 2023-09-22 09:37:00  2024-09-22 09:37:00
7 2023-11-10 11:33:00  2024-11-10 11:33:00
8 2023-11-10 12:50:00  2024-11-10 12:50:00
9 2023-12-08 08:22:00  2024-12-08 08:22:00


In [None]:
df = pd.read_excel("updated_dates_2024_10.xlsx")

date_col = 'Exit DateTime'
new_col = 'Exit DateTime NEW'

df[date_col] = pd.to_datetime(df[date_col], errors='coerce')

def change_to_2024(dt):
    if pd.isnull(dt):
        return pd.NaT
    return dt.replace(year=2024)

df[new_col] = df[date_col].apply(change_to_2024)

cols = list(df.columns)
insert_at = cols.index(date_col) + 1
cols.remove(new_col)
cols.insert(insert_at, new_col)
df = df[cols]

df.to_excel("updated_dates_2024_11.xlsx", index=False)

In [None]:
print(df[['Exit DateTime', 'Exit DateTime NEW']].head(10))

        Exit DateTime   Exit DateTime NEW
0 2023-07-14 12:20:00 2024-07-14 12:20:00
1 2023-07-14 14:15:00 2024-07-14 14:15:00
2                 NaT                 NaT
3 2023-09-22 08:35:00 2024-09-22 08:35:00
4                 NaT                 NaT
5 2023-07-14 09:49:00 2024-07-14 09:49:00
6 2023-09-22 09:52:00 2024-09-22 09:52:00
7 2023-11-10 11:47:00 2024-11-10 11:47:00
8 2023-11-10 12:57:00 2024-11-10 12:57:00
9 2023-12-08 08:31:00 2024-12-08 08:31:00


In [None]:
df = pd.read_excel("updated_dates_2024_11.xlsx")

date_col = 'Planned Start DateTime'
new_col = 'Planned Start DateTime NEW'

df[date_col] = pd.to_datetime(df[date_col], errors='coerce')

def change_to_2024(dt):
    if pd.isnull(dt):
        return pd.NaT
    return dt.replace(year=2024)

df[new_col] = df[date_col].apply(change_to_2024)

cols = list(df.columns)
insert_at = cols.index(date_col) + 1
cols.remove(new_col)
cols.insert(insert_at, new_col)
df = df[cols]

df.to_excel("updated_dates_2024_12.xlsx", index=False)

In [None]:
print(df[['Planned Start DateTime', 'Planned Start DateTime NEW']].head(10))

  Planned Start DateTime Planned Start DateTime NEW
0    2023-07-14 07:00:00        2024-07-14 07:00:00
1    2023-07-14 07:00:00        2024-07-14 07:00:00
2    2023-11-10 07:00:00        2024-11-10 07:00:00
3    2023-09-22 07:00:00        2024-09-22 07:00:00
4    2023-11-03 07:00:00        2024-11-03 07:00:00
5    2023-07-14 07:00:00        2024-07-14 07:00:00
6    2023-09-22 07:00:00        2024-09-22 07:00:00
7    2023-11-10 07:00:00        2024-11-10 07:00:00
8    2023-11-10 07:00:00        2024-11-10 07:00:00
9    2023-12-08 07:00:00        2024-12-08 07:00:00


In [None]:
df = pd.read_excel("updated_dates_2024_12.xlsx")

date_col = 'Planned End DateTime'
new_col = 'Planned End DateTime NEW'

df[date_col] = pd.to_datetime(df[date_col], errors='coerce')

def change_to_2024(dt):
    if pd.isnull(dt):
        return pd.NaT
    return dt.replace(year=2024)

df[new_col] = df[date_col].apply(change_to_2024)

cols = list(df.columns)
insert_at = cols.index(date_col) + 1
cols.remove(new_col)
cols.insert(insert_at, new_col)
df = df[cols]

df.to_excel("updated_dates_2024_13.xlsx", index=False)

In [None]:
print(df[['Planned End DateTime', 'Planned End DateTime NEW']].head(10))

  Planned End DateTime Planned End DateTime NEW
0  2023-07-14 12:56:00      2024-07-14 12:56:00
1  2023-07-14 12:56:00      2024-07-14 12:56:00
2  2023-11-10 11:06:00      2024-11-10 11:06:00
3  2023-09-22 09:36:00      2024-09-22 09:36:00
4  2023-11-03 09:41:00      2024-11-03 09:41:00
5  2023-07-14 12:56:00      2024-07-14 12:56:00
6  2023-09-22 09:36:00      2024-09-22 09:36:00
7  2023-11-10 11:06:00      2024-11-10 11:06:00
8  2023-11-10 11:06:00      2024-11-10 11:06:00
9  2023-12-08 12:10:00      2024-12-08 12:10:00


In [None]:
df = pd.read_excel("updated_dates_2024_13.xlsx")

date_col = 'Planned End Date for Doctor Block'
new_col = 'Planned End Date for Doctor Block NEW'

df[date_col] = pd.to_datetime(df[date_col], errors='coerce')

def replace_year(date):
    if pd.isnull(date):
        return pd.NaT
    return date.replace(year=2024)

df[new_col] = df[date_col].apply(replace_year)

cols = list(df.columns)
insert_at = cols.index(date_col) + 1
cols.remove(new_col)
cols.insert(insert_at, new_col)
df = df[cols]

df.to_excel("updated_dates_2024_14.xlsx", index=False)

In [None]:
print(df[['Planned End Date for Doctor Block', 'Planned End Date for Doctor Block NEW']].head(10))

  Planned End Date for Doctor Block Planned End Date for Doctor Block NEW
0                        2023-07-14                            2024-07-14
1                        2023-07-14                            2024-07-14
2                        2023-11-10                            2024-11-10
3                        2023-09-22                            2024-09-22
4                        2023-11-03                            2024-11-03
5                        2023-07-14                            2024-07-14
6                        2023-09-22                            2024-09-22
7                        2023-11-10                            2024-11-10
8                        2023-11-10                            2024-11-10
9                        2023-12-08                            2024-12-08
