**1. Import modules**

In [12]:
# importing modules
import pandas as pd
import os
from datetime import date
from datetime import datetime
import time
import openpyxl
from openpyxl.worksheet.datavalidation import DataValidation
import gc
from openpyxl import load_workbook
from bs4 import BeautifulSoup
import re

**2. Filtering snapshot data to information that we need**

This code snippet is for handling data in an Excel file using Python with the pandas library. Here's a breakdown:

1. **Read Excel File**: It reads an Excel file named 'Colorado_Status_Snapshot_Report_04_02_2024_00_00.xlsx' into a pandas DataFrame (a table-like data structure in Python), skipping the first row.  **REPLACE THE FILE PATH EACH TIME. Go to EMR --> Report --> Status Snapshot --> Set date to current date, time to 0:00**

2. **Specify Values**: It creates a list of specific values that are desired to be kept in the "Status" column.

3. **Filter Rows**: It filters rows from the DataFrame based on whether the value in the "Status Type" column is in the list of specific values defined earlier.

4. **Rename a Column**: It renames the "Last Update" column to "Status Type Update" in the filtered DataFrame.

5. **Remove Unwanted Columns**: It selects only three columns - 'Status Type', 'Resource', and 'Status Type Update' - from the filtered DataFrame.

6. **Write to New Excel File**: It saves the filtered DataFrame into a new Excel file named 'filtered_snapshot_df.xlsx' without including the DataFrame index.

Overall, this code reads data from one Excel file, filters it based on specific criteria, modifies the DataFrame structure, and then saves the filtered data into a new Excel file.

In [13]:
# Read the first Excel file containing the data
snapshot_df = pd.read_excel('/content/Colorado_Status_Snapshot_Report_04_02_2024_00_00.xlsx', skiprows=1)

# Define the specific values you want to keep in the "Status" column
specific_values = ['Confirmed COVID Hospitalized - Age 0-4', 'Confirmed COVID Hospitalized - Age 5-17', 'Confirmed COVID-19 - Total', 'NICU - Available (current)',
                   'Total # of NICU Beds', 'PICU - Available (current)', 'Total # of PICU Beds', 'Ped Med/Surgical - Available (current)', 'Total # of ICU Capable Beds',
                   'Total # of Staffed ICU Beds', 'ICU Bed Availability (current)', 'Baseline Staffed-bed Capacity', 'Total # of Acute Care Beds',
                   'Acute Care Inpatient Beds In-use', 'Med/Surgical Bed Availability (current)']

# Filter rows based on the specific values in the "Status" column
filtered_snapshot_df = snapshot_df[snapshot_df['Status Type'].isin(specific_values)]

# Renaming a column
filtered_snapshot_df.rename(columns={'Last Update': 'Status Type Update'}, inplace=True)

# Remove the rest of the information (optional, depending on your requirement),
filtered_snapshot_df = filtered_snapshot_df[['Status Type', 'Resource', 'Status Type Update']]

# Write the merged data to a new Excel file
filtered_snapshot_df.to_excel('filtered_snapshot_df.xlsx', index=False)

  warn("Workbook contains no default style, apply openpyxl's default")
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered_snapshot_df.rename(columns={'Last Update': 'Status Type Update'}, inplace=True)


**3. Transposing the other columns except "Resource"**

This code does several things with an Excel file:

1. It reads data from an Excel file called "filtered_snapshot_df.xlsx" into a DataFrame using Pandas.
2. It rearranges the data in the DataFrame by transposing the "status type" and "resource" columns, essentially swapping rows and columns.
3. It resets the column names in the transposed DataFrame.
4. It writes the transposed DataFrame into a new Excel file called "transposed_df.xlsx" without including the index column.

So, basically, it's taking data from one Excel file, reorganizing it, and saving the reorganized data into a new Excel file.

In [14]:
# Read the Excel file
filtered_snapshot_df = pd.read_excel('/content/filtered_snapshot_df.xlsx')

# Transpose the "status type" and "resource" columns
transposed_df = filtered_snapshot_df.pivot(index='Resource', columns='Status Type').reset_index()

# Reset column names after pivot
transposed_df.columns = transposed_df.columns.map(''.join)

# Write the merged data to a new Excel file
transposed_df.to_excel('transposed_df.xlsx', index=False)

**4. Renaming some columns after transpose step and rearranging**

This code does several things with an Excel file::

1. **Read Excel File**: It reads data from an Excel file named 'transposed_df.xlsx' and stores it in a DataFrame called `transposed_df`.

2. **Column Renaming**: It defines a dictionary (`column_rename_mapping`) to map old column names to new ones and renames the columns in the DataFrame accordingly.

3. **Column Reordering**: It specifies the desired order of columns in the DataFrame and rearranges the columns accordingly.

4. **DataFrame Reindexing**: It reindexes the DataFrame with the desired column order.

5. **Write to Excel**: Finally, it writes the manipulated DataFrame to a new Excel file named 'transposed_df_rename.xlsx' without including the index column.

So basically, it's reading an Excel file, renaming and reordering its columns, and then saving the modified data back to a new Excel file.

In [15]:
# Read the Excel file into a DataFrame
transposed_df = pd.read_excel('/content/transposed_df.xlsx')

# Define a dictionary to map old column names to new ones
column_rename_mapping = {
     'Status Type UpdateAcute Care Inpatient Beds In-use': 'Acute_Care_Inpatient_Beds_In_Use_lastupdate',
    'Status Type UpdateBaseline Staffed-bed Capacity': 'Baseline_Staffed_Bed_Capacity_lastupdate',
    'Status Type UpdateConfirmed COVID Hospitalized - Age 0-4': 'Confirmed_COVID19_Hospitalized_Age0to4_lastupdate',
       'Status Type UpdateConfirmed COVID Hospitalized - Age 5-17': 'Confirmed_COVID19_Hospitalized_Age5to17_lastupdate',
       'Status Type UpdateConfirmed COVID-19 - Total': 'Num_Confirmed_COVID19_lastupdate',
       'Status Type UpdateICU Bed Availability (current)': 'ICU_Bed_Availability_current_lastupdate',
       'Status Type UpdateMed/Surgical Bed Availability (current)': 'MedSurgical_Bed_Availability_current_lastupdate',
       'Status Type UpdateNICU - Available (current)': 'NICU_Bed_Available_current_lastupdate',
       'Status Type UpdatePICU - Available (current)': 'PICU_Bed_Available_current_lastupdate',
       'Status Type UpdatePed Med/Surgical - Available (current)': 'Ped_MedSurgical_Bed_Available_current_lastupdate',
       'Status Type UpdateTotal # of Acute Care Beds': 'Total_Num_Acute_Care_Beds_lastupdate',
       'Status Type UpdateTotal # of ICU Capable Beds': 'Total_Num_ICU_Capable_Beds_lastupdate',
       'Status Type UpdateTotal # of NICU Beds': 'Total_Num_NICU_Beds_lastupdate',
       'Status Type UpdateTotal # of PICU Beds': 'Total_Num_PICU_Beds_lastupdate',
       'Status Type UpdateTotal # of Staffed ICU Beds': 'Total_Num_Staffed_ICU_Beds_lastupdate',
    # Add more mappings as needed
}

# Rename columns
transposed_df = transposed_df.rename(columns=column_rename_mapping)

# Define the desired order of columns
desired_columns_order = ['Resource',
    'Confirmed_COVID19_Hospitalized_Age0to4_lastupdate', 'Confirmed_COVID19_Hospitalized_Age5to17_lastupdate', 'Num_Confirmed_COVID19_lastupdate', 'NICU_Bed_Available_current_lastupdate',
                         'Total_Num_NICU_Beds_lastupdate', 'PICU_Bed_Available_current_lastupdate', 'Total_Num_PICU_Beds_lastupdate', 'Ped_MedSurgical_Bed_Available_current_lastupdate', 'Total_Num_ICU_Capable_Beds_lastupdate',
                         'Total_Num_Staffed_ICU_Beds_lastupdate', 'ICU_Bed_Availability_current_lastupdate', 'Baseline_Staffed_Bed_Capacity_lastupdate', 'Total_Num_Acute_Care_Beds_lastupdate', 'Acute_Care_Inpatient_Beds_In_Use_lastupdate',
                         'MedSurgical_Bed_Availability_current_lastupdate',]
# Replace 'column1', 'column2', etc. with your actual column names in the desired order

# Reindex the DataFrame with the desired column order
transposed_df_rename = transposed_df.reindex(columns=desired_columns_order)

# Now transposed_df has columns renamed and rearranged according to your specifications

# Write the merged data to a new Excel file
transposed_df_rename.to_excel('transposed_df_rename.xlsx', index=False)


**5. Renaming some of the columns to the match the SQL names**

This code does several things with Excel files using Python's pandas library:

1. It reads data from two Excel files into pandas DataFrames (`transposed_df_rename` and `event_df`) (**for the event_df --> you will need to download event_snapshot data from EMR.  Go to EMR --> Report --> Event Snapshot --> Set start date to yesterdays date, set end date to today's date, report format to be xlsx --> select "Colorado COVID-19_Healthcare_Update DAILY" --> click next -->  --> input snapshot event hour as 10 and snapshot --> select the event minutes to 00 --> Generate report --> download**).
2. It defines a dictionary (`column_rename_mapping`) that maps old column names to new ones.
3. It renames columns in the `event_df` DataFrame based on the mappings defined in `column_rename_mapping`.
4. It defines the desired order of columns in the DataFrame (`desired_columns_order`).
5. It rearranges the columns in the `event_df` DataFrame based on the desired order.
6. It merges the two DataFrames (`transposed_df_rename` and `event_rearranged_df`) based on the 'Resource' column.
7. It sorts the merged DataFrame (`stacked_df`) by the 'Resource' column in ascending order.
8. It writes the sorted data to a new Excel file named 'sorted_merged_df.xlsx'.

In [20]:
# Read the Excel files into DataFrames
transposed_df_rename = pd.read_excel('/content/transposed_df_rename.xlsx')
event_df = pd.read_excel('/content/report (6).xlsx', skiprows=1)

# Define a dictionary to map old column names to new ones
column_rename_mapping = {
    'Confirmed COVID Hospitalized - Age 0-4': 'Confirmed_COVID19_Hospitalized_Age0to4',
    'Confirmed COVID Hospitalized - Age 5-17': 'Confirmed_COVID19_Hospitalized_Age5to17',
    'Confirmed COVID-19 - Total': 'Num_Confirmed_COVID19',
    'NICU - Available (current)': 'NICU_Bed_Available_current',
    'Total # of NICU Beds': 'Total_Num_NICU_Beds',
    'PICU - Available (current)': 'PICU_Bed_Available_current',
    'Total # of PICU Beds': 'Total_Num_PICU_Beds',
    'Ped Med/Surgical - Available (current)': 'Ped_MedSurgical_Bed_Available_current',
    'Total # of ICU Capable Beds': 'Total_Num_ICU_Capable_Beds',
    'Total # of Staffed ICU Beds': 'Total_Num_Staffed_ICU_Beds',
    'ICU Bed Availability (current)': 'ICU_Bed_Availability_current',
    'Baseline Staffed-bed Capacity': 'Baseline_Staffed_Bed_Capacity',
    'Total # of Acute Care Beds': 'Total_Num_Acute_Care_Beds',
    'Acute Care Inpatient Beds In-use': 'Acute_Care_Inpatient_Beds_In_Use',
    'Med/Surgical Bed Availability (current)': 'MedSurgical_Bed_Availability_current',
    # Add more mappings as needed
}

# Rename columns
event_rename_df = event_df.rename(columns=column_rename_mapping)

# Define the desired order of columns
desired_columns_order = ['Resource',
    'Confirmed_COVID19_Hospitalized_Age0to4',
    'Confirmed_COVID19_Hospitalized_Age5to17',
    'Num_Confirmed_COVID19',
    'NICU_Bed_Available_current',
    'Total_Num_NICU_Beds',
    'PICU_Bed_Available_current',
    'Total_Num_PICU_Beds',
    'Ped_MedSurgical_Bed_Available_current',
    'Total_Num_ICU_Capable_Beds',
    'Total_Num_Staffed_ICU_Beds',
    'ICU_Bed_Availability_current',
    'Baseline_Staffed_Bed_Capacity',
    'Total_Num_Acute_Care_Beds',
    'Acute_Care_Inpatient_Beds_In_Use',
    'MedSurgical_Bed_Availability_current',
    'License #  (State will populate)', 'Last Update',
                         ]

# Rearramged the desired order of columns
event_rearranged_df = event_rename_df[desired_columns_order]

# Merge the  DataFrames based on the 'Resource' column
stacked_df = pd.merge(transposed_df_rename, event_rearranged_df, on='Resource')

# Sort stacked_df by ascending order based on the 'Resource' column
stacked_df_sorted = stacked_df.sort_values(by='Resource', ascending=True)

# Write the sorted data to a new Excel file
stacked_df_sorted.to_excel('sorted_merged_df.xlsx', index=False)


  warn("Workbook contains no default style, apply openpyxl's default")
