**1. import modules**

In [None]:
import pandas as pd

This is to find what is changed in the Hospital Name Tracker spreadsheet https://docs.google.com/spreadsheets/d/1A_WDWfahKhdZyYD0Bp7I_BNuP2CaneuF/edit#gid=1752769994 (created by someone else) to see what needs to be changed in ours SQL database. The SQL hospital name data was given to me by Himaja, so ask her how she got that data.

This code is a Python script that compares two Excel files and creates a new Excel file with some changes. Let's break it down step by step:

1. **Reading Excel Files**: It starts by trying to read two Excel files. The first one is named "Hospital Name Tracker.xlsx", and the second one is named "Resource Name Change Log- SQL Database (1).xlsx". If any of these files are not found, it prints an error message and stops.

2. **Data Processing**:
   - It merges the data from both files based on a common column named 'RegulatoryID'.
   - It sorts the merged data by the 'Date' column in descending order (from newest to oldest).
   - It removes duplicate rows based on the 'Resource facility name', keeping only the first occurrence (the one with the latest date).

3. **Comparison and Description**:
   - It compares the 'Resource facility name' column with the 'Changed name' column in each row.
   - If they are the same, it adds "No change" to a new column named 'Change Description'.
   - If they are different, it adds a description stating that the name has changed from the old name to the new name.

4. **Saving Output**:
   - It saves the processed data to a new Excel file named 'output.xlsx', including columns: 'Date', 'RegulatoryID', 'Resource facility name', 'Changed name', and 'Change Description'.

5. **Printing Output**: It prints a message confirming that the output has been saved to 'output.xlsx'.

This script essentially compares names in the Excel files and generates a report of changes between them.

In [None]:
def compare_names(old_name, new_name):
    if old_name == new_name:
        return "No change"
    else:
        return f"Changed from '{old_name}' to '{new_name}'"

def main():
    # Read the first Excel file
    input_file_path = '/content/Hospital Name Tracker.xlsx'
    output_file_path = 'output.xlsx'

    try:
        df = pd.read_excel(input_file_path)
    except FileNotFoundError:
        print(f"Error: File '{input_file_path}' not found.")
        return
    except Exception as e:
        print(f"Error: {e}")
        return

    # Read the second Excel file for regulatoryID
    regulatory_file_path = '/content/Resource Name Change Log- SQL Database (1).xlsx'
    try:
        regulatory_df = pd.read_excel(regulatory_file_path)
    except FileNotFoundError:
        print(f"Error: File '{regulatory_file_path}' not found.")
        return
    except Exception as e:
        print(f"Error: {e}")
        return

    # Merge based on 'Resource facility name' column
    merged_df = pd.merge(df, regulatory_df, on='RegulatoryID', how='left')

    # Remove duplicates based on 'Resource facility name' keeping the one with the latest date
    merged_df.sort_values(by='Date', ascending=False, inplace=True)
    merged_df.drop_duplicates(subset='Resource facility name', keep='first', inplace=True)

    # Compare names and create a description column
    merged_df['Change Description'] = merged_df.apply(lambda row: compare_names(row['Resource facility name'], row['Changed name']), axis=1)

    # Save the output to an Excel file
    merged_df[['Date', 'RegulatoryID', 'Resource facility name', 'Changed name', 'Change Description']].to_excel(output_file_path, index=False)

    print(f"Output saved to '{output_file_path}'")

if __name__ == "__main__":
    main()


Output saved to 'output.xlsx'
