Skip to content

Using Python open-source libraries, organize and modify information from excel and xml files for better readability.

Notifications You must be signed in to change notification settings

Dipto9999/Excel_Spreadsheet_Organizer

Repository files navigation

Excel Spreadsheet Organizer

Contents

Overview

This was a short task completed for a confidential corporate organizer where information from an Excel Spreadsheet was modified and combined with information extracted from an XML file to create a table in another Excel Spreadsheet. This was done through the use of open-source libraries in Python and run in Jupyter Notebook. This allowed me to further explore the Pandas Library and become more familiar working with DataFrames.

Note : All sensitive information has been modified and replaced.

Solution Details

A few custom functions were built to shorten strings with a common prefixes, as shown here :

# This function shortens an ID from the Excel file.
def shorten_id(original_column) :
    length_to_cut = len('Delete Account ')
    length_total = len(original_column)
    truncated_column = original_column[length_to_cut: length_total]

    return truncated_column
# This function shortens the role information from the XML file.
def shorten_role(original_column) :
    length_to_cut = len('Role=')
    length_total = len(original_column)
    truncated_column = original_column[length_to_cut: length_total]

    return truncated_column

The XML file was iterated through using ElementTree Library built-in functions, as shown here :

# Iterate through XML file to filter and organize relevant information.
for account in root.iter('account') :
    id = account.get('id', default = None)

    for i in range(id_series.size) :
        # If ID is found in the ID series, execute code.
        if (id == id_series[i]) :
            # Add this unordered ID to the list.
            xml_ids.append(id)

            # Assign role column to have the role information.
            for attribute in account.iter('attribute') :
                if attribute.get('name') == 'Role' :
                    attributeValueRef_id = str()

                    for attributeValueRef in attribute.iter('attributeValueRef') :
                        # Remove the 'Role=' using the custom function before acquiring the information.
                        attributeValueRef_id += shorten_role(attributeValueRef.get('id')) + ' \n'
                    # Append each string to the list if the ID is found in the ID series.
                    role_column.append(attributeValueRef_id)

The ID and role columns were matched by iterating through the respective DataFrame and List.

# Organize role column to match the ordered ID series.
for organized_index in range(id_series.size) :
    # Add an empty string to end of list to account for IDs with blank roles.
    organized_role_column.append('')
    for unorganized_index in range(len(xml_ids)) :
        if (id_series[organized_index] == xml_ids[unorganized_index]) :
            # Replace empty string for IDs with roles in the XML file.
            organized_role_column.insert(organized_index, (role_column[unorganized_index]))

There is also a Python Script written with additional comments to further understand the procedure of developing this organizer.

Note : This Exploration Took a Weekend to Complete, Spanning Approximately 10 Hours Altogether.

About

Using Python open-source libraries, organize and modify information from excel and xml files for better readability.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published