Excel Spreadsheet Organizer

Overview

This was a short task completed for a confidential corporate organizer where information from an Excel Spreadsheet was modified and combined with information extracted from an XML file to create a table in another Excel Spreadsheet. This was done through the use of open-source libraries in Python and run in Jupyter Notebook. This allowed me to further explore the Pandas Library and become more familiar working with DataFrames.

Note : All sensitive information has been modified and replaced.

Solution Details

A few custom functions were built to shorten strings with a common prefixes, as shown here :

# This function shortens an ID from the Excel file.
def shorten_id(original_column) :
    length_to_cut = len('Delete Account ')
    length_total = len(original_column)
    truncated_column = original_column[length_to_cut: length_total]

    return truncated_column

# This function shortens the role information from the XML file.
def shorten_role(original_column) :
    length_to_cut = len('Role=')
    length_total = len(original_column)
    truncated_column = original_column[length_to_cut: length_total]

    return truncated_column

The XML file was iterated through using ElementTree Library built-in functions, as shown here :

# Iterate through XML file to filter and organize relevant information.
for account in root.iter('account') :
    id = account.get('id', default = None)

    for i in range(id_series.size) :
        # If ID is found in the ID series, execute code.
        if (id == id_series[i]) :
            # Add this unordered ID to the list.
            xml_ids.append(id)

            # Assign role column to have the role information.
            for attribute in account.iter('attribute') :
                if attribute.get('name') == 'Role' :
                    attributeValueRef_id = str()

                    for attributeValueRef in attribute.iter('attributeValueRef') :
                        # Remove the 'Role=' using the custom function before acquiring the information.
                        attributeValueRef_id += shorten_role(attributeValueRef.get('id')) + ' \n'
                    # Append each string to the list if the ID is found in the ID series.
                    role_column.append(attributeValueRef_id)

The ID and role columns were matched by iterating through the respective DataFrame and List.

# Organize role column to match the ordered ID series.
for organized_index in range(id_series.size) :
    # Add an empty string to end of list to account for IDs with blank roles.
    organized_role_column.append('')
    for unorganized_index in range(len(xml_ids)) :
        if (id_series[organized_index] == xml_ids[unorganized_index]) :
            # Replace empty string for IDs with roles in the XML file.
            organized_role_column.insert(organized_index, (role_column[unorganized_index]))

There is also a Python Script written with additional comments to further understand the procedure of developing this organizer.

Note : This Exploration Took a Weekend to Complete, Spanning Approximately 10 Hours Altogether.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
test_excel_1.xlsx		test_excel_1.xlsx
test_excel_2.xlsx		test_excel_2.xlsx
test_notebook.ipynb		test_notebook.ipynb
test_script.py		test_script.py
test_xml.xml		test_xml.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Excel Spreadsheet Organizer

Contents

Overview

Solution Details

About

Releases

Packages

Languages

Dipto9999/Excel_Spreadsheet_Organizer

Folders and files

Latest commit

History

Repository files navigation

Excel Spreadsheet Organizer

Contents

Overview

Solution Details

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages