Built on Anaconda3 (Python version 3.4) on Windows 10 on 2015-08-10.

This notebook analyzes an Excel file that shows a one-to-many relationship between the text in one column (named 'Company Name' in my work) and the text in another (named 'Ownership Status' in my work). The heart of this work is reversing the direction of the relationship; finding each unique member of the 'many' and showing its relationship to all members in the 'one' population. In effect, it reverses the original one-to-many to a new one-to-many.

I'm sure there is a database term for this, but I am not a database guru. I'm just a guy who needed to find this new relationship from an existing Excel spreadsheet. I always wanted to learn more about Pandas, since I expect to be doing more analysis of spreadsheets in the coming weeks, and I dread the idea of using VBA to do the work.

In [41]:
import pandas as pd

Open the file. Note the use of the leading "r" in the file name, to pass the raw string to Pandas and avoid Unicode issues. The sheetname 'Screening' contains the data I'm using in this analysis.

In [42]:
filename = r'C:\Users\Paul\Documents\03 - Professional\Private Equity\Aerospace and Defense Industry Research\Spaven Analysis\PE_backed_A&D_companies.xlsx'
df = pd.read_excel(filename, sheetname='Screening')

Take a look at the columns in this file, so I can find the owners' names.


Looking at the Excel, I confirm that the column named 'Ownership Status' contains the "many" side of the relationship that I need to reverse.

Now that I have found the data of interest, I will convert it, row by row, into a Python dictionary. Keys are the names of owning companies, and values are lists of the names of companies in which they have an ownership interest. This is the 'heart' of this notebook. This is where the reversal of the one-to-many relationship takes place.

In [45]:
owner_to_holdings = {}
for owners in df['Ownership Status']:
    owners_split = owners.split('; ')
    for owner in owners_split:
        if owner not in owner_to_holdings:
            owner_to_holdings[owner] = [df['Company Name']]
        else:
            owner_to_holdings[owner].append(df['Company Name'])

 

Now that I am happy the one-to-many has been reversed, I will convert the dictionary to a list. 

In [47]:
list_of_owners = []
for key, value in owner_to_holdings.items():
    list_of_owners.append([key, len(value)])
    

Now create a Pandas DataFrame with the list of owners and the number of their holdings.

In [54]:
df_owner = pd.DataFrame(list_of_owners)

In [52]:
writer = pd.ExcelWriter(r'C:\Users\Paul\Downloads\PandasOutput.xlsx')
df_owner.to_excel(writer, 'PandasOutput')
writer.save()

I am sure there are more elegant and Pythonic ways to do this, but I'm not a programmer, just an analyst happy to add a new tool to my kit--even if I don't know how best to use it yet :)