## TrumpWorld - analysing companies by country

I'll try to cross-reference the TrumpWorld dataset (specifically, the org-org one) with OpenCorporates.com to gather as much national data as possible for these companies and attribute them a country.

In [1]:
import sys
import json
import urllib2 as url
import pandas as pd
import numpy as np
from difflib import SequenceMatcher

In [10]:
# First step is loading the .csv and gathering all unique company names
tw_orgorg = pd.read_csv('org-org-connections.csv')
tw_perorg = pd.read_csv('person-org-connections.csv')

In [45]:
org_a_uni = tw_orgorg['Organization A'].unique()
org_b_uni = tw_orgorg['Organization B'].unique()
org_c_uni = tw_perorg[tw_perorg.Person == 'DONALD J. TRUMP']['Organization'].unique()

org_uni = np.array(list(set(np.concatenate((org_a_uni, org_b_uni, org_c_uni)))))
org_uni.sort()

In [None]:
# Now for each of them run a query on OpenCorporates.com
query_header = 'https://api.opencorporates.com/v0.4/companies/search?q={0}'
ocorp_data = {}

for i, ou in enumerate(org_uni):
    sys.stdout.write("\rOrganisation {0:2d}/{1}".format(i+1, len(org_uni)))
    ou_q = '+'.join(ou.lower().split())
    ou_q = query_header.format(ou_q)
    try:
        resp = json.loads(url.urlopen(ou_q).read())
    except:
        # Something went wrong, skip
        continue
    comp_list = resp['results']['companies']
    # First, did we find anything?
    if len(comp_list) < 1:
        continue
    elif len(comp_list) > 1:
        # Some quick epurations: we don't need dissoluted companies
        comp_list = [c for c in comp_list if c['company']['dissolution_date'] is None]
    # Second, what are the names?
    names = [c['company']['name'] for c in comp_list]
    # Let's check which one fits better, if there are more than one
    if len(comp_list) == 1:
        comp_i = 0
    else:
        match = [SequenceMatcher(None, ou, n).ratio() for n in names]
        comp_i = np.argmax(match)
        # NOTE: needs improvement as sometimes multiple matches can have the same exact name...
    # But do we have address info?
    address = comp_list[comp_i]['company']['registered_address']
    if address is None or address['country'] is None:
        continue
    ocorp_data[ou] = address['country']

### Distance analysis

In order to better interpret the result let's get an estimate of how far the companies are from Donald J. Trump

In [69]:
import networkx as nx

holdings_graph = nx.Graph()
# Add a special node, Donald J. Trump
holdings_graph.add_node('DONALD J. TRUMP')
# Add all organizations as nodes
holdings_graph.add_nodes_from(org_uni)
# Now connections. First, direct Donald-to-organization ones
for o in tw_perorg[tw_perorg.Person == 'DONALD J. TRUMP'].Organization:
    holdings_graph.add_edge('DONALD J. TRUMP', o)
# Then between organizations
for oa, ob in zip(tw_orgorg['Organization A'], tw_orgorg['Organization B']):
    holdings_graph.add_edge(oa, ob)

In [74]:
# Now compute distances
for ou in org_uni:
    try:
        dist = nx.shortest_path_length(holdings_graph, 'DONALD J. TRUMP', ou)
    except nx.NetworkXNoPath:
        dist = None
    if dist is not None:
        print "Distance of {0} from DJT: {1}".format(ou, dist)
    else:
        print "No direct connection found between {0} and DJT".format(ou)

 Distance of 1290 AVENUE OF THE AMERICAS, A TENANCY-IN-COMMON from DJT: 3
Distance of 1291 AVENUE OF THE AMERICAS, A TENANCY-IN-COMMON from DJT: 3
Distance of 1292 AVENUE OF THE AMERICAS, A TENANCY-IN-COMMON from DJT: 5
Distance of 1293 AVENUE OF THE AMERICAS, A TENANCY-IN-COMMON from DJT: 2
Distance of 3126 CORPORATION from DJT: 1
Distance of 4 SHADOW TREE LANE LLC from DJT: 1
Distance of 4 SHADOW TREE LANE MEMBER CORP. from DJT: 1
Distance of 40 WALL DEVELOPMENT ASSOCIATES LLC from DJT: 1
Distance of 40 WALL STREET COMMERCIAL LLC from DJT: 1
Distance of 40 WALL STREET LLC from DJT: 1
Distance of 40 WALL STREET MEMBER CORP. from DJT: 1
Distance of 401 MEZZ VENTURE LLC from DJT: 1
Distance of 401 NORTH WABASH VENTURE LLC from DJT: 1
Distance of 42FLOORS from DJT: 7
Distance of 55 WALL DEVELOPMENT CORP. from DJT: 1
Distance of 767 MANAGER LLC from DJT: 1
Distance of 809 NORTH CANON LLC from DJT: 1
Distance of 809 NORTH CANON MEMBER CORPORATION from DJT: 1
Distance of 81 PINE NOTE HOLDER