# DATA620 Final Project A
## Network Analysis
### Euclid Zhang ~ Sam Reeves ~ David Moste

# Project Description
***

We would like to find businesses (or ideally people) who are influential in drawing foreign and domestic investment in Myanmar.  Through basic network analysis techniques, we hope to find a clique of people or companies with common investment sources or common activities.

# Data Description
***

This data was scraped and realeased anonymously from official government sources in two leaks called Myanmar Financials and Myanmar Investments.  The former is incorporation documents for ~125k companies, and the latter is information from investment proposals for about 10k companies.

# Known Challenges
***

1. About 1/4 of the companies in Myanmar Financials paid somebody to approve their incorporation documents without addresses or names.

2. Comparatively few companies in the first leak have applied for investment permission.

3. People in Myanmar are named for astrological information pertaining to their birth.  There is no family name, and many people have the same names.

4. A small number of names are given in Burmese script, which is included in UTF-8, but is unreadable to our team.

# Method
***

1. Clean the data!
2. Create an edge list:
> |Company Name |People
> --- | --- 
> |companyNameInMyanmar |landOwner, nameOfInvestor, officers
3. Project bipartite graph, view statistics
4. Trim edges
5. Visualize

In [1]:
import os
from pathlib import *
import json
import pandas as pd
import networkx as nx
import matplotlib as plt
from bs4 import BeautifulSoup
import re

In [2]:
# Company incorporation documents
com_dir = Path('/home/s/fpa/data/company_info')

# Investment proposals and information about real projects
inv_dir = Path('/home/s/fpa/data/investment_info')

# Helper Functions
***
#### pathToList()
- Takes a Path object for a directory full of JSON files
- Returns a list containing a dict for each file read in

#### companyInfo()
- Takes a list of dicts
- Extracts Company Name, Addresses, Officers, and Officer Titles
- Returns a DataFrame with this information

#### cleanDF()
- Takes a DataFrame
- Converts all letters to lowercase
- Substitutes ltd with limited
- Removes all punctuation
- Removes spaces

In [3]:
def pathToList(path_obj):
    file_list = []
    for file_path in path_obj.iterdir():
        data = json.loads(file_path.read_bytes())
        file_list.append(data)
    return(file_list)

def companyInfo(com_list):
    info_list = []
    
    # Iterate over all companies in the list
    for i in range(len(com_list)):
        info = {}
        
        # Extract Company Name to "title"
        info['companyNameInMyanmar'] = com_list[i]['Corp']['CompanyName']
        
        # Extract Company Addresses
        for a in range(len(com_list[i]['Addresses'])):
            info['address' + str(a)] = BeautifulSoup(com_list[i]['Addresses'][a]['UIFormattedAddress']).get_text()
        
        # Extract Officer Information
        for b in range(len(com_list[i]['Officers'])):
            info['officer' + str(b) + 'name'] = com_list[i]['Officers'][b]['FullNameNormalized']
            info['officer' + str(b) + 'type'] = com_list[i]['Officers'][b]['OfficerType']
            
        # Convert to DataFrame
        info_list.append(info)
    df = pd.DataFrame.from_dict(info_list, orient = 'columns')
    return(df)



# Preprocessing
***

In [4]:
companies = companyInfo(pathToList(com_dir))
inv_list = pathToList(inv_dir)
os.listdir(inv_dir)

['commercialoperationsoverdue.json',
 'approved.json',
 'proposals.json',
 'actuals.json',
 'landlease.json',
 'monitor.json']

In [5]:
com_with_info = companies.dropna(subset = ['address0', 'officer0name'], how = 'all')
overdue = pd.DataFrame.from_dict([x for x in inv_list[0]['data']], orient = 'columns')
approved = pd.DataFrame.from_dict([x for x in inv_list[1]['data']], orient = 'columns')
proposals = pd.DataFrame.from_dict([x for x in inv_list[2]['data']], orient = 'columns')
actuals = pd.DataFrame.from_dict([x for x in inv_list[3]['data']], orient = 'columns')
landlease = pd.DataFrame.from_dict([x for x in inv_list[4]['data']], orient = 'columns')
monitor = pd.DataFrame.from_dict([x for x in inv_list[5]['data']], orient = 'columns')

In [45]:
pathToList(com_dir)[0]

{'Corp': {'CorpId': '713f82b3-705d-4ff3-93ee-4bfdca631bca',
  'CompanyName': 'ETERNAL WHITE FLOWER WAY COMPANY LIMITED',
  'RegistrationNumber': '115788760',
  'PriorRegistrationNumber': '2499/2014-2015',
  'CompanyType': 'Private Company Limited by Shares',
  'CompanyTypeId': 1,
  'CorpTypeId': 1,
  'Status': 'Struck-Off',
  'RegistrationDate': '8/19/2014 12:00:00 AM +00:00',
  'RegistrationDateFormatted': '19/08/2014',
  'AltName': 'ထာဝရ အဖြူရောင် ပန်းခင်းလမ်း ကုမ္ပဏီ လီမိတက်',
  'IsForeign': False,
  'IsSmall': 1,
  'TotalShares': 10000,
  'ShareCurrency': 'MMK',
  'HoldingCompanyName': '',
  'HoldingCompanyRegNumber': '',
  'HoldingCompanyJurisdiction': '',
  'CategoryOfAssociation': '',
  'CategoryOfAssociationId': 0,
  'AnnualReturnDueDate': '9/19/2019 12:00:00 AM +00:00',
  'AnnualReturnDueDateFormatted': '19/09/2019',
  'FinancialStatementDueDate': '',
  'FinancialStatementDueDateFormatted': '',
  'RegNumberInJurisdictionOfIncorporation': '',
  'RegisteredOfficeAddress': '<div>

In [9]:
com_with_info

Unnamed: 0,companyNameInMyanmar,address0,address1,officer0name,officer0type,officer1name,officer1type,officer2name,officer2type,officer3name,...,officer63type,officer64name,officer64type,officer65name,officer65type,officer66name,officer66type,officer67name,officer67type,address4
0,ETERNAL WHITE FLOWER WAY COMPANY LIMITED,THUNANDAR 1ST STREET H QUARTER/NORTH OKKALAPA ...,"THUNANDAR 1ST STREET714ZA QUARTER , YANGON, M...",DAWMYINTMYINTTHAN,Director,DAWZARNITHWE,Director,UAUNGMYOTHANT,Director,UMYOTHU,...,,,,,,,,,,
2,PANN TIE THIT COMPANY LIMITED,MYANMAR,"(8) MILE JUNCTION , SHWE BO - MYITKYINA ROAD M...",PHYOMAUNGMAUNG,Director,SUMYATMAUNGMAUNG,Director,,,,...,,,,,,,,,,
4,CENTER Y RESOURCES LIMITED,"135A-1,THAN LWIN ROADKAMAYUT TOWNSHIP , YANGON...","135A-1,THAN LWIN ROADKAMAYUT TOWNSHIP , YANGON...",MRDENGENHUA,Director,MRLIUXINYONG,Director,MRSUHAILIANG,Director,,...,,,,,,,,,,
5,SARA FUTURE TRADING COMPANY LIMITED,MYANMAR,"THIRI 3RD STREETNO. 124WARD NO. 2, HLAING TOWN...",MRMUTHURAMALINGATHEVARSARAVANAN,Director,MRSEETHALAXMIPITCHAMNAIDUSEENIVASAN,Director,UHLAWINBYARYAR,Director,,...,,,,,,,,,,
7,VOLUME TABLEWARE -(MYANMAR) LTD.,"HOLDING NO. 46,47,48,58,59 (KA), 132KWIN NO. 1...",KANBAWZA AVENUE ROADNO. 7 (A)GOLDEN VALLEY AVE...,MRADAMJOHNMONTGOMERY,Director,MRDEANALEXANDERMONTGOMERY,Director,MRJAMESALEXANDERMEISENHEIMER,Director,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
125696,SHWE (RURAL&URBAN) DEVELOPMENT BANK LIMITED,MYANMAR,"NO (66-76), CORNER OF MERCHANT ROAD & PANSODAN...",DAWCHANMYAESANDAWCHANMYAEKYAWWIN,Director,DAWNANMOUKLAUNGSEINGDAWSANSANAYE,Director,DAWTHAENOESANDAWTHAZINKYAWWIN,Director,UKYAWWIN,...,,,,,,,,,,
125697,LO TINE YA TAUNG TINE YA GEMS COMPANY LIMITED,MYANMAR,"NO.(49), TATAING HMWE STREET, MYA KHWAR NYO ...",KHUNBU,Director,MAYTHUAUNG,Director,SOEMOEAUNG,Director,,...,,,,,,,,,,
125699,HALLIBURTON MYANMAR ENERGY SERVICES PTE. LTD.,MYANMAR,CORNER OF PYAY ROAD AND HLEDAN ROADUNIT#518-51...,THANTZINTUN,Authorised Officer,BHARATHWAJKANNANSRINIVAS,Director,CHOWFARNHUAN,Director,QUEKKWANGCHYEWILLIAM,...,,,,,,,,,,
125700,TOE TATT EAIN COMPANY LIMITED,MYANMAR,"SWAL TAW STREETMYOTHIT QUARTER,MONYWA CITY, S...",DAWMAYKHINKYAW,Director,,,,,,...,,,,,,,,,,


# Network Analysis
***

# Conclusions
***