# DATA620 Final Project A
## Network Analysis
### Euclid Zhang ~ Sam Reeves ~ David Moste

# Project Description
***

We would like to find businesses (or ideally people) who are influential in drawing foreign and domestic investment in Myanmar.  Through basic network analysis techniques, we hope to find a clique of people or companies with common investment sources or common activities.

# Data Description
***

This data was scraped and realeased anonymously from official government sources in two leaks called Myanmar Financials and Myanmar Investments.  The former is incorporation documents for ~125k companies, and the latter is information from investment proposals for about 10k companies.

# Known Challenges
***

1. About 1/4 of the companies in Myanmar Financials paid somebody to approve their incorporation documents without addresses or names.

2. Comparatively few companies in the first leak have applied for investment permission.

3. People in Myanmar are named for astrological information pertaining to their birth.  There is no family name, and many people have the same names.

4. A small number of names are given in Burmese script, which is included in UTF-8, but is unreadable to our team.

In [1]:
import os
from pathlib import *
import json
import pandas as pd
import networkx as nx
import matplotlib as plt
from bs4 import BeautifulSoup

In [2]:
# Company incorporation documents
com_dir = Path('/home/s/fpa/data/company_info')

# Investment proposals and information about real projects
inv_dir = Path('/home/s/fpa/data/investment_info')

# Definitions for codes found in the above documents
#def_dir = Path('/home/s/fpa/data/definitions')

# Helper Functions
***
#### pathToList()
- Takes a Path object for a directory full of JSON files
- Returns a list containing a dict for each file read in

#### companyInfo()
- Takes a list of dicts
- Extracts Company Name, Addresses, Officers, and Officer Titles
- Returns a DataFrame with this information

#### investmentInfo()
- Takes a list of dicts
- Extracts ???
- Returns a DataFrame with this information

In [3]:
def pathToList(path_obj):
    file_list = []
    for file_path in path_obj.iterdir():
        data = json.loads(file_path.read_bytes())
        file_list.append(data)
    return(file_list)

def companyInfo(com_list):
    info_list = []
    
    # Iterate over all companies in the list
    for i in range(len(com_list)):
        info = {}
        
        # Extract Company Name to "title"
        info['title'] = com_list[i]['Corp']['CompanyName']
        
        # Extract Company Addresses
        for a in range(len(com_list[i]['Addresses'])):
            info['address' + str(a)] = BeautifulSoup(com_list[i]['Addresses'][a]['UIFormattedAddress']).get_text()
        
        # Extract Officer Information
        for b in range(len(com_list[i]['Officers'])):
            info['officer' + str(b) + 'name'] = com_list[i]['Officers'][b]['FullNameNormalized']
            info['officer' + str(b) + 'type'] = com_list[i]['Officers'][b]['OfficerType']
            
        # Convert to DataFrame
        info_list.append(info)
    df = pd.DataFrame.from_dict(info_list, orient = 'columns')
    return(df)

def investmentInfo(inv_list):
    overdue = pd.DataFrame.from_dict([x for x in inv_list[0]['data']], orient = 'columns')
    approved = pd.DataFrame.from_dict([x for x in inv_list[1]['data']], orient = 'columns')
    proposals = pd.DataFrame.from_dict([x for x in inv_list[2]['data']], orient = 'columns')
    actuals = pd.DataFrame.from_dict([x for x in inv_list[3]['data']], orient = 'columns')
    landlease = pd.DataFrame.from_dict([x for x in inv_list[4]['data']], orient = 'columns')
    monitor = pd.DataFrame.from_dict([x for x in inv_list[5]['data']], orient = 'columns')
    
    return()

# Preprocessing
***

In [4]:
companies = companyInfo(pathToList(com_dir))
inv_list = pathToList(inv_dir)
os.listdir(inv_dir)

['commercialoperationsoverdue.json',
 'approved.json',
 'proposals.json',
 'actuals.json',
 'landlease.json',
 'monitor.json']

In [5]:
com_with_info = companies.dropna(subset = ['address0', 'officer0name'], how = 'all')

In [38]:
len(inv_list[0]['data'])

819

# Network Analysis
***

# Conclusions
***