### Release note:  
1. Use the nominatim openstreetmap to find longitude and latitude for the address. Address entity is defined as location with same longitude and latitude according to nominatim.  
2. Compress first name, last name for only allowing alphanumeric character.  
3. Define director name are the same if they have the same compress name, same Address (or postal code if Address not find in nominatim)  

### Known issues:  
1. Nominatim doe not recognize postal code. Therefore, it cannot find a lot of location  
2. Nominatim return the same address if the street number is not found. Therefore, a lot of place has the same address even though they have different street number.  
3. Current version does not consider the time dimension. Therefore, same director may be shown up more than once when he/she moved  

### What to do next:    
1. Data Cleansing: Recombine nodes like directors with similar name within a connected cluster. (See David, D. Dave in the graph file)  
2. Data Import: Improve the accuracy of Nominatim by adding more data. Geometric data format is a bit tricky. I have defined the placeid and then attributes. May take me some time to learn.  
3. Data Analysis: Need tools to do data exploration of the graph. Need tools to identify the information broker (e.g. Davide)  

### Source code:
The Cypher code to import the data is as follows  

USING PERIODIC COMMIT  
LOAD CSV WITH HEADERS FROM "file:///addr_node.csv" AS row  
CREATE (:Address {addr_id: row.an_id, addr_addr: row.addr_addr, addr_lon: row.addr.lon, addr_lat: row.addr.lat});  

USING PERIODIC COMMIT  
LOAD CSV WITH HEADERS FROM "file:///n_directors.csv" AS row  
CREATE (:Director {dr_id: row.ds_id, first: row.ds_first, last: row.ds_last, addr_id: row.an_id, postal: row.ds_postal, address: row.ds_fulladdr});  

USING PERIODIC COMMIT  
LOAD CSV WITH HEADERS FROM "file:///n_corp.csv" AS row  
CREATE (:Corporation {corp_id: row.ci_corp_id, addr_id: row.addr_id, postal: row.ds_postal, address: row.ds_fulladdr});  

CREATE INDEX ON :Director(addr_id);  
CREATE INDEX ON :Corporation(addr_id);  

USING PERIODIC COMMIT  
LOAD CSV WITH HEADERS FROM "file:///n_dir_corp.csv" AS row  
MATCH (dr:Director {dr_id: row.dr_id})  
MATCH (corp:Corporation {corp_id: row.dr_corp_id})  
MERGE (dr)-[:WORKIN]->(corp);  

USING PERIODIC COMMIT  
LOAD CSV WITH HEADERS FROM "file:///n_corp.csv" AS row  
MATCH (addr:Address {addr_id: row.addr_id})  
MATCH (corp:Corporation {corp_id: row.ci_corp_id})  
MERGE (corp)-[:RESIDE]->(addr);  

USING PERIODIC COMMIT  
LOAD CSV WITH HEADERS FROM "file:///n_directors.csv" AS row  
MATCH (addr:Address {addr_id: row.ds_an_id})  
MATCH (dr:Director {dr_id: row.ds_id})  
MERGE (dr)-[:LIVEIN]->(addr);  


Orange is the coorporation  
Purple is the Address  
Blue is the Director  

### Cypher Query
MATCH p=()--()--()--() RETURN p LIMIT 200

CALL algo.betweenness(
'MATCH (n) RETURN id(n) as id',
'MATCH (c)-[:WORK_IN]->(d) RETURN id(c) as source, id(d) as target',
{graph:'cypher',  write: true, concurrency:7, direction:'BOTH', writeProperty:'betweenness'});

/* OR  pagerank*/
CALL algo.pageRank(
'MATCH (n) RETURN id(n) as id',
'MATCH (c)-[]-(d) RETURN id(c) as source, id(d) as target, count(*) as weight',
{graph:'cypher', iterations:10, write: true});

#### It is returning Director worked in more than 20 companies.

match (m:Director) where size((:Corporation)--(m)) > 20 return m limit 5

<img src="img/David.png" title="img" />