# Loading the "super-secure" PSC control relationships

Some companies make a statement that the PSC needs to be super-secure. Here we build those control relationships and match the appropriate companies to them.

In [1]:
import pandas as pd
import json
from pandas.io.json import json_normalize;

import blaze as bz

You can access NaTType as type(pandas.NaT)
  @convert.register((pd.Timestamp, pd.Timedelta), (pd.tslib.NaTType, type(None)))


First load the data and do some tidying up so we don't waste RAM

In [3]:
original_psc_data = pd.read_json('../data/psc_snapshot-2017-09-08.json')
all_records_psc = pd.concat([original_psc_data['company_number'],json_normalize(original_psc_data['data'])],axis=1)
del original_psc_data

In [4]:
all_records_psc.kind.value_counts()

individual-person-with-significant-control          4225140
persons-with-significant-control-statement           404603
corporate-entity-person-with-significant-control     344866
legal-person-person-with-significant-control           5490
super-secure-person-with-significant-control            186
exemptions                                               37
totals#persons-of-significant-control-snapshot            1
Name: kind, dtype: int64

So there are 404,603 statements of a PSC not being found for a company. Below we see the breakdown in the types of statement they can make

In [5]:
all_records_psc[all_records_psc.kind == 'super-secure-person-with-significant-control'].dropna(axis=1).head()

Unnamed: 0,company_number,description,etag,kind,links.self
14869,10264175,super-secure-persons-with-significant-control,af4df16232440267ba71f671ea0ffa525133a89b,super-secure-person-with-significant-control,/company/08208688/persons-with-significant-con...
58172,5875447,super-secure-persons-with-significant-control,2416cda91df75d6743f434d135cd957a64f0314e,super-secure-person-with-significant-control,/company/08594248/persons-with-significant-con...
79679,4474577,super-secure-persons-with-significant-control,9f07537f806c2b7dbd35dccf641294f99642860a,super-secure-person-with-significant-control,/company/03601075/persons-with-significant-con...
100217,10284681,super-secure-persons-with-significant-control,f9ccb32f7142e1f650013e810f9234200269caac,super-secure-person-with-significant-control,/company/09781806/persons-with-significant-con...
241490,5889494,super-secure-persons-with-significant-control,2fa2cb97768235ddb365b0f9ff66e5aa6f330c1b,super-secure-person-with-significant-control,/company/05953764/persons-with-significant-con...


## Inserting the Super-Secure Controlling Entity into Neo4j

Adding a super-secure ControllingEnity node

In [11]:
from neo4j.v1 import GraphDatabase
driver = GraphDatabase.driver("bolt://10.0.0.1:7687", auth=("myusername", "mypassword"))

In [12]:
kind = 'super-secure-person-with-significant-control'
with driver.session() as session:
    session.run("CREATE (ce:ControllingEntity {type: {kind}})", kind=kind)

## Now to connect companies to statements

1. First let's define a function that takes a company record and creates the relationship back to the super-secure ControllingEntity. 
2. Then we need to loop over all companies that have declared super-secure PSC and insert those into the neo4j database.

In [9]:
def write_super_secure_control(input_data):
    """Function writes super_secure records to Neo4j database"""
    with driver.session() as session:
        session.run(("UNWIND {list} AS d "
                     "MERGE (c:Company {uid: d.company_id}) "
                     "MERGE (ce:ControllingEntity {type: d.kind}) "
                     "MERGE (c)-[:CONTROLS]->(ce);"), 
                    {"list": input_data})

We don't have too many super-sercure people so we can do this in a single insert

In [8]:
super_secure = all_records_psc[all_records_psc.kind == 'super-secure-person-with-significant-control']
super_secure['company_id'] = super_secure['links.self'].map(lambda x: x.split('/')[2])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [13]:
input_data = [v for k,v in super_secure[['company_id', 'kind']].T.to_dict().items()]
write_super_secure_control(input_data)

## Considering the "exemptions"

Exemptions behave in a very similar way so we will handle those here too but with a "IS_EXEMPT" relationship

In [14]:
kind = 'exemptions'
with driver.session() as session:
    session.run("CREATE (ce:ControllingEntity {type: {kind}})", kind=kind)

In [15]:
def write_exemption_control(input_data):
    """Function writes super_secure records to Neo4j database"""
    with driver.session() as session:
        session.run(("UNWIND {list} AS d "
                     "MERGE (c:Company {uid: d.company_id}) "
                     "MERGE (ce:ControllingEntity {type: d.kind}) "
                     "MERGE (c)-[:IS_EXEMPT]->(ce);"), 
                    {"list": input_data})

In [16]:
exempt = all_records_psc[all_records_psc.kind == 'exemptions']
exempt['company_id'] = exempt['links.self'].map(lambda x: x.split('/')[2])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [17]:
input_data = [v for k,v in exempt[['company_id', 'kind']].T.to_dict().items()]
write_exemption_control(input_data)