# General Dutch Survey

This notebook searches for any Dutch references in the offshore leaks database.

## Preliminaries

Please note the general set-up requirements contained in the repo readme and the _showcase_ notebook.
Remember that the database needs to be running locally for this workbook to work.

In [None]:
#imports
import os  #to find the settings file(s)
import csv #to process the settings file(s)
import shutil #to copy the settings file (if needed)
from neo4j import GraphDatabase
import pandas as pd

In [None]:
#get settings
settings_dir = os.path.join("..","settings")
personal_settings = os.path.join(settings_dir,"personal_settings.csv")
if not "personal_settings.csv" in os.listdir(settings_dir):
    default_settings = os.path.join(settings_dir,"default_settings.csv")
    shutil.copy(default_settings, personal_settings)
    print("Created new personal settings file, this probably needs to be edited before proceeding.")
with open(personal_settings, mode = 'r') as file:
    user_settings = {}
    for line in csv.DictReader(file):
        user_settings[line['setting']] = line['value']
db_uri = "bolt://localhost:" + str(user_settings['port_number'])

In [None]:
#data path
data_root = os.path.join("..","data")
data_david = os.path.join(data_root,"extracts","david")
data_david

In [None]:
db_connection = GraphDatabase.driver(db_uri, auth=(user_settings['username'],user_settings['password']))

In [None]:
db_session = db_connection.session(database=user_settings['db_name'])

## Find Dutch Addresses

In [None]:
#look at entities
query = "match (n:Entity) where n.countries ends with 'Netherlands' return n"
query_response = db_session.run(query)
entities_nl = pd.DataFrame([dict(record.data()['n']) for record in query_response])
entities_nl

In [None]:
#look at officers
query = "match (n:Officer) where n.countries ends with 'Netherlands' return n"
query_response = db_session.run(query)
officers_nl = pd.DataFrame([dict(record.data()['n']) for record in query_response])
officers_nl

## Load Results David

David separately extracted data on entities and officers based in the Netherlands.

In [None]:
#file names
entities_file_david = "entities_nl_address.csv"
officers_file_david = "officers_nl_address.csv"

In [None]:
#entities david
entities_david = pd.read_csv(os.path.join(data_david,entities_file_david))
entities_david

In [None]:
# officers david
officers_david = pd.read_csv(os.path.join(data_david,officers_file_david))
officers_david

In [None]:
#TODO: compare data with David's explicitly
#TODO: look for 'contains' rather than 'ends with'
#TODO: make composite search that finds all with Dutch connection
#TODO: make 2nd generation (or further) matching based on full Dutch datasets
#TODO: have summary statistics to compare prevalence of NL in the datasets (ideally by dataset)