# Dashboard Data Generator

This notebook is to generate data to power the dashboard.
The idea is to make csv extracts from the full dataset containing the necessary data for the dashboard

## Preliminaries

Please note the general set-up requirements contained in the repo readme and the _showcase_ notebook.
Remember that the database needs to be running locally for this workbook to work.

In [24]:
#imports
import os  #to find the settings file(s)
import csv #to process the settings file(s)
import shutil #to copy the settings file (if needed)
from neo4j import GraphDatabase
import pandas as pd
from geopy.geocoders import Nominatim

In [2]:
#get settings
settings_dir = os.path.join("..","settings")
personal_settings = os.path.join(settings_dir,"personal_settings.csv")
if not "personal_settings.csv" in os.listdir(settings_dir):
    default_settings = os.path.join(settings_dir,"default_settings.csv")
    shutil.copy(default_settings, personal_settings)
    print("Created new personal settings file, this probably needs to be edited before proceeding.")
with open(personal_settings, mode = 'r') as file:
    user_settings = {}
    for line in csv.DictReader(file):
        user_settings[line['setting']] = line['value']
db_uri = "bolt://localhost:" + str(user_settings['port_number'])

In [10]:
#data path
data_output = os.path.join("..","dashboard","data")
os.listdir(data_output)

['temp_data.csv']

In [11]:
db_connection = GraphDatabase.driver(db_uri, auth=(user_settings['username'],user_settings['password']))

In [12]:
db_session = db_connection.session(database=user_settings['db_name'])

In [38]:
#functions to add coordinates to an address
geolocator = Nominatim(user_agent="nl-application")  #initiate external tool to get coordinates from addresses

def add_coordinates(row):  #lambda function to add coordinates to a row in a dataframe
    temp_location = geolocator.geocode(row['address'])
    if(temp_location):
        row['longitude'] = temp_location.longitude
        row['latitude'] = temp_location.latitude
    return row

## Create File Listing NL addresses by leak

In [63]:
query = "MATCH (n:Address) WHERE n.country_codes CONTAINS 'NLD' RETURN n"
query_response = db_session.run(query)
addresses_nl = pd.DataFrame([dict(record.data()['n']) for record in query_response])

In [41]:
addresses_nl = addresses_nl[['address','leak']]

In [42]:
addresses_nl.head(5)

Unnamed: 0,address,leak
0,"10 Langs de Heij Sittard, The Netherlands",Panama Papers
1,31 MAIN STREET EDENHAM; BOURNE LINCS; PE10 OLL,Panama Papers
2,35 Konijnenlaan; Wassenaar; The Netherlands,Panama Papers
3,4 OF GALON STR.; WOLFRATESHOUSEN 82515; THE NE...,Panama Papers
4,5 Konijnenlaan; Wassenaar; The Netherlands,Panama Papers


In [43]:
location = geolocator.geocode(addresses_nl.iloc[0]['address'])
location

Location(10, Langs de Heij, Noord, Sittard, Sittard-Geleen, Limburg, Nederland, 6136 KR, Nederland, (51.017214, 5.8690725, 0.0))

In [29]:
location.longitude

5.8690725

In [30]:
location.latitude

51.017214

In [44]:
temp_row = addresses_nl.iloc[0]
add_coordinates(temp_row)

address      10 Langs de Heij Sittard, The Netherlands
leak                                     Panama Papers
longitude                                     5.869072
latitude                                     51.017214
Name: 0, dtype: object

In [53]:
short_addresses = addresses_nl.iloc[0:100]

In [58]:
short_addresses = short_addresses.apply(add_coordinates, axis=1)

In [59]:
short_addresses

Unnamed: 0,address,latitude,leak,longitude
0,"10 Langs de Heij Sittard, The Netherlands",51.017214,Panama Papers,5.869072
1,31 MAIN STREET EDENHAM; BOURNE LINCS; PE10 OLL,,Panama Papers,
2,35 Konijnenlaan; Wassenaar; The Netherlands,52.124550,Panama Papers,4.368802
3,4 OF GALON STR.; WOLFRATESHOUSEN 82515; THE NE...,,Panama Papers,
4,5 Konijnenlaan; Wassenaar; The Netherlands,52.129700,Panama Papers,4.377273
...,...,...,...,...
95,KAYA RICHARD J BEAUJON CURACAO THE NETHERLANDS,,Panama Papers,
96,Keizersgracht 369B; 1016 EJ Amsterdam; The Net...,52.368339,Panama Papers,4.884904
97,KERKSTRASD 43 A; 2271 CR VOORBURG; THE NETHERL...,,Panama Papers,
98,Klagerstuin 47; 1689; JP Zwaag; Netherlands,52.666846,Panama Papers,5.058582


In [64]:
#addresses_nl.apply(add_coordinates, axis=1)

In [60]:
short_addresses.to_csv(os.path.join(data_output,"addresses_nl.csv"))

In [65]:
#TODO: run on more addresses
#TODO: include more data columns in the addresses data set
#TODO: deal with addresses not found automatically