Linking IDs
==

The goal with this notebook is to look at the existing database of stations, and create a new column with the google place ids for each station. This is done using the google nearby search api and a request for each station.

In [3]:
import config
from urllib.parse import urlencode
import requests
import pandas as pd
import json

Fetch existing station data. The station lat & lon will be used to try and find the google place for that station

In [25]:
stations = pd.read_sql("stations", config.CONNECTION_STRING);
stations = stations.set_index("number")
stations

Unnamed: 0_level_0,name,address,latitude,longitude
number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
42,SMITHFIELD NORTH,Smithfield North,53.349562,-6.278198
30,PARNELL SQUARE NORTH,Parnell Square North,53.353462,-6.265305
54,CLONMEL STREET,Clonmel Street,53.336021,-6.262980
108,AVONDALE ROAD,Avondale Road,53.359405,-6.276142
56,MOUNT STREET LOWER,Mount Street Lower,53.337960,-6.241530
...,...,...,...,...
39,WILTON TERRACE,Wilton Terrace,53.332383,-6.252717
83,EMMET ROAD,Emmet Road,53.340714,-6.308191
92,HEUSTON BRIDGE (NORTH),Heuston Bridge (North),53.347802,-6.292432
21,LEINSTER STREET SOUTH,Leinster Street South,53.342180,-6.254485


I imagine there is an easier way than this shorthand...

In [28]:
def station_value(number, field):
    return stations[stations.index == number][field].values[0]

station_value(88, "name")

'BLACKHALL PLACE'

For each station, search for nearby google places. Check that the place matches, i.e. that the name, address and place type are all as expected. Store the place IDs.

Also, check for duplicates, in case the above isn't enough to uniquely specify a place.
(It actually was enough, this step could have been omitted)

In [88]:
def check_entry(entry, station_number):
    
    if entry["name"] != "Dublinbikes":
        print("Unexpected entry name")
        print("Expected: Dublinbikes")
        print("Actual:", entry["name"])
        return False
    
    expected_vicinity = station_value(station_number, "name").lower()
    actual_vicinity = entry["vicinity"].lower()
    
    vicinity_pass = expected_vicinity in actual_vicinity
    
    if not vicinity_pass:
        
        exceptions = [
            ["rd", "road"],
            ["haroadwicke", "hardwicke"], # I think this is a typo on JC's part
            ["st", "street"]
        ]

        for item, replacement in exceptions:
            actual_vicinity = actual_vicinity.replace(item, replacement)
            
        vicinity_pass = expected_vicinity in actual_vicinity
    
    if not vicinity_pass:
        print("Entry vicinity not recognized")
        print("Expected vicinity to contain:", expected_vicinity)
        print("Actual:", actual_vicinity)
        return False
    
    if entry["types"] != ['point_of_interest', 'establishment']:
        print("Unexpected entry types")
        print("Entry types:", entry["types"])
        return False
    
    return True


def parse_response(response, station_number):
    
    entries = response["results"]
    if len(entries) == 0:
        print("No api results")
        return
    
    if check_entry(entries[0], station_number):
        
        print("PASS (Tentative)")
        
        print("Checking alternatives:")
        for entry in entries[1:]:
            if check_entry(entry, station_number):
                print("Duplicate suitable entries")
                return None
        
        print("No Duplicates")
        print("PASS (Confirmed)")
        
        return entries[0]["place_id"]
    
    else:
        
        print("FAIL")
    
    
def query_api():
    
    URL = "https://maps.googleapis.com/maps/api/place/nearbysearch/json"

    params = {
        "key": config.GCP_API_KEY,
        "keyword": "Dublinbikes",
        "radius": 2
    }
    
    results = list()
    
    for number in stations.index:
        
        print("Querying for station number: ", number)
        
        lat = station_value(number, "latitude")
        lon = station_value(number, "longitude")
        
        # Encode parameters to fit required format
        params["location"] = f"{lat},{lon}"
        params_encoded = urlencode(params)

        response = requests.get(f"{URL}?{params_encoded}")
        result = parse_response(response.json(), number)
        
        results.append(result)
        
        if response is None:
            print("Query failed")
        
        print()
            
    return results
        

This takes quite a while to run, and outputs lots of debug test.

In [89]:
#results = query_api()

Print out the results. By checking which entries are None, and using the debug text above, identify issues and correct them in the query & checking code.

In the end, as of writing this, the only issue is that some of the address names don't quite match up. (Things like "rd" vs "road")
Adding some exceptions to the check function to allow for these solved that.

In [91]:
for n, v in zip(stations.index, results):
    print(n, "-", v)

42 - ChIJl6nGmC4MZ0gRq6m5nCcTmCY
30 - ChIJpzSJ4YAOZ0gRcS_0fRVA1vw
54 - ChIJWx42l_UPZ0gRqDH4gKKc-1E
108 - ChIJ__CpR9QNZ0gRTXOjB2BzBhE
56 - ChIJU9evBpUOZ0gR2TNeEdKoEf8
6 - ChIJZZ2zvicMZ0gRuoXtGdylX2g
18 - ChIJc8g9AaAOZ0gRYRbEdVADDdQ
32 - ChIJebuNs5EOZ0gR2Qxq5KKDY3c
52 - ChIJDaMSk54OZ0gRPE5WS70vVTc
48 - ChIJi1CmCo0OZ0gR_eHhaTuaehc
13 - ChIJBxpeGpgOZ0gRiU9p0tIESwI
43 - ChIJtXKNYB4MZ0gRa3Ppb3v5En0
31 - ChIJIwZbY4EOZ0gRURYyCk07XAU
98 - ChIJf8EUNZoOZ0gRYK0XymT6q0g
23 - ChIJ03peUI4OZ0gRJNCRMlgbOMI
106 - ChIJDzge62INZ0gRIkq3fKokrjE
112 - ChIJV_dvFGIOZ0gRD_8GpooXZHQ
68 - ChIJkdnFAe0OZ0gRmjD_HV4iv_4
74 - ChIJuTGlWyUMZ0gRSrk2-b0CIWg
87 - ChIJGXJoujEMZ0gRaVKWHprVhiI
84 - ChIJ_6Ijx0YMZ0gRdrjPSqb3k0Q
90 - ChIJKUnILOwOZ0gR3tGW7SfGroM
11 - ChIJD8uzA6IOZ0gRvRwSC14fgFY
17 - ChIJAyXEXicMZ0gRYC-VnODqKR4
45 - ChIJa03uq4gOZ0gRDszTCWwb7Jo
114 - ChIJW9bFC70OZ0gRj5TIF2L1AEA
72 - ChIJ96CuvSUMZ0gRzQL0R7BTCtM
63 - ChIJf8O3zpsPZ0gRKA5WsA5x6l8
113 - ChIJY4Uw-5YOZ0gRFBucobh9Jyc
91 - ChIJi_POs-4OZ0gRL6_IWmTgIwA
99 - C

In [77]:
stations["google_place_id"] = results

In [78]:
stations

Unnamed: 0_level_0,name,address,latitude,longitude,google_place_id
number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
42,SMITHFIELD NORTH,Smithfield North,53.349562,-6.278198,ChIJl6nGmC4MZ0gRq6m5nCcTmCY
30,PARNELL SQUARE NORTH,Parnell Square North,53.353462,-6.265305,ChIJpzSJ4YAOZ0gRcS_0fRVA1vw
54,CLONMEL STREET,Clonmel Street,53.336021,-6.262980,ChIJWx42l_UPZ0gRqDH4gKKc-1E
108,AVONDALE ROAD,Avondale Road,53.359405,-6.276142,ChIJ__CpR9QNZ0gRTXOjB2BzBhE
56,MOUNT STREET LOWER,Mount Street Lower,53.337960,-6.241530,ChIJU9evBpUOZ0gR2TNeEdKoEf8
...,...,...,...,...,...
39,WILTON TERRACE,Wilton Terrace,53.332383,-6.252717,ChIJVwWa36IOZ0gRgl0BvRxLWOI
83,EMMET ROAD,Emmet Road,53.340714,-6.308191,ChIJg3ZvJEUMZ0gRiYTQYicmPz8
92,HEUSTON BRIDGE (NORTH),Heuston Bridge (North),53.347802,-6.292432,ChIJKUXe7DYMZ0gR2HthL72fBTc
21,LEINSTER STREET SOUTH,Leinster Street South,53.342180,-6.254485,ChIJX2c5bpoOZ0gRbYbhn1Btnw0


Finally, commit the results to the database. 

In [86]:
# stations.to_sql("stations", config.CONNECTION_STRING, if_exists="replace")