### Place Name Lookup with Geocoding API
    Project: FamilySearch Tests
    Author:  Benedikt Graf
    Version: 10-20-2020

I am using Google Maps' Geocoding API to identify the country and county (federal state) of a place observation (long string). <BR>
These data are used to identify the place (county) of birth and death of individuals for a study of transgenerational health effects. <BR>
I included a sample of 1000 querries in the "places_API" file. 

#### Importing Modules

In [2]:
import numpy # for arrays
import pandas # for panel data
import requests # for handling API requests
from urllib.parse import urlencode # to convert a string to a URL
import json # working with json files
import geocoder # Simple and consistent geocoding library
import os # miscellaneous operating system interfaces
from IPython.display import display, HTML # to adjust display preferences

In [3]:
CSS = """
.output {
    flex-direction: row;
}
"""

HTML('<style>{}</style>'.format(CSS))

#### Importing the places16 Data

In [19]:
os.getcwd() # current working directory
places16 = pandas.read_stata("G:\\My Drive\\Research\places16corrected.dta") # importing place name data
places16 = places16.rename(columns={"country": "country_corrected", "county": "county_corrected"}) # renaming some columns
places16_sample_wcorr = places16.sample(1000) # this creates a sample from places16
places16_sample = places16_sample_wcorr.drop(columns=['country_corrected', 'county_corrected']) 
# this drops the existing country/county variable in the sample which simplifies sending it to Google Maps
display(places16_sample.head())
display(places16_sample_wcorr.head())

Unnamed: 0,place
12656,Blixtorpsbacke: Fridene: Skaraborg: Sweden
50902,Kafvsjmla Sdragrd: lmeboda: Kronoberg: Sweden
54016,Kllbyn: Varmlands: Sweden
122336,lebckss: Ljuder: Kronoberg: Sweden
102129,Stocken: Morlanda: Gteborg och Bohus: Sweden


Unnamed: 0,place,country_corrected,county_corrected
12656,Blixtorpsbacke: Fridene: Skaraborg: Sweden,Sweden,Skaraborg
50902,Kafvsjmla Sdragrd: lmeboda: Kronoberg: Sweden,Sweden,Kronoberg
54016,Kllbyn: Varmlands: Sweden,Sweden,Varmland
122336,lebckss: Ljuder: Kronoberg: Sweden,Sweden,Kronoberg
102129,Stocken: Morlanda: Gteborg och Bohus: Sweden,Sweden,Goteborg och Bohus


#### Geocoding places16
I am using the geocoder wrapper instead of my own code (below), because it is much more efficient. My own code woks too though!

In [23]:
def geocode_place(address):
    result = geocoder.google(address, key="AIzaSyAFUyJXaOXu4wvSahptJTDvJTU_mM8gugw")
    return result.country_long, result.state_long

In [34]:
places16_sample["country_API"] = places16_sample.apply(lambda row: geocode_place(row), axis=1) 
# this applies the geocode_place function to all rows in the sample dataset

#### Data Cleaning
Because every inquiry costs, we don't want to access the API seperatley for country and county. <br> 
The function above returns the country and county as as series, and I split it into two columns below. 

In [25]:
places16_sample

Unnamed: 0,place,country_API
12656,Blixtorpsbacke: Fridene: Skaraborg: Sweden,"(Sweden, Västra Götaland County)"
50902,Kafvsjmla Sdragrd: lmeboda: Kronoberg: Sweden,"(Sweden, Kronoberg County)"
54016,Kllbyn: Varmlands: Sweden,"(Sweden, Varmland County)"
122336,lebckss: Ljuder: Kronoberg: Sweden,"(Sweden, Kronoberg County)"
102129,Stocken: Morlanda: Gteborg och Bohus: Sweden,"(Sweden, Västra Götaland County)"
40320,Hammond: Olmsted: Minnesota,"(United States, Minnesota)"
5660,Arendal: Halland: Swed.,"(Sweden, Västra Götaland County)"
96084,Skjulerod: Hogdal: G&B: Sweden,"(Sweden, Västra Götaland County)"
26844,Eneby: Stockholm: Sweden,"(Sweden, Stockholm County)"
38483,Gunnismark: Vsterbotten: Sweden,"(Sweden, Västerbotten County)"


In [35]:
places16_sample["country_API"] = places16_sample["country_API"].astype(str)
places16_sample.head()
# change data type to string

Unnamed: 0,place,country_API
12656,Blixtorpsbacke: Fridene: Skaraborg: Sweden,"('Sweden', 'Västra Götaland County')"
50902,Kafvsjmla Sdragrd: lmeboda: Kronoberg: Sweden,"('Sweden', 'Kronoberg County')"
54016,Kllbyn: Varmlands: Sweden,"('Sweden', 'Varmland County')"
122336,lebckss: Ljuder: Kronoberg: Sweden,"('Sweden', 'Kronoberg County')"
102129,Stocken: Morlanda: Gteborg och Bohus: Sweden,"('Sweden', 'Västra Götaland County')"


In [None]:
#places16_sample["country_API"] = places16_sample["country_API"].map(lambda x: x.lstrip('(').rstrip(')'))

In [37]:
# split the country/county variable into two columns based on the comma
split_cc =  places16_sample.country_API.str.split(",",expand=True)
places16_sample["country_API"] = split_cc[0]
places16_sample["county_API"]  = split_cc[1]

In [38]:
places16_sample

Unnamed: 0,place,country_API,county_API
12656,Blixtorpsbacke: Fridene: Skaraborg: Sweden,('Sweden','Västra Götaland County')
50902,Kafvsjmla Sdragrd: lmeboda: Kronoberg: Sweden,('Sweden','Kronoberg County')
54016,Kllbyn: Varmlands: Sweden,('Sweden','Varmland County')
122336,lebckss: Ljuder: Kronoberg: Sweden,('Sweden','Kronoberg County')
102129,Stocken: Morlanda: Gteborg och Bohus: Sweden,('Sweden','Västra Götaland County')
40320,Hammond: Olmsted: Minnesota,('United States','Minnesota')
5660,Arendal: Halland: Swed.,('Sweden','Västra Götaland County')
96084,Skjulerod: Hogdal: G&B: Sweden,('Sweden','Västra Götaland County')
26844,Eneby: Stockholm: Sweden,('Sweden','Stockholm County')
38483,Gunnismark: Vsterbotten: Sweden,('Sweden','Västerbotten County')


In [39]:
places16_sample = places16_sample.applymap(lambda x: x.replace("'", ""))
places16_sample = places16_sample.applymap(lambda x: x.replace("(", ""))
places16_sample = places16_sample.applymap(lambda x: x.replace(")", ""))
# remove unwanted characters (parentheses, quotation marks)

In [40]:
places16_sample

Unnamed: 0,place,country_API,county_API
12656,Blixtorpsbacke: Fridene: Skaraborg: Sweden,Sweden,Västra Götaland County
50902,Kafvsjmla Sdragrd: lmeboda: Kronoberg: Sweden,Sweden,Kronoberg County
54016,Kllbyn: Varmlands: Sweden,Sweden,Varmland County
122336,lebckss: Ljuder: Kronoberg: Sweden,Sweden,Kronoberg County
102129,Stocken: Morlanda: Gteborg och Bohus: Sweden,Sweden,Västra Götaland County
40320,Hammond: Olmsted: Minnesota,United States,Minnesota
5660,Arendal: Halland: Swed.,Sweden,Västra Götaland County
96084,Skjulerod: Hogdal: G&B: Sweden,Sweden,Västra Götaland County
26844,Eneby: Stockholm: Sweden,Sweden,Stockholm County
38483,Gunnismark: Vsterbotten: Sweden,Sweden,Västerbotten County


#### Merging with Corrected places16
Here, I am merging the API results with the corrected variables for comparison. 

In [42]:
cols_to_use = places16_sample.columns.difference(places16_sample_wcorr.columns)
merged_places16 = places16_sample_wcorr.merge(places16_sample[cols_to_use], left_index=True, right_index=True)
merged_places16

Unnamed: 0,place,country_corrected,county_corrected,country_API,county_API
12656,Blixtorpsbacke: Fridene: Skaraborg: Sweden,Sweden,Skaraborg,Sweden,Västra Götaland County
50902,Kafvsjmla Sdragrd: lmeboda: Kronoberg: Sweden,Sweden,Kronoberg,Sweden,Kronoberg County
54016,Kllbyn: Varmlands: Sweden,Sweden,Varmland,Sweden,Varmland County
122336,lebckss: Ljuder: Kronoberg: Sweden,Sweden,Kronoberg,Sweden,Kronoberg County
102129,Stocken: Morlanda: Gteborg och Bohus: Sweden,Sweden,Goteborg och Bohus,Sweden,Västra Götaland County
40320,Hammond: Olmsted: Minnesota,US,,United States,Minnesota
5660,Arendal: Halland: Swed.,Sweden,Halland,Sweden,Västra Götaland County
96084,Skjulerod: Hogdal: G&B: Sweden,Sweden,Goteborg och Bohus,Sweden,Västra Götaland County
26844,Eneby: Stockholm: Sweden,Sweden,Stockholm,Sweden,Stockholm County
38483,Gunnismark: Vsterbotten: Sweden,Sweden,Vasterbotten,Sweden,Västerbotten County


#### Exporting the New Dataframe

In [2]:
merged_places16.to_csv("places_API", index=True)

NameError: name 'merged_places16' is not defined

#### My Code
This is the code I wrote, but am no longer using. 

In [15]:
# this is a sample dataframe I used for testing the function, please ignore
sample_df = pandas.DataFrame({'place':["Bunche Hall 315 Portola Plaza Los Angeles"]})

In [16]:
# this is the base code that I use for the funcion below, please ignore
api_key = "AIzaSyAFUyJXaOXu4wvSahptJTDvJTU_mM8gugw"
data_type = "json"
endpoint = f"https://maps.googleapis.com/maps/api/geocode/{data_type}"
params = {"address": "Bunche Hall 315 Portola Plaza Los Angeles", "key": api_key}
url_params = urlencode(params)
url = f"{endpoint}?{url_params}"
print(url)

https://maps.googleapis.com/maps/api/geocode/json?address=Bunche+Hall+315+Portola+Plaza+Los+Angeles&key=AIzaSyAFUyJXaOXu4wvSahptJTDvJTU_mM8gugw


In [19]:
# this is the function I made for extracting the json file
# I stopped using this because the geocoder module is more efficient with wthe json indeces
def get_place(address, data_type = "json"):
    api_key = "AIzaSyAFUyJXaOXu4wvSahptJTDvJTU_mM8gugw"
    endpoint = f"https://maps.googleapis.com/maps/api/geocode/{data_type}"
    params = {"address": address, "key": api_key}
    url_params = urlencode(params)
    url = f"{endpoint}?{url_params}"
    r = requests.get(url)
    if r.status_code not in range(200,299):
        return {}        
    else:
        return  r.json()["results"][0]["address_components"][6]["long_name"],
                r.json()["results"][0]["address_components"][7]["long_name"]

In [20]:
get_place("Bunche Hall 315 Portola Plaza Los Angeles")

('California', 'United States')