# Elevation Requests

This short notebook will outline the numbers of missing elevations from the data set and execute api requests from Google to retrieve all the elevation data it can.

As Google puts limits on (or charges for) certain numbers of requests, this notebook is not meant to be run multiple times. The data will be retrieved and stored in an alternate csv file, which will be opened in the regular project notebook.

The resulting dataframe will have one more column than the original. In the project notebook, the original "gps_height" column will be dropped in favor of using the "elevation" column created here.

In [1]:
import pandas as pd
import json
from pathlib import Path
import requests

In [2]:
# retrieve the api key from secret local folder
with open("/Users/stubbletrouble/.secret/googlemaps_api.json") as f:
    api_key = json.load(f)['api_key']

# use this site in a future function
url_stem = 'https://maps.googleapis.com/maps/api/elevation/json'

# open the file we're going to modify
df = pd.read_csv('../data/training_set_values.csv')

Below, we will explore what data is missing, both from the larger set and from just the set of wells funded by the government.

In [3]:
# convert this column to lowercase in order to make searching simpler
df['funder'] = df['funder'].str.lower()
# change any label of 'tanzania' to 'government of tanzania'
df['funder'] = df['funder'].apply(lambda x: 'government of tanzania' if x == 'tanzania' else x)
# change any label of 'tanzania' to 'government of tanzania'
df['funder'] = df['funder'].apply(lambda x: 'government of tanzania' if x == 'tanza' else x)

total = len(df)
total_missing_elev = len(df[df.gps_height == 0])
total_irretrievable_elev = len(df[(df.longitude == 0) & (df.gps_height == 0)])
gov = len(df[df.funder == 'government of tanzania'])
gov_missing_elev = len(df[(df.gps_height == 0) & (df.funder == 'government of tanzania')])
gov_irretrievable_elev = len(df[(df.longitude == 0) & (df.gps_height == 0) & (df.funder == 'government of tanzania')])

print('There are', total, 'records.\n')
print(total_missing_elev, 'are missing elevation, and', total_irretrievable_elev, 'of those are irretrievable.\n')
print('There are', gov, 'records with the government as funder.\n')
print(gov_missing_elev, 'are missing elevation, and', gov_irretrievable_elev, 'of those are irretrievable.')

There are 59400 records.

20438 are missing elevation, and 1812 of those are irretrievable.

There are 9190 records with the government as funder.

2607 are missing elevation, and 242 of those are irretrievable.


Now we're ready to retrieve and add the data.

In [4]:
# this function will make an api request for missing elevation values
def get_elev(long, lat, elev, funder):
    # we won't bother with records for other funders
    if funder != 'government of tanzania':
        return elev
    # if longitude is zero, then we know this elevation record is irretrievable and we can do nothing
    elif long == 0:
        return elev
    # this will guide the request
    else:
        url = "{}?locations={}%2C{}&key={}".format(url_stem, long, lat, api_key)
        response = requests.request("GET", url, headers={}, data={})
        return response.json()['results'][0]['elevation']

In [5]:
# add a feature with all the elevation values that were available
df['elevation'] = df.apply(lambda x: int(get_elev(x.longitude, x.latitude, x.gps_height, x.funder)), axis=1)

In [6]:
# write the dataframe with new elevation data to a new csv file
filepath = Path('../data/vals_with_elevation.csv')  
filepath.parent.mkdir(parents=True, exist_ok=True)  
df.to_csv(filepath)