# Let's geocode some addresses

Geocoding is the process of turning an address -- `'141 Neff Annex, Columbia, MO 65211'` -- into a latitude/longitude coordinate pair -- `(38.9480979,-92.3303756)`. You'd need to do this if you wanted to do some GIS analysis or make an interactive map or whatever.

**Our task**: To geocode the addresses of your internship newsrooms.

**Our data** can be found here: `../../data/djnf_data_2018.csv`.

For this task, we shall use a library called [`geopy`](https://geopy.readthedocs.io/en/stable/) with an open-source geocoding engine called [OpenCage](https://geocoder.opencagedata.com/api). Before setting up this script, [I signed up for an API key](https://geocoder.opencagedata.com/users/sign_up) and saved that key as an [environment variable](https://en.wikipedia.org/wiki/Environment_variable) on this machine. 

The `geopy` library allows you to use other geocoding engines, including Google's, which is (in my experience) the most accurate. But [recent changes to its service](https://cloud.google.com/maps-platform/user-guide/account-changes/#no-plan) mean that you now must supply a credit card when you apply for an API key, which is why we're not using it here, but it's worth exploring.

We'll also use our friend `pandas` and a couple of built-in Python modules: [`os`](https://docs.python.org/3/library/os.html) (to access the API key) and [`time`](https://docs.python.org/3/library/time.html) (to pause between API requests).

Our steps:
1. Import the modules we need
2. Grab the environment variable that your computer is keeping track of -- a key that will allow us to use the OpenCage API
2. Set up the geocoder
3. Read the data into a pandas dataframe
4. Write a short function to geocode an address and return the coordinates
5. Apply that function to each row of our dataframe

### 1. Import the modules we need

In [9]:
import time
import os

import pandas as pd
from geopy.geocoders import OpenCage

### 2. Grab the environment variable

`os.environ` is a dictionary of environment variables on your computer.

In [10]:
OPENCAGE_GEOCODE_KEY = os.environ.get('OPENCAGE_GEOCODE_KEY')

### 3. Set up the geocoder

In [11]:
geolocator = OpenCage(api_key=OPENCAGE_GEOCODE_KEY)

### 4. Read the data into pandas

In [12]:
intern_data = pd.read_csv('../../data/djnf_data_2018.csv')

In [13]:
intern_data.head()

Unnamed: 0,intern,university,site,address
0,Orlaith McCaffrey,Binghamton University,Wall Street Journal,"1211 6th Ave, New York, NY 10036"
1,Ravinarayanan Lakshmanan,Columbia University,Financial Planning,"1 State St. Plaza, 27th floor, New York NY 10004"
2,Adriana Navarro,Ohio University,AccuWeather,"385 Science Park Road, State College, PA 16801"
3,Nishant Mohan,University of Idaho,The Wall Street Journal,"1211 Avenue of the Americas, New York NY 10036"
4,Lindsay Huth,"University of Maryland, College Park",Kansas City Public Television (KCPT),"125 E. 31st St. Kansas City, MO 64108"


### 5. Write a function to process one row of data

Our function will accept one row of data in our data frame. It will take the value in the "address" field and geocode it using the geocoder object we created earlier. It will grab the `latitude` and `longitude` values from the object that gets returned. It will print the original address and these coordinates, pause ("sleep") for two seconds -- this is to avoid getting banned by OpenCage -- then return the coordinates.

An `if/else` branch handles cases where the geocoder is unsuccessful -- in that case, it returns `(0, 0)`.

![0,0](../../img/middle-of-the-world.png "0,0")

👉 For more details on functions, see [this notebook](../../reference/Functions.ipynb).

👉 For more details on if statements, see [this notebook](../../reference/Python%20data%20types%20and%20basic%20syntax.ipynb#if-statements).

In [14]:
def get_coordinates(row):
    '''given a row of data, geocode the address and return the coordinates'''
    
    # geocode the value in the `address` column
    location = geolocator.geocode(row['address'])
    
    # make sure it worked with an if/else branch
    if location:
        # get the lat/lng coordinates
        coords = (location.latitude, location.longitude)

        # print to show us that it's working
        print(row['address'], coords)
    else:
        print(row['address'], 'could not be geocoded.')
        coords = (0, 0)
    
    # pause for 2 seconds
    time.sleep(2)

    # return the coordinates
    return coords

### 6.  this function to each row in the dataframe

... using the [`apply()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html) method, specifying `axis=1` to tell pandas that we want to apply this function to each _row_ of data.

In [15]:
print('Geocoding {} records ...'.format(len(intern_data)))

intern_data['coords'] = intern_data.apply(get_coordinates, axis=1)

Geocoding 12 records ...
1211 6th Ave, New York, NY 10036 (40.758469, -73.9819579)
1 State St. Plaza, 27th floor, New York NY 10004 (40.6964, -74.0253)
385 Science Park Road, State College, PA 16801 (40.7925, -77.8523)
1211 Avenue of the Americas, New York NY 10036 (40.7586694, -73.9823756)
125 E. 31st St. Kansas City, MO 64108 (39.0837, -94.5868)
919 Congress Ave., 6th Floor Austin TX 78701 (30.2713, -97.7426)
4400 Massachusetts Ave., N.W. Washington DC 20016 (38.9381, -77.086)
7950 Jones Branch Drive McLean, VA 22108 (38.9325446, -77.2178005)
919 Congress Ave., 6th Floor Austin TX 78701 (30.2713, -97.7426)
1201 K St., Suite 1200 Sacramento, CA 95814 (38.5804, -121.4922)
36 Russ St., Hartford, CT 06106 (41.761349, -72.683941)
2340 Eighth Avenue, New York NY 10027 (43.216233, -74.1823014)


In [16]:
intern_data.head()

Unnamed: 0,intern,university,site,address,coords
0,Orlaith McCaffrey,Binghamton University,Wall Street Journal,"1211 6th Ave, New York, NY 10036","(40.758469, -73.9819579)"
1,Ravinarayanan Lakshmanan,Columbia University,Financial Planning,"1 State St. Plaza, 27th floor, New York NY 10004","(40.6964, -74.0253)"
2,Adriana Navarro,Ohio University,AccuWeather,"385 Science Park Road, State College, PA 16801","(40.7925, -77.8523)"
3,Nishant Mohan,University of Idaho,The Wall Street Journal,"1211 Avenue of the Americas, New York NY 10036","(40.7586694, -73.9823756)"
4,Lindsay Huth,"University of Maryland, College Park",Kansas City Public Television (KCPT),"125 E. 31st St. Kansas City, MO 64108","(39.0837, -94.5868)"
