# Let's geocode some addresses

Geocoding is the process of turning an address -- `'141 Neff Annex, Columbia, MO 65211'` -- into a latitude/longitude coordinate pair -- `(38.9480979,-92.3303756)`. You'd need to do this if you wanted to do some GIS analysis or make an interactive map or whatever.

**Our task**: To geocode the addresses of your internship newsrooms.

**Our data** can be found here: `../../data/djnf_data_2018.csv`.

For this task, we shall use a library called [`geopy`](https://geopy.readthedocs.io/en/stable/). We'll also use our friend `pandas` and a couple of built-in Python modules: [`time`](https://docs.python.org/3/library/time.html) and [`os`](https://docs.python.org/3/library/os.html). Our steps:
- Import the libraries we need
- Grab an [environment variable](https://en.wikipedia.org/wiki/Environment_variable) that your computer is keeping track of -- a key that will allow us to use the Google Geocoder API
- Read the data into a pandas dataframe
- Write a short function to geocode an address and return the coordinates
- Apply that function to each row of our dataframe

### Import modules

In [None]:
import time
import os

import pandas as pd
from geopy.geocoders import GoogleV3

### Grab the geocoder API key and set up the geocoder

I followed [these instructions](https://developers.google.com/maps/documentation/geocoding/get-api-key) to get an API key. Then I saved that key as an environmental variable on my computer called `GOOGLE_GEOCODE_KEY`).

Your computer should have this variable already set.

In [None]:
GOOGLE_GEO_API_KEY = os.environ.get('GOOGLE_GEOCODE_KEY')

In [None]:
# Make a geolocator object
# Set the `timeout` keyword argument to 5 (seconds)
# https://geopy.readthedocs.io/en/stable/#geopy.geocoders.GoogleV3
geolocator = GoogleV3(api_key=GOOGLE_GEO_API_KEY, timeout=5)

### Read in the data

In [None]:
intern_data = pd.read_csv('../../data/djnf_data_2018.csv')

In [None]:
intern_data.head()

### Write a function to process one row of data

Our function will accept one row of data in our data frame. It will take the value in the "address" field and geocode it using the geocoder object we created earlier. It will grab the `latitude` and `longitude` values from the object that gets returned. It will print the original address and these coordinates, pause ("sleep") for two seconds -- this is to avoid getting banned by Google -- then return the coordinates.

👉 For more details on functions, see [this notebook](../../reference/Functions.ipynb).

In [None]:
def get_coordinates(row):
    '''given a row of data, geocode the address and return the coordinates'''
    
    # geocode the value in the `address` column
    location = geolocator.geocode(row['address'])
    
    # get the lat/lng coordinates
    coords = (location.latitude, location.longitude)
    
    # print to show us that it's working
    print(row['address'], coords)
    
    # pause for 2 seconds
    time.sleep(2)

    # return the coordinates
    return coords

### Apply this function to each row in the dataframe

... using the [`apply()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html) method, specifying `axis=1` to tell pandas that we want to apply this function to each _row_ of data.

In [None]:
print('Geocoding {} records ...'.format(len(intern_data)))

intern_data['coords'] = intern_data.apply(get_coordinates, axis=1)

In [None]:
intern_data.head()