# Let's geocode some addresses

Geocoding is the process of turning an address -- `'141 Neff Annex, Columbia, MO 65211'` -- into a latitude/longitude coordinate pair -- `(38.9480979,-92.3303756)`. You'd need to do this if you wanted to do some GIS analysis or make an interactive map or whatever.

**Our task**: To geocode the addresses of your internship newsrooms.

**Our data** can be found here: `../../data/djnf_data_2018.csv`.

For this task, we shall use a library called [`geopy`](https://geopy.readthedocs.io/en/stable/) with an open-source geocoding engine called [OpenCage](https://geocoder.opencagedata.com/api). Before setting up this script, [I signed up for an API key](https://geocoder.opencagedata.com/users/sign_up) and saved that key as an [environment variable](https://en.wikipedia.org/wiki/Environment_variable) on this machine. 

The `geopy` library allows you to use other geocoding engines, including Google's, which is (in my experience) the most accurate. But [recent changes to its service](https://cloud.google.com/maps-platform/user-guide/account-changes/#no-plan) mean that you now must supply a credit card when you apply for an API key, which is why we're not using it here, but it's worth exploring.

We'll also use our friend `pandas` and a couple of built-in Python modules: [`os`](https://docs.python.org/3/library/os.html) (to access the API key) and [`time`](https://docs.python.org/3/library/time.html) (to pause between API requests).

Our steps:
1. Import the modules we need
2. Grab the environment variable that your computer is keeping track of -- a key that will allow us to use the OpenCage API
2. Set up the geocoder
3. Read the data into a pandas dataframe
4. Write a short function to geocode an address and return the coordinates
5. Apply that function to each row of our dataframe

### 1. Import the modules we need

In [None]:
# import time, os, pandas, opencage geocoder from geopy


### 2. Grab the environment variable

`os.environ` is a dictionary of environment variables on your computer.

In [None]:
# nab the opencage geocoder key from your computer's environment


### 3. Set up the geocoder

In [None]:
# set up the geocoder


### 4. Read the data into pandas

In [None]:
# read in the DJNF intern data csv


In [None]:
# check the output with `head()`


### 5. Write a function to process one row of data

Our function will accept one row of data in our data frame. It will take the value in the "address" field and geocode it using the geocoder object we created earlier. It will grab the `latitude` and `longitude` values from the object that gets returned. It will print the original address and these coordinates, pause ("sleep") for two seconds -- this is to avoid getting banned by OpenCage -- then return the coordinates.

An `if/else` branch handles cases where the geocoder is unsuccessful -- in that case, it returns `(0, 0)`.

![0,0](../../img/middle-of-the-world.png "0,0")

👉 For more details on functions, see [this notebook](../../reference/Functions.ipynb).

👉 For more details on if statements, see [this notebook](../../reference/Python%20data%20types%20and%20basic%20syntax.ipynb#if-statements).

In [None]:
# define the geocoding function

    '''given a row of data, geocode the address and return the coordinates'''
    
    # geocode the value in the `address` column

    
    # make sure it worked with an if/else branch

        # get the lat/lng coordinates


        # print to show us that it's working

    # if not, what do we do?

        # print a message showing that it ain't work
        # set the coordinates to the eastern atlantic (0, 0)

    
    # pause for 2 seconds


    # return the coordinates


### 6.  this function to each row in the dataframe

... using the [`apply()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html) method, specifying `axis=1` to tell pandas that we want to apply this function to each _row_ of data.

In [None]:
# print a message letting us know what we're up to at the outset


# apply the function across the data into a new column


In [None]:
# check the output with `head()`
