# Geocoding data with Python and Open Street Map
## With this example you will also learn how to read and save csv files
In this section we will geocode the data we have collected from twitter using the [Twitter interface](https://jhub.aup.edu/hub/user-redirect/git-pull?repo=https://github.com/aup-cs1091/MyClassNotebooks&branch=master&subPath=TwitterInterface) and cleaned with a [simple data cleaning procedure](https://jhub.aup.edu/hub/user-redirect/git-pull?repo=https://github.com/aup-cs1091/MyClassNotebooks&branch=master&subPath=CleaningData). You can find the clean data in the same folder as this file.
In order to understand this code, you should review the [section on geocoding with Open Stree Map](https://jhub.aup.edu/hub/user-redirect/git-pull?repo=https://github.com/aup-cs1091/MyClassNotebooks&branch=master&subPath=geocoding/SimpleGeocodingWithOpenStreetMap.ipynb) and the section on [Pandas DataFrames](https://jhub.aup.edu/hub/user-redirect/git-pull?repo=https://github.com/aup-cs1091/MyClassNotebooks&branch=master&subPath=PandasDataFrames.ipynb), in particular the last example showing the use of lambda functions.

In [88]:
import pandas as pd
import requests
import json

# url to access geolocation data
url = 'https://nominatim.openstreetmap.org/search'

In [89]:
# Read the csv file with the clean Twitter data
df=pd.read_csv('cleanDataFromTwitter.txt')
df.head()

Unnamed: 0.1,Unnamed: 0,screen_name,name,followers_count,location
0,0,TuckerCarlson,Tucker Carlson,1064787,"Washington, DC"
1,3,WhiteHouse,The White House,15327283,"Washington, DC"
2,5,KellyannePolls,Kellyanne Conway,1664393,"Washington, DC"
3,6,Reince,Reince Priebus,941927,"Kenosha, WI"
4,10,RealRomaDowney,Roma Downey,192126,Malibu


### We have seen that the following code allows us to query OpenStreetMap
![alt text](geocodingWithOpenStreetMap.png "OpenStreetMap geocoding example")
### Following the example of this code, we create two lambda functions that we will apply to each location in the dataFrame to geocode each one of the locations

In [90]:
# Based on the code we have seen in geocoding, we define two lambda functions to access latitude and longitude
getLat = lambda x: requests.get(url, params={'q': x, 'format': 'json'}).json()[0]['lat']
getLon = lambda x: requests.get(url, params={'q': x, 'format': 'json'}).json()[0]['lon']

In [91]:
# Example of how getLat works
getLat('Washington, DC')

'38.8950092'

In [92]:
# Example of how getLon works
getLon('Washington, DC')

'-77.0365625'

### Before applying the lambda functions to the clean twitter data, we verify that it works on a simple DataFrame

In [93]:
# We build the dataFrame
U_Games={'year':[2016, 2016, 2017, 2017, 2018, 2018, 2019],
        'semester': ['Spring', 'fall', 'Spring', 'fall', 'Spring', 'fall', 'Spring'],
        'location': ['Washington, DC', 'Kenosha, WI', 'Malibu', 'New York, NY', 'Washington, DC', 'Kenosha, WI', 'Malibu']}
df1 = pd.DataFrame(data=U_Games)
df1

Unnamed: 0,year,semester,location
0,2016,Spring,"Washington, DC"
1,2016,fall,"Kenosha, WI"
2,2017,Spring,Malibu
3,2017,fall,"New York, NY"
4,2018,Spring,"Washington, DC"
5,2018,fall,"Kenosha, WI"
6,2019,Spring,Malibu


In [94]:
# We apply the lambda function getLat to the column location
df1['location'].apply(getLat)

# an alternative way to do the same thing
#df1['location'].map(getLat)

0    38.8950092
1    42.5846773
2     34.035591
3    40.7308619
4    38.8950092
5    42.5846773
6     34.035591
Name: location, dtype: object

In [95]:
# We add a column with the latitude
df1['lat']=df1['location'].apply(getLat)
df1

Unnamed: 0,year,semester,location,lat
0,2016,Spring,"Washington, DC",38.8950092
1,2016,fall,"Kenosha, WI",42.5846773
2,2017,Spring,Malibu,34.035591
3,2017,fall,"New York, NY",40.7308619
4,2018,Spring,"Washington, DC",38.8950092
5,2018,fall,"Kenosha, WI",42.5846773
6,2019,Spring,Malibu,34.035591


In [96]:
# We also add a column with the longitude
df1['lon']=df1['location'].apply(getLon)
df1

Unnamed: 0,year,semester,location,lat,lon
0,2016,Spring,"Washington, DC",38.8950092,-77.0365625
1,2016,fall,"Kenosha, WI",42.5846773,-87.8212263
2,2017,Spring,Malibu,34.035591,-118.689423
3,2017,fall,"New York, NY",40.7308619,-73.9871558
4,2018,Spring,"Washington, DC",38.8950092,-77.0365625
5,2018,fall,"Kenosha, WI",42.5846773,-87.8212263
6,2019,Spring,Malibu,34.035591,-118.689423


### Applying the lambda function to the clean twitter data does not work ... we need more cleaning

In [None]:
# if we try to apply the getLat function to the twitter data we have an error
df['location'].apply(getLat)

In [98]:
# There is something wrong with the data, with some exploration we find two rows that cannot be geolocated
# Here is an example of the exploration procedure
df[1:14]['location'].apply(getLat)
df[1:15]['location']

1      Washington, DC
2      Washington, DC
3         Kenosha, WI
4              Malibu
5        New York, NY
6      Washington, DC
7                 USA
8                 USA
9        New York, NY
10       New York, NY
11     Washington, DC
12    Los Angeles, CA
13                USA
14       Peace Within
Name: location, dtype: object

In [99]:
# we drop lines 14 and 52
df=df.drop([14,52])

### Finally we can apply the lambda functions to our data

In [None]:
df['lat']=df['location'].apply(getLat)
df['lon']=df['location'].apply(getLon)

In [103]:
df.head()

Unnamed: 0.1,Unnamed: 0,screen_name,name,followers_count,location,lat,lon
0,0,TuckerCarlson,Tucker Carlson,1064787,"Washington, DC",38.8950092,-77.0365625
1,3,WhiteHouse,The White House,15327283,"Washington, DC",38.8950092,-77.0365625
2,5,KellyannePolls,Kellyanne Conway,1664393,"Washington, DC",38.8950092,-77.0365625
3,6,Reince,Reince Priebus,941927,"Kenosha, WI",42.5846773,-87.8212263
4,10,RealRomaDowney,Roma Downey,192126,Malibu,34.035591,-118.689423


### And we save the data in a file so we can import them in QGIS

In [104]:
f = open("geocodedTwitterData.csv", "w")
f.write(df.to_csv(index=False))
f.close()