# Geolocation Data

Geospatial information is data that is referenced by spatial or geographic coordinates. The data that we will be working with in this lesson is vector data - features that are represented by points, lines, and polygons. <br>
 - **Points** are defined by a pair of (x,y) coordinates. They usually represent locations, place names, and other objects on the ground.
 - **Lines** are the connection between two points. They can have properties such as length, direction, flow, etc.
 - **Polygons** are a series of lines connected together to form a shape. They can have properties such as area, perimeters, and centroids. 
 
In this notebook, you will need to install the [geopandas](https://anaconda.org/conda-forge/geopandas) and [geoPy](https://anaconda.org/conda-forge/geopy) libraries. Also, download [this text file](https://automating-gis-processes.github.io/CSC18/_static/data/L3/addresses.txt) to use in the example code below.

In [1]:
import pandas as pd
import geopandas as gpd #used for transforming geolocation data
import matplotlib.pyplot as plt

from datetime import datetime  #to convert data to datetime that does not fall within the pandas.to_datetime function timeframe
from shapely.geometry import Point  #transform latitude/longitude to geo-coordinate data
from geopandas.tools import geocode #get the latitude/longitude for a given address
from geopandas.tools import reverse_geocode  #get the address for a location using latitude/longitude

%matplotlib inline

### Geocoding and Reverse Geocoding

Geocoding is taking an address for a location and returning its latitudinal and longitudinal coordinates. Reverse geocoding would then be the opposite - taking the latitudinal and longitudinal coordinates for a location and returning the physical address.

In [2]:
#take an address and return coordinates
#returned variable is a geo-dataframe with 2 columns, geometry (the geographical shape) and the full physical address
ex1_geo = geocode("12 South Summit Avenue, Gaithersburg, MD", provider='nominatim')
ex1_geo



Unnamed: 0,geometry,address
0,POINT (-77.19392100403969 39.14089855),"12, South Summit Avenue, Gaithersburg, Montgom..."


In [3]:
#structure of full address
#each API structures addresses differently
ex1_geo['address'].iloc[0]

'12, South Summit Avenue, Gaithersburg, Montgomery County, Maryland, 20877, USA'

In [4]:
#use latitude and longitude to get physical address
#pass through using Point geometry
#also returns geo-dataframe with geometry and full physical address
ex2_geo = reverse_geocode([Point(-77.15879730243169, 39.0985195)], provider='nominatim')
ex2_geo

Unnamed: 0,geometry,address
0,POINT (-77.15879730243169 39.0985195),"Montgomery College, 51, Mannakee Street, Westm..."


#### Geocode a Dataframe column

In [5]:
#dataset of addresses in Finland
location = "datasets/addresses.txt"

#load data into dataframe
#seperator between values in file is a semicolon
finland_df = pd.read_csv(location, sep=";")
finland_df.head()

Unnamed: 0,id,addr
0,1000,"Itämerenkatu 14, 00101 Helsinki, Finland"
1,1001,"Kampinkuja 1, 00100 Helsinki, Finland"
2,1002,"Kaivokatu 8, 00101 Helsinki, Finland"
3,1003,"Hermannin rantatie 1, 00580 Helsinki, Finland"
4,1005,"Tyynenmerenkatu 9, 00220 Helsinki, Finland"


In [6]:
#geocode an entire column in a dataframe
geo_addr = geocode(finland_df['addr'], provider='nominatim')

In [7]:
#first 5 rows of geo-dataframe of Finland addresses
geo_addr.head()

Unnamed: 0,geometry,address
0,POINT (24.9155624 60.1632015),"Ruoholahti, 14, Itämerenkatu, Ruoholahti, Läns..."
1,POINT (24.9316914 60.1690222),"Kamppi, 1, Kampinkuja, Kamppi, Eteläinen suurp..."
2,POINT (24.9416849 60.1699637),"Bangkok9, 8, Kaivokatu, Keskusta, Kluuvi, Etel..."
3,POINT (24.9733884 60.1961621),"Hermannin rantatie, Kyläsaari, Hermanni, Helsi..."
4,POINT (24.9216003 60.1566475),"Hesburger, 9, Tyynenmerenkatu, Jätkäsaari, Län..."


In [8]:
#add the geometry column (coordinates) to the original dataframe 
finland_df['geo_addr'] = geo_addr['geometry']
finland_df.head()

Unnamed: 0,id,addr,geo_addr
0,1000,"Itämerenkatu 14, 00101 Helsinki, Finland",POINT (24.9155624 60.1632015)
1,1001,"Kampinkuja 1, 00100 Helsinki, Finland",POINT (24.9316914 60.1690222)
2,1002,"Kaivokatu 8, 00101 Helsinki, Finland",POINT (24.9416849 60.1699637)
3,1003,"Hermannin rantatie 1, 00580 Helsinki, Finland",POINT (24.9733884 60.1961621)
4,1005,"Tyynenmerenkatu 9, 00220 Helsinki, Finland",POINT (24.9216003 60.1566475)


### NASA Meteorite Landings

At the end of your notebook for the NASA Meteorite data exercise, add a new cell with the code `df.to_csv("datasets/NASAmeteorite.csv")`. Run that cell to create a csv file of your meteorite data to use in the following examples. Then [download the `continents.json` GeoJSON file](https://notebooks.azure.com/priesterkc/projects/DABmaterial/tree/Lv2%20Data%20Analytics) for the world map that will be charted.

In [13]:
#load meteorite data collected from NASA Open Data API
meteor_df = pd.read_csv('datasets/NASAmeteorite.csv')
meteor_df.head()

Unnamed: 0,id,year,fall,name,name_type,mass,latitude,longitude,type
0,1,1880-01-01T00:00:00.000,Fell,Aachen,Valid,21.0,50.775,6.08333,Point
1,2,1951-01-01T00:00:00.000,Fell,Aarhus,Valid,720.0,56.18333,10.23333,Point
2,6,1952-01-01T00:00:00.000,Fell,Abee,Valid,107000.0,54.21667,-113.0,Point
3,10,1976-01-01T00:00:00.000,Fell,Acapulco,Valid,1914.0,16.88333,-99.9,Point
4,370,1902-01-01T00:00:00.000,Fell,Achiras,Valid,780.0,-33.16667,-64.95,Point


In [None]:
#data type of each column
meteor_df.dtypes

In [None]:
#only dataframe with non-null year column values
meteor_df = meteor_df.loc[meteor_df['year'].notnull()]

#change year column into a string
#need to use string type for getYear function below
meteor_df['year'] = meteor_df['year'].astype(str)

In [None]:
#function to split apart the date from the timestamp
def getYear(col):
    #get YYYY-MM-DD value
    date = col.split("T")[0]
    
    #extract year from date
    dt = datetime.strptime(date, '%Y-%m-%d')
    return dt.year

In [None]:
#replace the year timestamp data with only the year (using the getYear function)
meteor_df['year'] = meteor_df['year'].apply(getYear)
meteor_df.head()

In [None]:
#see columns with null values
meteor_df.count()

In [None]:
#only include rows with non-null latitudes (which means longitude is also not null) and non-null mass
meteor_df = meteor_df.loc[(meteor_df['latitude'].notnull()) & meteor_df['mass'].notnull()]
meteor_df.count()

In [None]:
#make a new column to hold the longitude & latitude as a list
meteor_df['coordinates'] = list(meteor_df[['longitude', 'latitude']].values)

In [None]:
#see new coordinates column
meteor_df.head()

In [None]:
#list values in coordinates column is classified as object type
meteor_df['coordinates'].dtypes

In [None]:
#convert the coordinates to a geolocation type
meteor_df['coordinates'] = meteor_df['coordinates'].apply(Point)

In [None]:
#coordinates column now has POINT next to each coordinate pair value
meteor_df.head()

In [None]:
#coordinates column with geolocation data is just a regular pandas Series type
type(meteor_df['coordinates'])

In [None]:
#create a geolocation dataframe type using the coordinates column as the geolocation data
geo_meteor = gpd.GeoDataFrame(meteor_df, geometry='coordinates')

In [None]:
#geo-dataframe looks the same as regular dataframe
geo_meteor.head()

In [None]:
#verify coordinates column is geolocation data type
type(geo_meteor['coordinates'])

In [None]:
#import file that contains a world map shape polygons
#will use to plot the coordinates of meteorite landings
filepath = "datasets/continents.json"

#data contains polygon shape coordinates for different map body types (continents, etc.)
map_df = gpd.read_file(filepath)
map_df.head()

In [None]:
#map graph
map_df.plot()

In [None]:
#plot the coordinates (no map)
geo_meteor.plot()

In [None]:
#plot coordinates on top of map graph

#this is to set the size of the borders
fig, ax = plt.subplots(1, figsize=(15,10))

#this is the map
basemap = map_df.plot(ax=ax)

#plot coordinates on top of map graph
geo_meteor.plot(ax=basemap, color='darkred', marker=".", markersize=10)

#take off axis numbers
ax.axis('off')

#put title on map
ax.set_title("NASA Meteorite Landings", fontsize=25, fontweight=3)