# Introduction

In this tutorial, you'll learn about two common manipulations for geospatial data: **geocoding** and **spatial joins**.

# Geocoding

**Geocoding** is the process of converting the name of a place or an address to a location on a map.

![](https://i.imgur.com/1IrgZQq.png)

There are many different providers of geocoding services, such as Google, Bing, and Yahoo.  To avoid having to create an API key, we'll work with the default geocoder from geopandas.  We begin by importing the necessary Python package.

In [None]:
from geopandas.tools import geocode

Then, to use the geocoder, we need only provide: 
- the name or address as a Python string, and
- the name of the provider; in this case, we use the [OpenStreetMap Nominatim geocoder](https://nominatim.openstreetmap.org/).

If the geocoding is successful, it returns a GeoDataFrame with two columns:
- the "geometry" column contains the (latitude, longitude) location, and
- the "address" column contains the full address.

In [None]:
result = geocode("1600 Pennsylvania Avenue, Washington, DC", provider="nominatim")
result

The entry in the "geometry" column is a `Point` object, and we can get the latitude and longitude from the `y` and `x` attributes, respectively.

In [None]:
point = result.geometry.iloc[0]
print("Latitude:", point.y)
print("Longitude:", point.x)

It's often the case that we'll need to geocode many different addresses.  For instance, say we want to obtain the locations of the top 100 universities in Europe.

In [None]:
import pandas as pd

universities = pd.read_csv("../input/geospatial-course-data/top_universities.csv")
universities.head()

Then we can use a lambda function to apply the geocoder to every row in the DataFrame.  (We use a try/except statement to account for the case that the geocoding is unsuccessful.)

In [None]:
import numpy as np

def geo_locate(row):
    try:
        point = geocode(row, provider='nominatim').geometry.iloc[0]
        return pd.Series({'Latitude': point.y, 'Longitude': point.x})
    except:
        return None

universities[['Latitude', 'Longitude']] = universities.apply(lambda x: geo_locate(x['Name']), axis=1)

print("{}% of addresses were geocoded!".format(
    (1 - sum(np.isnan(universities["Latitude"])) / len(universities)) * 100))
universities.head()

Next, we visualize all of the locations that were returned by the geocoder.  Notice that a few of the locations are certainly inaccurate, as they're not in Europe!

In [None]:
#$HIDE_INPUT$
# Function for displaying the map
def embed_map(m, file_name):
    from IPython.display import IFrame
    m.save(file_name)
    return IFrame(file_name, width='100%', height='500px')

In [None]:
import folium

# Create a map
m_1 = folium.Map(location=[54, 15], tiles='openstreetmap', zoom_start=2)

# Add points to the map
for idx, row in universities.iterrows():
    if not(np.isnan(row['Latitude'])):
        folium.Marker([row['Latitude'], row['Longitude']], popup=row['Name']).add_to(m_1)

# Display the map
embed_map(m_1, 'm_1.html')

# Spatial join