Last Updated: 7-29-2017

# Table of Contents
 <p><div class="lev1 toc-item"><a href="#Merging-Data" data-toc-modified-id="Merging-Data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Merging Data</a></div><div class="lev2 toc-item"><a href="#Attribute-Joins" data-toc-modified-id="Attribute-Joins-11"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Attribute Joins</a></div><div class="lev2 toc-item"><a href="#Spatial-Joins" data-toc-modified-id="Spatial-Joins-12"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Spatial Joins</a></div>

# Merging Data

- There are two ways to combine datasets in geopandas - attribute joins and spatial joins.
 
- In an attribute join, a ```GeoSeries``` or ```GeoDataFrame``` is combined with a regular pandas ```Series``` or ```DataFrame``` based on a common variable. This is analogous to normal merging or joining in pandas.

- In a Spatial Join, observations from to ```GeoSeries``` or ```GeoDataFrames``` are combined based on their spatial relationship to one another.

- In the following examples, we use these datasets:



In [1]:
import geopandas as gpd
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib
matplotlib.style.use('ggplot')
matplotlib.rcParams['figure.figsize'] = (16, 20)
from shapely.geometry import Point, Polygon

In [2]:
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
cities = gpd.read_file(gpd.datasets.get_path('naturalearth_cities'))

In [3]:
# For attribute join
country_shapes = world[['geometry', 'iso_a3']]
country_names = world[['name', 'iso_a3']]

In [4]:
country_shapes.head()

Unnamed: 0,geometry,iso_a3
0,"POLYGON ((61.21081709172574 35.65007233330923,...",AFG
1,(POLYGON ((16.32652835456705 -5.87747039146621...,AGO
2,"POLYGON ((20.59024743010491 41.85540416113361,...",ALB
3,"POLYGON ((51.57951867046327 24.24549713795111,...",ARE
4,(POLYGON ((-65.50000000000003 -55.199999999999...,ARG


In [5]:
country_names.head()

Unnamed: 0,name,iso_a3
0,Afghanistan,AFG
1,Angola,AGO
2,Albania,ALB
3,United Arab Emirates,ARE
4,Argentina,ARG


In [6]:
# For Spatial Join
countries = world[['geometry', 'name']]

countries = countries.rename(columns={'name': 'country'})

## Attribute Joins

- Attribute joins are accomplished using the ```merge``` method. In general, it is recommended to use the ```merge``` method called from the spatial dataset.

- With that said, the standalone ```merge``` function will work if the GeoDataFrame is in the ```left``` argument; if a DataFrame is in the ```left``` argument and a GeoDataFrame is in the ```right``` position, the result will no longer be a GeoDataFrame.

In [7]:
country_shapes.head()

Unnamed: 0,geometry,iso_a3
0,"POLYGON ((61.21081709172574 35.65007233330923,...",AFG
1,(POLYGON ((16.32652835456705 -5.87747039146621...,AGO
2,"POLYGON ((20.59024743010491 41.85540416113361,...",ALB
3,"POLYGON ((51.57951867046327 24.24549713795111,...",ARE
4,(POLYGON ((-65.50000000000003 -55.199999999999...,ARG


In [8]:
country_names.head()

Unnamed: 0,name,iso_a3
0,Afghanistan,AFG
1,Angola,AGO
2,Albania,ALB
3,United Arab Emirates,ARE
4,Argentina,ARG


- Merge with ```merge``` method on shared variable (iso codes):


In [9]:
country_shapes = country_shapes.merge(country_names, on='iso_a3')
country_shapes.head()

Unnamed: 0,geometry,iso_a3,name
0,"POLYGON ((61.21081709172574 35.65007233330923,...",AFG,Afghanistan
1,(POLYGON ((16.32652835456705 -5.87747039146621...,AGO,Angola
2,"POLYGON ((20.59024743010491 41.85540416113361,...",ALB,Albania
3,"POLYGON ((51.57951867046327 24.24549713795111,...",ARE,United Arab Emirates
4,(POLYGON ((-65.50000000000003 -55.199999999999...,ARG,Argentina


## Spatial Joins

- In a Spatial Join, two geometry objects are merged based on their spatial relationship to one another.

In [10]:
# One GeoDataFrame of countries, one of Cities.
# Want to merge so we can get each city's country.
countries.head()

Unnamed: 0,geometry,country
0,"POLYGON ((61.21081709172574 35.65007233330923,...",Afghanistan
1,(POLYGON ((16.32652835456705 -5.87747039146621...,Angola
2,"POLYGON ((20.59024743010491 41.85540416113361,...",Albania
3,"POLYGON ((51.57951867046327 24.24549713795111,...",United Arab Emirates
4,(POLYGON ((-65.50000000000003 -55.199999999999...,Argentina


In [11]:
cities.head()

Unnamed: 0,geometry,name
0,POINT (12.45338654497177 41.90328217996012),Vatican City
1,POINT (12.44177015780014 43.936095834768),San Marino
2,POINT (9.516669472907267 47.13372377429357),Vaduz
3,POINT (6.130002806227083 49.61166037912108),Luxembourg
4,POINT (158.1499743237623 6.916643696007725),Palikir


In [12]:
# Execute Spatial Join
cities_with_country = gpd.sjoin(cities, countries, how="inner", op="intersects")
cities_with_country.head()

Unnamed: 0,geometry,name,index_right,country
0,POINT (12.45338654497177 41.90328217996012),Vatican City,79,Italy
1,POINT (12.44177015780014 43.936095834768),San Marino,79,Italy
192,POINT (12.481312562874 41.89790148509894),Rome,79,Italy
2,POINT (9.516669472907267 47.13372377429357),Vaduz,9,Austria
184,POINT (16.36469309674374 48.20196113681686),Vienna,9,Austria


- The ```op``` options determines the type of join operation to apply. ```op``` can be set to "intersects", "within" or "contains" (these are all equivalent when joining points to polygons, but differ when joining polygons to other polygons or lines).

- Note more complicated spatial relationships can be studied by combining geometric operations with spatial join. To find all polygons within a given distance of a point, for example, one can first use the ```buffer``` method to expand each point into a circle of appropriate radius, then intersect those buffered circles with the polygons in question.

