# 5.2 Attribute join and spatial join


<br></br>
<font size="3">There are two ways to combine datasets in `geopandas` – attribute joins and spatial joins.
<br></br>
<br></br>
In an <b>attribute join</b>, a `GeoSeries` or `GeoDataFrame` is combined with a regular pandas Series or DataFrame based on a <b>common variable</b>. This is analogous to normal merging or joining in pandas. An example would be joining student grade to student information based on student ID as the <b>key</b>.
<br></br>
<br></br>
In a <b>spatial join</b>, observations from to GeoSeries or GeoDataFrames are combined based on their <b> spatial relationship</b> to one another. An example would be finding in which zipcode a crime incident happend by looking at which zipcode polygon contains that crime incident location. This operation definately needs more computation, but all is under the hood.

<br></br>
<br></br>
In the below example, we will be using sample datasets from `geopandas` library to demostrate.
</font>

In [8]:
import geopandas
%matplotlib inline

## Attribute Joins
<br></br>
<font size="3">Attribute joins are accomplished using the merge method. In general, it is recommended to use the merge method called from the spatial dataset. With that said, the stand-alone merge function will work if the GeoDataFrame is in the left argument; if a DataFrame is in the left argument and a GeoDataFrame is in the right position, the result will no longer be a GeoDataFrame.
<br></br>
<br></br>






<font size="3">For example, consider the following merge that adds full names to a GeoDataFrame that initially has only ISO codes for each country by merging it with a pandas DataFrame.</font> 


In [9]:
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))

country_shapes = world[['geometry', 'iso_a3']]
country_names = world[['name', 'iso_a3']]

<font size="3">The first table `country_shapes` contains geometry and iso code of a given country. </font> 

In [10]:
country_shapes.head()

Unnamed: 0,geometry,iso_a3
0,"(POLYGON ((180 -16.06713266364245, 180 -16.555...",FJI
1,POLYGON ((33.90371119710453 -0.950000000000000...,TZA
2,POLYGON ((-8.665589565454809 27.65642588959236...,ESH
3,"(POLYGON ((-122.84 49.00000000000011, -122.974...",CAN
4,"(POLYGON ((-122.84 49.00000000000011, -120 49....",USA


<font size="3">The second table `country_names` contains name and iso code of a given country. </font> 

In [11]:
country_names.head()

Unnamed: 0,name,iso_a3
0,Fiji,FJI
1,Tanzania,TZA
2,W. Sahara,ESH
3,Canada,CAN
4,United States of America,USA


<font size="3"> Lets merge two tables `country_names` with `country_shapes` using `merge` method on shared variable (iso codes `iso_a3`). The resulting table would have both name and geometry in one single table. </font> 

In [12]:
country_merged = country_shapes.merge(country_names, on='iso_a3')
country_merged.head()

Unnamed: 0,geometry,iso_a3,name
0,"(POLYGON ((180 -16.06713266364245, 180 -16.555...",FJI,Fiji
1,POLYGON ((33.90371119710453 -0.950000000000000...,TZA,Tanzania
2,POLYGON ((-8.665589565454809 27.65642588959236...,ESH,W. Sahara
3,"(POLYGON ((-122.84 49.00000000000011, -122.974...",CAN,Canada
4,"(POLYGON ((-122.84 49.00000000000011, -120 49....",USA,United States of America


## Spatial Joins
<br></br>
<font size="3"> Spatial joins are joining based on geometries. In this example, we have a list of cities that we know their coordinates, but we don't know which country each city is within. And spatial join can help in this case. The command in `geopandas` is `sjoin`.
<br></br>
<br></br>






In [13]:
cities = geopandas.read_file(geopandas.datasets.get_path('naturalearth_cities'))

In [16]:
cities.head()

Unnamed: 0,name,geometry
0,Vatican City,POINT (12.45338654497177 41.90328217996012)
1,San Marino,POINT (12.44177015780014 43.936095834768)
2,Vaduz,POINT (9.516669472907267 47.13372377429357)
3,Luxembourg,POINT (6.130002806227083 49.61166037912108)
4,Palikir,POINT (158.1499743237623 6.916643696007725)


<br></br>
<font size="3"> Let's play around with those methods using the US lower 48 states example.</font>

In [17]:
countries = world[['geometry', 'name']]
countries = countries.rename(columns={'name':'country'})

In [18]:
countries.head()

Unnamed: 0,geometry,country
0,"(POLYGON ((180 -16.06713266364245, 180 -16.555...",Fiji
1,POLYGON ((33.90371119710453 -0.950000000000000...,Tanzania
2,POLYGON ((-8.665589565454809 27.65642588959236...,W. Sahara
3,"(POLYGON ((-122.84 49.00000000000011, -122.974...",Canada
4,"(POLYGON ((-122.84 49.00000000000011, -120 49....",United States of America


<br></br>
<font size="3"> Let's execute the spatila join with the `sjoin()` command.</font>

In [20]:
# Execute spatial join
cities_with_country = geopandas.sjoin(cities, countries, how="inner", op='intersects')
cities_with_country.head()

Unnamed: 0,name,geometry,index_right,country
0,Vatican City,POINT (12.45338654497177 41.90328217996012),141,Italy
1,San Marino,POINT (12.44177015780014 43.936095834768),141,Italy
192,Rome,POINT (12.481312562874 41.89790148509894),141,Italy
2,Vaduz,POINT (9.516669472907267 47.13372377429357),114,Austria
184,Vienna,POINT (16.36469309674374 48.20196113681686),114,Austria


<br></br>
<font size="3"> And here we go, we can see that each city is now joined with the country. For example, Vatican City is joined with Italy.</font>

## join Arguments
<br></br>
<font size="3"> If you notice, there are two arguments in the `sjoin()` function above: `how="inner"` and `op='intersects'`.

### op
<br></br>
The `op` argument specifies how geopandas decides whether or not to join the attributes of one object to another. There are three different join options as follows:

* intersects: The attributes will be joined if the boundary and interior of the object intersect in any way with the boundary and/or interior of the other object.
* within: The attributes will be joined if the object’s boundary and interior intersect only with the interior of the other object (not its boundary or exterior).
* contains: The attributes will be joined if the object’s interior contains the boundary and interior of the other object and their boundaries do not touch at all.

You can read more about each join type in the Shapely documentation.

how

The how argument specifies the type of join that will occur and which geometry is retained in the resultant geodataframe. It accepts the following options:

left: use the index from the first (or left_df) geodataframe that you provide to sjoin; retain only the left_df geometry column
right: use index from second (or right_df); retain only the right_df geometry column
inner: use intersection of index values from both geodataframes; retain only the left_df geometry column
Note more complicated spatial relationships can be studied by combining geometric operations with spatial join. To find all polygons within a given distance of a point, for example, one can first use the buffer method to expand each point into a circle of appropriate radius, then intersect those buffered circles with the polygons in question.