**Spatial operations**

***Projections***

If you know it. We will choose epsg:32618 projection (unit of measurement in this projection is meter) that covers the NYC zone. Reprojecting our data is easy with GeoPandas, and we can simply pass the EPSG code to the to_crs function

In [None]:
nyc_gdf_proj = nyc_gdf.to_crs({'init': 'epsg:32618'})
nyc_gdf_proj.head()

**Buffer analysis**

Now that we are able to analyze our location data and get insights based on meter units, let's explore buffer analysis. Buffer analysis is one of the most used GIS spatial operations. It creates zones with a certain area around a point, line, or polygon geometry according to a specified buffer distance.

In [None]:
point1 = nyc_gdf_proj[:1]
buf10 = point1.buffer(10)
buf50 = point1.buffer(50)
buf100 = point1.buffer(100)

Let's plot all the buffered points and the original point together. We pass all images into the same axis to overlay on top of each other

In [None]:
fig, ax = plt.subplots(figsize=(12, 10))
buf100.plot(color = 'red', ax=ax);
buf50.plot(ax=ax, color='yellow')
buf10.plot(ax=ax, color='gray');
point1.plot(ax=ax, color='black')
ax.set_xticklabels([])
ax.set_yticklabels([])
plt.show()

Let's explore further how we can use buffer analysis for a subset of our data. We might be interested in seeing only subway data and perform a buffer analysis for that. We will take only data where ['VenueCategoryName']== 'Subway', provide a distance (1000 meters) by using .buffer, and then plot it.

In [None]:
subway = nyc_gdf_proj[nyc_gdf_proj['VenueCategoryName']== 'Subway']
subwayBuf = subway.buffer(1000)

**Spatial joins**

The table join is a classical query operation where two separate tables sharing a column (foreign ID) are merged based on that column. The table join does not involve any geographic relations, but only involves table attributes; however, we can use GeoDataFrame options to perform a spatial join, merging two geometry objects based on their locations. Let's look at an example of this. We will add a new dataset of NYC districts with polygon geometry. We will access the data directly from the server URL as GeoJSON and look at the first five rows, as follows

In [None]:
url_dist = 'http://services5.arcgis.com/GfwWNkhOj9bNBqoJ/arcgis/rest/services/nyad/FeatureServer/0/query?where=1=1&outFields=*&outSR=4326&f=geojson'
nyc_dist = gpd.read_file(url_dist)
nyc_dist.head()

To convert a GeoDataFrame CRS, GeoPandas has the to_crs function, which takes a dictionary of the projection to be used. We will use epsg:32618 here, as shown in the following code

In [None]:
# Convert to UTM meter based projection: https://epsg.io/32618 
nyc_dist_proj = nyc_dist.to_crs({'init': 'epsg:32618'})

Now that both datasets have the same CRS, let's overlay the points data and the polygon data to check whether their locations match. The locations match, but as you can see, the Foursquare dataset is actually not only in NYC, but also extends beyond NYC district boundaries. It is not uncommon to have such scenarios where you have different data that does not perfectly fit with your desired boundaries. 

The following code overlays both the points and boundary datasets. Once we create the figure and axis, we can easily pass any plot to the same axis to overlay it

In [None]:
fig, ax = plt.subplots(figsize=(12,12))
nyc_dist_proj.plot(ax=ax, color='gray');
nyc_gdf_proj.plot(ax=ax, markersize=0.01, color='black');

Here is where the spatial join helps in your location data preprocessing and analysis. We will first choose points that fall within NYC district boundaries based on their locations using spatial join operations. A spatial join is when two geometry objects are merged based on their spatial relationship. In GeoPandas, we can carry out a spatial join with the .sjoin method, which takes two GeoDataFrames and an operation type. The operation type determines the type of join to apply. It could be an intersection, within, or contains operation, and can be carried out with different geometries, such as points with polygons, lines with polygons, or points with lines.

To illustrate this, let's get points that are only within NYC districts. Here is how this would be performed in code: we use .sjoin and pass the two GeoDataFrames, nyc_gdf_proj and nyc_dist_proj. We also need to provide the operation; here, the within operation means we get only those points within the boundaries

In [None]:
nyc_points = gpd.sjoin(nyc_gdf_proj, nyc_dist_proj, op='within')

We'll save this data into a new GeoJSON file to save preprocessing time if we want to perform analysis on only this subset of data. We first create an output file—in this case, a GeoJSON file. We then use the .to_file method to write the GeoJSON file in the output file created

In [None]:
out = r"data/nyc_foursquare.geojson"
nyc_points.to_file(out)