# Introduction

In this tutorial, you'll explore several techniques for **proximity analysis**.  In particular, you'll learn how to do such things as:
- measure the distance between points on a map, and
- select all points within some radius of a feature.

In [None]:
#$HIDE_INPUT$
import folium
from folium import Marker
from folium.plugins import HeatMap

import pandas as pd
import geopandas as gpd

# Function for displaying the map
def embed_map(m, file_name):
    from IPython.display import IFrame
    m.save(file_name)
    return IFrame(file_name, width='100%', height='500px')

You'll work with a dataset from the US Environmental Protection Agency (EPA) that tracks releases of toxic chemicals in Philadelphia, Pennsylvania, USA.

In [None]:
releases = gpd.read_file("../input/geospatial-course-data/toxic_release_pennsylvania/toxic_release_pennsylvania.shp") 
releases.head()

You'll also work with a dataset that contains readings from air quality monitoring stations in the same city.

In [None]:
all_stations = gpd.read_file("../input/geospatial-course-data/PhillyHealth_Air_Monitoring_Stations/PhillyHealth_Air_Monitoring_Stations.shp")
all_stations.head()

# Measuring distance

If we want to measure distances between points from two different GeoDataFrames, we first have to make sure that they use the same coordinate reference system (CRS).  Thankfully, this is the case here, where both use EPSG 2272.

In [None]:
print(all_stations.crs)
print(releases.crs)

We also check the CRS to see which units it uses (meters, feet, or something else).  In this case, EPSG 2272 has units of feet.  (_If you like, you can check this [here](https://epsg.io/2272)._)

It's relatively straightforward to compute distances in GeoPandas.  The code cell below calculates the distance (in feet) between the station with the worst air quality in `worst_station` and every point in the `releases` GeoDataFrame.

In [None]:
# Select station with worst air quality
worst_station = all_stations.iloc[4]

# Measure distance from station to each release incident
all_distances = releases.geometry.distance(worst_station.geometry)

Using the calculated distances, we can obtain statistics like the mean distance to a release incident.  Or, we can print the details of the closest recorded release incident.

In [None]:
print('Mean distance to release incidents: {} feet\n'.format(all_distances.mean()))

print('Closest release incident ({} feet):'.format(all_distances.min()))
print(releases.iloc[all_distances.idxmin()][["CHEMICAL", "UNIT_OF_ME", "TOTAL_RELE"]])

# Creating a buffer

If we want to understand all points on a map that are some radius away from a point, the simplest way to accomplish this is by creating a buffer.

The code cell below creates a GeoSeries `two_mile_radius` containing 12 different `Polygon` objects.  Each polygon is centered at a different air quality monitoring station and has a radius of 2 miles (or, 2\*5280 feet).

In [None]:
two_mile_radius = all_stations.geometry.buffer(2*5280)
two_mile_radius.head()

In [None]:
m = folium.Map(location=[39.9526,-75.1652], tiles='openstreetmap', zoom_start=11)

HeatMap(data=releases[['LATITUDE', 'LONGITUDE']], radius=15).add_to(m)
for idx, row in all_stations.iterrows():
    Marker([row['LATITUDE'], row['LONGITUDE']]).add_to(m)
folium.GeoJson(two_mile_radius.to_crs(epsg=4326)).add_to(m)

embed_map(m, 'm.html')

In [None]:
my_union = all_two_mile_radius.geometry.unary_union
inside_range = releases.loc[releases["geometry"].apply(lambda x: my_union.contains(x))]
inside_range.CHEMICAL.value_counts().head()

# Your turn