In [None]:
# Install GDAL(Geospatial Data Abstraction Library) and Geopandas
# Reference: https://gdal.org/api/python.html

!apt install gdal-bin python-gdal python3-gdal --quiet
!apt install python3-rtree --quiet
!pip install git+git://github.com/geopandas/geopandas.git --quiet
!pip install descartes --quiet
!pip install folium

Reading package lists...
Building dependency tree...
Reading state information...
gdal-bin is already the newest version (2.2.3+dfsg-2).
python-gdal is already the newest version (2.2.3+dfsg-2).
The following additional packages will be installed:
  python3-numpy
Suggested packages:
  python-numpy-doc python3-nose python3-numpy-dbg
The following NEW packages will be installed:
  python3-gdal python3-numpy
0 upgraded, 2 newly installed, 0 to remove and 39 not upgraded.
Need to get 2,288 kB of archives.
After this operation, 13.2 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/main amd64 python3-numpy amd64 1:1.13.3-2ubuntu1 [1,943 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/universe amd64 python3-gdal amd64 2.2.3+dfsg-2 [346 kB]
Fetched 2,288 kB in 0s (15.6 MB/s)
Selecting previously unselected package python3-numpy.
(Reading database ... 155455 files and directories currently installed.)
Preparing to unpack .../python3-numpy_1%3a1.13.3-2ubu

In [None]:
!wget https://www.dropbox.com/s/xcxw2hl2zt3fwzg/bike_data.zip

--2022-04-20 04:06:25--  https://www.dropbox.com/s/xcxw2hl2zt3fwzg/bike_data.zip
Resolving www.dropbox.com (www.dropbox.com)... 162.125.6.18, 2620:100:6019:18::a27d:412
Connecting to www.dropbox.com (www.dropbox.com)|162.125.6.18|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /s/raw/xcxw2hl2zt3fwzg/bike_data.zip [following]
--2022-04-20 04:06:26--  https://www.dropbox.com/s/raw/xcxw2hl2zt3fwzg/bike_data.zip
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc0dd29e96a7591575c246118f6a.dl.dropboxusercontent.com/cd/0/inline/BjscLdnVUVlNChHU0OvfewpZnDwjW-gRTYt17fPY1ZJdWae82wesp8ZQfW0ErhUe5d0FxNI60jLB3850LhNnDyZjARNdeK6BUMPvtwv1ItIZaGxnsWzFeFfBuc0YHI6luGucaf1qwkuS1mfnnDk1ERxaeG1X6cCWr0SebG39Fv_-1Q/file# [following]
--2022-04-20 04:06:26--  https://uc0dd29e96a7591575c246118f6a.dl.dropboxusercontent.com/cd/0/inline/BjscLdnVUVlNChHU0OvfewpZnDwjW-gRTYt17fPY1ZJdWae82wesp8ZQfW0ErhUe5

In [None]:
!unzip bike_data.zip

Archive:  bike_data.zip
  inflating: points.csv              
  inflating: stations.csv            


In [None]:
#!pip install git+git://github.com/geopandas/geopandas.git

!pip install geopandas

Collecting geopandas
  Downloading geopandas-0.10.2-py2.py3-none-any.whl (1.0 MB)
[K     |████████████████████████████████| 1.0 MB 28.1 MB/s 
[?25hCollecting pyproj>=2.2.0
  Downloading pyproj-3.2.1-cp37-cp37m-manylinux2010_x86_64.whl (6.3 MB)
[K     |████████████████████████████████| 6.3 MB 52.1 MB/s 
[?25hCollecting fiona>=1.8
  Downloading Fiona-1.8.21-cp37-cp37m-manylinux2014_x86_64.whl (16.7 MB)
[K     |████████████████████████████████| 16.7 MB 49.3 MB/s 
Collecting munch
  Downloading munch-2.5.0-py2.py3-none-any.whl (10 kB)
Collecting cligj>=0.5
  Downloading cligj-0.7.2-py3-none-any.whl (7.1 kB)
Collecting click-plugins>=1.0
  Downloading click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB)
Installing collected packages: munch, cligj, click-plugins, pyproj, fiona, geopandas
Successfully installed click-plugins-1.1.1 cligj-0.7.2 fiona-1.8.21 geopandas-0.10.2 munch-2.5.0 pyproj-3.2.1


In [None]:
import pandas as pd
import geopandas as gpd 
import matplotlib.pyplot as plt
import folium 
from shapely.ops import nearest_points
from shapely.geometry import LineString

**Exploring the Data:**
We read the data with pandas and call the first data frame stations where we have attributes like station name, capacity and coordinates.

In [None]:
stations = pd.read_csv("stations.csv")
stations.head()

Unnamed: 0,id,name,dpcapacity,xcoord,ycoord
0,5,State St & Harrison St,19,-87.627739,41.873958
1,13,Wilton Ave & Diversey Pkwy,19,-87.652681,41.9325
2,14,Morgan St & 18th St,15,-87.651073,41.858086
3,15,Racine Ave & 19th St,15,-87.656471,41.856453
4,16,Wood St & North Ave,15,-87.672516,41.910329


The second table holds the randomly generated points. We call this Dataframe points. We have only three columns in this table, the id and coordinates of X and Y.

In [None]:
points = pd.read_csv("points.csv")
points.head()

Unnamed: 0,id,xcoord,ycoord
0,1,-87.675992,41.969792
1,2,-87.676702,41.956395
2,3,-87.601501,41.805379
3,4,-87.616656,41.858263
4,5,-87.706869,41.96301


To perform any Geographic processing task, including the Nearest Neighbour Analysis, convert the data into a Geodataframe using Geopandas.

The following function converts both datasets into a Geopandas Geodataframe.
This creates an additional column where the Geometry is stored.

Also, construct a Coordinate reference system(CRS) for the dataset -- Defines coordinate reference systems for projecting geographical points into pixel (screen) coordinates and back.

In this case, EPSG:4326 -- A common CRS among GIS enthusiasts. Uses simple Equirectangular projection.

References:

https://spatialreference.org/ref/epsg/

https://spatialreference.org/ref/epsg/4326/


In [None]:
def create_gdf(df, x="xcoord", y="ycoord"):
    return gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df[x], df[y]), crs={"init":"epsg:4326"})

In [None]:
stations_gdf = create_gdf(stations)
points_gdf = create_gdf(points)

  in_crs_string = _prepare_from_proj_string(in_crs_string)
  in_crs_string = _prepare_from_proj_string(in_crs_string)


visualise the data in a map using Folium Python Library.

**Reference**: 

https://python-visualization.github.io/folium/modules.html#module-folium.map

The map shown below visualises **station_gdf** with red colour and **points_gdf** in white colour. 

The goal is to find out the closest station for the randomly generated points.


In [None]:
m = folium.Map([41.805379, -87.601501],
               zoom_start=12,
               tiles="CartoDb dark_matter")
locs_stations = zip(stations_gdf.ycoord, stations_gdf.xcoord)
locs_points = zip(points_gdf.ycoord, points_gdf.xcoord)
for location in locs_stations:
    folium.CircleMarker(location=location, color="red", radius=4).add_to(m)
for location in locs_points:
    folium.CircleMarker(location=location, color="white", radius=2).add_to(m)
m.save("map1.html")
m

**Performing Nearest Neighbourhood Analysis using Geopandas & Shapely functionality** 

1. Find the nearest Bike station to the other random points.

Use Shapely function **nearest_points** functionality to find out which geometry is closest to each location 

2. Save other attributes, for example, the name of the station.

Use **calculate_nearest** function that takes the destination (station_gdf ) and the value we want to store from this dataset(station name).

**Reference**:

https://shapely.readthedocs.io/en/stable/manual.html#shapely.ops.nearest_points



In [None]:
def calculate_nearest(row, destination, val, col="geometry"):
    # 1 - create unary union    
    dest_unary = destination["geometry"].unary_union
    # 2 - find closest point
    nearest_geom = nearest_points(row[col], dest_unary)
    # 3 - Find the corresponding geom
    match_geom = destination.loc[destination.geometry 
                == nearest_geom[1]]
    # 4 - get the corresponding value
    match_value = match_geom[val].to_numpy()[0]
    return match_value

Apply this function (calculate_nearest) to the second dataset points_gdf to derive each random point’s nearest geometry (from station_gdf) and also the station name of the closest geometry.

In [None]:
# Get the nearest geometry

points_gdf["nearest_geom"] = points_gdf.apply(calculate_nearest, destination=stations_gdf, val="geometry", axis=1)

  arr = construct_1d_object_array_from_listlike(values)


In [None]:
# Get the nearest Bike station name

points_gdf["nearest_station"] = points_gdf.apply(calculate_nearest, destination=stations_gdf, val="name", axis=1)


The output is the following table where we have the two additional columns(nearest_geom & nearest_station) created above.

In [None]:
points_gdf.head()

Unnamed: 0,id,xcoord,ycoord,geometry,nearest_geom,nearest_station
0,1,-87.675992,41.969792,POINT (-87.67599 41.96979),POINT (-87.674237 41.96909),Ravenswood Ave & Lawrence Ave
1,2,-87.676702,41.956395,POINT (-87.67670 41.95640),POINT (-87.679259 41.955927),Lincoln Ave & Belle Plaine Ave
2,3,-87.601501,41.805379,POINT (-87.60150 41.80538),POINT (-87.599383 41.809835),Greenwood Ave & 47th St
3,4,-87.616656,41.858263,POINT (-87.61666 41.85826),POINT (-87.619407 41.857611),Calumet Ave & 18th St
4,5,-87.706869,41.96301,POINT (-87.70687 41.96301),POINT (-87.688487 41.966555),Western Ave & Leland Ave


Both nearest_geom & nearest_station are in the point_gdf

For instance, in id 2 the nearest station is Lincoln Ave & Belle Plaine Ave station.

To verify the results, create Line Geodataframe from geometry and nearest_geom that help to explore the data visually

In [None]:
# Create LineString Geometry
points_gdf['line'] = points_gdf.apply(lambda row: LineString([row['geometry'], row['nearest_geom']]), axis=1)
points_gdf.head()

  arr = construct_1d_object_array_from_listlike(values)


Unnamed: 0,id,xcoord,ycoord,geometry,nearest_geom,nearest_station,line
0,1,-87.675992,41.969792,POINT (-87.67599 41.96979),POINT (-87.674237 41.96909),Ravenswood Ave & Lawrence Ave,LINESTRING (-87.6759921188193 41.9697924176359...
1,2,-87.676702,41.956395,POINT (-87.67670 41.95640),POINT (-87.679259 41.955927),Lincoln Ave & Belle Plaine Ave,LINESTRING (-87.6767023973826 41.9563952204838...
2,3,-87.601501,41.805379,POINT (-87.60150 41.80538),POINT (-87.599383 41.809835),Greenwood Ave & 47th St,"LINESTRING (-87.601501134953 41.8053785205414,..."
3,4,-87.616656,41.858263,POINT (-87.61666 41.85826),POINT (-87.619407 41.857611),Calumet Ave & 18th St,LINESTRING (-87.6166556820615 41.8582625122995...
4,5,-87.706869,41.96301,POINT (-87.70687 41.96301),POINT (-87.688487 41.966555),Western Ave & Leland Ave,LINESTRING (-87.7068694739994 41.9630104417897...


In [None]:
# Create Line Geodataframe
line_gdf = points_gdf[["id", "nearest_station", "line"]].set_geometry('line')

In [None]:
# Set the Coordinate reference
line_gdf.crs = crs={"init":"epsg:4326"}

  in_crs_string = _prepare_from_proj_string(in_crs_string)


Now we have destination points **station_gdf**, nearest points in **points_gdf** and **line_gdf** that connects both datasets. 

Let us visualise all of them in one plot using folium.Map()

GeoJson - It is format for encoding the geographic data structures. It supports the geometry types point, linestring, polygon, MultiPoint, MultiLineString, and MultiPolygon.

Reference: https://geojson.org/


In [None]:
m = folium.Map([41.805379, -87.601501],
               zoom_start = 12, 
               tiles="CartoDb dark_matter")
locs_stations = zip(stations_gdf.ycoord, stations_gdf.xcoord)
locs_points = zip(points_gdf.ycoord, points_gdf.xcoord)
for location in locs_stations:
    folium.CircleMarker(location=location, color="red", radius=8).add_to(m)
for location in locs_points:
    folium.CircleMarker(location=location, color="white", radius=4).add_to(m)
folium.GeoJson(line_gdf).add_to(m)
m.save("map2.html")
m

**Nearest Neighbor Visualization**

The above Map shows Bike Stations in Red circle, the random points in White color and the connection line in Blue.

**Conclusion**

Performed Nearest Neighbourhood Analysis using **Geopandas** and **Shapely** using Chicago Bike stations spatial data. 

Also, explored how to construct & **visualise** the result of the geospatial data analysis with **Folium**.