In this notebook, you'll see how to connect to a Postgres database using the sqlalchemy library.

For this notebook, you'll need both the `sqlalchemy` and `psycopg2` libraries installed.

In [1]:
from sqlalchemy import create_engine

First, we need to create a connection string. The format is

 ```<dialect(+driver)>://<username>:<password>@<hostname>:<port>/<database>```

To connect to the Lahman baseball database, you can use the following connection string.

In [2]:
database_name = 'scooters'    # Fill this in with your scooter database name

connection_string = f"postgresql://postgres:postgres@localhost:5432/{database_name}"

Now, we need to create an engine and use it to connect.

In [3]:
engine = create_engine(connection_string, pool_size=10, max_overflow=20)

Now, we can create our query and pass it into the `.query()` method.

sqlalchemy plays nicely with pandas.

In [4]:
import pandas as pd

For much more information about SQLAlchemy and to see a more “Pythonic” way to execute queries, see Introduction to Databases in Python: https://www.datacamp.com/courses/introduction-to-relational-databases-in-python

##Keep 4 Project Info to Maintain Packages, Variables, Geo Dataframes,and other GEO data more map builds....

##4 Project -- SUMDs can provide alternative transportation and provide "last mile" access to public transit. 
##How often are trips starting near public transit hubs? 

In [5]:
##import packages for mapping
## added busstops_cleaned.csv & zipcodes.geojson from geospatial workshop to data folder 
##(C:\Users\larld\Documents\DA8\projects\scooter-partner1-witch_trial_logic\data)

from shapely.geometry import Point
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import folium
from folium.plugins import MarkerCluster
from folium.plugins import FastMarkerCluster

In [6]:
##4 Project Prep
##read in zipcode data / verify crs / review head

zipcodes = gpd.read_file('../data/zipcodes.geojson')
print(zipcodes.crs)
zipcodes.head()

EPSG:4326


Unnamed: 0,zip,objectid,po_name,shape_stlength,shape_starea,geometry
0,37115,1,MADISON,178783.0248888682,596553400.5788574,"MULTIPOLYGON (((-86.68725 36.31821, -86.68722 ..."
1,37216,3,NASHVILLE,75820.99782140006,188884682.28344727,"MULTIPOLYGON (((-86.73451 36.23774, -86.73425 ..."
2,37204,9,NASHVILLE,93180.2922504256,200664795.51708984,"MULTIPOLYGON (((-86.77914 36.13424, -86.77923 ..."
3,37027,11,BRENTWOOD,159760.6942933173,174978422.04101562,"MULTIPOLYGON (((-86.81258 36.06319, -86.81263 ..."
4,37064,18,FRANKLIN,28995.828320601937,46969608.005737305,"MULTIPOLYGON (((-87.02197 36.01200, -87.02140 ..."


In [10]:
##4 Project Prep
##verify df type / clean out excess columns
zipcodes = zipcodes[['zip', 'po_name', 'geometry']]
zipcodes.head()

Unnamed: 0,zip,po_name,geometry
0,37115,MADISON,"MULTIPOLYGON (((-86.68725 36.31821, -86.68722 ..."
1,37216,NASHVILLE,"MULTIPOLYGON (((-86.73451 36.23774, -86.73425 ..."
2,37204,NASHVILLE,"MULTIPOLYGON (((-86.77914 36.13424, -86.77923 ..."
3,37027,BRENTWOOD,"MULTIPOLYGON (((-86.81258 36.06319, -86.81263 ..."
4,37064,FRANKLIN,"MULTIPOLYGON (((-87.02197 36.01200, -87.02140 ..."


In [11]:
##4 Project Prep
##read in bus stop data / review shape, info, & head

bus_stops= pd.read_csv('../data/busstops_cleaned.csv')
print(bus_stops.shape)
bus_stops.info()
bus_stops.head()

(2524, 5)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2524 entries, 0 to 2523
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   stop      2524 non-null   object 
 1   route     2524 non-null   object 
 2   location  2524 non-null   object 
 3   lat       2524 non-null   float64
 4   lng       2524 non-null   float64
dtypes: float64(2), object(3)
memory usage: 98.7+ KB


Unnamed: 0,stop,route,location,lat,lng
0,GREEN LN & WHITES CREEK PIKE WB,GOLDEN VALLEY,"(36.236249, -86.816722)",36.236249,-86.816722
1,_ 9TH AVE S & EDGEHILL AVE SB,8TH AVENUE SOUTH,"(36.142642, -86.780897)",36.142642,-86.780897
2,DONELSON/DELL STATION OUTBOUND,MURFREESBORO PIKE,"(36.105615, -86.672004)",36.105615,-86.672004
3,17TH AVE S & DOROTHY PL SB,BELMONT,"(36.137623, -86.795609)",36.137623,-86.795609
4,COCKRILL ST & 14TH AVE N,ST. CECILIA - CUMBERLAND,"(36.175944, -86.804242)",36.175944,-86.804242


In [12]:
##4 Project Prep
##add geometry / create GeoDataFrame / verify type
##notes from geospatial workshop state that we must adjust bus_stops df to create GeoDataFrame():
##GeoDataFrame requires: df, crs, geometry
##add geometry: add 'geometry' column make it a point data type / use lambda function to execute on all rows in df
##add coordinate reference system (crs): use zipcodes crs to ensure matching crs

bus_stops['geometry'] = bus_stops.apply(lambda x: Point((x.lng, x.lat)), axis=1)
bus_geo = gpd.GeoDataFrame(bus_stops, crs=zipcodes.crs, geometry=bus_stops['geometry'])
type(bus_geo)

geopandas.geodataframe.GeoDataFrame

In [104]:
bus_geo.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 2524 entries, 0 to 2523
Data columns (total 6 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   stop      2524 non-null   object  
 1   route     2524 non-null   object  
 2   location  2524 non-null   object  
 3   lat       2524 non-null   float64 
 4   lng       2524 non-null   float64 
 5   geometry  2524 non-null   geometry
dtypes: float64(2), geometry(1), object(3)
memory usage: 118.4+ KB


In [13]:
##4 Project Prep
## validate crs
print(bus_geo.crs)

EPSG:4326


In [14]:
##4 Project Prep
##retrieve startlatitude / startlongitude from trips table in SQL
scooter_start_loc=  '''
SELECT startlongitude AS lng, startlatitude AS lat
FROM trips;
'''
scooter_start_loc = pd.read_sql(scooter_start_loc, con = engine)
scooter_start_loc.insert(0, 'scooter_st_loc', 'scooter_start', True)
scooter_start_loc.head()

Unnamed: 0,scooter_st_loc,lng,lat
0,scooter_start,-86.768183,36.158525
1,scooter_start,-86.768269,36.158519
2,scooter_start,-86.796023,36.153056
3,scooter_start,-86.774677,36.159388
4,scooter_start,-86.7705,36.1634


In [15]:
##4 Project Prep
##notes from geospatial workshop state that we must adjust bus_stops df to create GeoDataFrame():
##GeoDataFrame requires: df, crs, geometry
##add geometry: add 'geometry' column make it a point data type / use lambda function to execute on all rows in df
##add coordinate reference system (crs): use zipcodes crs to ensure matching crs
scooter_start_loc['geometry'] = scooter_start_loc.apply(lambda x: Point((x.lng, x.lat)), axis=1)
scooter_start_geo = gpd.GeoDataFrame(scooter_start_loc, crs=zipcodes.crs, geometry=scooter_start_loc['geometry'])
type(scooter_start_geo)


geopandas.geodataframe.GeoDataFrame

##4 Project Prep 
##Reviewing GeoPandas info for spatial join (gpd.sjoin) and found spatial join nearest (gpd_sjoin_nearest)...one of the features in sjoin_nearest is an argument when joining that allows you to set an argument 'max distance'. Max distance can be set to join info that is within the max distance from items on the right table. sjoin_nearest can work with degrees or distance (in meters). If the crs uses lat/lon degrees must be used. The process for using degrees seemed overly complicated and I found a way to change the crs using to_crs() to an epsg that works in meters (epsg = 3857). See links below for more details. 
May Use this if I do a spatial join nearest.....required to adjust max distance in meters 
##https://github.com/geopandas/geopandas/discussions/2797
##https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.sjoin_nearest.html#geopandas.GeoDataFrame.sjoin_nearest
##https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.to_crs.html


In [16]:
##4 Project Prep 
##verify current crs
print(zipcodes.crs)
print(scooter_start_geo.crs)

EPSG:4326
EPSG:4326


In [17]:
##4 Project Prep
##update crs for use w/ 
zipcodes_3857 = zipcodes.to_crs("epsg:3857")
scooter_start_3857 = scooter_start_geo.to_crs("epsg:3857")

In [18]:
##4 Project Prep 
##verify crs update
print(zipcodes_3857.crs)
print(scooter_start_3857.crs)

epsg:3857
epsg:3857


In [19]:
##4 Project Prep 
##update / verify crs
bus_geo_3857 = bus_geo.to_crs("epsg:3857")
print(bus_geo_3857.crs)

epsg:3857


In [20]:
##4 Project Prep 
##Use GeoPandas sjoin_nearest to join point data from bus stops and scooter start locs....
##used the argument max_distance=100 to only capture scooter start locs within 100 meters of bus stop locations
busgeo3857_w_scooterst3857 = gpd.sjoin_nearest(bus_geo_3857, scooter_start_3857, max_distance=100)

In [21]:
##4 Project Prep 
## review after sjoin
busgeo3857_w_scooterst3857.head()

Unnamed: 0,stop,route,location,lat_left,lng_left,geometry,index_right,scooter_st_loc,lng_right,lat_right
1,_ 9TH AVE S & EDGEHILL AVE SB,8TH AVENUE SOUTH,"(36.142642, -86.780897)",36.142642,-86.780897,POINT (-9660405.265 4320266.484),169960,scooter_start,-86.780635,36.141996
1,_ 9TH AVE S & EDGEHILL AVE SB,8TH AVENUE SOUTH,"(36.142642, -86.780897)",36.142642,-86.780897,POINT (-9660405.265 4320266.484),170406,scooter_start,-86.780635,36.141996
1,_ 9TH AVE S & EDGEHILL AVE SB,8TH AVENUE SOUTH,"(36.142642, -86.780897)",36.142642,-86.780897,POINT (-9660405.265 4320266.484),169524,scooter_start,-86.780635,36.141996
1,_ 9TH AVE S & EDGEHILL AVE SB,8TH AVENUE SOUTH,"(36.142642, -86.780897)",36.142642,-86.780897,POINT (-9660405.265 4320266.484),169103,scooter_start,-86.780635,36.141996
3,17TH AVE S & DOROTHY PL SB,BELMONT,"(36.137623, -86.795609)",36.137623,-86.795609,POINT (-9662042.997 4319574.646),462835,scooter_start,-86.79562,36.13761


In [22]:
##4 Project Prep 
## review value count of scooter start locations within 100 meters of a bus stop. value count = 1603
busgeo3857_w_scooterst3857['scooter_st_loc'].value_counts()

scooter_st_loc
scooter_start    1603
Name: count, dtype: int64

In [23]:
##4 Project Prep 
## validat initial df crs was not changed
print(zipcodes.crs)
print(scooter_start_geo.crs)
print(bus_geo.crs)

EPSG:4326
EPSG:4326
EPSG:4326


In [25]:
##4 Project Prep 
##Trimming the zips table down to just the geometry / polygon info
polygon_zips_3857 = zipcodes['geometry']
polygon_zips_3857.shape
polygon_zips_3857

0     MULTIPOLYGON (((-86.68725 36.31821, -86.68722 ...
1     MULTIPOLYGON (((-86.73451 36.23774, -86.73425 ...
2     MULTIPOLYGON (((-86.77914 36.13424, -86.77923 ...
3     MULTIPOLYGON (((-86.81258 36.06319, -86.81263 ...
4     MULTIPOLYGON (((-87.02197 36.01200, -87.02140 ...
5     MULTIPOLYGON (((-87.03553 36.08659, -87.03556 ...
6     MULTIPOLYGON (((-86.86263 36.37811, -86.86264 ...
7     MULTIPOLYGON (((-86.97084 36.11644, -86.97084 ...
8     MULTIPOLYGON (((-86.75361 36.40428, -86.75354 ...
9     MULTIPOLYGON (((-86.80790 36.14643, -86.80605 ...
10    MULTIPOLYGON (((-86.67188 35.98955, -86.67189 ...
11    MULTIPOLYGON (((-86.72012 36.00886, -86.72012 ...
12    MULTIPOLYGON (((-86.97543 36.20877, -86.97221 ...
13    MULTIPOLYGON (((-86.75348 36.16274, -86.75383 ...
14    MULTIPOLYGON (((-86.68705 36.01059, -86.68695 ...
15    MULTIPOLYGON (((-86.85290 36.38014, -86.85327 ...
16    MULTIPOLYGON (((-86.72541 36.00934, -86.72540 ...
17    MULTIPOLYGON (((-86.67355 36.12729, -86.66

In [26]:
##4 Project Prep
##From geospatial workshop: folium maps require a center point for the street map
polygon_zips_3857.geometry.centroid


  polygon_zips_3857.geometry.centroid


0     POINT (-86.69477 36.25433)
1     POINT (-86.72635 36.21605)
2     POINT (-86.77467 36.10566)
3     POINT (-86.78551 36.04148)
4     POINT (-87.02866 36.00026)
5     POINT (-87.03712 36.08997)
6     POINT (-86.86263 36.38001)
7     POINT (-86.97531 36.06283)
8     POINT (-86.76433 36.33621)
9     POINT (-86.80157 36.13335)
10    POINT (-86.63653 35.98113)
11    POINT (-86.72226 36.00938)
12    POINT (-86.91816 36.15605)
13    POINT (-86.73098 36.18090)
14    POINT (-86.69470 36.00387)
15    POINT (-86.85733 36.37882)
16    POINT (-86.72570 36.00942)
17    POINT (-86.66093 36.10779)
18    POINT (-86.82996 36.27880)
19    POINT (-86.78726 36.28724)
20    POINT (-86.89487 36.32034)
21    POINT (-86.68513 35.99434)
22    POINT (-86.64118 36.25192)
23    POINT (-86.98674 36.15655)
24    POINT (-86.78317 36.16682)
25    POINT (-86.68331 35.99201)
26    POINT (-86.76289 36.39920)
27    POINT (-86.89039 36.20505)
28    POINT (-86.52221 36.13926)
29    POINT (-86.67866 35.98840)
30    POIN

In [27]:
##4 Project Prep
##Used Google to find lat/lon center of Nashville (36.1627° N, 86.7816° W)...found a point in the list that was close
center = polygon_zips_3857.geometry.centroid[31]
print(center)

POINT (-86.76850874477554 36.167133451709944)



  center = polygon_zips_3857.geometry.centroid[31]


In [28]:
##4 Project Prep
##From geospatial workshop: Folium requires a location point as an array / latitude first
area_center = [center.y, center.x]
print(area_center)

[36.167133451709944, -86.76850874477554]


##Deliverables:
At the conclusion of this project, your group should deliver a presentation which addresses the following points:
* Are scooter companies in compliance with the required data cleaning? Answered Question 2
* What are typical usage patterns for scooters in terms of time, location, and trip duration?
* Does it appear that scooters are used as "last mile" transportation from public transit hubs to work or school? Answered Question 4
* What are your recommendations for total number of scooters for the city overall and density of scooters by zip code?

In [29]:
## Deliverables Prep
## What are typical usage patterns for scooters in terms of time, location, and trip duration?

##SELECT companyname, ROUND(AVG(tripduration), 2)
##FROM trips
##GROUP BY companyname;

scooter_tripinfo_sql=  '''
SELECT EXTRACT(MONTH from pubtimestamp) :: INT AS month_num, EXTRACT(DAY from pubtimestamp) :: INT AS day_num, 
EXTRACT(HOUR from pubtimestamp) :: INT AS hour_num, ROUND(tripduration, 2) AS tripduration
FROM trips;
'''
scooter_tripinfo_sql = pd.read_sql(scooter_tripinfo_sql, con = engine)
scooter_tripinfo_sql.head()

Unnamed: 0,month_num,day_num,hour_num,tripduration
0,7,24,0,5.0
1,7,24,0,6.0
2,7,24,0,6.0
3,7,24,0,26.0
4,7,24,2,2.0


In [30]:
##https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html
##describe(): Descriptive statistics include those that summarize the central tendency, dispersion and shape of 
##a dataset’s distribution
print(scooter_tripinfo_sql.describe())

           month_num        day_num       hour_num   tripduration
count  565522.000000  565522.000000  565522.000000  565522.000000
mean        5.841939      15.586952      14.160188      69.770555
std         0.781793       8.918475       7.958576     897.418039
min         5.000000       1.000000       0.000000     -19.360000
25%         5.000000       8.000000       5.000000       5.000000
50%         6.000000      16.000000      17.000000      10.070000
75%         6.000000      24.000000      20.000000      20.240000
max         8.000000      31.000000      23.000000  512619.000000


In [31]:
##Deliverables Prep: Need to Build Map / What are typical usage patterns for scooters in terms of location
##Based on volume of data / will use 1 week sample of data for jun 2-8
##Pull in scooter start and end locations for a 1 week time period 

scooter_st_jun_1wk = '''
SELECT DISTINCT triprecordnum AS st_rcdnum, startlatitude AS st_lat, startlongitude AS st_lng
FROM trips
WHERE pubtimestamp BETWEEN '2019-06-02 00:00:00.00' AND '2019-06-08 23:59:00.00'
ORDER BY triprecordnum; 
'''

scooter_st_jun_1wk = pd.read_sql(scooter_st_jun_1wk, con = engine)
scooter_st_jun_1wk.insert(1, 'scooter_st', 'scooter_start', True)
scooter_st_jun_1wk.head()

scooter_end_jun_1wk = '''
SELECT DISTINCT triprecordnum as end_rcdnum, endlatitude AS end_lat, endlongitude AS end_lng
FROM trips
WHERE pubtimestamp BETWEEN '2019-06-02 00:00:00.00' AND '2019-06-08 23:59:00.00'
ORDER BY triprecordnum; 
'''

scooter_end_jun_1wk = pd.read_sql(scooter_end_jun_1wk, con = engine)
scooter_end_jun_1wk.insert(1, 'scooter_end', 'scooter_end', True)
scooter_end_jun_1wk.head()

Unnamed: 0,end_rcdnum,scooter_end,end_lat,end_lng
0,BOL00001,scooter_end,36.156793,-86.780868
1,BOL00001,scooter_end,36.176659,-86.751759
2,BOL00001,scooter_end,36.124954,-86.785136
3,BOL00001,scooter_end,36.155009,-86.785008
4,BOL00001,scooter_end,36.15038,-86.779316


In [32]:
scooter_st_jun_1wk.head()

Unnamed: 0,st_rcdnum,scooter_st,st_lat,st_lng
0,BOL00001,scooter_start,36.150718,-86.782662
1,BOL00001,scooter_start,36.152978,-86.783875
2,BOL00001,scooter_start,36.161925,-86.779499
3,BOL00001,scooter_start,36.160885,-86.777596
4,BOL00001,scooter_start,36.155232,-86.785049


In [33]:
##notes from geospatial workshop state that we must adjust df to create GeoDataFrame():
##GeoDataFrame requires: df, crs, geometry
##add geometry: add 'geometry' column make it a point data type / use lambda function to execute on all rows in df
##add coordinate reference system (crs): use zipcodes crs to ensure matching crs
scooter_st_jun_1wk['st_geometry'] = scooter_st_jun_1wk.apply(lambda x: Point((x.st_lng, x.st_lat)), axis=1)
scooter_st_jun_1wk_geo = gpd.GeoDataFrame(scooter_st_jun_1wk, crs=zipcodes_3857.crs, geometry=scooter_st_jun_1wk['st_geometry'])
type(scooter_st_jun_1wk_geo)

scooter_end_jun_1wk['end_geometry'] = scooter_end_jun_1wk.apply(lambda x: Point((x.end_lng, x.end_lat)), axis=1)
scooter_end_jun_1wk_geo = gpd.GeoDataFrame(scooter_end_jun_1wk, crs=zipcodes_3857.crs, geometry=scooter_end_jun_1wk['end_geometry'])
type(scooter_end_jun_1wk_geo)

geopandas.geodataframe.GeoDataFrame

In [30]:
type(scooter_st_jun_1wk_geo)

geopandas.geodataframe.GeoDataFrame

In [101]:
##Cluster Map Showing 1 Week's Data for Scooter Start / End Locs w/ Bus Stop locations
##how to change folium/geojson color, changed color of polygon areas: 
##https://stackoverflow.com/questions/58437156/how-to-change-folium-geojson-color

##Stopped presenting map in Jupyter Notebook / send to map folder...Rendered the file too large for github upload 

##Map displays too much info for useful analysis...Otherwise pretty cool!
#Will create a map with a single day used for sampling

cluster_map_nashville_scooter_st_end_rvw1 = folium.Map(location = area_center, zoom_start = 10)

marker_cluster = MarkerCluster().add_to(cluster_map_nashville_scooter_st_end_rvw1)

##color info for polygons
style1 = {'fillColor': '#228B22', 'lineColor': '#228B22'}
style2 = {'fillColor': '#00FFFFFF', 'lineColor': '#00FFFFFF'}

##polygons (polygon_zips_3857: geojson read and geo df create/manipulate in cells below. Not part of initial query))
folium.GeoJson(polygon_zips_3857, style_function=lambda x:style1).add_to(cluster_map_nashville_scooter_st_end_rvw1)
folium.GeoJson(polygon_parks_3857, style_function=lambda x:style2).add_to(cluster_map_nashville_scooter_st_end_rvw1)

##folium.GeoJson(polygon_zips_3857).add_to(cluster_map_nashville_scooter_st_end_rvw1)

##bus stops
for row_index, row_values in bus_geo_3857.iterrows():
    loc = [row_values['lat'], row_values['lng']]
    pop = str(row_values['route'])
    icon = folium.Icon(color='blue', icon='bus', prefix='fa')
    
    marker = folium.Marker(location = loc, popup = pop, icon = icon)
    
    ##marker.add for adding within cluster or directly to map
    ##marker.add_to(cluster_map_nashville_scooter_st_end_rvw1)
    marker.add_to(marker_cluster)

##schools (nashville_school_geo_3857: geojson read and geo df create/manipulate in cells below. Not part of initial query)
for row_index, row_values in nashville_school_geo_3857.iterrows():
    loc = [row_values['lat'], row_values['lng']]
    pop = str(row_values['location_name'])
    icon = folium.Icon(color='blue', icon='university', prefix='fa')
    
    marker = folium.Marker(location = loc, popup = pop, icon = icon)
    
    ##marker.add_to(cluster_map_nashville_scooter_5jun_review)
    marker.add_to(marker_cluster)

##scooter start 
for row_index, row_values in scooter_st_jun_1wk_geo.iterrows():
    loc = [row_values['st_lat'], row_values['st_lng']]
    pop = str(row_values['st_rcdnum'])
    icon = folium.Icon(color='green', icon='star', prefix='fa')
    
    marker = folium.Marker(location = loc, popup = pop, icon = icon)
    
    marker.add_to(marker_cluster)

##scooter end 
for row_index, row_values in scooter_end_jun_1wk_geo.iterrows():
    loc = [row_values['end_lat'], row_values['end_lng']]
    pop = str(row_values['end_rcdnum'])
    icon = folium.Icon(color='red', icon='star', prefix='fa')
    
    marker = folium.Marker(location = loc, popup = pop, icon = icon)
    
    marker.add_to(marker_cluster)
cluster_map_nashville_scooter_st_end_rvw1.save('../maps/cluster_map_nashville_scooter_st_end_rvw1.html')

#cluster_map_nashville_scooter_st_end_rvw1

In [35]:
#Nashville School Info from: https://data.nashville.gov/browse?limitTo=datasets

nashville_school_loc_info = gpd.read_file('../data/Schools_Served_by_Metro_Arts_Grantees.geojson')
nashville_school_loc_info.head()

Unnamed: 0,number_of_students,zip_code,city,state,organization_name,location_name,osm_id,location_type,in_school_or_out_of_school,street_address,grant_category,geometry
0,4,37221,Nashville,TN,Children's House of Nashville,Bellevue,197472,MNPS,,,Basic Operating,
1,4,37206,Nashville,TN,W.O. Smith/Nashville Community Music School,KIPP Nashville College Prep,836042366,MNPS,,3410 Knight Drive,,
2,29,37215,Nashville,TN,Southern Word,Nashville Public Library - Green Hills Branch,19451208,Other,Out of School Time,3701 Benham Avenue,,POINT (-86.80870 36.10935)
3,1000,37013,Nashville,TN,Humanities Tennessee,Thurgood Marshall Middle School,19470265,MNPS,during school,5832 Pettus Rd,Basic Operating,POINT (-86.66473 36.02241)
4,6,37209,Nashville,TN,Eldridge Home School,Carlson Home School,220548207,Home Schooled,During School Hours,204 Cherokee Station Drive,Basic Operating,POINT (-86.83914 36.13882)


In [66]:
##Filter for specific value/string (MNPS): df. loc[df['col1'] == value]
##https://www.statology.org/pandas-select-rows-based-on-column-values/
##Dropping the rows that contain a specific string: df[df[“column_name”].str.contains(“string”)==False]
##https://www.geeksforgeeks.org/how-to-drop-rows-that-contain-a-specific-string-in-pandas/

nashville_school_loc_info = nashville_school_loc_info.loc[nashville_school_loc_info['location_type'] == 'MNPS']
nashville_school_loc_info = nashville_school_loc_info[['location_name', 'location_type', 'geometry']]
nashville_school_loc_info.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Index: 519 entries, 0 to 1019
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype   
---  ------         --------------  -----   
 0   location_name  519 non-null    object  
 1   location_type  519 non-null    object  
 2   geometry       511 non-null    geometry
dtypes: geometry(1), object(2)
memory usage: 16.2+ KB


In [71]:
##Parse geometry to us lat / Lng in For Loop in Folium Map
##Turn geometry column into lat/long columns in Geodataframe:
##https://stackoverflow.com/questions/60922709/turn-geometry-column-into-lat-long-columns-in-geodataframe

nashville_school_loc_info['lat'] = nashville_school_loc_info['geometry'].y
nashville_school_loc_info['lng'] = nashville_school_loc_info['geometry'].x
nashville_school_loc_info.dropna(subset = ['geometry'])
#Forgot to add inplace=True...Fixed in Cell Below

Unnamed: 0,location_name,location_type,geometry,lat,lng
3,Thurgood Marshall Middle School,MNPS,POINT (-86.66473 36.02241),36.022414,-86.664733
8,Dan Mills Elementary School,MNPS,POINT (-86.71749 36.21865),36.218646,-86.717495
10,Cockrill Elementary School,MNPS,POINT (-86.84498 36.15663),36.156626,-86.844980
11,Lakeview Elementary Design Center,MNPS,POINT (-86.62967 36.07848),36.078477,-86.629671
12,Hickman Elementary School,MNPS,POINT (-86.65538 36.16685),36.166855,-86.655375
...,...,...,...,...,...
1004,F.H. Jenkins Preparatory School,MNPS,POINT (-86.79908 36.20476),36.204759,-86.799082
1006,McMurray Middle School,MNPS,POINT (-86.72267 36.05738),36.057380,-86.722669
1014,Julia Green Elementary School,MNPS,POINT (-86.83984 36.10624),36.106244,-86.839842
1016,Paragon Mills Elementary School,MNPS,POINT (-86.70701 36.08705),36.087048,-86.707014


In [78]:
##Convert CRS to Match existing GeoDataFrames being used (epsg: 3857)
##Remove 'None' from geometry column
nashville_school_geo_3857 = nashville_school_loc_info.to_crs("epsg:3857")
nashville_school_geo_3857.dropna(subset = ['geometry'], inplace=True)
nashville_school_geo_3857.head()

Unnamed: 0,location_name,location_type,geometry,lat,lng
3,Thurgood Marshall Middle School,MNPS,POINT (-9647473.947 4303705.942),36.022414,-86.664733
8,Dan Mills Elementary School,MNPS,POINT (-9653347.386 4330748.582),36.218646,-86.717495
10,Cockrill Elementary School,MNPS,POINT (-9667538.952 4322194.325),36.156626,-86.84498
11,Lakeview Elementary Design Center,MNPS,POINT (-9643570.863 4311425.066),36.078477,-86.629671
12,Hickman Elementary School,MNPS,POINT (-9646432.220 4323604.718),36.166855,-86.655375


In [100]:
##Drop Duplicates:https://favtutor.com/blogs/pandas-unique-values-in-column
##print(df.drop_duplicates(subset = "Subjects"))
nashville_school_geo_3857.drop_duplicates(subset = ['location_name'], inplace=True)
nashville_school_geo_3857.head()

Unnamed: 0,location_name,location_type,geometry,lat,lng
3,Thurgood Marshall Middle School,MNPS,POINT (-9647473.947 4303705.942),36.022414,-86.664733
8,Dan Mills Elementary School,MNPS,POINT (-9653347.386 4330748.582),36.218646,-86.717495
10,Cockrill Elementary School,MNPS,POINT (-9667538.952 4322194.325),36.156626,-86.84498
11,Lakeview Elementary Design Center,MNPS,POINT (-9643570.863 4311425.066),36.078477,-86.629671
12,Hickman Elementary School,MNPS,POINT (-9646432.220 4323604.718),36.166855,-86.655375


In [44]:
##Read in Nashville Parks Data
nashville_parks_loc_info = gpd.read_file('../data/Metro_Parks_Boundaries_GIS.geojson')
nashville_parks_loc_info.head()

Unnamed: 0,acres,name,year_estab,common_nam,status,address,lon,lat,descriptio,geometry
0,69.86,Harpeth River Park,1988,Harpeth River,PARK,7820 Coley Davis Rd,-86.9592214,36.07738249,Harpeth River State Park has much historical a...,"MULTIPOLYGON (((-86.95547 36.07930, -86.95578 ..."
1,9.88,Harpeth Knoll Park,1972,Harpeth Knoll,PARK,708 Goodpasture Terrace,-86.93815537,36.05942777,"Huge grassy field with playground, and shade s...","MULTIPOLYGON (((-86.93860 36.06077, -86.93742 ..."
2,17.14,Bellevue Park,1982,Red Caboose,PARK,656 Colice Jeanne Rd,-86.93354496,36.07159731,"Bellevue Park, known as Red Caboose Park, is p...","MULTIPOLYGON (((-86.93207 36.06954, -86.93283 ..."
3,809.59,Bells Bend Park,2007,Bells Bend,PARK,4187 Old Hickory Blvd,-86.92646294,36.15016925,"Located in western Davidson County, this park ...","MULTIPOLYGON (((-86.93987 36.16093, -86.93495 ..."
4,2168.792,Alvin G. Beaman Park,1996,Beaman,PARK,5911 Old Hickory Blvd.,-86.91984207,36.26075833,Beaman Park provides education and awareness p...,"MULTIPOLYGON (((-86.90367 36.27352, -86.90364 ..."


In [46]:
##Convert CRS to Match existing GeoDataFrames being used (epsg: 3857)
nashville_parks_3857 = nashville_parks_loc_info.to_crs("epsg:3857")
type(nashville_parks_3857)

geopandas.geodataframe.GeoDataFrame

In [48]:
##Create Polygon GeoData Frame to use in Folium Map
polygon_parks_3857 = nashville_parks_3857['geometry']
polygon_zips_3857.head()

0    MULTIPOLYGON (((-86.68725 36.31821, -86.68722 ...
1    MULTIPOLYGON (((-86.73451 36.23774, -86.73425 ...
2    MULTIPOLYGON (((-86.77914 36.13424, -86.77923 ...
3    MULTIPOLYGON (((-86.81258 36.06319, -86.81263 ...
4    MULTIPOLYGON (((-87.02197 36.01200, -87.02140 ...
Name: geometry, dtype: geometry

In [49]:
##SQL queries for 1 day's worth of start and stop info

scooter_st_5jun = '''
SELECT DISTINCT triprecordnum AS st_rcdnum, startlatitude AS st_lat, startlongitude AS st_lng
FROM trips
WHERE pubtimestamp BETWEEN '2019-06-05 00:00:00.00' AND '2019-06-05 23:59:00.00'
ORDER BY triprecordnum; 
'''

scooter_st_5jun = pd.read_sql(scooter_st_5jun, con = engine)
scooter_st_5jun.insert(1, 'scooter_st', 'scooter_start', True)
scooter_st_5jun.head()

scooter_end_5jun = '''
SELECT DISTINCT triprecordnum as end_rcdnum, endlatitude AS end_lat, endlongitude AS end_lng
FROM trips
WHERE pubtimestamp BETWEEN '2019-06-05 00:00:00.00' AND '2019-06-05 23:59:00.00'
ORDER BY triprecordnum; 
'''

scooter_end_5jun = pd.read_sql(scooter_end_5jun, con = engine)
scooter_end_5jun.insert(1, 'scooter_end', 'scooter_end', True)
scooter_end_5jun.head()

Unnamed: 0,end_rcdnum,scooter_end,end_lat,end_lng
0,BOL00001,scooter_end,36.155009,-86.785008
1,BOL00002,scooter_end,36.155102,-86.785135
2,BOL00005,scooter_end,36.164349,-86.769384
3,BOL00006,scooter_end,36.175677,-86.773768
4,BOL00007,scooter_end,36.175641,-86.773765


In [87]:
##notes from geospatial workshop state that we must adjust df to create GeoDataFrame():
##GeoDataFrame requires: df, crs, geometry
##add geometry: add 'geometry' column make it a point data type / use lambda function to execute on all rows in df
##add coordinate reference system (crs): use zipcodes crs to ensure matching crs
scooter_st_5jun['st_geometry'] = scooter_st_5jun.apply(lambda x: Point((x.st_lng, x.st_lat)), axis=1)
scooter_st_5jun_geo = gpd.GeoDataFrame(scooter_st_5jun, crs=zipcodes_3857.crs, geometry=scooter_st_jun_1wk['st_geometry'])
type(scooter_st_5jun_geo)

scooter_end_5jun['end_geometry'] = scooter_end_5jun.apply(lambda x: Point((x.end_lng, x.end_lat)), axis=1)
scooter_end_5jun_geo = gpd.GeoDataFrame(scooter_end_5jun, crs=zipcodes_3857.crs, geometry=scooter_end_jun_1wk['end_geometry'])
scooter_end_5jun_geo.info()

<bound method DataFrame.info of      end_rcdnum  scooter_end    end_lat    end_lng   
0      BOL00001  scooter_end  36.155009 -86.785008  \
1      BOL00002  scooter_end  36.155102 -86.785135   
2      BOL00005  scooter_end  36.164349 -86.769384   
3      BOL00006  scooter_end  36.175677 -86.773768   
4      BOL00007  scooter_end  36.175641 -86.773765   
...         ...          ...        ...        ...   
6145      SPI95  scooter_end  36.163299 -86.770108   
6146      SPI96  scooter_end  36.153060 -86.789681   
6147      SPI97  scooter_end  36.159224 -86.780909   
6148      SPI98  scooter_end  36.153552 -86.783920   
6149      SPI99  scooter_end  36.152940 -86.789635   

                                       end_geometry                geometry  
0                      POINT (-86.785008 36.155009)  POINT (-86.781 36.157)  
1                      POINT (-86.785135 36.155102)  POINT (-86.752 36.177)  
2                      POINT (-86.769384 36.164349)  POINT (-86.785 36.125)  
3      

In [103]:
##Cluster Map Showing 1 day usage review for scooter data (5jun2019) 
##Stopped presenting map in Jupyter Notebook...Renders the file too large for github upload
##how to change folium/geojson color, changed color of polygon areas: 
##https://stackoverflow.com/questions/58437156/how-to-change-folium-geojson-color

cluster_map_nashville_scooter_5jun_review = folium.Map(location = area_center, zoom_start = 10)

marker_cluster = MarkerCluster().add_to(cluster_map_nashville_scooter_5jun_review)
#marker_cluster2 = MarkerCluster().add_to(cluster_map_nashville_scooter_5jun_review)

##color info for polygons
style1 = {'fillColor': '#228B22', 'lineColor': '#228B22'}
style2 = {'fillColor': '#00FFFFFF', 'lineColor': '#00FFFFFF'}

##polygons
folium.GeoJson(polygon_zips_3857, style_function=lambda x:style1).add_to(cluster_map_nashville_scooter_5jun_review)
folium.GeoJson(polygon_parks_3857, style_function=lambda x:style2).add_to(cluster_map_nashville_scooter_5jun_review)

##bus stops
for row_index, row_values in bus_geo_3857.iterrows():
    loc = [row_values['lat'], row_values['lng']]
    pop = str(row_values['route'])
    icon = folium.Icon(color='blue', icon='bus', prefix='fa')
    
    marker = folium.Marker(location = loc, popup = pop, icon = icon)
    
    ##marker.add for adding within cluster or directly to map
    ##marker.add_to(cluster_map_nashville_scooter_5jun_review)
    marker.add_to(marker_cluster)

##schools
for row_index, row_values in nashville_school_geo_3857.iterrows():
    loc = [row_values['lat'], row_values['lng']]
    pop = str(row_values['location_name'])
    icon = folium.Icon(color='blue', icon='university', prefix='fa')
    
    marker = folium.Marker(location = loc, popup = pop, icon = icon)
    
    ##marker.add_to(cluster_map_nashville_scooter_5jun_review)
    marker.add_to(marker_cluster)

##scooter start
for row_index, row_values in scooter_st_5jun_geo.iterrows():
    loc = [row_values['st_lat'], row_values['st_lng']]
    pop = str(row_values['st_rcdnum'])
    icon = folium.Icon(color='green', icon='star', prefix='fa')
    
    marker = folium.Marker(location = loc, popup = pop, icon = icon)
    
    marker.add_to(marker_cluster)

##scooter end
for row_index, row_values in scooter_end_5jun_geo.iterrows():
    loc = [row_values['end_lat'], row_values['end_lng']]
    pop = str(row_values['end_rcdnum'])
    icon = folium.Icon(color='red', icon='star', prefix='fa')
    
    marker = folium.Marker(location = loc, popup = pop, icon = icon)
    
    marker.add_to(marker_cluster)
cluster_map_nashville_scooter_5jun_review.save('../maps/cluster_map_nashville_scooter_5jun_review.html')

#cluster_map_nashville_scooter_5jun_review

In [88]:
bus_geo_3857.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 2524 entries, 0 to 2523
Data columns (total 6 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   stop      2524 non-null   object  
 1   route     2524 non-null   object  
 2   location  2524 non-null   object  
 3   lat       2524 non-null   float64 
 4   lng       2524 non-null   float64 
 5   geometry  2524 non-null   geometry
dtypes: float64(2), geometry(1), object(3)
memory usage: 118.4+ KB


In [102]:
nashville_school_geo_3857.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Index: 161 entries, 3 to 1004
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype   
---  ------         --------------  -----   
 0   location_name  161 non-null    object  
 1   location_type  161 non-null    object  
 2   geometry       161 non-null    geometry
 3   lat            161 non-null    float64 
 4   lng            161 non-null    float64 
dtypes: float64(2), geometry(1), object(2)
memory usage: 7.5+ KB


In [90]:
scooter_st_5jun_geo.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 6150 entries, 0 to 6149
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype   
---  ------       --------------  -----   
 0   st_rcdnum    6150 non-null   object  
 1   scooter_st   6150 non-null   object  
 2   st_lat       6150 non-null   float64 
 3   st_lng       6150 non-null   float64 
 4   st_geometry  6150 non-null   object  
 5   geometry     6150 non-null   geometry
dtypes: float64(2), geometry(1), object(3)
memory usage: 288.4+ KB


In [91]:
scooter_end_5jun_geo.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 6150 entries, 0 to 6149
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype   
---  ------        --------------  -----   
 0   end_rcdnum    6150 non-null   object  
 1   scooter_end   6150 non-null   object  
 2   end_lat       6150 non-null   float64 
 3   end_lng       6150 non-null   float64 
 4   end_geometry  6150 non-null   object  
 5   geometry      6150 non-null   geometry
dtypes: float64(2), geometry(1), object(3)
memory usage: 288.4+ KB
