CUNY-Specific Analytics
1. Which MTA bus routes are highly utilized by CUNY students?
2. How do violation rates compare among CUNY campuses?

In [5]:
import requests
import numpy as np
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
from pathlib import Path

Data loading/cleaning - CUNY campuses

In [None]:
url = "https://data.ny.gov/resource/irqs-74ez.csv"
campuses = pd.read_csv(url)
campuses.head() 

Unnamed: 0,college_or_institution_type,campus,campus_website,address,city,state,zip,lat,long,georeference
0,Community Colleges,Borough of Manhattan Community College,https://www.bmcc.cuny.edu/,199 Chambers Street,New York,NY,10007-1044,40.717367,-74.012178,POINT (-74.012178 40.717367)
1,Community Colleges,Bronx Community College,https://www.bcc.cuny.edu/,2155 University Avenue,Bronx,NY,10453-2804,40.856673,-73.910127,POINT (-73.910127 40.856673)
2,Community Colleges,Hostos Community College,http://hostos.cuny.edu,500 Grand Concourse,Bronx,NY,10451-5323,40.817828,-73.926862,POINT (-73.926862 40.817828)
3,Community Colleges,Kingsborough Community College,http://kbcc.cuny.edu,2001 Oriental Boulevard,Brooklyn,NY,11235-2333,40.578349,-73.934465,POINT (-73.934465 40.578349)
4,Community Colleges,LaGuardia Community College,https://www.laguardia.edu/,31-10 Thomson Avenue,Long Island City,NY,11101-3007,40.743951,-73.935154,POINT (-73.935154 40.743951)


In [3]:
campuses.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26 entries, 0 to 25
Data columns (total 10 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   college_or_institution_type  26 non-null     object 
 1   campus                       26 non-null     object 
 2   campus_website               26 non-null     object 
 3   address                      26 non-null     object 
 4   city                         26 non-null     object 
 5   state                        26 non-null     object 
 6   zip                          26 non-null     object 
 7   lat                          26 non-null     float64
 8   long                         26 non-null     float64
 9   georeference                 26 non-null     object 
dtypes: float64(2), object(8)
memory usage: 2.2+ KB


In [8]:
campuses[['campus', 'lat', 'long']]

Unnamed: 0,campus,lat,long
0,Borough of Manhattan Community College,40.717367,-74.012178
1,Bronx Community College,40.856673,-73.910127
2,Hostos Community College,40.817828,-73.926862
3,Kingsborough Community College,40.578349,-73.934465
4,LaGuardia Community College,40.743951,-73.935154
5,Queensborough Community College,40.75615,-73.75755
6,Guttman Community College,40.752846,-73.984133
7,Medgar Evers College,40.66624,-73.957349
8,New York City College of Technology,40.695507,-73.987882
9,College of Staten Island,40.608648,-74.153563


In [None]:
# conversion for mapping
campuses['geometry'] = campuses.apply(lambda row: Point(row['long'], row['lat']), axis=1)
campuses_gdf = gpd.GeoDataFrame(campuses, geometry='geometry', crs="EPSG:4326")

Data loading/cleaning - Bus stops/routes

In [9]:
# GTFS dataset
GTFS_ROOT = Path("../raw_data")

feeds = {"bronx": "gtfs_bronx",
         "brooklyn": "gtfs_brooklyn",
         "manhattan": "gtfs_manhattan",
         "queens": "gtfs_queens",
         "staten_island": "gtfs_staten_island",
         "busco": "gtfs_busco"}

def load_concat(file_name):
    frames = []
    for key, sub in feeds.items():
        p = GTFS_ROOT / sub / file_name
        if p.exists():
            df = pd.read_csv(p)
            df["feed"] = key
            frames.append(df)
    if not frames:
        raise FileNotFoundError(f"Could not find {file_name} in any of the feed folders")
    return pd.concat(frames, ignore_index=True)

stops = load_concat("stops.txt")
routes = load_concat("routes.txt")
trips = load_concat("trips.txt")
stop_times = load_concat("stop_times.txt")

print(stops.shape, routes.shape, trips.shape, stop_times.shape)
stops.head()

(14456, 10) (1532, 10) (230451, 8) (7447673, 9)


Unnamed: 0,stop_id,stop_name,stop_desc,stop_lat,stop_lon,zone_id,stop_url,location_type,parent_station,feed
0,100014,BEDFORD PK BLVD/GRAND CONCOURSE,,40.872562,-73.888156,,,0.0,,bronx
1,100017,PAUL AV/W 205 ST,,40.876836,-73.88971,,,0.0,,bronx
2,100018,PAUL AV/WEST MOSHOLU PKWY SOUTH,,40.880392,-73.886081,,,0.0,,bronx
3,100019,GRAND CONCOURSE/E 138 ST,,40.813496,-73.929489,,,0.0,,bronx
4,100020,GRAND CONCOURSE/E 144 ST,,40.816812,-73.928001,,,0.0,,bronx


Data loading/cleaning - MTA Bus ACE Violations


Linking the datasets
- Which bus routes serve each CUNY campus
- Find the violation counts for those routes (+violation types?)
- Compare across CUNY campuses

Visualizations/Insights

Recommendations