# FARLAB - Robotable Streets Project 
Developer: @mattwfranchi

Project Members: Matt Franchi, Maria-Teresa Parreira, Frank Bu, Wendy Ju 

As robots deployments become more common, they will become yet another dancer in the sidewalk ballet. Within urban mapping, transit mobility and walkability scores have emerged as a way to measure the quality of a city's infrastructure for a specific medium of traffic. However, there is no such metric for robots! Here, we aim to envision what a 'robotability' score might look like, and how it might be used to inform urban planning and policy. 

We utilize the following data features in computing a *robotability score*: 
- Sidewalk width 
- Sidewalk quality proxied by 311 complaints 
- Pedestrian density, computed via aggregated dashcam data
- Sidewalk material (concrete, asphalt, cobblestone, etc.) 
- Connectivity: cellular coverage, WiFi availbility, IoT network coverage, and GPS coverage 
- Elevation change from beginning to end of road segment 
- Solar radiation levels, for potential solar charging and for potential overheating. 
- Proximity to hypothetical charging stations 
- Grating on sidewalk (ie in NYC, the subway grates) that might be problematic for robots to navigate 
- Snow buildup 
- Local attitudes towards robots 
- Average illegal parking levels, ie cars parked on sidewalks 
- Shade / shadows 
- Overhead covering (scaffolding, awnings, etc., in the case of non-waterproof bots)
- Zoning. My hypothesis: robots are more acceptable in commercial-zoned areas, and less acceptable in majority-residential zoned areas. 

### Other Things to Lock In (4/25/24): 
- Study period. Some of this data (311 complaints, pedestrian densities, etc., should be constrained within a time range. **Limitation: we don't have new, free dashcam data presently**)



In [1]:
# class RobotabilityGraph that inherits from Graph class 
import os
import sys 
sys.path.append("/share/ju/urban-fingerprinting")

import osmnx as ox 
import geopandas as gpd 
import pandas as pd 
import numpy as np 

import matplotlib.pyplot as plt 
# enable latex plotting 
plt.rc('text', usetex=True)
plt.rc('font', family='serif')

from glob import glob 
from tqdm import tqdm 

from shapely import wkt, LineString 

import rasterio
from rasterio.enums import Resampling
from rasterio.plot import show 


from src.utils.logger import setup_logger 

logger = setup_logger('robotability-score')
logger.setLevel("INFO")
logger.info("Modules initialized.")

WGS='EPSG:4326'
PROJ='EPSG:2263'

REGEN_SEGMENTIZATION=False
REGEN_TOPOLOGY=True

GEN_INSPECTION_PLOTS=True
INSPECTION_PLOTS="figures/inspection_plots"

os.makedirs(INSPECTION_PLOTS, exist_ok=True)


[34m2024-08-18 16:57:11 - robotability-score - INFO - Modules initialized.[0m


## Loading and Preprocessing Data Features 

### Neighborhood Tabulation Areas (NYC)

In [3]:
# Load the Neighborhood Tabulation Areas (NTAs) dataset 
ntas_nyc = pd.read_csv("data/ntas_nyc.csv")
ntas_nyc = gpd.GeoDataFrame(ntas_nyc, geometry=wkt.loads(ntas_nyc['the_geom']), crs=WGS).to_crs(PROJ)

# Remove redundant columns 
TO_DROP = ['BoroCode','CountyFIPS','NTA2020','NTAAbbrev','CDTA2020','CDTAName']
ntas_nyc = ntas_nyc.drop(columns=TO_DROP)

logger.success("NTAs loaded.")

[32m2024-08-18 15:15:03 - robotability-score - SUCCESS - NTAs loaded.[0m


### Census Blocks (NYC)

In [4]:
cbs_nyc = gpd.read_file("data/nycb2020_24c/nycb2020.shp")

TO_DROP = ['BoroCode', 'CT2020', 'BCTCB2020']

cbs_nyc = cbs_nyc.drop(columns=TO_DROP)
cbs_nyc = cbs_nyc.to_crs(PROJ)

logger.success("Census Blocks loaded.")

[32m2024-08-18 15:15:09 - robotability-score - SUCCESS - Census Blocks loaded.[0m


### Sidewalk Basemap (NYC)

In [5]:
# Load the NYC sidewalk basemap 
sidewalk_nyc = pd.read_csv("data/sidewalks_nyc.csv")
sidewalk_nyc = gpd.GeoDataFrame(sidewalk_nyc, geometry=wkt.loads(sidewalk_nyc['the_geom']), crs=WGS).to_crs(PROJ)

In [6]:
# Take out features we don't need, and add a width column 
TO_DROP = ['SUB_CODE', 'FEAT_CODE', 'STATUS', 'the_geom']
sidewalk_nyc = sidewalk_nyc.drop(columns=TO_DROP)
sidewalk_nyc['SHAPE_Width'] = sidewalk_nyc['SHAPE_Area'] / sidewalk_nyc['SHAPE_Leng']

# Simplify 
sidewalk_nyc['geometry'] = sidewalk_nyc['geometry'].simplify(10)

# write to disk 
if REGEN_SEGMENTIZATION:
    # segmentize 
    segmentized = sidewalk_nyc.segmentize(50).extract_unique_points().explode(index_parts=True)

    segmentized = gpd.GeoDataFrame(segmentized).reset_index() 

    segmentized = segmentized.merge(sidewalk_nyc,left_on='level_0',right_index=True).drop(columns=['level_0','level_1','geometry'])
    segmentized['geometry'] = segmentized.iloc[:,0]
    segmentized.drop(segmentized.columns[0],axis=1, inplace=True)
    segmentized = gpd.GeoDataFrame(segmentized, crs=PROJ)

    segmentized.to_file("data/sidewalks_nyc_segmentized.geojson", driver='GeoJSON')
    logger.success("Segmentized sidewalk basemap written to disk.")

else: 
    segmentized = gpd.read_file("data/sidewalks_nyc_segmentized.geojson")
    logger.info("Segmentized sidewalk basemap loaded.")


sidewalk_nyc = segmentized

logger.success("NYC sidewalk basemap loaded.")
logger.info(f"Distribution of sidewalk widths [ft]: \n{sidewalk_nyc['SHAPE_Width'].describe()}")

[34m2024-08-16 00:31:39 - robotability-score - INFO - Segmentized sidewalk basemap loaded.[0m
[32m2024-08-16 00:31:39 - robotability-score - SUCCESS - NYC sidewalk basemap loaded.[0m
[34m2024-08-16 00:31:39 - robotability-score - INFO - Distribution of sidewalk widths [ft]: 
count    2.551208e+06
mean     5.373540e+00
std      1.480766e+00
min      2.710948e-01
25%      4.458703e+00
50%      5.149090e+00
75%      6.177234e+00
max      4.021491e+01
Name: SHAPE_Width, dtype: float64[0m


In [7]:
if GEN_INSPECTION_PLOTS:
    fig, ax = plt.subplots(1,1,figsize=(10,10))

    ntas_nyc.plot(ax=ax, color='lightgrey', edgecolor='black', alpha=0.5)
    sidewalk_nyc.sample(frac=0.1).plot(ax=ax, color='black', alpha=0.5, markersize=0.5)

    plt.title("NYC Sidewalk Basemap")
    plt.axis('off')
    plt.tight_layout()
    plt.savefig(f"{INSPECTION_PLOTS}/nyc_sidewalk_basemap.png")
    plt.close()

###  Topology 

In [8]:
TOPOGRAPHY_NYC = "data/1ft_dem_nyc/DEM_LiDAR_1ft_2010_Improved_NYC_int.tif"

downsample_factor = 10

# Open the raster
with rasterio.open(TOPOGRAPHY_NYC) as src:
    # Calculate new transform and dimensions
    new_transform = src.transform * src.transform.scale(
        downsample_factor,
        downsample_factor
    )
    new_width = src.width // downsample_factor
    new_height = src.height // downsample_factor
    
    # Resample the raster
    topology = src.read(
        out_shape=(src.count, new_height, new_width),
        resampling=Resampling.bilinear
    )

    # Create a new rasterio-like object with updated metadata
    new_meta = src.meta.copy()
    new_meta.update({
        "driver": "GTiff",
        "height": new_height,
        "width": new_width,
        "transform": new_transform
    })

    # Write the new raster to disk
    if REGEN_TOPOLOGY: 
        with rasterio.open("data/1ft_dem_nyc/downsampled_topography.tif", "w", **new_meta) as dst:
            dst.write(topology)



In [9]:
# plot topology 
if GEN_INSPECTION_PLOTS:
    fig, ax = plt.subplots(figsize=(10, 10))
    fig.suptitle(r"\bf Elevation in New York City", fontsize=20)

    # elevation is a 2D array, so we can plot it directly
    show(topology, ax=ax, cmap='terrain', transform=new_transform, )
    #nyc_ct.plot(ax=ax, facecolor='white', edgecolor='black', linewidth=0.5, alpha=0.25)

    ax.set_axis_off()
    plt.savefig(f"{INSPECTION_PLOTS}/topology_nyc.png", dpi=300)
    plt.close()

### Satellite Availability 

In [5]:
gso_satellite = pd.read_csv("data/bdc_36_GSOSatellite_fixed_broadband_D23_06aug2024.csv", engine='pyarrow')

gso_satellite['block_geoid'] = gso_satellite['block_geoid'].astype(str)

In [6]:
# merge gso_satellite with cbs_nyc 
gso_satellite = cbs_nyc.merge(gso_satellite, right_on='block_geoid', left_on='GEOID', how='left')
gso_satellite = gpd.GeoDataFrame(gso_satellite, crs=PROJ)
logger.info(f"GSOSatellite data merged with Census Blocks. \n{gso_satellite.info()}")

[34m2024-08-18 15:15:48 - robotability-score - INFO - GSOSatellite data merged with Census Blocks. 
None[0m


<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 2470523 entries, 0 to 2470522
Data columns (total 18 columns):
 #   Column                         Dtype   
---  ------                         -----   
 0   CB2020                         object  
 1   BoroName                       object  
 2   GEOID                          object  
 3   Shape_Leng                     float64 
 4   Shape_Area                     float64 
 5   geometry                       geometry
 6   frn                            float64 
 7   provider_id                    float64 
 8   brand_name                     object  
 9   location_id                    float64 
 10  technology                     float64 
 11  max_advertised_download_speed  float64 
 12  max_advertised_upload_speed    float64 
 13  low_latency                    float64 
 14  business_residential_code      object  
 15  state_usps                     object  
 16  block_geoid                    object  
 17  h3_res8_id         

In [8]:
ngso_satellite = pd.read_csv("data/bdc_36_NGSOSatellite_fixed_broadband_D23_06aug2024.csv", engine='pyarrow')
ngso_satellite['block_geoid'] = ngso_satellite['block_geoid'].astype(str)

# merge ngso_satellite with cbs_nyc
ngso_satellite = cbs_nyc.merge(ngso_satellite, right_on='block_geoid', left_on='GEOID', how='left')
ngso_satellite = gpd.GeoDataFrame(ngso_satellite, crs=PROJ)
logger.info(f"NSGOSatellite data merged with Census Blocks. \n{ngso_satellite.info()}")

[34m2024-08-18 15:16:52 - robotability-score - INFO - NSGOSatellite data merged with Census Blocks. 
None[0m


<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 826917 entries, 0 to 826916
Data columns (total 18 columns):
 #   Column                         Non-Null Count   Dtype   
---  ------                         --------------   -----   
 0   CB2020                         826917 non-null  object  
 1   BoroName                       826917 non-null  object  
 2   GEOID                          826917 non-null  object  
 3   Shape_Leng                     826917 non-null  float64 
 4   Shape_Area                     826917 non-null  float64 
 5   geometry                       826917 non-null  geometry
 6   frn                            821993 non-null  float64 
 7   provider_id                    821993 non-null  float64 
 8   brand_name                     821993 non-null  object  
 9   location_id                    821993 non-null  float64 
 10  technology                     821993 non-null  float64 
 11  max_advertised_download_speed  821993 non-null  float64 
 12  max_adve

### Surveillance Cameras 

In [6]:
surveillance_cameras = pd.read_csv("data/surveillance_cameras/data/cameras_2015-2021.csv")
surveillance_cameras = surveillance_cameras[surveillance_cameras['city'] == 'New York']
surveillance_cameras = surveillance_cameras[surveillance_cameras['camera_count'] > 0]
surveillance_cameras 

  surveillance_cameras = pd.read_csv("data/surveillance_cameras/data/cameras_2015-2021.csv")


Unnamed: 0,panoid,heading,lat,lon,city,year,month,distance,camera_count,zone_type
4152,qbafD92lLAk_SRn2GErWkg,84,40.549514,-74.151841,New York,2018,7,4.905160,1,commercial
4298,YaZUN1_WQa2mt6ZHU8f0IA,145,40.550640,-74.197978,New York,2018,8,2.886381,1,residential
5835,2Rh9tnnR1KrW1Xjk20zRlg,215,40.562025,-74.101158,New York,2018,6,13.419243,1,residential
6481,0jNVCy-Er4yEhGr5oBwNkw,31,40.568562,-74.106132,New York,2017,7,4.582663,1,residential
7335,rt7iOq7usJr3YM3huiCY1g,212,40.575047,-74.165660,New York,2015,9,7.510337,1,commercial
...,...,...,...,...,...,...,...,...,...,...
98816,dspn02Xm1hsWuvUNi_JDoA,115,40.890967,-73.828891,New York,2018,8,3.015696,1,residential
99023,KDtTu6amyjs-VaMRwBnoDg,202,40.893688,-73.860541,New York,2016,7,4.943950,1,residential
99028,YUU9BMH1B_cinU2UWSlWXQ,203,40.893713,-73.858487,New York,2016,7,5.639436,1,residential
99364,ejcY-mgRFhaooy5h_4rzRw,352,40.898056,-73.862961,New York,2016,7,11.193857,1,residential


### Street Furniture 

In [None]:
street_furniture_nyc = pd.read_csv("data/processed/street_furniture_density.csv", engine='pyarrow')

logger.success("Street furniture density data loaded.")

### NYC Zoning (ZOLA)

In [30]:
zoning_nyc = gpd.read_file("data/nyc_zoning/nyzd.shp")

logger.success("ZOLA data loaded.")

[32m2024-08-18 17:18:47 - robotability-score - SUCCESS - ZOLA data loaded.[0m


### CitiBike Stations 
We simulate charging stations in the environment with existing CitiBike charging stations. It is also possible to simulate this via random sampling of the NYC road network, but we think citibike stations might be a more accurate distribution to pull from , as they are influenced by population density and zoning patterns. 

In [29]:
citibike_stations_nyc = pd.read_json("data/citibike/station_information.json")
citibike_stations_nyc = pd.json_normalize(citibike_stations_nyc['data']).T 
citibike_stations_nyc = pd.json_normalize(citibike_stations_nyc[0])
citibike_stations_nyc = gpd.GeoDataFrame(citibike_stations_nyc, geometry=gpd.points_from_xy(citibike_stations_nyc['lon'], citibike_stations_nyc['lat']), crs=WGS).to_crs(PROJ)

logger.success("Citibike stations data loaded.")

[32m2024-08-18 17:18:34 - robotability-score - SUCCESS - Citibike stations data loaded.[0m


### Street Lighting 

### Curb Ramps 

In [31]:
curb_ramps_nyc = pd.read_csv("data/pedestrian_curb_ramp_nyc.csv")

### Raised Crosswalks 

In [33]:
raised_crosswalks_nyc = pd.read_csv("data/raised_crosswalks_nyc.csv")

### Sidewalk Scorecard 


In [36]:
sidewalk_scorecard_nyc = pd.read_csv("data/Scorecard_Ratings_20240814.csv") 

### Points of Interest 

In [41]:
pois_nyc = pd.read_csv("data/pois_nyc.csv") 
pois_nyc = gpd.GeoDataFrame(pois_nyc, geometry=wkt.loads(pois_nyc['the_geom']), crs=WGS).to_crs(PROJ)

## Features that will not be modeled with empirical data 

### Weather Conditions 
We will assign weights to different types of weather. However, weather conditions may not be considered in use-cases where only the static built-environment is considered. 

### Existence of Detailed Digital Maps 

In the case of New York City, the entire city is covered by a DCM. As such, there is no need to bring this indicator in empirically, as the variance across the city is 0. 

## Features that cannot be modeled with empirical data 

### Sidewalk material 
There is no known dataset of sidewalk materials at the per-sidewalk level in NYC. Further, we deem that there is low variance; ie. the majority of sidewalks are all concrete. 

### Street lighting 
There is no known dataset of street lighting / lamp-posts in NYC. While a subset can be derived via the locations of relevant 311 complaints, we are unaware if this subset is representative of the overall distribution. 

## Features that are inferred via proxy datasets / other empirical data 
