## Bikespace Analysis - Damaged Bicycle Parking Reports

This notebook takes user-submitted reports of damaged bicycle parking from the BikeSpace app and returns the nearest 5 or fewer City of Toronto bicycle parking features based on geographic proximity. The goal is to identify City bicycle parking that may need to be replaced or repaired.


### TODO

* Improve excel format
* Sort the bikespace reports by id or date desc?
* Incorporate survey notes
* Add value for link like: `https://dashboard.bikespace.ca/#feed?view_all=1&submission_id=1003`
* Make additional tabs for data tables of parking features and reports
* Handling to drop "" and 0 rows
* Generate lat long geometry columns for city output

In [55]:
# imports
import datetime
from pathlib import Path
import json

import pandas as pd
import geopandas as gpd
import requests


### Get Data - Bikespace Reports

The [BikeSpace app](https://bikespace.ca/) allows users to report issues with bicycle parking in Toronto, including parking features that are damaged. User reports can be viewed on the [BikeSpace dashboard](https://dashboard.bikespace.ca/) or downloaded via the API.

Details on the bikespace API can be found at [api-dev.bikespace.ca](https://api-dev.bikespace.ca/api/v2/docs).


In [2]:
# get bikespace reports
report_limit = 5000
bikespace_request = requests.get(
  "https://api-dev.bikespace.ca/api/v2/submissions",
  params={"limit": report_limit})
bikespace_response = json.loads(bikespace_request.text)
bikespace_reports_data = pd.DataFrame(bikespace_response['submissions']).set_index('id')

In [3]:
# convert to geodataframe
bikespace_reports = gpd.GeoDataFrame(bikespace_reports_data, 
  geometry=gpd.points_from_xy(
    bikespace_reports_data['longitude'], 
    bikespace_reports_data['latitude'],
    ),
  crs="EPSG:4326"
  ) #.drop(["latitude", "longitude"], axis=1)

bikespace_reports['issues'].explode().value_counts()

issues
not_provided    512
damaged         400
full            359
other           158
abandoned        16
Name: count, dtype: int64

In [4]:
# get toronto ward boundaries
# https://open.toronto.ca/dataset/city-wards/
toronto_wards = gpd.read_file("https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/5e7a8234-f805-43ac-820f-03d7c360b588/resource/737b29e0-8329-4260-b6af-21555ab24f28/download/City%20Wards%20Data.geojson")

In [5]:
# bikespace reports within Toronto only
br_toronto = bikespace_reports.sjoin(toronto_wards[["geometry", "AREA_DESC"]], how="inner", predicate="intersects").drop("index_right", axis=1)

### Sources of City Bicycle Parking Data

#### Option A

Source datasets from [open.toronto.ca](https://open.toronto.ca/)

#### Option B

Uses "normalized" city bicycle parking data from [github.com/tallcoleman/new-parking-map/](https://github.com/tallcoleman/new-parking-map/). This data makes the data fields consistent for all datasets (based on the OpenStreetMap tagging system) and filters out bicycle parking features for which the data indicates they are not current present on the street.

Relevant filters applied:

* bicycle-parking-racks: STATUS must be "Installed". ("Delivered", "Approved", "Proposed", and "TBD" features are not included)
* street-furniture-bicycle-parking: STATUS must be "Existing". ("Temporarily Removed" features are not included)

In [6]:
# read source urls and other metadata from open_toronto_ca_sources.json
city_sources_path = Path("open_toronto_ca_sources.json")
with city_sources_path.open("r") as f:
  city_sources = json.load(f)


In [7]:
# get city bicycle parking data - OPTION A
city_data = {}
for source in city_sources['datasets']:
  city_data[source['dataset_name']] = gpd.read_file(source['download_url'])
  city_data[source['dataset_name']].insert(0, "source", source['dataset_name'])

city_data_all = pd.concat(city_data.values())


In [8]:
# get city bicycle parking data - OPTION B
normalized_data = {}
for source in city_sources['datasets']:
  normalized_data[source['dataset_name']] = gpd.read_file(source['normalized_url'])

normalized_data_all = pd.concat(normalized_data.values())

In [9]:
# convert datetime values to string
for column in normalized_data_all:
  if "datetime" in str(normalized_data_all[column].dtype):
    normalized_data_all[column] = normalized_data_all[column].map(
      lambda x: x.isoformat()
    )

### Damage reports and closest parking feature

**TODO**

* buffer report points
* distance from report?
* top 5 based on distances?

In [10]:
br_toronto_damaged = br_toronto[["damaged" in i for i in br_toronto['issues']]]

In [11]:
# convert crs to allow for distance calculations in metres
br_toronto_damaged_utm17n = br_toronto_damaged.to_crs("32617")
city_data_all_utm17n = city_data_all.to_crs("32617")


In [12]:
# area to search, in metres
search_radius = 30

# option A - nearest join only
nearest_features_damaged = br_toronto_damaged_utm17n.sjoin_nearest(
  city_data_all_utm17n.assign(
    city_geometry=city_data_all_utm17n['geometry']
    ), 
  how="inner", 
  max_distance=search_radius, 
  distance_col="distance"
  )
nearest_features_damaged.columns

Index(['comments', 'issues', 'latitude', 'longitude', 'parking_duration',
       'parking_time', 'geometry', 'AREA_DESC', 'index_right', 'source', '_id',
       'ADDRESS_POINT_ID', 'ADDRESS_NUMBER', 'LINEAR_NAME_FULL',
       'ADDRESS_FULL', 'POSTAL_CODE', 'MUNICIPALITY', 'CITY', 'WARD',
       'PLACE_NAME', 'GENERAL_USE_CODE', 'CENTRELINE_ID', 'LO_NUM',
       'LO_NUM_SUF', 'HI_NUM', 'HI_NUM_SUF', 'LINEAR_NAME_ID', 'ID',
       'PARKING_TYPE', 'FLANKING', 'BICYCLE_CAPACITY', 'SIZE_M',
       'YEAR_INSTALLED', 'BY_LAW', 'DETAILS', 'OBJECTID', 'WARD_NAME',
       'MI_PRINX', 'CAPACITY', 'MULTIMODAL', 'SEASONAL', 'SHELTERED',
       'SURFACE', 'STATUS', 'LOCATION', 'NOTES', 'MAP_CLASS',
       'ADDRESSNUMBERTEXT', 'ADDRESSSTREET', 'FRONTINGSTREET', 'SIDE',
       'FROMSTREET', 'DIRECTION', 'SITEID', 'BIA', 'ASSETTYPE', 'SDE_STATE_ID',
       'city_geometry', 'distance'],
      dtype='object')

#### Try using buffer instead

In [13]:
br_toronto_damaged_utm17n = br_toronto_damaged_utm17n.assign(
  geometry_buffered = br_toronto_damaged_utm17n.buffer(search_radius)
)

data_matches = city_data_all_utm17n.sjoin(
  df=br_toronto_damaged_utm17n[["geometry_buffered"]].set_geometry("geometry_buffered"),
  how='inner',
  predicate='intersects'
)

data_matches.sample(5)

Unnamed: 0,source,_id,ADDRESS_POINT_ID,ADDRESS_NUMBER,LINEAR_NAME_FULL,ADDRESS_FULL,POSTAL_CODE,MUNICIPALITY,CITY,WARD,...,ADDRESSSTREET,FRONTINGSTREET,SIDE,FROMSTREET,DIRECTION,SITEID,BIA,ASSETTYPE,SDE_STATE_ID,index_right
5460,street-furniture-bicycle-parking,5461,,,,,,,,10,...,Queens Quay W,Queens Quay W,South,Lower Simcoe St,East,,The Waterfront,Ring,0.0,97
87,bicycle-parking-high-capacity-outdoor,88,772833.0,481.0,Bloor St W,481 Bloor St W,M5S 1X9,former Toronto,Toronto,University-Rosedale,...,,,,,,,,,,336
13731,street-furniture-bicycle-parking,13732,,,,,,,,10,...,Queens Quay W,Queens Quay W,South,Bishop Tutu Blvd,East,,The Waterfront,Ring,0.0,814
2665,street-furniture-bicycle-parking,2666,,,,,,,,10,...,Adelaide St W,University Ave,East,Adelaide St W,North,,Financial District,Ring,0.0,1183
104,bicycle-parking-high-capacity-outdoor,105,7792268.0,595.0,Bay St,595 Bay St,M5G 2C2,former Toronto,Toronto,University-Rosedale,...,,,,,,,,,,1247


In [18]:
br_toronto_damaged_utm17n.loc[[1330]]

Unnamed: 0_level_0,comments,issues,latitude,longitude,parking_duration,parking_time,geometry,AREA_DESC,geometry_buffered
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1330,BP-35952 Ring and post is missing a ring. #Tea...,[damaged],43.668834,-79.337012,minutes,"Sat, 23 Mar 2024 16:16:23 GMT",POINT (634068.858 4836435.669),Toronto-Danforth (14),"POLYGON ((634098.858 4836435.669, 634098.713 4..."


In [15]:
# gpd.GeoDataFrame([br_toronto_damaged_utm17n.loc[i] for i in data_matches['index_right']])

report_matches = gpd.GeoDataFrame(
  [
    br_toronto_damaged_utm17n.loc[i] 
    for i 
    in data_matches['index_right']
  ],
  crs="32617"
)

distances = data_matches['geometry'].distance(report_matches, align=False)
data_matches = data_matches.assign(distance=distances)

In [67]:
data_matches = data_matches[
  ['distance'] + [col for col in data_matches.columns if col != 'distance']
]

Unnamed: 0,distance,source,_id,ADDRESS_POINT_ID,ADDRESS_NUMBER,LINEAR_NAME_FULL,ADDRESS_FULL,POSTAL_CODE,MUNICIPALITY,CITY,...,ADDRESSSTREET,FRONTINGSTREET,SIDE,FROMSTREET,DIRECTION,SITEID,BIA,ASSETTYPE,SDE_STATE_ID,index_right
8914,19.481449,street-furniture-bicycle-parking,8915,,,,,,,,...,Dufferin St,Dufferin St,East,Temple Ave,North,,,Ring,0.0,1143
9822,19.291355,street-furniture-bicycle-parking,9823,,,,,,,,...,Bloor St W,Bloor St W,North,Brunswick Ave,East,,Bloor Annex,Ring,0.0,241
11184,28.922231,street-furniture-bicycle-parking,11185,,,,,,,,...,Queens Quay W,Queens Quay W,South,Lower Simcoe St,East,,The Waterfront,Ring,0.0,1101
14751,13.638683,street-furniture-bicycle-parking,14752,,,,,,,,...,Bloor St W,Bloor St W,North,Howland Ave,East,,Bloor Annex,Ring,0.0,1117
9331,4.982162,street-furniture-bicycle-parking,9332,,,,,,,,...,Queen St E,Queen St E,North,Bright St,East,,Historic Queen East,Ring,0.0,941


In [68]:
report_city_matches = []

for ix in br_toronto_damaged_utm17n.index:
  report_city_matches.append({
    "report": br_toronto_damaged_utm17n.loc[[ix]],
    "city_features": data_matches[
      data_matches['index_right'] == ix
      ].nsmallest(n=5, columns="distance"),
  })



In [74]:
report_city_matches_yes = [x for x in report_city_matches if len(x['city_features']) > 0]
print(len(report_city_matches), len(report_city_matches_yes))

367 274


In [75]:
# output

# set up output excel sheet
writer = pd.ExcelWriter(
  'damage_bikespace_city_matches.xlsx', 
  engine='xlsxwriter',
)
workbook = writer.book
worksheet = workbook.add_worksheet('DamageReports')
writer.sheets['DamageReports'] = worksheet

# write header content
bold = workbook.add_format({'bold': True})
worksheet.write(
  'A1', 
  "Bikespace Analysis - Damaged Bicycle Parking Reports", 
  bold,
)
worksheet.write(
  'A2',
  f"Updated {datetime.datetime.today().strftime('%B %d %Y')}",
)
worksheet.write(
  'A3',
  f"{len(report_city_matches_yes)} BikeSpace damage reports with nearby City bicycle parking features"
)

# write data tables
write_row = 4
for pair in report_city_matches_yes:
  report, city_features = pair.values()
  report = (report
    .reset_index(names=["id"])
    .drop(columns=["geometry", "geometry_buffered"])
    .T
  )
  city_features = city_features.drop(columns=["index_right"]).T.dropna()
  report.to_excel(
    writer, 
    sheet_name='DamageReports', 
    startrow=write_row, 
    startcol=0,
    header=False,
  )
  write_row += len(report) + 1
  city_features.to_excel(
    writer, 
    sheet_name='DamageReports', 
    startrow=write_row, 
    startcol=0,
    header=False,
  )
  write_row += len(city_features) + 2


workbook.close()
