This notebook is a development restart of the `streetmapper` library and the `trash-talk` project.

## Data munging

We use a subset of the datasets mentioned in the `README.md` and the `streetmapper` library, developed as part of this project, to map trash locations onto (1) blocks and (2) buildings.

In [1]:
import pandas as pd
import geopandas as gpd

buildings = gpd.read_file("../../data/Building Footprints").to_crs({"init": "epsg:4326"})
streets = gpd.read_file("../../data/Streets - Active and Retired.geojson")
blocks = gpd.read_file("../../data/Census 2010_ Blocks for San Francisco.geojson")

In [2]:
%load_ext autoreload
%autoreload 2

import streetmapper as sm

In [3]:
%time blockfaces = sm.blockfaces_on_blocks(blocks, blocks_uid_col='geoid10')

CPU times: user 21.6 s, sys: 401 ms, total: 22 s
Wall time: 23.9 s


In [16]:
blockfaces.columns = ['blockface_id', 'block_id', 'geometry']
blockfaces.head()

Unnamed: 0,blockface_id,block_id,geometry
0,060750213002002_1,60750213002002,"LINESTRING (-122.440616 37.750902, -122.440459..."
1,060750213002002_2,60750213002002,"LINESTRING (-122.440459 37.749301, -122.441538..."
2,060750213002002_3,60750213002002,"LINESTRING (-122.441538 37.74923100000001, -12..."
3,060750213002002_4,60750213002002,"LINESTRING (-122.441687 37.75083600000001, -12..."
0,060750213002000_1,60750213002000,"LINESTRING (-122.438397 37.750226, -122.43832 ..."


In [5]:
sm.bldgs_on_blocks

<function streetmapper.pipeline.bldgs_on_blocks(bldgs, blocks, buildings_uid_col='building_id')>

In [9]:
%%time

buildings, _, _ = sm.bldgs_on_blocks(buildings, blocks, buildings_uid_col='globalid')

CPU times: user 1min 7s, sys: 1.52 s, total: 1min 8s
Wall time: 1min 11s


In [10]:
buildings = (
    buildings.assign(
        building_id=buildings.globalid.map(lambda v: v[1:-1].replace('-', '_')), 
        block_id=buildings.geoid10
    )
    .drop(['geoid10', 'globalid'], axis='columns')
)

In [12]:
df = pd.read_excel("../../data/Month 1 Open Source Data.xlsx")
from shapely.geometry import Point
trash = gpd.GeoDataFrame(df, geometry=df.apply(lambda srs: Point((srs['long'], srs['lat'])), axis='columns'))
del df

To keep the amount of work involved in performing matches reasonable, we restrict the zone of interest to roughly the buildings located along the survey zone.

In [28]:
blocks = blocks.rename(columns={'geoid10': 'block_id'})

In [31]:
from shapely.geometry import Polygon

selection_area = Polygon(
    ((-122.422836714479, 37.7849452051136), 
     (-122.419309755901, 37.7849452051136), 
     (-122.419309755901, 37.7998665826739), 
     (-122.422836714479, 37.7998665826739), 
     (-122.422836714479, 37.7849452051136))
).buffer(0.001)


blocks_of_interest = (
    sm.select_area_of_interest(blocks, selection_area)
)
buildings_of_interest = sm.select_area_of_interest(buildings, selection_area)

In [33]:
%%time

frontages_of_interest = sm.frontages_on_blockfaces(
    blocks_of_interest, blockfaces, buildings_of_interest,
    blocks_uid_col='block_id',
    buildings_uid_col='building_id',
    blockfaces_block_uid_col='block_id', 
    buildings_block_uid_col='block_id'
)

100%|██████████| 71/71 [00:09<00:00,  8.53it/s]


CPU times: user 10.2 s, sys: 106 ms, total: 10.3 s
Wall time: 10.3 s


In [35]:
trash_of_interest = sm.points_on_frontages(
    trash.head(100), frontages_of_interest
)

In [36]:
trash_of_interest.head()

Unnamed: 0,itemsTagged,likes,street,type,time,userPrimaryCommunityName,userCity,userCityDistrict,userState,userZipCode,lat,long,totalNumberOfItemsTagged,pickedUp,geometry,frontage_id,block_id,building_id,blockface_id
0,1,0,Filbert St,tobacco,"9/18/2018, 12:45:07 AM",Russian Hill,San Francisco,San Francisco County,California,94109,37.799602,-122.422209,1,True,POINT (-122.422208928493 37.7996024350634),BF7B2D3E_9565_4BDB_B3CB_70B7A6CD9B56_0,60750102002002,BF7B2D3E_9565_4BDB_B3CB_70B7A6CD9B56,060750102002002_1
1,1,0,Filbert St,other,"9/18/2018, 12:45:05 AM",Russian Hill,San Francisco,San Francisco County,California,94109,37.799808,-122.422168,1,True,POINT (-122.422168068763 37.7998081601673),C84A3FBA_F146_4A30_A2D7_FA9892831308_1,60750109003000,C84A3FBA_F146_4A30_A2D7_FA9892831308,060750109003000_3
2,1,0,Filbert St,other,"9/18/2018, 12:45:04 AM",Russian Hill,San Francisco,San Francisco County,California,94109,37.799744,-122.422153,1,True,POINT (-122.422152552981 37.7997437724664),BF7B2D3E_9565_4BDB_B3CB_70B7A6CD9B56_0,60750102002002,BF7B2D3E_9565_4BDB_B3CB_70B7A6CD9B56,060750102002002_1
3,1,0,Filbert St,other,"9/18/2018, 12:45:03 AM",Russian Hill,San Francisco,San Francisco County,California,94109,37.799807,-122.422091,1,True,POINT (-122.422090792739 37.7998066128486),F9B7661F_3478_47E1_8B6D_D3AF74B95225_0,60750102002002,F9B7661F_3478_47E1_8B6D_D3AF74B95225,060750102002002_1
4,1,0,Polk St,tobacco,"9/18/2018, 12:42:13 AM",Russian Hill,San Francisco,San Francisco County,California,94109,37.799505,-122.422302,1,True,POINT (-122.422302477379 37.799504567795),48D0D3C4_BBA4_4FB0_9BB0_9F03D0A1117D_0,60750109003000,48D0D3C4_BBA4_4FB0_9BB0_9F03D0A1117D,060750109003000_2


Unfortunately, the best land use data available is rather thin-on-the-ground when it comes to detail:

import geopandas as gpd
land_use = gpd.read_file('../../data/Land Use.geojson')

In [39]:
land_use['landuse'].value_counts()

RESIDENT        115169
MIXRES           22826
VACANT            4593
RETAIL/ENT        2726
PDR               2090
MISSING DATA      1985
MIXED             1939
CIE               1367
MIPS              1352
OpenSpace          932
VISITOR            273
MED                213
Right of Way         3
Name: landuse, dtype: int64

(note also the data sources available via the [Property Information Map](https://sfplanninggis.org/pim/) tool, which are documented [here](https://sfplanninggis.org/pim/help.html); and the SF Public Works datasets [here](https://bsm.sfdpw.org/datasf/))

## Ontology

Since the survey zones are small, it's only a few man-hours at most to manually tag buildings using Google Street Maps or, better yet, a walk through the neighborhood. This would allow us to generate a more useful, descriptive ontology.