# Clean and Explore SF Permit Data

In [1]:
import pandas as pd
import os

In [22]:
sf_building_permits = pd.read_csv("./data/raw_data/sf_permits.csv")
sf_building_permits.shape

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


## Clean

### Format date columns

In [3]:
date_cols = [c for c in sf_building_permits.columns if 'Date' in c]
date_cols

['Permit Creation Date',
 'Current Status Date',
 'Filed Date',
 'Issued Date',
 'Completed Date',
 'First Construction Document Date']

In [4]:
sf_building_permits[date_cols] = sf_building_permits[date_cols].apply(pd.to_datetime)

### Add Columns

In [5]:
sf_building_permits['apn'] = sf_building_permits['Block'] + '/' + sf_building_permits['Lot']
sf_building_permits['new_units'] = sf_building_permits['Proposed Units'].fillna(0) - sf_building_permits['Existing Units'].fillna(0)

## Explore Dataset 

### Analyze Permit Types and Proposed Uses

In [6]:
sf_building_permits[[
    'Permit Type', 'Permit Type Definition'
]].drop_duplicates().sort_values('Permit Type')

Unnamed: 0,Permit Type,Permit Type Definition
1297,1,new construction
230,2,new construction wood frame
58,3,additions alterations or repairs
5,4,sign - erect
3101,5,grade or quarry or fill or excavate
1289,6,demolitions
53,7,wall or painted sign
0,8,otc alterations permit


I think it's overly restrictive to only look at permit type 1 and 2. Instead, we can find increased density by find the difference between proposed units and existing units. In this dataset, 12,981 permits added units:

In [7]:
sum(sf_building_permits['new_units'] > 0)

12295

Units are added for homes seeking permits of **type 1, 2, 3, and 8.**

In [8]:
sf_building_permits.query('new_units > 0')['Permit Type'].value_counts()

3    5346
8    5345
2    1143
1     461
Name: Permit Type, dtype: int64

Thus, looking at just permit types 1 and 2 leads to an undercount. Permit type 3 includes "additions, alterations, or repairs," and additions can count towards RHNA. See for instance the first permit below which added an ADU:

In [18]:
[x for x in sf_building_permits[sf_building_permits['Permit Type'] == 3].Description.sample(3, random_state=99)]

['horizontal addition at 1st fl at rear yard & side yards. add 3 (n) dwelling units in (e) 1st fl. garage & storage space. each (n) unit to be studio w/ 1(n) kitchen & 1(n) bath.',
 'renovation and upgrade of an extg gamewell/fci e3 fire alarm system. ref app#200808149124. n/a for maher ordinance.',
 'provide&install 1 ea 750 kw 480 v 3 phase 4 wire at 1/fl.provide&install 1 ea 1000 kw 480 v 3 phase 4 wire at 1/fl. fuel oil was submitted on 3/14/14 appl 2014-03-14-0753. both generators to be installed in (e) generator room. mech pumbing under ap#2013-1219-4661.']

Permit type 8 is "over the counter" permits, meaning less bureaucracy. They can add units too.

In [60]:
permit_8_sample = sf_building_permits.query('new_units > 0 and `Permit Type` == 8').sample(5, random_state=8).Description
for p in permit_8_sample:
    print(p)

revision to approved plans pa# 2015-1103-1542: relocate unit #3 open space to the roof. reconfigure the p.l. foundation detail due to neighbor's foundation & field conditions. interior layout reconfigure 1st & 4th floor
tenant improvement for temp office & support space. scope includes arch, mep & telecom. interior work only.
deferred submittal of roof truss framing & associated calculations at above address with pa 201312315318 (type b,elev.a) by reference to pa 201407100858 master permit plan,& pa 201403100318 master bldg roof truss plan & calculation. this is an identical reuse of the master bldg & truss permit plans.
revision to pa #201602169672 - revised interior layout of new units. no change to (e) exterior or room count. structural work under separate permit.
nan


One thing that's odd about permits of type 8 is that they often purport to add units without the description mentioning it; sometimes the description just mentions fire springklers. My suspicion is that builders are requesting permit type 1 or 2 and coming back mid-project to get a permit type 8 to put in sprinklers and other finishing touches.

### Find uses that count towards RHNA

In [61]:
sf_building_permits.loc[lambda x: x['Permit Type'].isin([1, 2, 3, 8])]['Proposed Use'].value_counts().index

Index(['1 family dwelling', 'apartments', 'office', '2 family dwelling',
       'retail sales', 'food/beverage hndlng', 'tourist hotel/motel',
       'residential hotel', 'school', 'clinics-medic/dental',
       'warehouse,no frnitur', 'manufacturing', 'artist live/work', 'church',
       'vacant lot', 'health studios & gym', 'barber/beauty salon',
       'lending institution', 'recreation bldg', 'auto repairs',
       'workshop commercial', 'public assmbly other', 'prkng garage/public',
       'theater', 'misc group residns.', 'museum', 'sfpd or sffd station',
       'club', 'prkng garage/private', 'filling/service stn',
       'warehouse, furniture', 'massage parlor', 'day care home gt 12',
       'antenna', 'storage shed', 'parking lot', 'day care, non-res',
       'power plant', 'greenhouse', 'laundry/laundromat', 'automobile sales',
       'animal sale or care', 'nite club', 'dry cleaners', 'day care center',
       'social care facility', 'wholesale sales', 'phone xchnge/equip',


##### Note to self: check that the uses below actually count towards RHNA

In [82]:
relevant_uses = ['apartments', '1 family dwelling', '2 family dwelling', 
                 'residential hotel', 'misc group residns.', 'artist live/work', 
                 'convalescent home', 'accessory cottage', 'nursing home non amb',
                'orphanage', 'r-3(dwg) nursing', 'nursing home gt 6']

### Filter for permits that count towards RHNA

In [105]:
sf_all_construction = sf_building_permits[
    sf_building_permits['new_units'] > 0
    & sf_building_permits['Proposed Use'].isin(relevant_uses)
    & sf_building_permits['Permit Type'].isin([1, 2, 3, 8])
]

In [106]:
pd.set_option('max_columns', 100)

In [107]:
sf_all_construction.shape

(13027, 54)

#### Most lots in this dataset are included multiple times.

That's because, on average, each parcel requested 2.4 permits.

In [108]:
len(sf_all_construction.apn) / len(sf_all_construction.apn.unique())

2.415986646884273

One parcel has 40 permits.

In [109]:
sf_all_construction.apn.value_counts().head(10)

0843/016     40
2347/004G    39
3783/001     35
5457/037     34
0331/028     27
1193/048     27
7331/005     26
1101/020     23
4624/031     22
3738/004     22
Name: apn, dtype: int64

The parcel with the most permits recieved one permit for each street number. Based on google maps, it looks like these actually were separate constructions.

In [110]:
sf_all_construction[sf_all_construction.apn == '0843/016'].head()

Unnamed: 0,Permit Number,Permit Type,Permit Type Definition,Permit Creation Date,Block,Lot,Street Number,Street Number Suffix,Street Name,Street Suffix,Unit,Unit Suffix,Description,Current Status,Current Status Date,Filed Date,Issued Date,Completed Date,First Construction Document Date,Structural Notification,Number of Existing Stories,Number of Proposed Stories,Voluntary Soft-Story Retrofit,Fire Only Permit,Permit Expiration Date,Estimated Cost,Revised Cost,Existing Use,Existing Units,Proposed Use,Proposed Units,Plansets,TIDF Compliance,Existing Construction Type,Existing Construction Type Description,Proposed Construction Type,Proposed Construction Type Description,Site Permit,Supervisor District,Neighborhoods - Analysis Boundaries,Zipcode,Location,Record ID,SF Find Neighborhoods,Current Police Districts,Current Supervisor Districts,Analysis Neighborhoods,DELETE - Zip Codes,DELETE - Fire Prevention Districts,DELETE - Supervisor Districts,DELETE - Current Police Districts,DELETE - Supervisorial_Districts_Waterline_data_from_7pkg_wer3,apn,new_units
91787,201305217457,2,new construction wood frame,2013-05-21,843,16,680,,Page,St,,,"erect 4-story, type 5, 0 basement, 3 dwelling ...",complete,2016-12-30,2013-05-21,2015-06-25,2016-12-30,2015-08-31,,,4.0,,,2018-06-09,650000.0,945000.0,,,apartments,3.0,2.0,,,,5.0,wood frame (5),Y,5.0,Hayes Valley,94117.0,"(37.77306436257559, -122.43203185256074)",1385078504283,26.0,4.0,11.0,9.0,29492.0,15.0,11.0,6.0,10.0,0843/016,3.0
91788,201305217457,2,new construction wood frame,2013-05-21,843,16,682,,Page,St,,,"erect 4-story, type 5, 0 basement, 3 dwelling ...",complete,2016-12-30,2013-05-21,2015-06-25,2016-12-30,2015-08-31,,,4.0,,,2018-06-09,650000.0,945000.0,,,apartments,3.0,2.0,,,,5.0,wood frame (5),Y,5.0,Hayes Valley,94117.0,"(37.77306436257559, -122.43203185256074)",1385079504274,26.0,4.0,11.0,9.0,29492.0,15.0,11.0,6.0,10.0,0843/016,3.0
91789,201305217457,2,new construction wood frame,2013-05-21,843,16,684,,Page,St,,,"erect 4-story, type 5, 0 basement, 3 dwelling ...",complete,2016-12-30,2013-05-21,2015-06-25,2016-12-30,2015-08-31,,,4.0,,,2018-06-09,650000.0,945000.0,,,apartments,3.0,2.0,,,,5.0,wood frame (5),Y,5.0,Hayes Valley,94117.0,"(37.77306436257559, -122.43203185256074)",1385080504273,26.0,4.0,11.0,9.0,29492.0,15.0,11.0,6.0,10.0,0843/016,3.0
91803,201305217463,2,new construction wood frame,2013-05-21,843,16,692,,Page,St,,,"erect 4-story, type 5, 0 basement, 3 dwelling ...",complete,2017-03-07,2013-05-21,2015-06-25,2017-03-07,2015-08-31,,,4.0,,,2018-06-09,650000.0,945000.0,,,apartments,3.0,2.0,,,,5.0,wood frame (5),Y,5.0,Hayes Valley,94117.0,"(37.77306436257559, -122.43203185256074)",1385094504280,26.0,4.0,11.0,9.0,29492.0,15.0,11.0,6.0,10.0,0843/016,3.0
91805,201305217463,2,new construction wood frame,2013-05-21,843,16,694,,Page,St,,,"erect 4-story, type 5, 0 basement, 3 dwelling ...",complete,2017-03-07,2013-05-21,2015-06-25,2017-03-07,2015-08-31,,,4.0,,,2018-06-09,650000.0,945000.0,,,apartments,3.0,2.0,,,,5.0,wood frame (5),Y,5.0,Hayes Valley,94117.0,"(37.77306436257559, -122.43203185256074)",1385096504279,26.0,4.0,11.0,9.0,29492.0,15.0,11.0,6.0,10.0,0843/016,3.0


In [128]:
sf_all_construction.groupby(['apn', 'Street Number']).size().sort_values(ascending=False).head()

apn        Street Number
3783/001   801              21
4991/276   725              20
4172/022   1201             16
4591D/131  1                15
3733/008   250              15
dtype: int64

In [130]:
sf_all_construction.query("apn == '3783/001' and `Street Number` == 801").head()

Unnamed: 0,Permit Number,Permit Type,Permit Type Definition,Permit Creation Date,Block,Lot,Street Number,Street Number Suffix,Street Name,Street Suffix,Unit,Unit Suffix,Description,Current Status,Current Status Date,Filed Date,Issued Date,Completed Date,First Construction Document Date,Structural Notification,Number of Existing Stories,Number of Proposed Stories,Voluntary Soft-Story Retrofit,Fire Only Permit,Permit Expiration Date,Estimated Cost,Revised Cost,Existing Use,Existing Units,Proposed Use,Proposed Units,Plansets,TIDF Compliance,Existing Construction Type,Existing Construction Type Description,Proposed Construction Type,Proposed Construction Type Description,Site Permit,Supervisor District,Neighborhoods - Analysis Boundaries,Zipcode,Location,Record ID,SF Find Neighborhoods,Current Police Districts,Current Supervisor Districts,Analysis Neighborhoods,DELETE - Zip Codes,DELETE - Fire Prevention Districts,DELETE - Supervisor Districts,DELETE - Current Police Districts,DELETE - Supervisorial_Districts_Waterline_data_from_7pkg_wer3,apn,new_units
25631,201309045886,1,new construction,2013-09-04,3783,1,801,,Brannan,St,,,"to erect 6 stories, no basement, 434 dwelling ...",complete,2018-12-07,2013-09-04,2014-09-26,2018-12-07,2015-04-20,,,6.0,,,2020-08-25,112000000.0,127060484.0,,,apartments,434.0,2.0,,,,1.0,constr type 1,Y,6.0,South of Market,94103.0,"(37.771824392588535, -122.40388337311785)",1316337167598,33.0,1.0,10.0,34.0,28853.0,14.0,9.0,2.0,3.0,3783/001,434.0
87493,201505065510,3,additions alterations or repairs,2015-05-06,3783,1,801,,Brannan,St,,,revision to s1. revised foundation pile plans ...,complete,2018-12-03,2015-05-06,2015-08-12,2018-12-03,NaT,,,6.0,,,2018-07-27,150000.0,150000.0,vacant lot,,apartments,434.0,2.0,,,,1.0,constr type 1,,6.0,South of Market,94103.0,"(37.771824392588535, -122.40388337311785)",1380603167598,33.0,1.0,10.0,34.0,28853.0,14.0,9.0,2.0,3.0,3783/001,434.0
99357,201508214980,3,additions alterations or repairs,2015-08-21,3783,1,801,,Brannan,St,,,rev. to addendum 1 ref pa#201309045886. added ...,complete,2017-05-05,2015-08-21,2015-09-08,2017-05-05,NaT,,,6.0,,,2019-08-18,4500000.0,4500000.0,,,apartments,434.0,2.0,,,,1.0,constr type 1,,6.0,South of Market,94103.0,"(37.771824392588535, -122.40388337311785)",1393031167598,33.0,1.0,10.0,34.0,28853.0,14.0,9.0,2.0,3.0,3783/001,434.0
110758,201512013798,8,otc alterations permit,2015-12-01,3783,1,801,,Brannan,St,,,install temporary sprinkler monitors per sffd ...,complete,2018-12-06,2015-12-01,2016-01-14,2018-12-06,NaT,,,6.0,,Y,2017-01-08,12000.0,12000.0,vacant lot,,apartments,434.0,2.0,,,,1.0,constr type 1,,6.0,South of Market,94103.0,"(37.771824392588535, -122.40388337311785)",1404893363268,33.0,1.0,10.0,34.0,28853.0,14.0,9.0,2.0,3.0,3783/001,434.0
112657,201512175338,3,additions alterations or repairs,2015-12-17,3783,1,801,,Brannan,St,,,revision to pa# 2013/09/04/5886 s-2. added one...,complete,2018-12-03,2015-12-17,2016-07-27,2018-12-03,NaT,,,6.0,,,2020-07-06,5700000.0,5700000.0,vacant lot,,apartments,434.0,2.0,,,,1.0,constr type 1,,6.0,South of Market,94103.0,"(37.771824392588535, -122.40388337311785)",1406850167598,33.0,1.0,10.0,34.0,28853.0,14.0,9.0,2.0,3.0,3783/001,434.0


This implies that **apn** is not a unique identifier.

In [131]:
sf_all_construction.query("apn == '0843/016' and `Street Number` == 680").head()

Unnamed: 0,Permit Number,Permit Type,Permit Type Definition,Permit Creation Date,Block,Lot,Street Number,Street Number Suffix,Street Name,Street Suffix,Unit,Unit Suffix,Description,Current Status,Current Status Date,Filed Date,Issued Date,Completed Date,First Construction Document Date,Structural Notification,Number of Existing Stories,Number of Proposed Stories,Voluntary Soft-Story Retrofit,Fire Only Permit,Permit Expiration Date,Estimated Cost,Revised Cost,Existing Use,Existing Units,Proposed Use,Proposed Units,Plansets,TIDF Compliance,Existing Construction Type,Existing Construction Type Description,Proposed Construction Type,Proposed Construction Type Description,Site Permit,Supervisor District,Neighborhoods - Analysis Boundaries,Zipcode,Location,Record ID,SF Find Neighborhoods,Current Police Districts,Current Supervisor Districts,Analysis Neighborhoods,DELETE - Zip Codes,DELETE - Fire Prevention Districts,DELETE - Supervisor Districts,DELETE - Current Police Districts,DELETE - Supervisorial_Districts_Waterline_data_from_7pkg_wer3,apn,new_units
91787,201305217457,2,new construction wood frame,2013-05-21,843,16,680,,Page,St,,,"erect 4-story, type 5, 0 basement, 3 dwelling ...",complete,2016-12-30,2013-05-21,2015-06-25,2016-12-30,2015-08-31,,,4.0,,,2018-06-09,650000.0,945000.0,,,apartments,3.0,2.0,,,,5.0,wood frame (5),Y,5.0,Hayes Valley,94117.0,"(37.77306436257559, -122.43203185256074)",1385078504283,26.0,4.0,11.0,9.0,29492.0,15.0,11.0,6.0,10.0,0843/016,3.0
124289,201604074242,8,otc alterations permit,2016-04-07,843,16,680,,Page,St,,,install new fire sprinkler system per nfpa 13 ...,complete,2016-12-21,2016-04-07,2016-05-23,2016-12-21,NaT,,,4.0,,Y,2017-05-18,50000.0,80000.0,vacant lot,,apartments,3.0,2.0,,,,5.0,wood frame (5),,5.0,Hayes Valley,94117.0,"(37.77306436257559, -122.43203185256074)",1418899504283,26.0,4.0,11.0,9.0,29492.0,15.0,11.0,6.0,10.0,0843/016,3.0
127662,201605066826,8,otc alterations permit,2016-05-06,843,16,680,,Page,St,,,installation of manual and automatice fire ala...,complete,2016-12-21,2016-05-06,2016-05-11,2016-12-21,NaT,,,4.0,,Y,2017-05-06,8000.0,15000.0,vacant lot,0.0,apartments,3.0,2.0,,,,5.0,wood frame (5),,5.0,Hayes Valley,94117.0,"(37.77306436257559, -122.43203185256074)",1422384504283,26.0,4.0,11.0,9.0,29492.0,15.0,11.0,6.0,10.0,0843/016,3.0


In [132]:
sf_all_construction.sort_values(by="Permit Type", axis=0, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  sf_all_construction.sort_values(by="Permit Type", axis=0, inplace=True)


In [133]:
sf_all_construction = sf_all_construction.drop_duplicates(['apn', 'new_units', 'Proposed Units', 'Street Number'])

In [134]:
sf_all_construction.shape

(7480, 54)

In [135]:
len(sf_all_construction.apn.drop_duplicates()) / len(sf_all_construction.apn)

0.720855614973262

They span multiple permit types, and I assume some lots requesting permit type 1 or 2 will get a follow up permit.

In [136]:
sf_all_construction[sf_all_construction.apn.duplicated()]['Permit Type'].value_counts()

3    1233
8     521
2     241
1      93
Name: Permit Type, dtype: int64

In [141]:
sf_construction_post_2015 = sf_all_construction[
    (sf_all_construction['Issued Date'] >= '2015-01-01')
]

In [144]:
if not os.path.isdir('./clean_data'):
    os.mkdir('./clean_data')
    
sf_all_construction.to_csv('./clean_data/sf_all_construction.csv', index=False)
sf_construction_post_2015.to_csv('./clean_data/sf_construction_post_2015.csv', index=False)