# [DataKind Data Dive](https://datadive.datakind.org/), Housing Insecurity Project

Written by Laura Prichard, 18 September 2021

Notebook contains:
- Exploratory Data Analysis on the National Housing Preservation Database (NHPD)
- Some data cleaning to homogenize column formats and variables
- Exploring the following question:
    - **Does Florida reserve more units for certain protected groups compared to other locations?**


- **Conclusions:**
    - Florida tends to similarly track trends for Southeastern US states (not including Florida) and the rest of the USA (not including US Territories or the Southeastern states) for protected groups for the largest demographic groups (Family, Elderly, Elderly Or Disabled, Disabled, Mixed).
    - For the smaller group (bearing in mind small number statistics), things deviate slightly.
        - Florida shows no properties allocated for Group Homes
        - Florida allocates the highest number of houses for Homeless people relative to the rest of the USA
        - Florida shows no specific housing allocation for Veterans on Low Income housing 
    - These effects could be due to differences in the classifications between states.


[**Housing Insecurity Project**](https://docs.google.com/document/d/1ovSvMK39wO6NXqCrH0chQL7aRHR6Lr0vQIzXhmUEBFk/edit) | [Project GitHub](https://github.com/datakind/sep21-housing-insecurity) | [Tasks](https://docs.google.com/spreadsheets/d/1H4KZ31jKkhyBYXAlYm_Bw-RjTNkVoeskye0GwtbPuvM/edit?pli=1#gid=0) | [Data Folder](https://drive.google.com/drive/folders/19B0xzeRyozYJDxwXKlGIPFe3Qnc3nfux)

*Research Question 5 (NHPD): Explore the National Housing Preservation Database, generating summary statistics and maps pertaining to low income housing in Florida. How does Florida compare to the rest of the Southeast, and how does it compare to the rest of the United States as a whole? (A few example questions to get you started: **Does Florida reserve more units for certain protected groups compared to other locations?** Are Florida units more likely to have received violations upon inspection? How does the proportion of units in each building reserved for low-income housing compare to other locations? Does fair market rent have any relation to the number of units available?)*

*Task 2: Open ended EDA 1 - generate numerical summaries, distribution plots, correlation matrices, etc for the columns in this dataset. What is surprising? What is not surprising?*


In [1]:
import numpy as np 
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
%matplotlib inline

In [2]:
# Load in data
df = pd.read_excel('./data/National_Housing_Preservation_Database/Active_and_Inconclusive_Properties.xlsx')
display(df.head())
display(df.info())

Unnamed: 0,NHPDPropertyID,PropertyName,PropertyAddress,City,State,Zip,CBSACode,CBSAType,County,CountyCode,...,NumberActiveMR,NumberInconclusiveMR,NumberInactiveMR,Mr_1_Status,Mr_1_ProgramName,Mr_1_AssistedUnits,Mr_2_Status,Mr_2_ProgramName,Mr_2_AssistedUnits,OldNHPDPropertyID
0,1000000,IVY ESTATES,6729 Zeigler Blvd,Mobile,AL,36608-4253,33660.0,Metropolitan Statistical Area,Mobile,1097.0,...,0,0,0,,,,,,,
1,1000001,RENDU TERRACE WEST,7400 Old Shell Rd,Mobile,AL,36608-4549,33660.0,Metropolitan Statistical Area,Mobile,1097.0,...,0,0,0,,,,,,,
2,1000002,TWB RESIDENTIAL OPPORTUNITIES II,93 Canal Rd,Port Jefferson Station,NY,11776-3024,35620.0,Metropolitan,Suffolk,36103.0,...,0,0,0,,,,,,,
3,1000003,THE DAISY HOUSE,615 Clarissa St,Rochester,NY,14608-2485,40380.0,Metropolitan,Monroe,36055.0,...,0,0,0,,,,,,,
4,1000004,MAIN AVENUE APARTMENTS,105 E Walnut St,Sylacauga,AL,35150-3012,45180.0,Micropolitan Statistical Area,Talladega,1121.0,...,0,0,0,,,,,,,


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 82287 entries, 0 to 82286
Columns: 252 entries, NHPDPropertyID to OldNHPDPropertyID
dtypes: datetime64[ns](42), float64(73), int64(41), object(96)
memory usage: 158.2+ MB


None

In [3]:
display(df.describe())

Unnamed: 0,NHPDPropertyID,CBSACode,CountyCode,CensusTract,Latitude,Longitude,ActiveSubsidies,TotalInconclusiveSubsidies,TotalInactiveSubsidies,TotalUnits,...,NumberInconclusivePBV,NumberInactivePBV,Pbv_1_AssistedUnits,Pbv_2_AssistedUnits,NumberActiveMR,NumberInconclusiveMR,NumberInactiveMR,Mr_1_AssistedUnits,Mr_2_AssistedUnits,OldNHPDPropertyID
count,82287.0,72919.0,82229.0,82224.0,82287.0,82287.0,82287.0,82287.0,82287.0,82287.0,...,82287.0,82287.0,2784.0,173.0,82287.0,82287.0,82287.0,500.0,9.0,58370.0
mean,1074656.0,30447.401281,28953.566759,28952170000.0,38.483402,-90.228069,1.386355,0.077485,0.369098,66.711145,...,0.0,0.0,43.463003,35.011561,0.006234,0.0,0.0,34.208,26.333333,52459.141083
std,40216.2,11096.935896,15256.967835,15254660000.0,4.975471,15.636717,0.895868,0.287468,0.734841,96.200003,...,0.0,0.0,41.703263,29.06228,0.081741,0.0,0.0,25.840049,19.68502,33880.224028
min,1000000.0,10100.0,1001.0,1001020000.0,13.49503,-166.722478,0.0,0.0,0.0,1.0,...,0.0,0.0,11.0,11.0,0.0,0.0,0.0,11.0,12.0,4.0
25%,1039279.0,19740.0,17053.0,17053960000.0,34.983064,-96.380764,1.0,0.0,0.0,18.0,...,0.0,0.0,17.0,14.0,0.0,0.0,0.0,16.0,13.0,24226.25
50%,1073499.0,32580.0,29095.0,29095010000.0,39.312214,-86.490946,1.0,0.0,0.0,40.0,...,0.0,0.0,29.0,24.0,0.0,0.0,0.0,25.0,19.0,49850.5
75%,1108144.0,39300.0,41015.0,41013950000.0,41.799999,-79.05225,2.0,0.0,1.0,82.0,...,0.0,0.0,53.0,46.0,0.0,0.0,0.0,43.0,22.0,78774.75
max,1163400.0,99999.0,69120.0,56045950000.0,65.160556,145.751129,106.0,13.0,24.0,5881.0,...,0.0,0.0,449.0,191.0,5.0,0.0,0.0,187.0,62.0,127185.0


In [11]:
# Investigate state names
states=df['State'].unique()
states.sort()
print(len(states))

# Get number of entries states per state and sort
d1=df[['State']]
d1['number']=1
d1=d1.groupby('State', as_index=False).sum()
d1=d1.sort_values('number', ascending=False)
d1

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,State,number
4,CA,6492
35,NY,5658
44,TX,4047
36,OH,3664
28,NC,3037
9,FL,2987
39,PA,2890
14,IL,2763
24,MO,2569
19,MA,2556


In [12]:
display(df[df['State']=='WP'])  # This state should actually be GU, Tamuning in Guam
display(df[df['State']=='MP'])

Unnamed: 0,NHPDPropertyID,PropertyName,PropertyAddress,City,State,Zip,CBSACode,CBSAType,County,CountyCode,...,NumberActiveMR,NumberInconclusiveMR,NumberInactiveMR,Mr_1_Status,Mr_1_ProgramName,Mr_1_AssistedUnits,Mr_2_Status,Mr_2_ProgramName,Mr_2_AssistedUnits,OldNHPDPropertyID
70965,1118506,GHURA ELDERLY HOUSING,145 TRANKILUDAT ST,TAMUNING,WP,96913,,,,,...,0,0,0,,,,,,,


Unnamed: 0,NHPDPropertyID,PropertyName,PropertyAddress,City,State,Zip,CBSACode,CBSAType,County,CountyCode,...,NumberActiveMR,NumberInconclusiveMR,NumberInactiveMR,Mr_1_Status,Mr_1_ProgramName,Mr_1_AssistedUnits,Mr_2_Status,Mr_2_ProgramName,Mr_2_AssistedUnits,OldNHPDPropertyID
6943,1019578,ROTA,ROTA SECTION 8 SUB,ROTA,MP,96951,,,,69100.0,...,0,0,0,,,,,,,
6945,1019580,KOBLERVILLE,KOBLERVILLE S8 SUBDIVISION,SAIPAN,MP,96950,,,,69110.0,...,0,0,0,,,,,,,
6947,1019582,MIHAVILLE,MIHAVILLE S8 SUBDIVISION,SAIPAN,MP,96950,,,,69110.0,...,0,0,0,,,,,,,
6949,1019584,TINIAN,"TINIAN, SAN JOSE VILLAGE","SAN JOSE, TINIAN",MP,96952,,,,69120.0,...,0,0,0,,,,,,,
59630,1105443,SANDY BEACH HOMES,1 SAN ISIDRO AVE,SAIPAN,MP,96950,,,,,...,0,0,0,,,,,,,
77038,1130111,TASI HOMES,LOT 003 H 44 CHALAN HAGOI & BEACH RD,SAIPAN,MP,96950,,,,,...,0,0,0,,,,,,,
77039,1130112,SAIPAN COMFORT HOMES,CHALAN TUN ANTONIO,SAIPAN,MP,96950,,,,,...,0,0,0,,,,,,,
77040,1130113,BLUE WATER HOMES LLC,MIDDLE RD GUALO RAI,SAIPAN,MP,96950,,,,,...,0,0,0,,,,,,,
81647,1161606,ZEN HOMES,LOT 1877-R3 GUALO RAI RD GUALO RAI,SAIPAN,MP,96950,,,,,...,0,0,0,,,,,,,


In [None]:
# Investigate city names
cities=df['City'].unique()
cities.sort()
len(cities)

In [13]:
# Investigating uniformity of entries
display(df.CBSAType.value_counts())

display(df.CongressionalDistrict.value_counts())

display(df.PropertyStatus.value_counts())

display(df.OwnerType.value_counts())

display(df.ManagerType.value_counts())

display(df.TargetTenantType.value_counts())

display(df.S8_1_Status.value_counts())

Metropolitan Statistical Area    44927
Metropolitan                     15292
Micropolitan Statistical Area     5004
Micropolitan                      4929
Name: CBSAType, dtype: int64

01    9550
02    9383
03    7867
04    6882
07    5546
05    5486
06    4575
08    4067
09    2861
13    2671
00    2652
10    2040
12    1870
15    1717
11    1587
16    1066
14    1065
18     990
17     915
24     692
20     659
21     612
26     599
19     576
27     574
25     553
23     534
34     519
22     416
98     340
35     274
36     262
28     258
29     232
30     223
51     206
33     201
37     175
31     163
47     136
46     112
32     111
44     110
40     105
43      98
50      87
53      79
52      73
49      69
41      63
38      52
45      46
39      46
48      44
42      26
AL       1
Name: CongressionalDistrict, dtype: int64

Active          79525
Inconclusive     2762
Name: PropertyStatus, dtype: int64

Non-Profit          18900
For Profit          18832
Profit Motivated     7824
Public Entity        6344
Multiple             5796
Limited Profit       5795
Limited Dividend      688
Unknown                 3
Name: OwnerType, dtype: int64

Profit Motivated    13616
For Profit           9572
Non-Profit           8565
Public Entity        6482
Multiple             1075
Limited Dividend      199
Profit Seeking          1
Name: ManagerType, dtype: int64

Family                                24836
Elderly                               13738
Elderly or disabled                    5834
Disabled                               4334
Mixed                                  2403
Eldery or Disabled                      408
FAMILY                                  130
Congregate                               49
Group Home                               31
Homeless                                 30
Health Care                              28
Veterans                                 11
MIXED                                     7
Low Income                                6
Senior                                    5
Affordable                                2
Mixed;Link                                2
Indv. families - not eld/ handicap        2
Homeless Veterans                         1
Mixed Income                              1
Family & Elderly                          1
OTHER                                     1
ELDERLY                         

Active          21687
Inconclusive      777
Name: S8_1_Status, dtype: int64

**Data Cleaning**

In [17]:
# Homogenizing column formats
data = copy.deepcopy(df)
data['PropertyName'] = data['PropertyName'].str.upper()
data['PropertyAddress'] = data['PropertyAddress'].str.title()
data['City'] = data['City'].str.title()
data['County'] = data['County'].str.title()
data['Owner'] = data['Owner'].str.upper()
data['ManagerName'] = data['ManagerName'].str.upper()

# Replace state name for Guam
data['State'].loc[data['State']=='WP'] = 'GU'

# Target Tenant Type cleaning
data['TargetTenantType'] = data['TargetTenantType'].str.replace("Eldery or Disabled", "Elderly or Disabled")
data['TargetTenantType'] = data['TargetTenantType'].str.replace("Mixed;Link", "Mixed")
data['TargetTenantType'] = data['TargetTenantType'].str.replace("Indv. families - not eld/ handicap", "Family")
data['TargetTenantType'] = data['TargetTenantType'].str.replace("Senior", "Elderly")
data['TargetTenantType'] = data['TargetTenantType'].str.replace("Family & Elderly", "Mixed")
data['TargetTenantType'] = data['TargetTenantType'].str.replace("Homeless Veterans", "Veterans")
data['TargetTenantType'] = data['TargetTenantType'].str.replace("Mixed Income", "Mixed")
data['TargetTenantType'] = data['TargetTenantType'].str.replace("OTHER", "Mixed")
data['TargetTenantType'] = data['TargetTenantType'].str.replace("Affordable", "Low Income")
data['TargetTenantType'] = data['TargetTenantType'].str.title()

  app.launch_new_instance()


In [19]:
data.head()

Unnamed: 0,NHPDPropertyID,PropertyName,PropertyAddress,City,State,Zip,CBSACode,CBSAType,County,CountyCode,...,NumberActiveMR,NumberInconclusiveMR,NumberInactiveMR,Mr_1_Status,Mr_1_ProgramName,Mr_1_AssistedUnits,Mr_2_Status,Mr_2_ProgramName,Mr_2_AssistedUnits,OldNHPDPropertyID
0,1000000,IVY ESTATES,6729 Zeigler Blvd,Mobile,AL,36608-4253,33660.0,Metropolitan Statistical Area,Mobile,1097.0,...,0,0,0,,,,,,,
1,1000001,RENDU TERRACE WEST,7400 Old Shell Rd,Mobile,AL,36608-4549,33660.0,Metropolitan Statistical Area,Mobile,1097.0,...,0,0,0,,,,,,,
2,1000002,TWB RESIDENTIAL OPPORTUNITIES II,93 Canal Rd,Port Jefferson Station,NY,11776-3024,35620.0,Metropolitan,Suffolk,36103.0,...,0,0,0,,,,,,,
3,1000003,THE DAISY HOUSE,615 Clarissa St,Rochester,NY,14608-2485,40380.0,Metropolitan,Monroe,36055.0,...,0,0,0,,,,,,,
4,1000004,MAIN AVENUE APARTMENTS,105 E Walnut St,Sylacauga,AL,35150-3012,45180.0,Micropolitan Statistical Area,Talladega,1121.0,...,0,0,0,,,,,,,


**Does Florida reserve more units for certain protected groups compared to other locations?**

How does Florida compare to the rest of the Southeast, and how does it compare to the rest of the United States as a whole?

In [47]:
# Get number of tenant types per state and sort
data1=data[['TargetTenantType']]
data1['number']=1
data1=data1.groupby(['TargetTenantType'], as_index=False).sum()
data1=data1.sort_values('number', ascending=False)
data1['TargetTenantType'].to_list()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



['Family',
 'Elderly',
 'Elderly Or Disabled',
 'Disabled',
 'Mixed',
 'Congregate',
 'Group Home',
 'Homeless',
 'Health Care',
 'Veterans',
 'Low Income']

In [56]:
# Get Number of restaurants per state and sort
data2=data[['TargetTenantType', 'State']]
data2['Number']=1
data2=data2.groupby(['TargetTenantType', 'State'], as_index=False).sum()
data2=data2.sort_values('Number', ascending=False)

# Add in Region variable
data2['Region'] = ''
data2['Region'].loc[~data2['State'].str.contains('MP|GU')] = 'US (non-SE)'  # Remove territories
data2['Region'].loc[data2['State']=='FL'] = 'FL'                   # Isolate Florida
data2['Region'].loc[data2['State'].str.contains('AL|AR|GA|KY|LA|MS|NC|SC|TN|VA|WV', regex=True)] = 'SE (not FL)'  # Isolate southeast states

display(data2)
display(data2[data2['State']=='FL'])



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,TargetTenantType,State,Number,Region
174,Family,CA,2730,US (non-SE)
179,Family,FL,1387,FL
198,Family,NC,1303,SE (not FL)
205,Family,NY,1218,US (non-SE)
214,Family,TX,1053,US (non-SE)
...,...,...,...,...
254,Low Income,MA,1,US (non-SE)
255,Low Income,MO,1,US (non-SE)
256,Low Income,NJ,1,US (non-SE)
257,Low Income,OH,1,US (non-SE)


Unnamed: 0,TargetTenantType,State,Number,Region
179,Family,FL,1387,FL
77,Elderly,FL,397,FL
27,Disabled,FL,155,FL
128,Elderly Or Disabled,FL,136,FL
268,Mixed,FL,107,FL
252,Homeless,FL,29,FL
2,Congregate,FL,1,FL
236,Health Care,FL,1,FL


In [69]:
# Southeast states
selist = ['AL', 'AR', 'GA', 'KY', 'LA', 'MS', 'NC', 'SC', 'TN', 'VA', 'WV']
print('|'.join(selist))

# Splitting category lists
tlist = ['Family', 'Elderly','Elderly Or Disabled','Disabled','Mixed']
tlist1 = ['Congregate','Group Home','Homeless','Health Care','Veterans','Low Income']
print('|'.join(tlist))
print('|'.join(tlist1))

AL|AR|GA|KY|LA|MS|NC|SC|TN|VA|WV
Family|Elderly|Elderly Or Disabled|Disabled|Mixed
Congregate|Group Home|Homeless|Health Care|Veterans|Low Income


In [68]:
# Plot each protected group, allocated housing split by region
fig = px.bar(data2.loc[data2['Region']!=''].loc[data2['TargetTenantType'].str.contains('Family|Elderly|Elderly Or Disabled|Disabled|Mixed', regex=True)], 
             x='TargetTenantType', y='Number',title="Protected Groups Allocated Housing in USA", color='Region', barmode="group", 
             category_orders={"TargetTenantType": ['Family', 'Elderly','Elderly Or Disabled','Disabled','Mixed'],
                    "Region": ['US (non-SE)', 'SE (not FL)', "FL"]})
fig.show()

fig = px.bar(data2.loc[data2['Region']!=''].loc[data2['TargetTenantType'].str.contains('Congregate|Group Home|Homeless|Health Care|Veterans|Low Income', regex=True)], 
             x='TargetTenantType', y='Number',title="Protected Groups Allocated Housing in USA", color='Region', barmode="group", 
             category_orders={"TargetTenantType": ['Congregate','Group Home','Homeless','Health Care','Veterans','Low Income'],
                    "Region": ['US (non-SE)', 'SE (not FL)', "FL"]})
fig.show()

_________________________
_________________________
**Code from others: Collating useful data**

In [None]:
# Useful Columns from Anabel
# ActiveSubsidies
# TotalInconclusiveSubsidies
# TotalInactiveSubsidies
# TotalUnits
# OwnerType
# ManagerType (ManagerName?)
# ReacScore1
# ReacScore2
# ReacScore3
# StudioOneBedroomUnits
# TwoBedroomUnits
# ThreePlusBedroomUnits
# PercentofELIHouseholds (Percent of Extremely Low Income Households)
# TargetTenantType
# FairMarketRent_2BR
# OccupancyRate
# AverageMonthsOfTenancy
# s202_1_principalbalance

In [20]:
# Code from Merrick Usta
import gdown
#Read housing loss data
#Hillsborough
url = 'https://drive.google.com/uc?id=1abt4fLPO__KxBLz9SXue5VKeZN3cUcCF&export=download'
output = './data/hills_loss.csv'
gdown.download(url, output, quiet=False)
#Miami-Dade
url = 'https://drive.google.com/uc?id=1gLojTGS6HQ1s60gmIxFCq2xObB1634BU&export=download'
output = './data/miami_loss.csv'
gdown.download(url, output, quiet=False)
#Orange
url = 'https://drive.google.com/uc?id=15ee2QrH8a_yuIfptGwAsVF-tWGTEYXCy&export=download'
output = './data/orange_loss.csv'
gdown.download(url, output, quiet=False)
housing_loss=pd.concat([pd.read_csv('./data/hills_loss.csv'),
                       pd.read_csv('./data/miami_loss.csv'),
                       pd.read_csv('./data/orange_loss.csv')])
#Get census tracts, compare with those in NHPD data
housing_loss.rename(columns={'census_tract_GEOID':'CensusTract'},inplace=True)

# LP added
nhpd_data = data.copy()
nhpd_data = nhpd_data.merge(housing_loss,how="inner",on=['CensusTract'])

Downloading...
From: https://drive.google.com/uc?id=1abt4fLPO__KxBLz9SXue5VKeZN3cUcCF&export=download
To: /Users/lprichard/Dropbox/Data_Science/DataKind_18Sep21/data/hills_loss.csv
100%|██████████| 207k/207k [00:00<00:00, 2.41MB/s]
Downloading...
From: https://drive.google.com/uc?id=1gLojTGS6HQ1s60gmIxFCq2xObB1634BU&export=download
To: /Users/lprichard/Dropbox/Data_Science/DataKind_18Sep21/data/miami_loss.csv
100%|██████████| 321k/321k [00:00<00:00, 4.01MB/s]
Downloading...
From: https://drive.google.com/uc?id=15ee2QrH8a_yuIfptGwAsVF-tWGTEYXCy&export=download
To: /Users/lprichard/Dropbox/Data_Science/DataKind_18Sep21/data/orange_loss.csv
100%|██████████| 143k/143k [00:00<00:00, 1.94MB/s]


In [21]:
display(nhpd_data.head())
display(nhpd_data.info())

Unnamed: 0,NHPDPropertyID,PropertyName,PropertyAddress,City,State,Zip,CBSACode,CBSAType,County,CountyCode,...,lien-foreclosure-rate-2019,avg-eviction-rate,ratio-to-mean-foreclosure-rate,ratio-to-mean-eviction-rate,avg-housing-loss-rate,evictions-pct-total-housing-loss,housing-loss-index,county_GEOID,county,state
0,1000163,BUENA VISTA APARTMENTS,521 Sw 6Th St,Miami,FL,33130-2773,33100.0,Metropolitan Statistical Area,Miami-Dade,12086.0,...,0.0,1.444604,0.598472,0.989825,1.439467,0.902985,0.799893,12086,Miami-Dade County,Florida
1,1000165,VILLA BEATRIZ,776 Nw 2Nd St,Miami,FL,33128-1454,33100.0,Metropolitan Statistical Area,Miami-Dade,12086.0,...,0.0,1.444604,0.598472,0.989825,1.439467,0.902985,0.799893,12086,Miami-Dade County,Florida
2,1000613,JOE MORETTI II,535 Sw 6Th St,Miami,FL,33130-2745,33100.0,Metropolitan Statistical Area,Miami-Dade,12086.0,...,0.0,1.444604,0.598472,0.989825,1.439467,0.902985,0.799893,12086,Miami-Dade County,Florida
3,1014273,HUNTER RIVER WALK APARTMENTS,524 Nw 1St St,Miami,FL,33128-1572,33100.0,Metropolitan Statistical Area,Miami-Dade,12086.0,...,0.0,1.444604,0.598472,0.989825,1.439467,0.902985,0.799893,12086,Miami-Dade County,Florida
4,1018226,VILLA SARA,435 Sw 6Th St,Miami,FL,33130-2875,33100.0,Metropolitan Statistical Area,Miami-Dade,12086.0,...,0.0,1.444604,0.598472,0.989825,1.439467,0.902985,0.799893,12086,Miami-Dade County,Florida


<class 'pandas.core.frame.DataFrame'>
Int64Index: 910 entries, 0 to 909
Columns: 324 entries, NHPDPropertyID to state
dtypes: datetime64[ns](42), float64(113), int64(71), object(98)
memory usage: 2.3+ MB


None

In [22]:
# Code from Nepur Neti
# Make lists of columns to sum and those to average at a census tract level
columns_to_sum = ['ActiveSubsidies', 'TotalInconclusiveSubsidies',
       'TotalInactiveSubsidies', 'TotalUnits', 
       'NumberActiveSection8', 'NumberInconclusiveSection8',
       'NumberInactiveSection8', 'NumberActiveSection202','NumberActiveHUDInsured',
       'NumberInconclusiveHUDInsured', 'NumberInactiveHud',
       'NumberActiveLihtc', 'NumberInconclusiveLihtc', 'NumberInactiveLihtc',
       'NumberActiveSection515', 
       'NumberInactiveSection515', 'NumberActiveSection538',
       'NumberActiveHome', 'NumberInconclusiveHome', 'NumberInactiveHome',
       'NumberActivePublicHousing', 
       'NumberInactivePublicHousing', 'NumberActiveState', 'NumberInactiveState', 'NumberActivePBV', 'NumberActiveMR']
columns_to_average = ['TotalUnits',
 'StudioOneBedroomUnits',
 'TwoBedroomUnits',
 'ThreePlusBedroomUnits',
 'FairMarketRent_2BR']

#Create a dictionary to pass to agg function 
agg_dict = dict()

for col in columns_to_sum:
    agg_dict[col] = "sum"
for col in columns_to_average:
    agg_dict[col] = "mean"

 # Aggregate the dataframe at a census tract level using the aggregate dictionary we just created
nhpd_fl_census_tract = nhpd_data.groupby(['CensusTract', 'County', 'CountyCode']).agg(agg_dict).reset_index()

In [23]:
nhpd_fl_census_tract

Unnamed: 0,CensusTract,County,CountyCode,ActiveSubsidies,TotalInconclusiveSubsidies,TotalInactiveSubsidies,TotalUnits,NumberActiveSection8,NumberInconclusiveSection8,NumberInactiveSection8,...,NumberActivePublicHousing,NumberInactivePublicHousing,NumberActiveState,NumberInactiveState,NumberActivePBV,NumberActiveMR,StudioOneBedroomUnits,TwoBedroomUnits,ThreePlusBedroomUnits,FairMarketRent_2BR
0,1.205700e+10,Hillsborough,12057.0,8,0,1,55.571429,4,0,0,...,0,0,2,0,2,0,69.50,5.600000,0.000000,1220.0
1,1.205700e+10,Hillsborough,12057.0,4,0,3,63.000000,2,0,0,...,0,0,1,0,0,0,92.00,1.333333,0.000000,1070.0
2,1.205700e+10,Hillsborough,12057.0,1,0,0,84.000000,0,0,0,...,0,0,0,0,1,0,,,,1070.0
3,1.205700e+10,Hillsborough,12057.0,1,0,0,2.000000,0,0,0,...,0,0,0,0,0,0,,1.000000,1.000000,1070.0
4,1.205700e+10,Hillsborough,12057.0,1,0,0,96.000000,0,0,0,...,0,0,1,0,0,0,72.00,24.000000,0.000000,1220.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
349,1.209502e+10,Orange,12095.0,1,1,0,55.000000,0,1,0,...,0,0,1,0,0,0,55.00,0.000000,0.000000,1980.0
350,1.209502e+10,Orange,12095.0,2,0,2,95.500000,0,0,0,...,1,0,0,0,0,1,,0.000000,0.000000,1130.0
351,1.209502e+10,Orange,12095.0,1,0,0,224.000000,0,0,0,...,1,0,0,0,0,0,,,,1500.0
352,1.209502e+10,Orange,12095.0,11,1,2,128.375000,3,1,0,...,1,0,3,0,0,0,50.00,46.250000,36.428571,1160.0
