# Fair Housing Project
### Assigned by Bob Gradeck of the Western PA Data Conservancy
Tara Schroth, Stephen Vandrak, Gloria Givler, Annie Goodwin  

### Project Objective
Currently, the affordable housing initiative struggles to place property bids quickly, due to bureaucracy inherent in the organization (working with government resources, etc). Our objective with this project is to establish a list of property owners that may be selling multi-unit properties in the near future. That way, the affordable housing initiative can reach out proactively to these owners and potentially strike a deal to purchase properties before they hit the market.

In [1]:
#import all necessary modules and packages
import pandas as pd
pd.set_option('display.max_rows', None)
from sklearn import datasets
from sklearn.feature_selection import chi2
from sklearn.feature_selection import SelectKBest
import string
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn import metrics
from sklearn import tree
import matplotlib.pyplot as plt
!pip install geopy
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="example app")
!pip install plotly
import plotly.express as px

Collecting geopy
  Downloading geopy-2.3.0-py3-none-any.whl (119 kB)
[K     |████████████████████████████████| 119 kB 2.1 MB/s eta 0:00:01
[?25hCollecting geographiclib<3,>=1.52
  Downloading geographiclib-2.0-py3-none-any.whl (40 kB)
[K     |████████████████████████████████| 40 kB 2.0 MB/s eta 0:00:01
[?25hInstalling collected packages: geographiclib, geopy
Successfully installed geographiclib-2.0 geopy-2.3.0
Collecting plotly
  Downloading plotly-5.11.0-py2.py3-none-any.whl (15.3 MB)
[K     |████████████████████████████████| 15.3 MB 12.9 MB/s eta 0:00:01
[?25hCollecting tenacity>=6.2.0
  Downloading tenacity-8.1.0-py3-none-any.whl (23 kB)
Installing collected packages: tenacity, plotly
Successfully installed plotly-5.11.0 tenacity-8.1.0


## Importing, cleaning, and merging the datasets

After reading in the sales and assessments datasets and dropping unwanted columns, we wanted to narrow the assessments dataset down to only the properties we are interested in. 

We decided to filter by the feature "USEDESC" (use description). We weren't interested in single-family homes, multi-family homes such as duplexes, condominum units, or any other types of residential properties that could not easily be converted to low-income housing. We decided that we were only interested in properties with the following use descriptions:
- APART:40+ UNITS
- APART:20-39 UNITS
- APART: 5-19 UNITS
- DWG APT CONVERSION


In [2]:
# Read in sales data and look at first 5 rows
sales=pd.read_csv('SalesData.csv', low_memory=False)
sales.head()

Unnamed: 0,PARID,PROPERTYHOUSENUM,PROPERTYFRACTION,PROPERTYADDRESSDIR,PROPERTYADDRESSSTREET,PROPERTYADDRESSSUF,PROPERTYADDRESSUNITDESC,PROPERTYUNITNO,PROPERTYCITY,PROPERTYSTATE,...,MUNIDESC,RECORDDATE,SALEDATE,PRICE,DEEDBOOK,DEEDPAGE,SALECODE,SALEDESC,INSTRTYP,INSTRTYPDESC
0,1075F00108000000,4720,,,HIGHPOINT,DR,,,GIBSONIA,PA,...,Hampton,2012-09-27,2012-09-27,120000.0,15020,356,3,LOVE AND AFFECTION SALE,DE,DEED
1,0011A00237000000,0,,,LOMBARD,ST,,,PITTSBURGH,PA,...,3rd Ward - PITTSBURGH,2015-01-06,2015-01-06,1783.0,TR15,2,2,CITY TREASURER SALE,TS,TREASURER DEED
2,0011J00047000000,1903,,,FORBES,AVE,,,PITTSBURGH,PA,...,1st Ward - PITTSBURGH,2012-10-26,2012-10-26,4643.0,TR13,3,2,CITY TREASURER SALE,TS,TREASURER DEED
3,0113B00029000000,479,,,ROOSEVELT,AVE,,,PITTSBURGH,PA,...,Bellevue,2017-03-27,2017-03-06,0.0,16739,166,3,LOVE AND AFFECTION SALE,CO,CORRECTIVE DEED
4,0119S00024000000,5417,,,NATRONA,WAY,,,PITTSBURGH,PA,...,10th Ward - PITTSBURGH,2015-02-04,2015-02-04,27541.0,TR15,59,GV,GOVERNMENT SALE,TS,TREASURER DEED


In [3]:
# Convert sale date from string to date
sales['SALEDATE']=pd.to_datetime(sales['SALEDATE'])

In [4]:
# Read in assessments data and look at first 5 rows
#change path to work with your file setup
#path='C:\\Users\\Tara\\OneDrive - University of Pittsburgh\\FALL 2022\\ENGR 1171\\Project Housing Data\\'

assessment=pd.read_csv('AssessmentData.csv', low_memory=False)
assessment.head()

FileNotFoundError: [Errno 2] No such file or directory: 'AssessmentData.csv'

In [None]:
# Drop unwanted columns and look at first 5 rows
assessment=assessment[['PARID','PROPERTYHOUSENUM','PROPERTYFRACTION','PROPERTYADDRESS','PROPERTYUNIT','MUNIDESC','OWNERDESC','CLASSDESC', 
'USEDESC', 'LOTAREA','HOMESTEADFLAG','FARMSTEADFLAG','ABATEMENTFLAG','SALEDATE','SALEPRICE','SALEDESC','PREVSALEDATE','PREVSALEPRICE',
 'PREVSALEDATE2','PREVSALEPRICE2','CHANGENOTICEADDRESS1','CHANGENOTICEADDRESS2','CHANGENOTICEADDRESS3','CHANGENOTICEADDRESS4',
'STYLEDESC','STORIES','YEARBLT','CDUDESC',
]]
assessment.head()

In [None]:
# Narrow it down to properties we’re interested in
# Filter by USEDESC (see above explanation of choices)
assessment=assessment.loc[
    (assessment['USEDESC'].isin(['APART:40+ UNITS','APART:20-39 UNITS','APART:5-19 UNITS','DWG APT CONVERSION']))
]
assessment.head()

In [None]:
# Merge the two datasets
# Pull the latest sale date for each property (Tara’s code)
salesgrouped=sales[['PARID','SALEDATE']].groupby('PARID').agg({'SALEDATE':'max'}).reset_index().rename(columns={'SALEDATE':'FINALSALEDATE'})
df=pd.merge(assessment,salesgrouped,how='left',on='PARID')

## Determining the owners that have purchased the most properties in the past 2 years

We wanted to determine who has purchased the most properties in the past 2 years, with the goal of seeing which companies are currently growing their real-estate portfolios. They are less likely to be selling a lot of properties.

In [None]:
# Filter to last 2 years of sales only, look at first 5 rows
last2years=df.loc[df['FINALSALEDATE']>pd.Timestamp('2020-09-22 00:00:00')]
last2years.head()

In [None]:
# Find owners who have purchased at least 3 multiunit properties in the last 2 years and look at df
pd.options.mode.chained_assignment = None  # default='warn'
last2years['OwnerInfo']=last2years['CHANGENOTICEADDRESS1'].astype(str)+' '+last2years['CHANGENOTICEADDRESS2'].astype(str)+' '+last2years['CHANGENOTICEADDRESS3'].astype(str)+' '+last2years['CHANGENOTICEADDRESS4'].astype(str)
buyersofmany=last2years[['PARID','OwnerInfo']].drop_duplicates().groupby('OwnerInfo').size().reset_index(name='NumPropertiesBought')
buyersofmany=buyersofmany.loc[(~buyersofmany['OwnerInfo'].isna())&(buyersofmany['NumPropertiesBought']>2)]
last2years.head()

In [None]:
# Group by the owner to get a list of owners
print('There are', len(buyersofmany), 'owners that have purchased at least three multiunit residential properties in the past 2 years')
buyersofmany.sort_values(['NumPropertiesBought'],ascending=[False])

# Owner's contact info (phone number/primary residence) is not available in the dataset

## Determining the owners that currently own the most properties

We wanted to determine who currently owns the most properties, based on the property assessments.

In [None]:
# Look at all data we have
alltime=df
alltime.head()

In [None]:
# Find owners who have purchased a multiunit property, ever, and look at df
pd.options.mode.chained_assignment = None  # default='warn'
alltime['OwnerInfo']=alltime['CHANGENOTICEADDRESS1'].astype(str)+' '+alltime['CHANGENOTICEADDRESS2'].astype(str)+' '+alltime['CHANGENOTICEADDRESS3'].astype(str)+' '+alltime['CHANGENOTICEADDRESS4'].astype(str)
ownersofmany=alltime[['PARID','OwnerInfo']].drop_duplicates().groupby('OwnerInfo').size().reset_index(name='NumPropertiesOwned')
ownersofmany=ownersofmany.loc[(~ownersofmany['OwnerInfo'].isna())&(ownersofmany['NumPropertiesOwned']>5)]
alltime.head()

In [None]:
# Group by the owner to get a list of owners
print('There are', len(ownersofmany), 'owners that have bought at least 5 multiunit residential properties, ever')
ownersofmany.sort_values(['NumPropertiesOwned'],ascending=[False])

# Remove buyers of many from owners of many
We are interested in companies who currently own a lot of multiunit properties, but aren't recently buying a lot more multiunit properties. We want to eliminate those individuals, as they are more likely to be competitors of the fair housing initiative in terms of placing bids on new properties.

In [None]:
#merging the dataframes and dropping duplicates, printing the list - changed to eliminate the NA # bought col
# interestingowners=(ownersofmany.merge(buyersofmany, on='OwnerInfo', how='left', indicator=True)
#      .query('_merge == "left_only"')
#      .drop('_merge', 1))
interestingowners=ownersofmany.loc[~ownersofmany['OwnerInfo'].isin(buyersofmany['OwnerInfo'])]
print('There are', len(interestingowners), 'owners that have bought at least 5 multiunit residential properties, fewer than 3 of which were bought in the last year.')
interestingowners.sort_values(['NumPropertiesOwned'],ascending=[False])

### Note: the code block below is optional!

Only run the code block below if you have new houses and need their coordinates!

In [None]:
# Pull the addresses of all properties owned by the above owners
assessmentcopy=pd.read_csv('AssessmentData1.csv')
ownerproperties=df.loc[df['OwnerInfo'].isin(interestingowners['OwnerInfo'])]
assessmentcopy=assessmentcopy.loc[assessmentcopy['PARID'].isin(ownerproperties['PARID'])]
assessmentcopy['Address']=assessmentcopy['PROPERTYHOUSENUM'].astype(str)+' '+assessmentcopy['PROPERTYADDRESS'].astype(str)+' '+assessmentcopy['PROPERTYCITY'].astype(str)+' '+assessmentcopy['PROPERTYSTATE'].astype(str)+' '+assessmentcopy['PROPERTYZIP'].astype(int).astype(str)
housestomap=assessmentcopy[['PARID','Address']]

In [None]:
# Use geopy to get the lat and lon coordinates for each house so we can map them
housestomap['Lat']=''
housestomap['Lon']=''
housestomap=housestomap.reset_index()
for i in range(len(housestomap['Address'])):
    data=geolocator.geocode(housestomap.loc[i,'Address'])
    try:
        housestomap.loc[i,'Lat']=data.raw.get("lat")
        housestomap.loc[i,'Lon']=data.raw.get("lon")
    except:
        continue
housestomap.to_csv('housestomap.csv')

In [None]:
# We had to manually fill in some of the coordinates, hence the read.csv
#can use google maps, and right click on the pin to get the lat and lon
housestomap=pd.read_csv('housestomap.csv')

In [None]:
# Merge the owner address and number of properties owned with the coordinates
mergedhousestomap=pd.merge(housestomap,df[['PARID','OwnerInfo']],how='left',on='PARID')
# Create a new df that adds on the number of properties owned by each owner
mergedhousestomap2=pd.merge(mergedhousestomap,interestingowners,how='left',on='OwnerInfo')

In [None]:
# Create a regular map of all the houses
fig = px.scatter_mapbox(mergedhousestomap,lat='Lat',lon='Lon', hover_name="OwnerInfo")
fig.update_layout(title = 'Map of Possible Units', title_x=0.5)
fig.update_layout(mapbox_style="open-street-map")
fig.show()

In [None]:
# Map where the houses are, color coded by how many properties their owner owns
fig = px.scatter_mapbox(mergedhousestomap2,lat='Lat',lon='Lon', hover_name="OwnerInfo",color='NumPropertiesOwned',color_continuous_scale='Bluered_r')
fig.update_layout(title = 'Map of Possible Units', title_x=0.5)
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(showlegend=False)
fig.show()

## Incorporating Community Sites Onto the Map
On the Western Pennsylvania Regional Data Center website, we found a dataset that is a list of community assets in the county. Examples of assets include (but are not limited to) gas stations, barbers, doctor's offices, and grocery stores. We thought that it may be useful to incorporate some of these locations into the map view, so that specific properties could be investigated as good future low-income housing sites based on the infrastructure nearby. Not everyone living in low-income housing will have access to a car, so ensuring that some level of resources are walkable will set residents up for less hardship.

We decided to only include the resources we viewed as most critical to daily life, which were the following:
- acha_community_sites (Allegheny County Housing Authority Community Centers)
- achd_clinics (Allegheny County Health Department Clinics)
- child_care_centers
- dentists
- doctors_offices
- family_support_centers
- food_banks
- health_centers
- laundromats
- libraries
- pharmacies
- rec_centers
- senior_centers
- supermarkets
- va_facilities (Veterans Affairs Facilities)
- wic_offices (Women, Infants, and Children Offices)

In [None]:
# Read in community assets data, drop unwanted columns, and look at first 5 rows
communityassets=pd.read_csv('CommunityAssets.csv', low_memory=False)
communityassets=communityassets[['name','asset_type','street_address','city','state','latitude','longitude']]
communityassets.head()

In [None]:
# Narrow it down to assets we’re interested in
# Filter by asset_type (see above explanation of choices)
communityassets=communityassets.loc[
    (communityassets['asset_type'].isin(['acha_community_sites','achd_clinics','child_care_centers','dentists', 'doctors_offices',
                                         'family_support_centers','food_banks','health_centers','laundromats','libraries','pharmacies',
                                        'rec_centers','senior_centers','supermarkets','va_facilities','wic_offices']))
]

In [None]:
#Creating a more general variable of asset class
communityassets['asset_class'] = communityassets.loc[:, 'asset_type']
communityassets['asset_class'] = communityassets['asset_class'].replace(['acha_community_sites'], 'community_centers')
communityassets['asset_class'] = communityassets['asset_class'].replace(['achd_clinics'], 'healthcare_providers')
communityassets['asset_class'] = communityassets['asset_class'].replace(['child_care_centers'], 'child_resources')
communityassets['asset_class'] = communityassets['asset_class'].replace(['dentists'], 'healthcare_providers')
communityassets['asset_class'] = communityassets['asset_class'].replace(['doctors_offices'], 'healthcare_providers')
communityassets['asset_class'] = communityassets['asset_class'].replace(['family_support_centers'], 'child_resources')
communityassets['asset_class'] = communityassets['asset_class'].replace(['food_banks'], 'food_sources')
communityassets['asset_class'] = communityassets['asset_class'].replace(['health_centers'], 'healthcare_providers')
communityassets['asset_class'] = communityassets['asset_class'].replace(['laundromats'], 'laundromats')
communityassets['asset_class'] = communityassets['asset_class'].replace(['libraries'], 'community_centers')
communityassets['asset_class'] = communityassets['asset_class'].replace(['pharmacies'], 'pharmacies')
communityassets['asset_class'] = communityassets['asset_class'].replace(['rec_centers'], 'community_centers')
communityassets['asset_class'] = communityassets['asset_class'].replace(['senior_centers'], 'community_centers')
communityassets['asset_class'] = communityassets['asset_class'].replace(['supermarkets'], 'food_sources')
communityassets['asset_class'] = communityassets['asset_class'].replace(['va_facilities'], 'healthcare_providers')
communityassets['asset_class'] = communityassets['asset_class'].replace(['wic_offices'], 'wic_offices')
communityassets.head()

In [None]:
# Create a regular map of all the community assets, colored by asset_type
fig = px.scatter_mapbox(communityassets,lat='latitude',lon='longitude', hover_name="name", color='asset_class')
fig.update_layout(title = 'Map of Community Assets', title_x=0.5)
fig.update_layout(mapbox_style="open-street-map")
fig.show()

In [None]:
#Create new dataframe and change names of columns to match eachother
newcommunityassets=communityassets.copy()
newcommunityassets.rename(columns = {'latitude':'Lat','longitude':'Lon','street_address':'Address'}, inplace = True)
newcommunityassets.head()  

In [None]:
#Merge housing data with community assets
comboassetandhouses=outer_merged = pd.merge(mergedhousestomap2, newcommunityassets, how="outer", on=["Lat", "Lon","Address"])
comboassetandhouses.head()

In [None]:
#Clear NaN Values and rename to match 
comboassetandhouses1=comboassetandhouses.fillna(0)
comboassetandhouses1['asset_class']=comboassetandhouses1['asset_class'].replace(0,'property')

#Create new column to differentiate between houses and community assets
comboassetandhouses1['HouseAssets'] = [3 if x =='property' else 1 for x in comboassetandhouses1['asset_class']]
comboassetandhouses1.head()

In [None]:
# Create a regular map of all the community assets and houses, colored and sized by asset_type
fig = px.scatter_mapbox(comboassetandhouses1,lat='Lat',lon='Lon', hover_name="Address",color='asset_class',size='HouseAssets',size_max=7)
fig.update_layout(title = 'Map of Community Assets and Properties', title_x=0.5)
fig.update_layout(mapbox_style="open-street-map")
fig.show()

# Potential Challenges & Ethical Implications

After analyzing the data and coming up with the list of properties/owners, we decided it would be beneficial to have a discussion on the potential challenges and ethical dilemmas that arise within the affordable housing community. 

A potential challenge that could come from the non-profit approaching potential housing deals is that the property owners could refuse to sell to them and go with a higher bidder. What is the highest amount that the organization will/can offer for the properties? What is more important, profiting or creating affordable housing for lower income households? 


##Is Housing a human right?

Read this article: https://housingmatters.urban.org/articles/naming-housing-human-right-first-step-solving-housing-crisis

Article 25 of the United Nations Universal Declaration of Human Rights states: 

"Everyone has the right to a standard of living adequate for the health and well-being of himself and of his family, including food, clothing, housing and medical care and..."

If housing has been declared as a human right, then why do we treat it as a commodity? What ideas do you have to ensuring access to affordable housing?


