# Equity Loss Analysis for Atlanta MSA

## Data Sources
- Fulton County digest parcel data from 2011 to 2022 (selected for LUC=101, SFHs), excel
- Fulton County digest parcel data for 2022 (for geocoding), geojson
- Fulton County sales data from 2011 to 2022, txt
- Atlanta Neighborhood Statistical Areas with supplemental data from Census (), 2022, csv from Neighborhood Nexus
- Neighborhood characteristics? unknown

**Note: NSAs in DeKalb are excluded, we do not have data for all years**

Those neighborhoods are:
- Candler Park, Druid Hills
- Lake Claire
- East Lake
- Kirkwood
- Edgewood
- East Atlanta
- Emory University/Center for Disease Control
- Part of Morningside/Lenox Park

This leaves _ neighborhoods (see appendix for list)

## Areas of Analysis
- Corporate power in buying and purchasing (stat significance in purachsing price diff?)
- Corporate profits from rentals
- Corporate concentration
- Neighborhood characteristics?

- Sum of buying, selling -> all sales
- Sum of holding -> all parcels
- Create a cumulative measure and normalized by neighborhood context
- Take distribution of all sales to ind, corp and compare to see if statistically significant
- Correlate to neighborhood characteristics
- Predict based on neighborhood characteristics
- Geospatial for each neighborhood
- Foreclosure rate 

In [24]:
import os
import pandas as pd
import geopandas as gpd

pd.set_option('display.max_columns', 150)
pd.options.display.float_format = '{:.2f}'.format

In [25]:
fulton_sales_all = pd.read_parquet("./output/fulton_sales_all.parquet")

In [64]:
digest_full_geo_nbhd = pd.read_parquet("./output/digest_full_geo_nbhd.parquet")

First lets calculate sales in and out, for that we need to identify corp and size of corp; size of corp needs to be an agg measure of properties owned count in parcel data for the given year in neighborhood, in ATL, and in Fulton IN THE GIVEN YEAR

so we do that via parcel data then when that parcel is transacted, we use those values

In [47]:
digest_full_geo_nbhd.sample(3)

Unnamed: 0,PARID,OBJECTID,geometry,TAXYR,Situs Adrno,Situs Adrdir,Situs Adrstr,Situs Adrsuf,Cityname,Luc,Calcacres,Own1,Own2,Owner Adrno,Owner Adradd,Owner Adrdir,Owner Adrstr,Owner Adrsuf,own_cityname,Statecode,own_zip,D Yrblt,D Effyr,D Yrremod,Sfla,neighborhood
1550937,17 0227 LL0655,196856,"b""\x01\x03\x00\x00\x00\x01\x00\x00\x00\x06\x00...",2021,965.0,,WESTMORELAND,CIR,ATLANTA,101,0.12,CORKER REGINA &,CORKER RASHAD,965.0,,,WESTMORELAND,CIR,ATLANTA,GA,30316,2007,0,0,2556.0,"Carver Hills, Rockdale, Scotts Crossing, West ..."
2678358,12 304008700578,540029,"b""\x01\x03\x00\x00\x00\x01\x00\x00\x00\x10\x00...",2015,205.0,,ROD,CT,FUL,101,0.42,TODD HAROLD B III & KELLY T,,205.0,,,ROD,CT,ALPHARETTA,GA,30022,1996,0,0,3844.0,
1819072,14 015200140092,242291,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00\x05\x00...,2017,1233.0,,LOCKWOOD,DR,ATL,101,0.23,MILLER STANLEY,,1233.0,,,LOCKWOOD,DR,ATLANTA,GA,30311,1955,0,0,920.0,"Fort McPherson, Venetian Hills"


In [None]:
# TODO: drop invalid sale codes or deed types and non-arms length (low sales price)
# TODO: foreclosures?

In [65]:
# TODO change owner adrno in data processing to int

digest_full_geo_nbhd["Owner Adrno"] = digest_full_geo_nbhd.where(
    digest_full_geo_nbhd["Owner Adrno"] != "", "0",
)

ValueError: Columns must be same length as key

In [None]:
(
    digest_full_geo_nbhd["Owner Adrno"].astype("float").astype("int") + " " +
    digest_full_geo_nbhd["Owner Adrstr"] + " " +
    digest_full_geo_nbhd["Owner Adrsuf"] + " " +
    digest_full_geo_nbhd["own_zip"]
).str.replace("[.,]+", "")

In [29]:
digest_full_geo_nbhd.groupby("Owner Adrstr")["PARID"].count().sort_values(ascending=False).head(10)

Owner Adrstr
PEACHTREE    11616
PIEDMONT      7305
MAIN          6864
PARK          4861
SPALDING      4606
WOODLAND      4281
NORTHSIDE     4259
CASCADE       4224
FAIRBURN      4091
MONROE        4069
Name: PARID, dtype: int64

In [None]:
digest_full_geo_nbhd.groupby("TAXYR")

In [6]:
fulton_sales_all

Unnamed: 0,TAXYR,PARID,Luc,Saledt,SALES PRICE,FAIR MARKET VALUE,DEED TYPE,Costval,Saleval,GRANTOR,GRANTEE
1,2011,06 0310 LL0490,101,07-JUN-2010,794600.0,717100.0,WD,717100,0,CDG HOMES LLC,EDMUNDS KEITH S & KIMBERLY C
2,2011,06 0310 LL0581,101,14-JUL-2010,800000.0,590400.0,WD,590400,0,CAPITAL DESIGN HOMES LLC,MEHDIPOUR MOHAMMADREZ & SADEGHI SHIVA
5,2011,06 0310 LL0771,101,21-SEP-2010,700000.0,700000.0,LW,993600,RE,REGIONS BANK,DAHAN HAIM
6,2011,06 031000020232,101,28-JAN-2010,437500.0,437500.0,WD,465100,0,STUMP BLAIR E.,DEMPSEY JASON
7,2011,06 031100030032,101,13-AUG-2010,475000.0,451300.0,WD,451300,0,MIDDLETON ERIC C & SUZUKI MASAMI A,BROWN EDWARD H
...,...,...,...,...,...,...,...,...,...,...,...
58671,2022,22 545011880240,101,06-OCT-2021,1.0,444600.0,QC,444600,T,MC BRIDE CALEB,GEORGIA DEPARTMENT OF TRANSPORTATION
58672,2022,22 545011880240,101,15-OCT-2021,1.0,444600.0,QC,444600,T,TROUSDALE NICHOLAS,GEORGIA DEPARTMENT OF TRANSPORTATION
58673,2022,22 545011880240,101,15-OCT-2021,1.0,444600.0,QC,444600,T,AGAPE IRRIGATION REPAIR LLC,GEORGIA DEPARTMENT OF TRANSPORTATION
58674,2022,22 545011880240,101,06-OCT-2021,1.0,444600.0,QC,444600,T,BRADLEY JAZMINE,GEORGIA DEPARTMENT OF TRANSPORTATION


### Drop parcels where government institutions or banks were owners

In [None]:
print("Size before: ", atl_df.shape)

govt_keywords = ['FEDERAL'] # FANNIE AND FREDDIE MAE PUT FEDERAL IN THEIR NAMES
govt = atl_df[
    atl_df['GRANTEE'].apply(lambda x: any([key in str(x) for key in govt_keywords]))
]['GRANTEE'].unique().tolist() + atl_df[
    atl_df['GRANTOR'].apply(lambda x: any([key in str(x) for key in govt_keywords]))
]['GRANTOR'].unique().tolist()

bank_keywords = ['BANK', 'MORTGAGE', 'LENDING', 'LOAN', 'FINANCE', 'FUND', 'CREDIT', 'TRUST', 'SERVICES']
banks = atl_df[
    atl_df['GRANTEE'].apply(lambda x: any([key in str(x) for key in bank_keywords]))
]['GRANTEE'].unique().tolist() + atl_df[
    atl_df['GRANTOR'].apply(lambda x: any([key in str(x) for key in bank_keywords]))
]['GRANTOR'].unique().tolist()

atl_df = atl_df[
    ~(atl_df['GRANTEE'].isin(govt + banks)
    | atl_df['GRANTOR'].isin(govt + banks))
]
print("Size after: ", atl_df.shape)

### Identify corporate owners (excluding govt institutions, banks, trusts), create corp owner flag; classify size of investor by owner addr prob and have a sum of properties owned for continuous measure, compare statistical results
- should it be any corp owner in the period or do they need to have bought in during the period

### For each sale, create a dummy variable for each sale type: corp purchase from ind, ind purchase from ind, corp sale to ind, ind sale from ind (should be identical to other ind to ind metric)

### Agg each class of sale

### Get totals for Fulton then drop non-ATL and agg by neighborhoods, year, size of investor

### Track each property after purchase (or at all owned by corp during period), calculate rental income

### Normalized equity loss measure

### Statistical test to see if FMV - SP was significant between ind and corp (ANOVA) or regression

### Create a measure of corp concentration in neighborhood to use as metric for analysis - is it just being a corp that helps, or when there's high concentration?

### Geospatial

### Do neighborhood characteristics predict equity loss

### Foreclosures?