# Equity Loss Analysis for Atlanta MSA

## Data Sources
- Fulton County digest parcel data from 2011 to 2022 (selected for LUC=101, SFHs), excel
- Fulton County digest parcel data for 2022 (for geocoding), geojson
- Fulton County sales data from 2011 to 2022, txt
- Atlanta Neighborhood Statistical Areas with supplemental data from Census (), 2022, csv from Neighborhood Nexus
- Neighborhood characteristics? unknown

**Note: NSAs in DeKalb are excluded, we do not have data for all years**

Those neighborhoods are:
- Candler Park, Druid Hills
- Lake Claire
- East Lake
- Kirkwood
- Edgewood
- East Atlanta
- Emory University/Center for Disease Control
- Part of Morningside/Lenox Park

This leaves _ neighborhoods (see appendix for list)

## Areas of Analysis
- Corporate power in buying and purchasing (stat significance in purachsing price diff?)
- Corporate profits from rentals
- Corporate concentration
- Neighborhood characteristics?

- Sum of buying, selling -> all sales
- Sum of holding -> all parcels
- Create a cumulative measure and normalized by neighborhood context
- Take distribution of all sales to ind, corp and compare to see if statistically significant
- FLIPPING ACTIVITY
- Correlate to neighborhood characteristics
- Predict based on neighborhood characteristics
- Geospatial for each neighborhood
- Foreclosure rate 

In [37]:
import pandas as pd
import geopandas as gpd

pd.set_option('display.max_columns', 150)
pd.options.display.float_format = '{:.5f}'.format

### Data from process_data.ipynb

In [217]:
# All sales in Fulton for period, LUC == 101
fulton_sales_all = pd.read_parquet("./output/fulton_sales_all.parquet")
# Parcel data for every year and parcel in the period, LUC == 101
digest_full_geo_nbhd = pd.read_parquet("./output/digest_full_geo_nbhd.parquet")

### Initial, basic data cleaning for our research question

PARCEL: ---
- t

SALES: ---
- Only retain sales with valid saleval code (saleval=0)
- Drop sales with low sales price, indicating non-arms length transcations (handled by excluding saleval code T)

Notable Saleval codes:
- 0 = valid sale
- T = sale under $1000
- G = deed of gift
- 5 = Foreclosure
- 9 = Unvalidated/Deed stamps
- 3 = Remodeled after sale

Parcel data

In [256]:
# Investigate the cause of TAXYR, PARID duplicate keys
digest_full_geo_nbhd[digest_full_geo_nbhd.duplicated(subset=["TAXYR", "PARID"], keep=False)].sort_values(by=["TAXYR", "PARID"]).head(5)

Unnamed: 0,PARID,OBJECTID,geometry,TAXYR,Situs Adrno,Situs Adrdir,Situs Adrstr,Situs Adrsuf,Cityname,Luc,Calcacres,Own1,Own2,Owner Adrno,Owner Adradd,Owner Adrdir,Owner Adrstr,Owner Adrsuf,own_cityname,Statecode,own_zip,D Yrblt,D Effyr,D Yrremod,Sfla,neighborhood
1306124,06 031200010082,160985,"POLYGON ((-84.270547 33.960675, -84.270835 33....",2010,7615,,NESBIT FERRY,RD,SANDY SPRINGS,101,1.0055,MC BRIDE LAVONNE G & MICHELLE,,7615,,,NESBIT FERRY,RD,ATLANTA,GA,30350,1972,0,0,3975.0,
1306126,06 031200010082,160985,"POLYGON ((-84.270547 33.960675, -84.270835 33....",2010,7615,,NESBIT FERRY,RD,SANDY SPRINGS,101,1.0055,MC BRIDE LAVONNE G & MICHELLE,,7615,,,NESBIT FERRY,RD,ATLANTA,GA,30350,1974,0,0,1520.0,
1339237,06 031200030064,167046,"POLYGON ((-84.271076 33.962745, -84.271477 33....",2010,5020,,SPALDING,DR,SANDY SPRINGS,101,0.8254,GOLDBY FRANCES R & F SCOTT,,5020,,,SPALDING,DR,DUNWOODY,GA,30350,1973,0,0,3669.0,
1339238,06 031200030064,167046,"POLYGON ((-84.271076 33.962745, -84.271477 33....",2010,5020,,SPALDING,DR,SANDY SPRINGS,101,0.8254,GOLDBY FRANCES R & F SCOTT,,5020,,,SPALDING,DR,DUNWOODY,GA,30350,2002,0,0,1003.0,
1324623,06 0338 LL0241,165849,"POLYGON ((-84.300355 33.962623, -84.300056 33....",2010,2100,,DUNWOODY HERITAGE,DR,SANDY SPRINGS,101,1.607,PACETTI MICHAEL K & EILEEN H,,2100,,,DUNWOODY HERITAGE,DR,DUNWOODY,GA,30350,1962,0,0,667.0,


TAXYR, PARID duplicate keys appear to be caused by ADUs; since we are only investigating LUC=101 (detached single-family), this makes sense. Upon confirming from Google Maps, the properties above did have ADUs. The data shows each record refers to a different structure with a different year built and square footage.

We will simply take the row with the largest square footage. First, we can see below that they are not significant in number. Second, we are only interested buying and selling activity as well as rentals by corporates. If a parcel is purchased, all structures on the parcel are purchased. While corporates can rent out ADUs, we will later use Fair Market Value to from the sales data to calculate rents, which includes the entire transcation.

In [None]:
init_len = len(digest_full_geo_nbhd)

digest_full_geo_nbhd = digest_full_geo_nbhd.sort_values(by="Sfla").drop_duplicates(subset=["TAXYR", "PARID"], keep="first")

print(f"Number of dropped duplicates: {init_len - len(digest_full_geo_nbhd)}")

Sales data

In [226]:
# Count of each saleval code
fulton_sales_all.groupby("Saleval")["Saleval"].count().sort_values(ascending=False).head(10)

Saleval
0     117330
T      36730
G      23159
5      17287
M      13777
9      13131
3      10205
RE      8124
4       7015
4E      6112
Name: Saleval, dtype: int64

In [227]:
# Investigating foreclosure sales as those might be of interest
fulton_sales_all[fulton_sales_all["Saleval"] == "5"].sample(5)

Unnamed: 0,TAXYR,PARID,Luc,Saledt,SALES PRICE,FAIR MARKET VALUE,DEED TYPE,Costval,Saleval,GRANTOR,GRANTEE
33977,2011,14F0124 LL0825,101,07-SEP-2010,134199.0,85500.0,FD,85500,5,WILLIAMS SHARON DENISE,NATIONSTAR MORTGAGE LLC
24870,2012,14 024400010345,101,03-MAY-2011,11400.0,30800.0,DP,30800,5,DAVIS JAMES,BANK OF NEW YORK MELLON THE
20585,2011,14 006000010783,101,02-NOV-2010,40000.0,35300.0,DP,35300,5,BOYD BON,EQUITY TRUST COMPANY & FULL SPECTRUM
9498,2013,11 095100330221,101,04-DEC-2012,412295.0,277400.0,DP,277400,5,ROUSSELL VANESSA L,WELLS FARGO BANK NA
14256,2011,13 0094 LL0969,101,01-JUN-2010,10.0,52000.0,FD,55900,5,JONES CRAIG L,FEDERAL HOME LOAN MORTGAGE CORP


In [None]:
# Cleaning sales
fulton_sales_all = fulton_sales_all[fulton_sales_all["Saleval"] == "0"]

### Basic methodology to identify same owners (needed for next steps)
- Drop any rows without Owner Address
- Create an Owner Address (labeled: "owner_addr") column that is the concatentation of owner address number, owner address string, and owner zip.
- If address string contains numbers, then it is a PO BOX. However, a lot are formatted in different ways, such as P O BOX 123, PO BOX 123, P.O. BOX 123, etc. We can only retain the number from the address string, and manually prepend PO BOX, so all will have an identical format.
- Why: these values get us a highly accurate key for same owner. Owner address string does not contain postfixes like ST, AVE, etc. that might cause issues. Combined with owner number and owner zip, we can say with high confidence that the address is the same while avoiding many common differences amongst the same address (ST vs STREET, etc.). This method is prefered over names which has a higher chance of false positive, and large corporations may operate with differently named subsidaries. This method may also undercount, if a company uses multiple addresses, but this is somewhat unlikely and undercounting is simply an acceptable limitation. It is acceptable since large investors (who would use different addresses) will own so many properties with each subsidary that it will be binned in the correct bin regardless.
- (really the best metric would be a radii, since neighborhood cutoffs are arbitrary and what if investor owns next door but its in a diff nbhd)

In [19]:
# Drop rows without an owner address
init_len = len(digest_full_geo_nbhd)
digest_full_geo_nbhd = digest_full_geo_nbhd[digest_full_geo_nbhd["Owner Adrstr"] != ""]
print(f"Number of empty addresses dropped: {init_len - len(digest_full_geo_nbhd)}")

Number of empty addresses dropped: 3571


In [20]:
# Demonstration of PO BOX issue
re_letters_then_numbers = r"^[a-zA-Z ]*[0-9]+"

digest_full_geo_nbhd[
    digest_full_geo_nbhd["Owner Adrstr"].str.contains(re_letters_then_numbers, regex=True)
]["Owner Adrstr"].sample(5)

2404490      P O BOX 162
120563       P O BOX 273
2363527      P O BOX 872
1665146    PO BOX 420253
1696798    P O BOX 41090
Name: Owner Adrstr, dtype: string

In [21]:
# Demonstration that if an address string is a PO BOX, it contains "BOX" and numbers
re_contains_weird_box = r".*B\.O\.X.*"
re_box_and_numbers = r".*BOX.*[0-9].*"

print(len(digest_full_geo_nbhd[
    digest_full_geo_nbhd["Owner Adrstr"].str.contains(re_contains_weird_box, regex=True)
]["Owner Adrstr"]))

digest_full_geo_nbhd[
    digest_full_geo_nbhd["Owner Adrstr"].str.contains(re_box_and_numbers, regex=True)
]["Owner Adrstr"].sample(10)

0


1239581    P.O. BOX 450233
1785379       P O BOX 5122
147946       P.O. BOX 3372
1928186      PO BOX 110144
551749         P.O BOX 129
2239113      PO BOX 211574
1722633     P.O. BOX 93582
309186      P.O. BOX 20306
2454507      PO BOX 133114
2400509    P.O. BOX 161301
Name: Owner Adrstr, dtype: string

In [22]:
# Re-format PO BOXES
re_capture_numbers = r"([0-9]+)"
digest_full_geo_nbhd["mod_own_adrstr"] = digest_full_geo_nbhd["Owner Adrstr"].copy(deep=True)

mask = digest_full_geo_nbhd["mod_own_adrstr"].str.contains(re_box_and_numbers, regex=True)

digest_full_geo_nbhd.loc[mask, "mod_own_adrstr"] = "PO BOX " + digest_full_geo_nbhd.loc[
    mask, "mod_own_adrstr"
].str.extract(re_capture_numbers)[0]

In [23]:
re_po_box_no_number = r"^(?!.*\d)[P]+.* BOX.*"
digest_full_geo_nbhd[digest_full_geo_nbhd["mod_own_adrstr"].str.contains(
    re_po_box_no_number, regex=True
)][["Owner Adrno", "mod_own_adrstr"]]

Unnamed: 0,Owner Adrno,mod_own_adrstr
952415,2458,P.O. BOX
1107161,0,PO BOX
1639088,0,P O BOX
1639089,0,P O BOX
1639090,0,P O BOX
1693568,0,P O BOX NINETY TWO
1693569,0,P O BOX NINETY TWO
1693570,0,P O BOX NINETY TWO
1693571,0,P O BOX NINETY TWO
1693572,0,P O BOX NINETY TWO


There's not enough PO Boxes without numbers (less than 30) to worry about accounting for this.

In [24]:
digest_full_geo_nbhd[["Owner Adrno", "Owner Adrstr", "mod_own_adrstr"]].sample(20)

Unnamed: 0,Owner Adrno,Owner Adrstr,mod_own_adrstr
925401,840,HIGHMEADE,HIGHMEADE
1031463,10891,BOSSIER,BOSSIER
2521609,445,BUSH,BUSH
2060929,729,WALDEN,WALDEN
856201,4272,BLUEHOUSE,BLUEHOUSE
460187,7125,HARBOUR LANDING,HARBOUR LANDING
2647252,170,PARK EAST,PARK EAST
1520181,1275,WESLEY,WESLEY
1079107,199,CAMDEN,CAMDEN
2311485,391,GRANT PARK,GRANT PARK


Appears to work as expected.

In [25]:
# Regex to clean by replacing dots, commas, and multiple spaces
# Also make all strings uppercase (they should be already)

re_dots_commas = r"[.,]+"
re_multiple_spaces = r"\s{2,}"

digest_full_geo_nbhd["owner_addr"] = (
    digest_full_geo_nbhd["Owner Adrno"].astype(str) + " " +
    digest_full_geo_nbhd["mod_own_adrstr"] + " " +
    digest_full_geo_nbhd["own_zip"]
).str.replace(
    re_dots_commas,
    "",
    regex=True
).str.replace(
    re_multiple_spaces,
    " ",
    regex=True
).str.upper()

In [26]:
# Lets validate the accuracy of this approach
digest_full_geo_nbhd.groupby(
    "owner_addr"
).agg(
    {
        "Own1": lambda x: list(x),
        "owner_addr": "count"
    }
).rename(
    columns={
        "owner_addr": "count"
    }
).sort_values(
    by="count",
    ascending=False
).head(5)

Unnamed: 0_level_0,Own1,count
owner_addr,Unnamed: 1_level_1,Unnamed: 2_level_1
3505 KOGER 30096,"[FYR SFR BORROWER LLC, FYR SFR BORROWER LLC, A...",2513
5001 PLAZA ON THE 78746,"[VM PRONTO LLC, CPI AMHERST SFR PROGRAM OWNER ...",2289
1717 MAIN 75201,"[2018 4 IH BORROWER LP, 2018 4 IH BORROWER LP,...",2022
0 PO BOX 650043 75265,"[FEDERAL NATIONAL MORTGAGE ASSOCIATION, FEDERA...",2008
901 MAIN 75202,"[2014 3 IH BORROWER L P, 2014 3 IH BORROWER L ...",1593


Full method (creating a cleaned owner_addr column to aggreggate on) also appears to work as expected.

### Determine the scale of ownership for each parcel owner and year at the neighorhood, city, and county level; create an ownership table
E.g. each parcel will have a column with a sum and percent of properties owned by the parcel owner in the given neighborhood, ATL, and in Fulton county for that TAXYR.

Later we can put these into discrete bins if needed.

In [155]:
# Caculate number, percent of parcels owned in all of Fulton in each year

fulton_parcel_count_yr = pd.DataFrame(
    digest_full_geo_nbhd.groupby("TAXYR")["PARID"].count()
).rename(columns={
    "PARID": "fulton_parcels_taxyr"
})

all_fulton = digest_full_geo_nbhd.groupby(
    ["TAXYR", "owner_addr"]
).agg(
    {"owner_addr": "count"}
).rename(
    columns={"owner_addr": "count_owned_fulton"}
).reset_index().merge(
    fulton_parcel_count_yr,
    on="TAXYR",
    how="inner"
)

all_fulton["pct_owned_fulton"] = all_fulton["count_owned_fulton"] / all_fulton["fulton_parcels_taxyr"] * 100

# Caculate number, percent of parcels owned in ATL in each year

atl_parcels_only = digest_full_geo_nbhd[digest_full_geo_nbhd["neighborhood"].notna()]
atl_parcel_count_yr = pd.DataFrame(
    atl_parcels_only.groupby("TAXYR")["PARID"].count()
).rename(columns={
    "PARID": "atl_parcels_taxyr"
})

all_atl = atl_parcels_only.groupby(
    ["TAXYR", "owner_addr"]
).agg(
    {"owner_addr": "count"}
).rename(
    columns={"owner_addr": "count_owned_atl"}
).reset_index().merge(
    atl_parcel_count_yr,
    on="TAXYR",
    how="inner"
)

all_atl["pct_owned_atl"] = all_atl["count_owned_atl"] / all_atl["atl_parcels_taxyr"] * 100

# Caculate number, percent of parcels owned in the parcel's neighorbohood in each year

nbhd_parcel_count_yr = pd.DataFrame(
    atl_parcels_only.groupby(
        ["neighborhood", "TAXYR"]
    )["PARID"].count()
).rename(columns={"PARID": "neighborhood_parcels_taxyr"}).reset_index()

all_neighborhood = atl_parcels_only.groupby(
    ["TAXYR", "owner_addr", "neighborhood"]
).agg(
    {"owner_addr": "count"}
).rename(
    columns={"owner_addr": "count_owned_neighborhood"}
).reset_index().merge(
    nbhd_parcel_count_yr,
    on=["TAXYR", "neighborhood"],
    how="inner"
)

all_neighborhood["pct_owned_neighborhood"] = all_neighborhood[
    "count_owned_neighborhood"
] / all_neighborhood[
    "neighborhood_parcels_taxyr"
] * 100

all_ownership_levels = all_fulton.merge(
    all_atl, on=["TAXYR", "owner_addr"], how="inner"
).merge(
    all_neighborhood, on=["TAXYR", "owner_addr"], how="outer"
)

In [166]:
# TODO need to create an ownership table most likely (by year?)
all_ownership_levels.drop_duplicates(
    subset=["TAXYR", "owner_addr"]
).sort_values(
    "pct_owned_fulton", ascending=False
).merge(
    digest_full_geo_nbhd[["TAXYR", "owner_addr", "Own1"]],
    on=["TAXYR", "owner_addr"],
    how="inner"
).head(20)[["TAXYR", "owner_addr", "Own1", "count_owned_fulton", "pct_owned_fulton"]]

Unnamed: 0,TAXYR,owner_addr,Own1,count_owned_fulton,pct_owned_fulton
0,2022,5001 PLAZA ON THE 78746,VM PRONTO LLC,714,0.31826
1,2022,5001 PLAZA ON THE 78746,CPI AMHERST SFR PROGRAM OWNER L L C,714,0.31826
2,2022,5001 PLAZA ON THE 78746,ARVM 5 LLC,714,0.31826
3,2022,5001 PLAZA ON THE 78746,VM PRONTO LLC,714,0.31826
4,2022,5001 PLAZA ON THE 78746,SRMZ 4 ASSET COMPANY 1 LLC,714,0.31826
5,2022,5001 PLAZA ON THE 78746,ARVM 5 LLC,714,0.31826
6,2022,5001 PLAZA ON THE 78746,SAFARI TWO ASSET COMPANY LLC,714,0.31826
7,2022,5001 PLAZA ON THE 78746,CPI AMHERST SFR PROGRAM OWNER L L C,714,0.31826
8,2022,5001 PLAZA ON THE 78746,CPI AMHERST SFR PROGRAM OWNER LLC,714,0.31826
9,2022,5001 PLAZA ON THE 78746,BAF I LLC,714,0.31826


### Determine the scale of buying and selling activity for each owner at the neighborhood, city, and county level; create a sales activity table
E.g. each parcel will have a column with a aggregated sum and percent of properties purchased, sold, and overall activity by the parcel owner in the given neighborhood, ATL, and in Fulton county for that TAXYR.

Later we can put these into discrete bins if needed.

**Method**:
- Sales data does not contain buyer address. We can't simply use GRANTEE name, because names can be different for the same owner corporation (subsidaries, typos). Instead we identify buyer (and seller, if needed) address by:
    - Match GRANTEE name to parcel data on GRANTEE = Own1 (owner name) and extract owner_addr for that TAXYR
    - Match GRANTOR name to parcel data on GRANTOR = Own1 (owner name) and extract owner_addr for PREVIOUS TAXYR
    - For names where the GRANTEE or GRANTOR name doesn't match exactly (due to typos, etc.), we can simply take the owner_addr with the same method, but we should record how many cases there are (it is not significant)
    - See below for verifcation that the sales and parcel data can be correctly matched this way
- Aggregate sales for each year by their owner address, identify the number of purchase, sell, and total transcations of that owner in the given TAXYR.

In [None]:
# Minor cleaning on GRANTEE and GRANTOR
# Regex to clean by replacing dots, commas, and multiple spaces
# Also make all strings uppercase (they should be already)
re_dots_commas = r"[.,]+"
re_multiple_spaces = r"\s{2,}"

for col in ["GRANTEE", "GRANTOR"]:
    fulton_sales_all[col] = fulton_sales_all[col].str.replace(
        re_dots_commas, "", regex=True
    ).str.replace(
        re_multiple_spaces, " ", regex=True
    ).str.upper()

In [170]:
# Verify that GRANTEE = Own1 in TAXYR, GRANTOR = Own1 in PREVIOUS TAXYR
# where GRANTEE/GRANTOR from sale data, Own1 from parcel data.
# Take a random PARID with at least one sale, pull up its sales and parcel data, then compare

fulton_sales_all.sample(1)["PARID"]

24943    14 012400100182
Name: PARID, dtype: string

In [176]:
fulton_sales_all[fulton_sales_all["PARID"] == "14 012400100182"][["TAXYR", "PARID", "GRANTOR", "GRANTEE"]].sort_values(by="TAXYR")

Unnamed: 0,TAXYR,PARID,GRANTOR,GRANTEE
24481,2018,14 012400100182,BANNISTER EARLENE V.,HERSHBERGER JAMES
25280,2019,14 012400100182,HERSHBERGER JAMES,GREEN ENERGY LIGHTING LLC
24942,2020,14 012400100182,"PEACHTREE CITY FINANCIAL, LLC",DANLEY DEVELOPMENT GROUP INC
24943,2020,14 012400100182,GREEN ENERGY LIGHTING LLC,PEACHTREE CITY FINANCIAL LLC
29609,2022,14 012400100182,PEACHTREE CITY FINANCIAL LLC,DANLEY DEVELOPMENT GROUP INC
29610,2022,14 012400100182,DANLEY DEVELOPMENT GROUP INC,SCHNEIDER KRISTIN ANNE &


We want to identify who the buyer and seller of each transcation is and be able to aggreggate their activities, however, buyer and seller name can be inconsistent and subsidaries will not be counted as one

Therefore we want to use owner address, but sales data does not have address

- Try to match by name, parid, and taxyr
- If not, try to match with parid and taxyr, IF there is only one transcation in a given year for that property
- Else, try to match GRANTOR or GRANTEE name with any exact match in the parcel data; potentially use GA business registry data


In [177]:
digest_full_geo_nbhd[digest_full_geo_nbhd["PARID"] == "14 012400100182"][["TAXYR", "PARID", "Own1"]].sort_values(by="TAXYR")

Unnamed: 0,TAXYR,PARID,Own1
2469107,2010,14 012400100182,BANNISTER EARLENE V
2469106,2011,14 012400100182,BANNISTER EARLENE V
2469108,2012,14 012400100182,BANNISTER EARLENE V
2469109,2013,14 012400100182,BANNISTER EARLENE V
2469110,2014,14 012400100182,BANNISTER EARLENE V
2469104,2015,14 012400100182,BANNISTER EARLENE V
2469105,2017,14 012400100182,BANNISTER EARLENE V
2469111,2018,14 012400100182,HERSHBERGER JAMES
2469112,2019,14 012400100182,GREEN ENERGY LIGHTING LLC
2469113,2020,14 012400100182,DANLEY DEVELOPMENT GROUP INC


In [215]:
transactions_in_yr = pd.DataFrame(
    fulton_sales_all.groupby(["TAXYR", "PARID"])["PARID"].count()
).rename(columns={"PARID": "transcations_in_yr"})

fulton_sales_all = fulton_sales_all.merge(
    transactions_in_yr,
    on=["TAXYR", "PARID"],
    how="inner"
)

In [212]:
# only merge if the count of sales in the TAXYR for that parcel is <= 1
t = fulton_sales_all[fulton_sales_all["transcations_in_yr"] < 2].merge(
    digest_full_geo_nbhd[["TAXYR", "PARID", "Own1", "owner_addr"]],
    left_on=["TAXYR", "PARID"],
    right_on=["TAXYR", "PARID"],
    how="left"
)

In [None]:
t[t["Own1"].isna()]

In [213]:
t.sample(5)

Unnamed: 0,TAXYR,PARID,Luc,Saledt,SALES PRICE,FAIR MARKET VALUE,DEED TYPE,Costval,Saleval,GRANTOR,GRANTEE,transcations_in_yr,Own1,owner_addr
107073,2019,12 320208910270,101,16-MAR-2018,332000.0,174100.0,LW,174100,3,JORDAN LONGORIA AND HANNAH LONGORIA,CRAWFORD PATRICK &,1,CRAWFORD PATRICK &,9255 BRUMBELOW 30022
149089,2021,14 0231 LL2090,101,18-NOV-2020,206000.0,189700.0,LW,189700,0,DAVIS ALEESHA C.,SHERMAN BARRY,1,SHERMAN BARRY,3112 IMPERIAL 30311
45529,2014,17 0123 LL1732,101,31-JAN-2013,434000.0,395600.0,WD,395600,0,SIMITSES WILLIAM,JARMIN HEATHER E & JEREMIAH KENT,1,JARMIN HEATHER E & JEREMIAH KENT,1045 LANCASTER 30328
144739,2021,14 002100050035,101,01-JUL-2020,470000.0,336900.0,LW,336900,0,"GEOMATRIX, LLC",SEWELL SEAN OLIVER,1,SEWELL SEAN OLIVER,705 BRYAN 30312
36798,2013,22 415010280356,101,27-JUL-2012,400000.0,318000.0,WD,318000,0,WALLACE RONALD G & MARY K,STANFIELD ROBERT C & ELIZABETH M,1,STANFIELD ROBERT C & ELIZABETH M,13140 FREEMANVILLE 30004


In [196]:
fulton_sales_all[fulton_sales_all["PARID"] == "12 206004720791"][["TAXYR", "PARID", "GRANTOR", "GRANTEE"]].sort_values(by="TAXYR")

Unnamed: 0,TAXYR,PARID,GRANTOR,GRANTEE
10087,2021,12 206004720791,"SPH PROPERTY TWO, LLC, A DELAWARE LIMITE",KUROSAW YOSUKE CHRISTOPHER & KRISTIN
10088,2021,12 206004720791,GREENE DOUGLAS WELLS,SPH PROPERTY TWO LLC


In [194]:
digest_full_geo_nbhd[digest_full_geo_nbhd["PARID"] == "12 206004720791"][["TAXYR", "PARID", "Own1"]].sort_values(by="TAXYR")

Unnamed: 0,TAXYR,PARID,Own1
306511,2010,12 206004720791,GREENE DOUGLAS W
306512,2011,12 206004720791,GREENE DOUGLAS W
306514,2012,12 206004720791,GREENE DOUGLAS W
306513,2013,12 206004720791,GREENE DOUGLAS W
306515,2014,12 206004720791,GREENE DOUGLAS W
306516,2015,12 206004720791,GREENE DOUGLAS W
306517,2016,12 206004720791,GREENE DOUGLAS W
306518,2017,12 206004720791,GREENE DOUGLAS W
306519,2018,12 206004720791,GREENE DOUGLAS W
306520,2019,12 206004720791,GREENE DOUGLAS W


In the case of multiple sales in one TAXYR, the last purchaser appears to be recorded in the parcel data as the owner.
If we tried to match an earlier sale in that year, we would get the wrong owner address.
This is a problem because we want the purchaser address for each sale to appropriately account for flipping activity for example. We can use GA Business Registry, although this won't account for individuals, but its unlikely individuals would be involved in multiple transcations on same property in one year, and we don't care much about individuals. An individual without a corporation will almost definitely not have the capital to be doing this for many properties.

- Match to parcel data if exact owner name match
- If not exact match, AND there are not multiple sales for that TAXYR, use the owner from parcel data anyway
- Else: hard to determine, record count, maybe use GA Business Registry

In [None]:
# Save output

### Drop parcels and sales where government institutions or banks are owners

### Identify corporate owners, create corp owner flag; understand distribution of corporate ownership size
- should it be any corp owner in the period or do they need to have bought in during the period

### For each sale, create a dummy variable for each sale type: corp purchase from ind, ind purchase from ind, corp sale to ind, ind sale from ind (should be identical to other ind to ind metric)

### Agg each class of sale

### Get totals for Fulton then drop non-ATL and agg by neighborhoods, year, size of investor

### Track each property after purchase (or at all owned by corp during period), calculate rental income

### Normalized equity loss measure

### Statistical test to see if FMV - SP was significant between ind and corp (ANOVA) or regression

### Create a measure of corp concentration in neighborhood to use as metric for analysis - is it just being a corp that helps, or when there's high concentration?

### Geospatial

### Do neighborhood characteristics predict equity loss

### Foreclosures?