# Equity Loss Analysis for Atlanta MSA

## Data Sources
- Fulton County digest parcel data from 2011 to 2022 (selected for LUC=101, SFHs), excel
- Fulton County digest parcel data for 2022 (for geocoding), geojson
- Fulton County sales data from 2011 to 2022, txt
- Atlanta Neighborhood Statistical Areas with supplemental data from Census (), 2022, csv from Neighborhood Nexus
- Neighborhood characteristics? unknown

**Note: NSAs in DeKalb are excluded, we do not have data for all years**

Those neighborhoods are:
- Candler Park, Druid Hills
- Lake Claire
- East Lake
- Kirkwood
- Edgewood
- East Atlanta
- Emory University/Center for Disease Control
- Part of Morningside/Lenox Park

This leaves _ neighborhoods (see appendix for list)

## Areas of Analysis
- Corporate power in buying and purchasing (stat significance in purachsing price diff?)
- Corporate profits from rentals
- Corporate concentration
- Neighborhood characteristics?

- Sum of buying, selling -> all sales
- Sum of holding -> all parcels
- Create a cumulative measure and normalized by neighborhood context
- Take distribution of all sales to ind, corp and compare to see if statistically significant
- FLIPPING ACTIVITY
- Correlate to neighborhood characteristics
- Predict based on neighborhood characteristics
- Geospatial for each neighborhood
- Foreclosure rate 

In [75]:
import pandas as pd
import geopandas as gpd
import plotly.express as px

pd.set_option('display.max_columns', 150)
pd.options.display.float_format = '{:.5f}'.format

### Data from process_data.ipynb

In [2]:
# All sales in Fulton for period, LUC == 101
fulton_sales_all = pd.read_parquet("./output/fulton_sales_all.parquet")
# Parcel data for every year and parcel in the period, LUC == 101
digest_full_geo_nbhd = pd.read_parquet("./output/digest_full_geo_nbhd.parquet")

### Initial, basic data cleaning for our research question

PARCEL: ---
- t

SALES: ---
- Only retain sales with valid saleval code (saleval=0)
- Drop sales with low sales price, indicating non-arms length transcations (handled by excluding saleval code T)

Notable Saleval codes:
- 0 = valid sale
- T = sale under $1000
- G = deed of gift
- 5 = Foreclosure
- 9 = Unvalidated/Deed stamps
- 3 = Remodeled after sale (flipping)

Parcel data

In [3]:
# Investigate the cause of TAXYR, PARID duplicate keys
digest_full_geo_nbhd[digest_full_geo_nbhd.duplicated(subset=["TAXYR", "PARID"], keep=False)].sort_values(by=["TAXYR", "PARID"]).head(5)

Unnamed: 0,PARID,OBJECTID,geometry,TAXYR,Situs Adrno,Situs Adrdir,Situs Adrstr,Situs Adrsuf,Cityname,Luc,Calcacres,Own1,Own2,Owner Adrno,Owner Adradd,Owner Adrdir,Owner Adrstr,Owner Adrsuf,own_cityname,Statecode,own_zip,D Yrblt,D Effyr,D Yrremod,Sfla,neighborhood
1306124,06 031200010082,160985,"POLYGON ((-84.270547 33.960675, -84.270835 33....",2010,7615,,NESBIT FERRY,RD,SANDY SPRINGS,101,1.0055,MC BRIDE LAVONNE G & MICHELLE,,7615,,,NESBIT FERRY,RD,ATLANTA,GA,30350,1972,0,0,3975.0,
1306126,06 031200010082,160985,"POLYGON ((-84.270547 33.960675, -84.270835 33....",2010,7615,,NESBIT FERRY,RD,SANDY SPRINGS,101,1.0055,MC BRIDE LAVONNE G & MICHELLE,,7615,,,NESBIT FERRY,RD,ATLANTA,GA,30350,1974,0,0,1520.0,
1339237,06 031200030064,167046,"POLYGON ((-84.271076 33.962745, -84.271477 33....",2010,5020,,SPALDING,DR,SANDY SPRINGS,101,0.8254,GOLDBY FRANCES R & F SCOTT,,5020,,,SPALDING,DR,DUNWOODY,GA,30350,1973,0,0,3669.0,
1339238,06 031200030064,167046,"POLYGON ((-84.271076 33.962745, -84.271477 33....",2010,5020,,SPALDING,DR,SANDY SPRINGS,101,0.8254,GOLDBY FRANCES R & F SCOTT,,5020,,,SPALDING,DR,DUNWOODY,GA,30350,2002,0,0,1003.0,
1324623,06 0338 LL0241,165849,"POLYGON ((-84.300355 33.962623, -84.300056 33....",2010,2100,,DUNWOODY HERITAGE,DR,SANDY SPRINGS,101,1.607,PACETTI MICHAEL K & EILEEN H,,2100,,,DUNWOODY HERITAGE,DR,DUNWOODY,GA,30350,1962,0,0,667.0,


TAXYR, PARID duplicate keys appear to be caused by ADUs; since we are only investigating LUC=101 (detached single-family), this makes sense. Upon confirming from Google Maps, the properties above did have ADUs. The data shows each record refers to a different structure with a different year built and square footage.

We will simply take the row with the largest square footage. First, we can see below that they are not significant in number. Second, we are only interested buying and selling activity as well as rentals by corporates. If a parcel is purchased, all structures on the parcel are purchased. While corporates can rent out ADUs, we will later use Fair Market Value to from the sales data to calculate rents, which includes the entire transcation.

In [4]:
init_len = len(digest_full_geo_nbhd)

digest_full_geo_nbhd = digest_full_geo_nbhd.sort_values(by="Sfla").drop_duplicates(subset=["TAXYR", "PARID"], keep="first")

print(f"Number of dropped duplicates: {init_len - len(digest_full_geo_nbhd)}")

Number of dropped duplicates: 27680


Sales data

In [5]:
# Count of each saleval code
fulton_sales_all.groupby("Saleval")["Saleval"].count().sort_values(ascending=False).head(10)

Saleval
0     117330
T      36730
G      23159
5      17287
M      13777
9      13131
3      10205
RE      8124
4       7015
4E      6112
Name: Saleval, dtype: int64

In [6]:
# Investigating foreclosure sales as those might be of interest
fulton_sales_all[fulton_sales_all["Saleval"] == "5"].sample(5)

Unnamed: 0,TAXYR,PARID,Luc,Saledt,SALES PRICE,FAIR MARKET VALUE,DEED TYPE,Costval,Saleval,GRANTOR,GRANTEE
11152,2013,12 193403810391,101,05-JUN-2012,385589.0,378500.0,DP,378500,5,BRANTNER AARON A.,CITIMORTGAGE INC
28171,2012,17 002100010321,101,04-JAN-2011,272000.0,346700.0,FD,346700,5,OLGUIN JULIE & ELISEO JR,DEUTSCHE BANK NATIONAL TRUST CO
23091,2012,14 017900030516,101,10-MAY-2011,13000.0,35100.0,DP,35100,5,WARRES STEVEN,REGIONS BANK
29399,2012,17 004600030163,101,07-JUN-2011,247100.0,324500.0,DP,324500,5,SHACKELFORD ELISABETH C,NEISLER JASON
16957,2011,14 001200100724,101,21-JAN-2010,52800.0,115000.0,QC,115000,5,TREST INVESTMENTS LLC,HOWE KEVIN C


In [7]:
# Cleaning sales
fulton_sales_all = fulton_sales_all[fulton_sales_all["Saleval"] == "0"]

### Drop parcels and sales where government institutions or banks are owners

In [8]:
govt_keywords = ['FEDERAL'] # FANNIE AND FREDDIE MAE PUT FEDERAL IN THEIR NAMES
bank_keywords = [
    'BANK', 'MORTGAGE', 'LENDING', 'LOAN',
    'FINANCE', 'FUND', 'CREDIT', 'TRUST', 'SERVICES'
]
govt = []
banks = []

govt += fulton_sales_all[
    fulton_sales_all['GRANTEE'].apply(lambda x: any([key in str(x) for key in govt_keywords]))
]['GRANTEE'].unique().tolist() + fulton_sales_all[
    fulton_sales_all['GRANTOR'].apply(lambda x: any([key in str(x) for key in govt_keywords]))
]['GRANTOR'].unique().tolist() + digest_full_geo_nbhd[
    digest_full_geo_nbhd["Own1"].apply(lambda x: any([key in str(x) for key in govt_keywords]))
]['Own1'].unique().tolist()

banks += fulton_sales_all[
    fulton_sales_all['GRANTEE'].apply(lambda x: any([key in str(x) for key in bank_keywords]))
]['GRANTEE'].unique().tolist() + fulton_sales_all[
    fulton_sales_all['GRANTOR'].apply(lambda x: any([key in str(x) for key in bank_keywords]))
]['GRANTOR'].unique().tolist() + digest_full_geo_nbhd[
    digest_full_geo_nbhd["Own1"].apply(lambda x: any([key in str(x) for key in bank_keywords]))
]['Own1'].unique().tolist()

print("Sales")
print("Size before: ", fulton_sales_all.shape)
fulton_sales_all = fulton_sales_all[
    ~(
        fulton_sales_all['GRANTEE'].isin(govt + banks)
        | fulton_sales_all['GRANTOR'].isin(govt + banks)
    )
]
print("Size after: ", fulton_sales_all.shape)
print("")

print("Digest")
print("Size before: ", digest_full_geo_nbhd.shape)
digest_full_geo_nbhd = digest_full_geo_nbhd[
    ~(digest_full_geo_nbhd['Own1'].isin(govt + banks))
]
print("Size after: ", digest_full_geo_nbhd.shape)
print("")


Sales
Size before:  (117330, 11)
Size after:  (113325, 11)

Digest
Size before:  (2750929, 26)
Size after:  (2692761, 26)



In [9]:
with open("./output/govt.txt", "w") as f:
    f.write("\n".join(govt))
with open("./output/banks.txt", "w") as f:
    f.write("\n".join(banks))

### Basic methodology to identify same owners (needed for next steps)
- Drop any rows without Owner Address
- Create an Owner Address (labeled: "owner_addr") column that is the concatentation of owner address number, owner address string, and owner zip.
- If address string contains numbers, then it is a PO BOX. However, a lot are formatted in different ways, such as P O BOX 123, PO BOX 123, P.O. BOX 123, etc. We can only retain the number from the address string, and manually prepend PO BOX, so all will have an identical format.
- Why: these values get us a highly accurate key for same owner. Owner address string does not contain postfixes like ST, AVE, etc. that might cause issues. Combined with owner number and owner zip, we can say with high confidence that the address is the same while avoiding many common differences amongst the same address (ST vs STREET, etc.). This method is prefered over names which has a higher chance of false positive, and large corporations may operate with differently named subsidaries. This method may also undercount, if a company uses multiple addresses, but this is somewhat unlikely and undercounting is simply an acceptable limitation. It is acceptable since large investors (who would use different addresses) will own so many properties with each subsidary that it will be binned in the correct bin regardless.
- (really the best metric would be a radii, since neighborhood cutoffs are arbitrary and what if investor owns next door but its in a diff nbhd)

In [10]:
# Drop rows without an owner address
init_len = len(digest_full_geo_nbhd)
digest_full_geo_nbhd = digest_full_geo_nbhd[digest_full_geo_nbhd["Owner Adrstr"] != ""]
print(f"Number of empty addresses dropped: {init_len - len(digest_full_geo_nbhd)}")

Number of empty addresses dropped: 3462


In [11]:
# Demonstration of PO BOX issue
re_letters_then_numbers = r"^[a-zA-Z ]*[0-9]+"

digest_full_geo_nbhd[
    digest_full_geo_nbhd["Owner Adrstr"].str.contains(re_letters_then_numbers, regex=True)
]["Owner Adrstr"].sample(5)

2453021     P O BOX 98171
2156823    P O BOX 724453
2036155      P O BOX 1044
136812        14TH AVENUE
1640492      PO BOX 19696
Name: Owner Adrstr, dtype: string

In [12]:
# Demonstration that if an address string is a PO BOX, it contains "BOX" and numbers
re_contains_weird_box = r".*B\.O\.X.*"
re_box_and_numbers = r".*BOX.*[0-9].*"

print(len(digest_full_geo_nbhd[
    digest_full_geo_nbhd["Owner Adrstr"].str.contains(re_contains_weird_box, regex=True)
]["Owner Adrstr"]))

digest_full_geo_nbhd[
    digest_full_geo_nbhd["Owner Adrstr"].str.contains(re_box_and_numbers, regex=True)
]["Owner Adrstr"].sample(10)

0


2140789        P O BOX 1953
1782483        P.O. BOX 102
2477944       P O BOX 45402
164278        P.O. BOX 5888
1495334       P.O. BOX 4698
2469473      P O BOX 161074
2401794        P O BOX 5153
1794398        P.O. BOX 343
1705428    P.O. BOX 1601 76
1224699       P.O. BOX 8569
Name: Owner Adrstr, dtype: string

In [13]:
# Re-format PO BOXES
re_capture_numbers = r"([0-9]+)"
digest_full_geo_nbhd["mod_own_adrstr"] = digest_full_geo_nbhd["Owner Adrstr"].copy(deep=True)

mask = digest_full_geo_nbhd["mod_own_adrstr"].str.contains(re_box_and_numbers, regex=True)

digest_full_geo_nbhd.loc[mask, "mod_own_adrstr"] = "PO BOX " + digest_full_geo_nbhd.loc[
    mask, "mod_own_adrstr"
].str.extract(re_capture_numbers)[0]

In [14]:
re_po_box_no_number = r"^(?!.*\d)[P]+.* BOX.*"
digest_full_geo_nbhd[digest_full_geo_nbhd["mod_own_adrstr"].str.contains(
    re_po_box_no_number, regex=True
)][["Owner Adrno", "mod_own_adrstr"]]

Unnamed: 0,Owner Adrno,mod_own_adrstr
952415,2458,P.O. BOX
1693572,0,P O BOX NINETY TWO
1693571,0,P O BOX NINETY TWO
1693570,0,P O BOX NINETY TWO
1693569,0,P O BOX NINETY TWO
1693568,0,P O BOX NINETY TWO
1639089,0,P O BOX
1639088,0,P O BOX
1639090,0,P O BOX
1948464,0,P.O. BOX


There's not enough PO Boxes without numbers (less than 30) to worry about accounting for this.

In [15]:
digest_full_geo_nbhd[["Owner Adrno", "Owner Adrstr", "mod_own_adrstr"]].sample(20)

Unnamed: 0,Owner Adrno,Owner Adrstr,mod_own_adrstr
1345216,8075,HABERSHAM WATERS,HABERSHAM WATERS
454925,3604,PICKERAL,PICKERAL
2738993,1938,PERRY,PERRY
903250,640,SOUTH PRESTON,SOUTH PRESTON
1903799,3556,LEIGH,LEIGH
552082,15275,WHITE COLUMNS,WHITE COLUMNS
1897002,2716,RIGGS,RIGGS
2402459,1825,VALENCE,VALENCE
1682727,25,HOLLY,HOLLY
886066,3305,WATERS MILL,WATERS MILL


Appears to work as expected.

In [16]:
# Regex to clean by replacing dots, commas, and multiple spaces
# Also make all strings uppercase (they should be already)

re_dots_commas = r"[.,]+"
re_multiple_spaces = r"\s{2,}"

digest_full_geo_nbhd["owner_addr"] = (
    digest_full_geo_nbhd["Owner Adrno"].astype(str) + " " +
    digest_full_geo_nbhd["mod_own_adrstr"] + " " +
    digest_full_geo_nbhd["own_zip"]
).str.replace(
    re_dots_commas,
    "",
    regex=True
).str.replace(
    re_multiple_spaces,
    " ",
    regex=True
).str.upper()

In [17]:
# Lets validate the accuracy of this approach
digest_full_geo_nbhd.groupby(
    "owner_addr"
).agg(
    {
        "Own1": lambda x: list(x),
        "owner_addr": "count"
    }
).rename(
    columns={
        "owner_addr": "count"
    }
).sort_values(
    by="count",
    ascending=False
).head(5)

Unnamed: 0_level_0,Own1,count
owner_addr,Unnamed: 1_level_1,Unnamed: 2_level_1
3505 KOGER 30096,"[RNTR 3 LLC, RNTR 3 LLC, FYR SFR BORROWER LLC,...",2493
5001 PLAZA ON THE 78746,"[ALTO ASSET COMPANY 2 LLC, ALTO ASSET COMPANY ...",2289
1717 MAIN 75201,"[2018 3 IH BORROWER LP, 2018 3 IH BORROWER LP,...",2022
901 MAIN 75202,"[2015 3 IH2 BORROWER LP, 2015 3 IH2 BORROWER L...",1593
4400 WILL ROGERS 73108,"[SECRETARY OF HOUSING & URBAN DEV, SECRETARY O...",1518


Full method (creating a cleaned owner_addr column to aggreggate on) also appears to work as expected.

### Determine the scale of ownership for each parcel owner and year at the neighorhood, city, and county level; create an ownership table
E.g. each parcel will have a column with a sum and percent of properties owned by the parcel owner in the given neighborhood, ATL, and in Fulton county for that TAXYR.

Later we can put these into discrete bins if needed.

In [18]:
# Caculate number, percent of parcels owned in all of Fulton in each year

fulton_parcel_count_yr = pd.DataFrame(
    digest_full_geo_nbhd.groupby("TAXYR")["PARID"].count()
).rename(columns={
    "PARID": "fulton_parcels_taxyr"
})

all_fulton = digest_full_geo_nbhd.groupby(
    ["TAXYR", "owner_addr"]
).agg(
    {"owner_addr": "count"}
).rename(
    columns={"owner_addr": "count_owned_fulton"}
).reset_index().merge(
    fulton_parcel_count_yr,
    on="TAXYR",
    how="inner"
)

all_fulton["pct_owned_fulton"] = all_fulton["count_owned_fulton"] / all_fulton["fulton_parcels_taxyr"] * 100

# Caculate number, percent of parcels owned in ATL in each year

atl_parcels_only = digest_full_geo_nbhd[digest_full_geo_nbhd["neighborhood"].notna()]
atl_parcel_count_yr = pd.DataFrame(
    atl_parcels_only.groupby("TAXYR")["PARID"].count()
).rename(columns={
    "PARID": "atl_parcels_taxyr"
})

all_atl = atl_parcels_only.groupby(
    ["TAXYR", "owner_addr"]
).agg(
    {"owner_addr": "count"}
).rename(
    columns={"owner_addr": "count_owned_atl"}
).reset_index().merge(
    atl_parcel_count_yr,
    on="TAXYR",
    how="inner"
)

all_atl["pct_owned_atl"] = all_atl["count_owned_atl"] / all_atl["atl_parcels_taxyr"] * 100

# Caculate number, percent of parcels owned in the parcel's neighorbohood in each year

nbhd_parcel_count_yr = pd.DataFrame(
    atl_parcels_only.groupby(
        ["neighborhood", "TAXYR"]
    )["PARID"].count()
).rename(columns={"PARID": "neighborhood_parcels_taxyr"}).reset_index()

all_neighborhood = atl_parcels_only.groupby(
    ["TAXYR", "owner_addr", "neighborhood"]
).agg(
    {"owner_addr": "count"}
).rename(
    columns={"owner_addr": "count_owned_neighborhood"}
).reset_index().merge(
    nbhd_parcel_count_yr,
    on=["TAXYR", "neighborhood"],
    how="inner"
)

all_neighborhood["pct_owned_neighborhood"] = all_neighborhood[
    "count_owned_neighborhood"
] / all_neighborhood[
    "neighborhood_parcels_taxyr"
] * 100

all_ownership_levels = all_fulton.merge(
    all_atl, on=["TAXYR", "owner_addr"], how="inner"
).merge(
    all_neighborhood, on=["TAXYR", "owner_addr"], how="outer"
)

Now we have a table that lists each owner and their concentration ownership concentration for every TAXYR in each neighborhood, in Atlanta, and in Fulton

In [19]:
all_ownership_levels.sort_values(by="pct_owned_fulton", ascending=False).head(5)

Unnamed: 0,TAXYR,owner_addr,count_owned_fulton,fulton_parcels_taxyr,pct_owned_fulton,count_owned_atl,atl_parcels_taxyr,pct_owned_atl,neighborhood,count_owned_neighborhood,neighborhood_parcels_taxyr,pct_owned_neighborhood
886300,2022,5001 PLAZA ON THE 78746,714,216467,0.32984,101,75679,0.13346,South River Gardens,4,596,0.67114
886316,2022,5001 PLAZA ON THE 78746,714,216467,0.32984,101,75679,0.13346,"Fairburn Mays, Mays",1,250,0.4
886301,2022,5001 PLAZA ON THE 78746,714,216467,0.32984,101,75679,0.13346,Cascade Avenue/Road,1,835,0.11976
886302,2022,5001 PLAZA ON THE 78746,714,216467,0.32984,101,75679,0.13346,Grove Park,3,1623,0.18484
886303,2022,5001 PLAZA ON THE 78746,714,216467,0.32984,101,75679,0.13346,"Arlington Estates, Ben Hill, Butner/Tell, Elmc...",7,1229,0.56957


In [73]:
# Aggregate all names from parcel data that have been matched to the same address
same_owners = pd.DataFrame(digest_full_geo_nbhd.groupby("owner_addr")["Own1"].unique().apply(lambda x: ' - '.join(x))).merge(
    all_ownership_levels,
    on="owner_addr",
    how="inner"
).rename(columns={"Own1": "all_assoc_names"}).sort_values(by="pct_owned_fulton", ascending=False)[
    [
        'owner_addr', 'TAXYR', 'neighborhood', 'count_owned_neighborhood', 'neighborhood_parcels_taxyr', 'pct_owned_neighborhood',
        'count_owned_atl', 'atl_parcels_taxyr', 'pct_owned_atl', 'count_owned_fulton', 'fulton_parcels_taxyr', 'pct_owned_fulton', 'all_assoc_names'
    ]
]

In [74]:
# Save output
all_ownership_levels.to_csv("./output/all_ownership_levels.csv", index=False)
same_owners.to_csv("./output/same_owners.csv", index=False)

### Determine the scale of buying and selling activity for each owner at the neighborhood, city, and county level; create a sales activity table
E.g. each parcel will have a column with a aggregated sum and percent of properties purchased, sold, and overall activity by the parcel owner in the given neighborhood, ATL, and in Fulton county for that TAXYR.

Later we can put these into discrete bins if needed.

**Method**:
- Sales data does not contain buyer or seller address. We can't simply use GRANTEE or GRANTOR name, because names can be different for the same owner corporation (subsidaries, typos). Instead we identify buyer and seller address by:
    - Match GRANTEE name to parcel data on GRANTEE = Own1 (owner name) and extract owner_addr for CURRENT TAXYR
    - Match GRANTOR name to parcel data on GRANTOR = Own1 (owner name) and extract owner_addr for PREVIOUS TAXYR
    - For names where the GRANTEE or GRANTOR name doesn't match exactly (due to typos, etc.), we can take the owner_addr with the same method ONLY IF there was only one sale in the given TAXYR. In the case of multiple sales in one TAXYR, the last purchaser appears to be recorded in the parcel data as the owner (see evidence below); if we tried to match an earlier sale in that year, we would get the wrong owner address. This is a problem because we want the purchaser address for each sale to appropriately account for flipping activity for example.
    - Else, try to find an exact owner name match from all parcel data, not limited to PARID and TAXYR; use the first match if a match is found. Then try to match owner name with GA Business Registry data. Although this won't account for individuals, but its unlikely individuals would be involved in multiple transcations on same property in one year, and we don't care much about individuals. An individual without a corporation will almost definitely not have the capital to be doing this for many properties.
    - See below for verifcation that the sales and parcel data can be correctly matched this way
- In short:
    - Try to match by owner name, PARID, and TAXYR
    - If no match, get match from just PARID and TAXYR, ONLY IF there is a single transcation in the given TAXYR for that PARID
    - Else, try to find an exact owner name match from all parcel data, not limited to PARID and TAXYR; use the last match if a match is found (last because that is most recent address of company)
    - Maybe try GA Business REGISTRY -----
    - Where none of the above methods work, drop if total count is insignificant
- Aggregate sales for each year by their owner address, identify the number of purchase, sell, and total transcations of that owner in the given TAXYR.

In [21]:
# Minor cleaning on GRANTEE, GRANTOR, and Own1 (parcel data)
# Regex to clean by replacing dots, commas, and multiple spaces
# Also make all strings uppercase (they should be already)
re_dots_commas = r"[.,]+"
re_multiple_spaces = r"\s{2,}"

for col in ["GRANTEE", "GRANTOR"]:
    fulton_sales_all[col] = fulton_sales_all[col].str.replace(
        re_dots_commas, "", regex=True
    ).str.replace(
        re_multiple_spaces, " ", regex=True
    ).str.upper()
    
digest_full_geo_nbhd["Own1"] = digest_full_geo_nbhd["Own1"].str.replace(
    re_dots_commas, "", regex=True
).str.replace(
    re_multiple_spaces, " ", regex=True
).str.upper()

In [22]:
# Verify that GRANTEE = Own1 in TAXYR, GRANTOR = Own1 in PREVIOUS TAXYR
# where GRANTEE/GRANTOR from sale data, Own1 from parcel data.
# Take a random PARID with at least one sale, pull up its sales and parcel data, then compare

# Note: first time this was ran, sample was PARID="14 012400100182"
# fulton_sales_all.sample(1)["PARID"]
fulton_sales_all[fulton_sales_all["PARID"] == "14 012400100182"][["TAXYR", "PARID", "GRANTOR", "GRANTEE"]].sort_values(by="TAXYR")

Unnamed: 0,TAXYR,PARID,GRANTOR,GRANTEE
29610,2022,14 012400100182,DANLEY DEVELOPMENT GROUP INC,SCHNEIDER KRISTIN ANNE &


In [23]:
digest_full_geo_nbhd[digest_full_geo_nbhd["PARID"] == "14 012400100182"][["TAXYR", "PARID", "Own1"]].sort_values(by="TAXYR")

Unnamed: 0,TAXYR,PARID,Own1
2469107,2010,14 012400100182,BANNISTER EARLENE V
2469106,2011,14 012400100182,BANNISTER EARLENE V
2469108,2012,14 012400100182,BANNISTER EARLENE V
2469109,2013,14 012400100182,BANNISTER EARLENE V
2469110,2014,14 012400100182,BANNISTER EARLENE V
2469104,2015,14 012400100182,BANNISTER EARLENE V
2469105,2017,14 012400100182,BANNISTER EARLENE V
2469111,2018,14 012400100182,HERSHBERGER JAMES
2469112,2019,14 012400100182,GREEN ENERGY LIGHTING LLC
2469113,2020,14 012400100182,DANLEY DEVELOPMENT GROUP INC


In [24]:
fulton_sales_all.sample(5)

Unnamed: 0,TAXYR,PARID,Luc,Saledt,SALES PRICE,FAIR MARKET VALUE,DEED TYPE,Costval,Saleval,GRANTOR,GRANTEE
553,2018,06 038400011000,101,20-MAR-2017,459900.0,392500.0,WD,392500,0,TIMMERMAN IV HERBERT H,WILLIAMS LAUREN &
12631,2015,17 005600020401,101,12-SEP-2014,425000.0,210500.0,LW,210500,0,WALKER PAULA M,THIBODEAU SHANNON & CARL
49804,2019,22 479010531333,101,06-AUG-2018,734500.0,734500.0,LW,724600,0,BURNEY PATRICIA MASSEY,DU HOANG LAM & LE LIEN THIKIM
3687,2017,11 122004520614,101,31-OCT-2016,778940.0,674200.0,LW,674200,0,ASHTON ATLANTA RESIDENTIAL LLC,COLLIER CORLISS Y &
2341,2015,11 059102210138,101,10-MAR-2014,363000.0,356700.0,LW,356700,0,SATTERWHITE THOMAS S,SOLEM RANDALL


In [25]:
count_sales_yr = pd.DataFrame(
    fulton_sales_all.groupby(["TAXYR", "PARID"])["PARID"].count()
).rename(columns={"PARID": "count_sales_yr"})

fulton_sales_all = fulton_sales_all.merge(
    count_sales_yr,
    on=["TAXYR", "PARID"],
    how="inner"
)

more_than_one_sale_yr = len(
    fulton_sales_all[fulton_sales_all["count_sales_yr"] > 1].drop_duplicates(
        subset=["TAXYR", "PARID"]
    )
)
print(f"Count of properties that sold multiple times in one year: {more_than_one_sale_yr}")
count_sales_yr.sort_values(by="count_sales_yr", ascending=False).head(5)

Count of properties that sold multiple times in one year: 1177


Unnamed: 0_level_0,Unnamed: 1_level_0,count_sales_yr
TAXYR,PARID,Unnamed: 2_level_1
2014,14 008900040456,4
2013,14 003900070074,4
2014,14 003500030759,4
2011,14 016300160964,4
2015,14 015900040097,4


In [26]:
digest_df = digest_full_geo_nbhd[['PARID', 'TAXYR', 'owner_addr', 'Own1']].copy(deep=True)
# TODO change to use last
for person in ["GRANTEE", "GRANTOR"]:
    if person == "GRANTOR":
        digest_df["TAXYR"] = digest_df["TAXYR"] + 1
    matches = {
        "exact": {"left": ['PARID', 'TAXYR', person], "right": ['PARID', 'TAXYR', 'Own1']},
        "single_sale": {"left": ['PARID', 'TAXYR'], "right": ['PARID', 'TAXYR']},
        "only_exact_name": {"left": [person], "right": ['Own1']}
    }
    for match in matches:
    
        df = fulton_sales_all
        if match == "single_sale":
            df = df[df["count_sales_yr"] == 1]
            matched_df = df.merge(
                digest_df[['PARID', 'TAXYR', 'owner_addr', 'Own1']],
                on=matches[match]["left"],
                how='inner'
            )
        elif match == "only_exact_name":
            matched_df = df.merge(
                digest_df.drop_duplicates(subset=["Own1"], keep="last")[['Own1', 'owner_addr']],
                left_on=matches[match]["left"],
                right_on=matches[match]["right"],
                how='inner'
            )
        else:
            matched_df = df.merge(
                digest_df[['PARID', 'TAXYR', 'owner_addr', 'Own1']],
                left_on=matches[match]["left"],
                right_on=matches[match]["right"],
                how='inner'
            )
        
        if match == "only_exact_name":
            fulton_sales_all = fulton_sales_all.merge(
                matched_df[["Own1", "owner_addr"]].drop_duplicates(),
                left_on=matches[match]["left"],
                right_on=matches[match]["right"],
                how="left"
            ).rename(
                columns={"Own1": f"{person}_{match}", "owner_addr": f"{person}_{match}_addr"}
            )
        else:
            fulton_sales_all = fulton_sales_all.merge(
                matched_df[["TAXYR", "PARID", "Own1", "owner_addr"]],
                on=["TAXYR", "PARID"],
                how="left"
            ).rename(
                columns={"Own1": f"{person}_{match}", "owner_addr": f"{person}_{match}_addr"}
            )
        
        display(fulton_sales_all[["TAXYR", "PARID", f"{person}", f"{person}_{match}"]].sample(5))

Unnamed: 0,TAXYR,PARID,GRANTEE,GRANTEE_exact
56020,2018,11 103203640726,MAYER CHRISTOPHER P &,MAYER CHRISTOPHER P &
102423,2022,11 044001621403,MOSINDI GLORY I,
76246,2020,09F070000263648,CARSON VICTOR,CARSON VICTOR
84977,2020,17 011100080101,WILLIAMS JUSTIN & JESSICA,WILLIAMS JUSTIN & JESSICA
19614,2014,11 090103210223,PERALTA ANTHONY,


Unnamed: 0,TAXYR,PARID,GRANTEE,GRANTEE_single_sale
66439,2019,11 046701910473,KELLY PATRICIA,KELLY PATRICIA &
89218,2021,09F410101720072,BERKLEY RAMON A,BERKLEY RAMON A
92395,2021,13 0193 LL1009,SHAHID YAASMEEN,SHAHID YAASMEEN
30671,2015,12 181103440577,ADAIR CANDACE L,ADAIR CANDACE L
3444,2011,14F0128 LL1696,MASON HERMON L,MASON HERMON L


Unnamed: 0,TAXYR,PARID,GRANTEE,GRANTEE_only_exact_name
34935,2015,17 001100010224,SMITH WILLIAM CALVIN III &,SMITH WILLIAM CALVIN III &
62872,2018,21 575011920474,HOLLOWAY CELENA,HOLLOWAY CELENA
95507,2021,14F0031 LL0983,EVBUOMWAN TYRONE,EVBUOMWAN TYRONE
7562,2012,14 014100030776,BATTLE FRANCINA,BATTLE FRANCINA
12822,2013,13 012900020502,JALALA LLC,JALALA LLC


Unnamed: 0,TAXYR,PARID,GRANTOR,GRANTOR_exact
24810,2014,17 003500030182,DOMINICK CHRISTIAN & JESSICA D,
3177,2011,14 022400010190,VALDIVIA ENRIQUE,VALDIVIA ENRIQUE
76386,2020,09F070000262145,URREGO IVAN ARMANDO &,URREGO IVAN ARMANDO &
26363,2014,17 0184 LL0227,DUVAL-ARNOULD ALEX,
22030,2014,14 003500030759,RBP LLC,MARKETING ADVANTAGE GROUP LLC


Unnamed: 0,TAXYR,PARID,GRANTOR,GRANTOR_single_sale
48000,2017,14 011700050253,EUPHORBIA PROPERTIES LLC,EUPHORBIA PROPERTIES LLC
97893,2021,17 015200020073,LYON ROBERT D,LYON ROBERT D
9699,2012,22 325010070253,MURPHY JAMES & BERTHA ALICIA,MURPHY JAMES & BERTHA ALICIA
12808,2013,13 006700010606,STOVALL LEONARD CHARLES,STOVALL LEONARD C & DENISE J
33221,2015,14 010300010238,WHITE SANDY,WHITE SANDY


Unnamed: 0,TAXYR,PARID,GRANTOR,GRANTOR_only_exact_name
97467,2021,17 010500120129,MICHAEL D HAYFORD & JILL K HAYFORD 1997,
26604,2014,17 022600070500,SOUTHERN DEVELOPMENT GROUP LLC,
55329,2018,11 007200130237,NEETA MIRPURI AKA NEETA SANDERS AND DAVI,
72694,2019,17 008800030261,OSTERMANN GAIL S,
47202,2017,14 004300020156,TURNER BRADLEY,TURNER BRADLEY


In [27]:
for person in ["GRANTEE", "GRANTOR"]:
    print(f"Person: {person} ---")
    fulton_sales_all[f"{person}_match"] = fulton_sales_all[f"{person}_exact"]
    fulton_sales_all[f"{person}_match_addr"] = fulton_sales_all[f"{person}_exact_addr"]
    num_matched = len(fulton_sales_all[fulton_sales_all[f'{person}_match'].notna()])
    print(f"Number exact matched: {num_matched}")
    print(f"Pct exact matched: {num_matched / len(fulton_sales_all)}")
    print("")
    
    for match in ["single_sale", "only_exact_name"]:
        fulton_sales_all[f"{person}_match"] = fulton_sales_all[f"{person}_match"].fillna(
            fulton_sales_all[f"{person}_{match}"]
        )
        fulton_sales_all[f"{person}_match_addr"] = fulton_sales_all[f"{person}_match_addr"].fillna(
            fulton_sales_all[f"{person}_{match}_addr"]
        )
        prev_matched = num_matched
        num_matched = len(fulton_sales_all[fulton_sales_all[f'{person}_match'].notna()])
        print(f"Number of additional matches with {match}: {num_matched - prev_matched}")
        print(f"Number prev matches + {match} matched: {num_matched}")
        print(f"Pct prev matches + {match} matched: {num_matched / len(fulton_sales_all)}")
        print("")
    
    print("")
    print("")

Person: GRANTEE ---
Number exact matched: 101370
Pct exact matched: 0.8927108927108927

Number of additional matches with single_sale: 10960
Number prev matches + single_sale matched: 112330
Pct prev matches + single_sale matched: 0.9892296989071183

Number of additional matches with only_exact_name: 430
Number prev matches + only_exact_name matched: 112760
Pct prev matches + only_exact_name matched: 0.9930164768874447



Person: GRANTOR ---
Number exact matched: 44225
Pct exact matched: 0.3894657120463572

Number of additional matches with single_sale: 54605
Number prev matches + single_sale matched: 98830
Pct prev matches + single_sale matched: 0.8703424832457091

Number of additional matches with only_exact_name: 10150
Number prev matches + only_exact_name matched: 108980
Pct prev matches + only_exact_name matched: 0.95972805650225





Let's briefly investigate and see if the low percent of exact GRANTOR matches is problematic.

In [28]:
fulton_sales_all.sample(15)[["TAXYR", "PARID", "GRANTOR", "GRANTOR_match"]].sort_values(by="TAXYR")

Unnamed: 0,TAXYR,PARID,GRANTOR,GRANTOR_match
20032,2014,12 156103060286,GOETERS DONALD WILLIAM,GOETERS DONALD W & ELIZABETH R
28480,2015,07 150001405503,D R HORTON INC,D R HORTON INC
49089,2017,14 022900010260,GENESIS CONSTRUCTION AND CONSULTING GROU,HUTTO PATRICIA B
62489,2018,17 0228 LL0662,BROCK BUILT HOMES LLC,BROCK BUILT HOMES LLC
59885,2018,14F001600060084,MARCHEL JOHN EDWARD,MARCHEL JOHN EDWARD
71403,2019,14F006400040046,LAMBERT KEISHA D,LAMBERT KEISHA D
72743,2019,17 009300020299,58 SS LLC A GEORGIA LIMITED LIABILITY,FIFTY EIGHT SS LLC
90773,2021,12 160102360440,SINDELAR DOUGLAS KENNETH,SINDELAR DOUGLAS KENNETH & JODI LYNN
89828,2021,11 047001881968,CURRAN TIMOTHY F,CURRAN TIMOTHY F
106863,2022,14 011600080293,VUONG HOMES LLC,VUONG HOMES LLC


**Investigate PARID = "14 015200120227" for 2017 where GRANTOR = KINGDOM REALTY LLC and GRANTOR_match = HUNTER TROY H JR**

In [29]:
def check_parid(parid: str):
    # Where did we get the match from?
    print("Sales data with match info")
    display(fulton_sales_all[fulton_sales_all["PARID"] == parid][
        [
            "TAXYR", "PARID", "Saledt", "GRANTEE", "GRANTOR", "GRANTOR_match", "GRANTOR_exact",
            "GRANTOR_single_sale", "GRANTOR_only_exact_name", "GRANTOR_match_addr", "SALES PRICE"
        ]
    ].sort_values(by="TAXYR"))
    print("")
    print("Digest data with parcel info")
    display(digest_full_geo_nbhd[digest_full_geo_nbhd["PARID"] == parid][
        [
            "PARID", "TAXYR", "Own1", "Own2", "owner_addr"
        ]
    ].sort_values(by="TAXYR"))

In [30]:
check_parid("14 015200120227")

Sales data with match info


Unnamed: 0,TAXYR,PARID,Saledt,GRANTEE,GRANTOR,GRANTOR_match,GRANTOR_exact,GRANTOR_single_sale,GRANTOR_only_exact_name,GRANTOR_match_addr,SALES PRICE
48536,2017,14 015200120227,16-JUN-2016,DIVINE DREAM HOMES LLC,KINGDOM REALTY LLC,HUNTER TROY H JR,,HUNTER TROY H JR,,1306 LOCKHAVEN 30311,39000.0



Digest data with parcel info


Unnamed: 0,PARID,TAXYR,Own1,Own2,owner_addr
1818625,14 015200120227,2010,HUNTER TROY H JR,,1306 LOCKHAVEN 30311
1818626,14 015200120227,2011,HUNTER TROY H JR,,1306 LOCKHAVEN 30311
1818627,14 015200120227,2012,HUNTER TROY H JR,,1306 LOCKHAVEN 30311
1818628,14 015200120227,2013,HUNTER TROY H JR,,1306 LOCKHAVEN 30311
1818629,14 015200120227,2014,HUNTER TROY H JR,,1306 LOCKHAVEN 30311
1818623,14 015200120227,2015,HUNTER TROY H JR,,1306 LOCKHAVEN 30311
1818635,14 015200120227,2016,HUNTER TROY H JR,,1306 LOCKHAVEN 30311
1818624,14 015200120227,2017,DIVINE DREAM HOMES LLC,,2345 CAREY 30315
1818630,14 015200120227,2018,DIVINE DREAM HOMES LLC,,2345 CAREY 30315
1818631,14 015200120227,2019,DIVINE DREAM HOMES LLC,,2345 CAREY 30315


KINGDOM REALTY LLC website: "With our proprietary marketing systems, we find the best properties in foreclosure, bank owned foreclosures, Metro Atlanta investment properties for sale, handyman deals, fixer uppers, discount homes, distressed property, and buy them at great win-win prices for both us and the home seller."

**Investigate PARID = "14 007500040379" for 2019 where GRANTOR = PEACHTREE ASSET MANAGEMENT LLC and GRANTOR_match = ATL 700 800 BLOCK HOLDINGS LLC**

In [31]:
check_parid("14 007500040379")

Sales data with match info


Unnamed: 0,TAXYR,PARID,Saledt,GRANTEE,GRANTOR,GRANTOR_match,GRANTOR_exact,GRANTOR_single_sale,GRANTOR_only_exact_name,GRANTOR_match_addr,SALES PRICE
69812,2019,14 007500040379,12-OCT-2018,SUTIC MILJAN,PEACHTREE ASSET MANAGEMENT LLC,ATL 700 800 BLOCK HOLDINGS LLC,,ATL 700 800 BLOCK HOLDINGS LLC,PEACHTREE ASSET MANAGEMENT LLC,2203 CUMBERLAND 30339,235000.0
93636,2021,14 007500040379,24-JUL-2020,WALKER LAURIE,SUTIC MILJAN,SUTIC MILJAN,SUTIC MILJAN,SUTIC MILJAN,SUTIC MILJAN,581 FORMWALT 30312,330000.0



Digest data with parcel info


Unnamed: 0,PARID,TAXYR,Own1,Own2,owner_addr
2774987,14 007500040379,2010,CMB HOMES LLC,,117 OAKWIND POINTE 30101
2774986,14 007500040379,2011,ATL 700 800 BLOCK HOLDINGS LLC,,117 OAKWIND POINTE 30101
2774988,14 007500040379,2012,ATL 700 800 BLOCK HOLDINGS LLC,,117 OAKWIND POINTE 30101
2774989,14 007500040379,2013,ATL 700 800 BLOCK HOLDINGS LLC,,5033 COLCHESTER 30080
2774990,14 007500040379,2014,ATL 700 800 BLOCK HOLDINGS LLC,,2203 CUMBERLAND 30339
2774984,14 007500040379,2015,ATL 700 800 BLOCK HOLDINGS LLC,,2203 CUMBERLAND 30339
2774996,14 007500040379,2016,ATL 700 800 BLOCK HOLDINGS LLC,,2203 CUMBERLAND 30339
2774985,14 007500040379,2017,ATL 700 800 BLOCK HOLDINGS LLC,,2203 CUMBERLAND 30339
2774991,14 007500040379,2018,ATL 700 800 BLOCK HOLDINGS LLC,,2203 CUMBERLAND 30339
2774992,14 007500040379,2019,SUTIC MILJAN,,581 FORMWALT 30312


In [32]:
digest_full_geo_nbhd[digest_full_geo_nbhd["Own1"] == "PEACHTREE ASSET MANAGEMENT LLC"][["PARID", "TAXYR", "Own1", "Own2", "owner_addr"]].sample(5).sort_values(by="TAXYR")

Unnamed: 0,PARID,TAXYR,Own1,Own2,owner_addr
2380629,14 008600031201,2014,PEACHTREE ASSET MANAGEMENT LLC,,5033 COLCHESTER 30080
2377590,14 007500040213,2017,PEACHTREE ASSET MANAGEMENT LLC,,2203 CUMBERLAND 30339
2377713,14 007500040825,2017,PEACHTREE ASSET MANAGEMENT LLC,,2203 CUMBERLAND 30339
2380624,14 008600031201,2017,PEACHTREE ASSET MANAGEMENT LLC,,5033 COLCHESTER 30080
2377597,14 007500040213,2019,PEACHTREE ASSET MANAGEMENT LLC,,2203 CUMBERLAND 30339


PEACHTREE ASSET MANAGEMENT LLC uses two addresses; ATL 700 800 BLOCK HOLDINGS LLC uses the same two addresses. They can be treated as the same entity based on address, and are by this methodology.

**Investigate PARID = "11 108003863303" for 2022 where GRANTOR = TPG HOMES AT BELLMORE LLC	and GRANTOR_match = JOHNS CREEK 206 LLC**

In [33]:
check_parid("11 108003863303")

Sales data with match info


Unnamed: 0,TAXYR,PARID,Saledt,GRANTEE,GRANTOR,GRANTOR_match,GRANTOR_exact,GRANTOR_single_sale,GRANTOR_only_exact_name,GRANTOR_match_addr,SALES PRICE
103206,2022,11 108003863303,29-JUN-2021,MOGAL GHOUSE BAIG,TPG HOMES AT BELLMORE LLC,JOHNS CREEK 206 LLC,,JOHNS CREEK 206 LLC,TPG HOMES AT BELLMORE LLC,3131 HARVARD 75205,517299.0



Digest data with parcel info


Unnamed: 0,PARID,TAXYR,Own1,Own2,owner_addr
2771936,11 108003863303,2021,JOHNS CREEK 206 LLC,,3131 HARVARD 75205
2771937,11 108003863303,2022,MOGAL GHOUSE BAIG,,1447 CALVERT 30097


JOHNS CREEK 206 LLC owned the property but TPG HOMES AT BELLMORE LLC carried out the transcation

In [34]:
digest_full_geo_nbhd[digest_full_geo_nbhd["Own1"] == "TPG HOMES AT BELLMORE LLC"][["PARID", "TAXYR", "Own1", "Own2", "owner_addr"]].sort_values(by="TAXYR")

Unnamed: 0,PARID,TAXYR,Own1,Own2,owner_addr
2741742,11 108003952064,2019,TPG HOMES AT BELLMORE LLC,,11340 LAKEFIELD 30097
2758795,11 108003852793,2019,TPG HOMES AT BELLMORE LLC,,11340 LAKEFIELD 30097


In [35]:
fulton_sales_all[fulton_sales_all["GRANTOR_match_addr"] == "11340 LAKEFIELD 30097"][["TAXYR", "PARID", "GRANTOR", "GRANTOR_match", "GRANTOR_match_addr"]].sample(5).sort_values(by="TAXYR")

Unnamed: 0,TAXYR,PARID,GRANTOR,GRANTOR_match,GRANTOR_match_addr
56192,2018,11 108003951637,TPG HOMES AT BELLMOORE LLC,TPG HOMES AT BELLMOORE L L C,11340 LAKEFIELD 30097
68209,2019,12 270307481196,THE PROVIDENCE GROUP OF GEORGIA CUSTOM H,THE PROVIDENCE GROUP OF GEORGIA CUSTOM H,11340 LAKEFIELD 30097
90369,2021,11 108003951397,TPG HOMES AT BELLMOORE LLC,TPG HOMES AT BELLMOORE LLC,11340 LAKEFIELD 30097
90484,2021,11 114003951619,TPG HOMES AT BELLMOORE LLC,TPG HOMES AT BELLMOORE L L C,11340 LAKEFIELD 30097
103313,2022,11 114004171886,TPG HOMES AT BELLMOORE LLC,TPG HOMES AT BELLMOORE LLC,11340 LAKEFIELD 30097


TPG HOMES AT BELLMORE LLC has many different entities being picked up together as the same address; the case with JOHNS CREEK 206 LLC is a bit strange

In [36]:
digest_full_geo_nbhd[digest_full_geo_nbhd["Own1"] == "JOHNS CREEK 206 LLC"][["PARID", "TAXYR", "Own1", "Own2", "owner_addr"]].sample(5).sort_values(by="TAXYR")

Unnamed: 0,PARID,TAXYR,Own1,Own2,owner_addr
2758749,11 108003852439,2019,JOHNS CREEK 206 LLC,,3131 HARVARD 75205
2741746,11 108003952007,2019,JOHNS CREEK 206 LLC,,3131 HARVARD 75205
2741697,11 114004081499,2019,JOHNS CREEK 206 LLC,,3131 HARVARD 75205
2771974,11 108003853338,2021,JOHNS CREEK 206 LLC,,3131 HARVARD 75205
2771960,11 108003853254,2021,JOHNS CREEK 206 LLC,,3131 HARVARD 75205


In [37]:
check_parid("11 108003852546")

Sales data with match info


Unnamed: 0,TAXYR,PARID,Saledt,GRANTEE,GRANTOR,GRANTOR_match,GRANTOR_exact,GRANTOR_single_sale,GRANTOR_only_exact_name,GRANTOR_match_addr,SALES PRICE
78326,2020,11 108003852546,26-JUN-2019,SPARKS NICOLE,TPG HOMES AT BELLMOORE LLC,JOHNS CREEK 206 LLC,,JOHNS CREEK 206 LLC,TPG HOMES AT BELLMOORE LLC,3131 HARVARD 75205,627692.0



Digest data with parcel info


Unnamed: 0,PARID,TAXYR,Own1,Own2,owner_addr
2758930,11 108003852546,2019,JOHNS CREEK 206 LLC,,3131 HARVARD 75205
2758931,11 108003852546,2020,SPARKS NICOLE & TIMOTHY,,1189 HANNAFORD 30097
2758932,11 108003852546,2021,SPARKS NICOLE & TIMOTHY,,1189 HANNAFORD 30097
2758933,11 108003852546,2022,SPARKS NICOLE & TIMOTHY,,1189 HANNAFORD 30097


Randomly pulling up another parcel owned by JOHNS CREEK 206 LLC, we can see it was sold to TPG HOMES AT BELLMORE LLC; these companies likely have some sort of uncaptured relationship.

**Investigate PARID = "17 004200030217" for 2010 where GRANTOR = GAKSTATTER FRED VOLKER	and GRANTOR_match = WALZER HELEN S**

In [38]:
check_parid("17 009800030467")

Sales data with match info


Unnamed: 0,TAXYR,PARID,Saledt,GRANTEE,GRANTOR,GRANTOR_match,GRANTOR_exact,GRANTOR_single_sale,GRANTOR_only_exact_name,GRANTOR_match_addr,SALES PRICE
4064,2011,17 009800030467,30-SEP-2010,NELSEN MATTHEW S,GAKSTATTER FRED VOLKER,WALZER HELEN S,,WALZER HELEN S,,2768 BRIDLE RIDGE 30519,252000.0
50700,2017,17 009800030467,04-OCT-2016,KADAVIL JOE &,NELSEN MATTHEW S,NELSEN MATTHEW S,NELSEN MATTHEW S,NELSEN MATTHEW S,NELSEN MATTHEW S,37 LAKELAND 30305,516000.0



Digest data with parcel info


Unnamed: 0,PARID,TAXYR,Own1,Own2,owner_addr
1133725,17 009800030467,2010,WALZER HELEN S,,2768 BRIDLE RIDGE 30519
1133726,17 009800030467,2011,NELSEN MATTHEW S,NELSEN HEATHER M,37 LAKELAND 30305
1133727,17 009800030467,2012,NELSEN MATTHEW S,NELSEN HEATHER M,37 LAKELAND 30305
1133728,17 009800030467,2013,NELSEN MATTHEW S,NELSEN HEATHER M,37 LAKELAND 30305
1133729,17 009800030467,2014,NELSEN MATTHEW S,NELSEN HEATHER M,37 LAKELAND 30305
1133730,17 009800030467,2015,NELSEN MATTHEW S,NELSEN HEATHER M,37 LAKELAND 30305
1133731,17 009800030467,2016,NELSEN MATTHEW S,NELSEN HEATHER M,37 LAKELAND 30305
1133732,17 009800030467,2017,KADAVIL JOE &,KADAVIL ANU,37 LAKELAND 30305
1133723,17 009800030467,2018,KADAVIL JOE &,KADAVIL ANU,37 LAKELAND 30305
1133724,17 009800030467,2019,KADAVIL JOE &,KADAVIL ANU,418 COLONY LAKE ESTATES 77477


Looks like multiple individuals owned the property but only one sold (divorce etc. can explain, such property transfers were excluded), not problematic.

**Investigate PARID = "09F140000803501" for 2017 where GRANTOR = DR HORTON-WPH LLC and GRANTOR_match = NA**

In [39]:
check_parid("09F140000803501")

Sales data with match info


Unnamed: 0,TAXYR,PARID,Saledt,GRANTEE,GRANTOR,GRANTOR_match,GRANTOR_exact,GRANTOR_single_sale,GRANTOR_only_exact_name,GRANTOR_match_addr,SALES PRICE
43007,2017,09F140000803501,10-NOV-2016,HART RONALD L,DR HORTON-WPH LLC,,,,,,145645.0
88720,2021,09F140000803501,11-MAY-2020,WEEKES GAIL,SPH PROPERTY TWO LLC A DELAWARE LIMITE,HART RONALD L,,HART RONALD L,,6352 WOODWELL 30291,170000.0



Digest data with parcel info


Unnamed: 0,PARID,TAXYR,Own1,Own2,owner_addr
167775,09F140000803501,2017,HART RONALD L,,6352 WOODWELL 30291
167776,09F140000803501,2018,HART RONALD L,,6352 WOODWELL 30291
167777,09F140000803501,2019,HART RONALD L,,6352 WOODWELL 30291
167778,09F140000803501,2020,HART RONALD L,,6352 WOODWELL 30291
167779,09F140000803501,2021,WEEKES GAIL,,6352 WOODWELL 30291
167780,09F140000803501,2022,WEEKES GAIL,,6352 WOODWELL 30291


Parcel did not exist in records before 2017 and the company name could not be matched with any other methods. This is fine, we don't want false positives.

**Investigate PARID = "14 015200100104" for 2012 where GRANTOR = CSF ENTERPRISES LLC and GRANTOR_match = CPI HOUSING FUND LLC; and 2015 where GRANTOR = ELKINS INVESTMENT LLC and GRANTOR_match = NA**

In [40]:
check_parid("14 015200100104")

Sales data with match info


Unnamed: 0,TAXYR,PARID,Saledt,GRANTEE,GRANTOR,GRANTOR_match,GRANTOR_exact,GRANTOR_single_sale,GRANTOR_only_exact_name,GRANTOR_match_addr,SALES PRICE
7671,2012,14 015200100104,14-APR-2011,ELKINS INVESTMENTS LLC,CSF ENTERPRISES LLC,CSF ENTERPRISES LLC,,,CSF ENTERPRISES LLC,212 16TH 30363,15500.0
33873,2015,14 015200100104,21-JAN-2014,RHA 1 LLC,ELKINS INVESTMENT LLC,,,,,,34000.0



Digest data with parcel info


Unnamed: 0,PARID,TAXYR,Own1,Own2,owner_addr
1817006,14 015200100104,2010,BABATOPE DAVID A,,0 PO BOX 747 30168
1817004,14 015200100104,2015,RHA 1 LLC,,3505 KOGER 30096
1817016,14 015200100104,2016,RHA 1 LLC,,3505 KOGER 30096
1817005,14 015200100104,2017,RHA 1 LLC,,3505 KOGER 30096
1817011,14 015200100104,2018,RHA 1 LLC,,3505 KOGER 30096
1817012,14 015200100104,2019,FYR SFR BORROWER LLC,,5100 TAMARIND REEF 820
1817013,14 015200100104,2020,FYR SFR BORROWER LLC,,3505 KOGER BLVD 30096
1817014,14 015200100104,2021,FYR SFR BORROWER LLC,,3505 KOGER 30096
1817015,14 015200100104,2022,FYR SFR BORROWER LLC,,3505 KOGER 30096


Looks like CSF Enterprises and CPI Housing Fund are related and the former carried out the transcation for the latter. We are also missing some parcel records for 2012-2014 (see below if this is a common issue)

**Roughly many parcels are missing an entry during a year?**

In [41]:
count_of_records_parcel = pd.DataFrame(digest_full_geo_nbhd.groupby(by="PARID")["PARID"].count()).rename(columns={"PARID": "count_records"})
print(f"Total number of parcels: {len(count_of_records_parcel)}")
print(f"Number of parcels that don't have a record for every year in period: {len(count_of_records_parcel[count_of_records_parcel["count_records"] < 13])}")

tot_parcels_begin = digest_full_geo_nbhd[digest_full_geo_nbhd["TAXYR"] == 2010].merge(
    count_of_records_parcel,
    on="PARID",
    how="inner"
)

count_missing = tot_parcels_begin[tot_parcels_begin["count_records"] < 13]
print(f"Total number of parcels around at start of study period: {len(tot_parcels_begin)}")
print(f"Number of parcels around at start of period that don't have any entry for every year: {len(count_missing)}")

Total number of parcels: 226986
Number of parcels that don't have a record for every year in period: 55174
Total number of parcels around at start of study period: 200925
Number of parcels around at start of period that don't have any entry for every year: 29113


A fair portion of parcels don't have records for every year, but most of those are parcels which were not around during the beginning of the period. They were likely created during the study period. For parcels that were around at the beginning, less than 10% are missing any records.

Validate entire process is working as expected with a sample

In [42]:
fulton_sales_all.sample(20)[
    [
        "TAXYR", "PARID", "GRANTEE", "GRANTEE_match",
        "GRANTOR", "GRANTOR_match"
    ]
].sort_values(by="TAXYR")

Unnamed: 0,TAXYR,PARID,GRANTEE,GRANTEE_match,GRANTOR,GRANTOR_match
4782,2011,21 566011690779,WISE OLIN M & KAREN F,WISE OLIN M & KAREN F,GORDON SEAN H & DZINTRA A,GORDON SEAN H & DZINTRA A
1735,2011,12 292108200153,KALBER RICHARD & SUSAN,KALBER RICHARD & SUSAN,BARR STEPHEN D & NANCY NORTON,BARR STEPHEN D & NANCY N
16100,2013,17 015200010082,ROBY DARRIN J,ROBY DARRIN J,BUSSONE MARC D & ALLISON S,BUSSONE MARC D & ALLISON S
25173,2014,17 006000060690,NEUROTH STEVE &,NEUROTH STEVE &,STUCKEY CHARLOTTE DIANE,STUCKEY CHARLOTTE D
24948,2014,17 004600130229,KOWAL CHRISTOPHER R &,KOWAL CHRISTOPHER R &,VELARDE DAYTON STOUT,VELARDE DAYTON STOUT &
29398,2015,11 004000073121,DACK JOSHUA &,DACK JOSHUA &,SILVERBERG BENJAMIN P,SILVERBERG BENJAMIN P &
38288,2015,22 518003980294,SMITH COREY & TOSHA,SMITH COREY & TOSHA,THRIVE HOMES LLC,THRIVE HOMES LLC
38346,2015,22 537006131325,SRIGANESHA SELLATHURAI,SRIGANESHA SELLATHURAI &,PONDER SCOTT C,PONDER SCOTT C &
38005,2015,22 452002410725,PEACHTREE RESIDENTIAL LLC,PEACHTREE RESIDENTIAL LLC,GDCI GA 4 LP,
33102,2015,14 008700060159,NAVANTI ENTERPRISES LLC,NAVANTI ENTERPRISES LLC,BURRIS DAMES GLOBAL FIRM INC,KANU SAMUEL


In [43]:
# Save output
fulton_sales_all.to_csv("./output/fulton_sales_owner_matches.csv", index=False)

### Identify corporate owners, create corp owner flags for each record (grantee, grantor, and own1 in sales and digest);

### For each sale, create a dummy variable for each sale type: corp purchase from ind, ind purchase from ind, corp sale to ind, ind sale from ind (should be identical to other ind to ind metric)
Flags:
- One for any corp owner
- One for corps who bought from ind after the study period began (depends on sale type matrix)

In [44]:
# Any with risk of false positive like "CO" need to have a space prepended or postpended
corp_keywords = [
    'LLC', ' INC', 'LLP', 'L.L.C', 'L.L.P', 'I.N.C', 'L L C',
    'L L P', ' L P', 'LTD', ' CORP', 'CORPORATION',
    'COMPANY', ' CO ', 'LIMITED', 'PARTNERSHIP', 'PARTNERSHIPS',
    'ASSOCIATION', 'ASSOC', 'INCORPORATED', 'INCORP',
    'L.T.D', 'LTD', "HOME"
]

# Make a list of all corp owners
corps = fulton_sales_all[
    fulton_sales_all['GRANTEE'].apply(lambda x: any([key in str(x) for key in corp_keywords]))
]['GRANTEE'].unique().tolist() + fulton_sales_all[
    fulton_sales_all['GRANTOR'].apply(lambda x: any([key in str(x) for key in corp_keywords]))
]['GRANTOR'].unique().tolist() + digest_full_geo_nbhd[
    digest_full_geo_nbhd["Own1"].apply(lambda x: any([key in str(x) for key in corp_keywords]))
]['Own1'].unique().tolist()

with open("./output/corp_names.txt", "w") as f:
    f.write("\n".join(corps))

In [45]:
# Flag for any corp owner
fulton_sales_all["GRANTEE_corp_flag"] = fulton_sales_all['GRANTEE'].isin(corps).astype(int)
fulton_sales_all["GRANTOR_corp_flag"] = fulton_sales_all['GRANTOR'].isin(corps).astype(int)

digest_full_geo_nbhd["own_corp_flag"] = digest_full_geo_nbhd["Own1"].isin(corps).astype(int)

# Sale type matrix

fulton_sales_all['corp_bought_ind'] = 0
fulton_sales_all['ind_bought_ind'] = 0
fulton_sales_all['corp_sold_ind'] = 0
fulton_sales_all['ind_sold_ind'] = 0

fulton_sales_all.loc[
    (fulton_sales_all["GRANTEE_corp_flag"] == 1) & (fulton_sales_all["GRANTOR_corp_flag"] == 0), 'corp_bought_ind'
] = 1
fulton_sales_all.loc[
    (fulton_sales_all["GRANTEE_corp_flag"] == 0) & (fulton_sales_all["GRANTOR_corp_flag"] == 0), 'ind_bought_ind'
] = 1
fulton_sales_all.loc[
    (fulton_sales_all["GRANTEE_corp_flag"] == 0) & (fulton_sales_all["GRANTOR_corp_flag"] == 1), 'corp_sold_ind'
] = 1
fulton_sales_all.loc[
    (fulton_sales_all["GRANTEE_corp_flag"] == 0) & (fulton_sales_all["GRANTOR_corp_flag"] == 0), 'ind_sold_ind'
] = 1

# Validate sale matrix is correct
fulton_sales_all[[
    "GRANTEE", "GRANTEE_corp_flag", "GRANTOR", "GRANTOR_corp_flag", "corp_bought_ind", "ind_bought_ind",
    "corp_sold_ind", "ind_sold_ind"
]].sample(10)

Unnamed: 0,GRANTEE,GRANTEE_corp_flag,GRANTOR,GRANTOR_corp_flag,corp_bought_ind,ind_bought_ind,corp_sold_ind,ind_sold_ind
93181,NIXON TIFFANY TIERA,0,ATLANTA NEIGHBORHOOD AND DEVELOPMENT PAR,0,0,1,0,1
40943,BOYD DAVID E,0,SECORD LORENE A,0,0,1,0,1
10792,THR GEORGIA LP,0,TRIBBLE DEBORA,0,0,1,0,1
39830,VAN GELDER PHILIP & MORGAN,0,HANKINS RICHARD B & MELISSA A,0,0,1,0,1
101256,BENOIT THERESSA,0,CORNERSTONE FULTON HOME BUILDERS INC,1,0,0,1,0
94141,KEEN OBAID IQBAL ET AL,0,SORAK MARK,0,0,1,0,1
86880,MOHAN KELLEN & LAURA,0,PADEN ASHLEY JONES,0,0,1,0,1
59627,DAVIS JASON JON,0,ARAIM MANAGEMENT 1 LLC,1,0,0,1,0
6093,BURTCHAELL RHONDA C,0,HILLS LAURENCE G,0,0,1,0,1
71172,DIXON GWENDOLYN CROCKETT,0,SHELLI BARNES N/K/A SHELLI BARNES DAVIS,0,0,1,0,1


In [46]:
# Flag for corp owner bought after study period began
parcels_corp_from_ind = set(fulton_sales_all[fulton_sales_all["corp_bought_ind"] == 1]['PARID'].unique())
digest_full_geo_nbhd["corp_bought_after_2010"] = digest_full_geo_nbhd["PARID"].apply(lambda x: 1 if x in parcels_corp_from_ind else 0)

# Validate
digest_full_geo_nbhd[["TAXYR", "PARID", "corp_bought_after_2010"]].sample(10)

Unnamed: 0,TAXYR,PARID,corp_bought_after_2010
1222604,2019,17 010500040335,0
2596007,2017,09F400201620918,0
1444166,2010,17 021400010187,0
35649,2013,22 495110380199,0
1946428,2010,14 021700040394,0
1403006,2015,17 009800010238,0
1104274,2018,17 014600050608,0
690241,2015,12 243205790182,0
1640448,2010,14 011500030059,0
679698,2014,12 303108420046,0


In [47]:
# Validate continued; only last should have been bought by a corp from ind
display(fulton_sales_all[fulton_sales_all["PARID"] == "14 001100110161"][["TAXYR", "PARID", "GRANTEE", "GRANTOR", "corp_bought_ind"]])
display(fulton_sales_all[fulton_sales_all["PARID"] == "17 000200100547"][["TAXYR", "PARID", "GRANTEE", "GRANTOR", "corp_bought_ind"]])
display(fulton_sales_all[fulton_sales_all["PARID"] == "14 017500110080"][["TAXYR", "PARID", "GRANTEE", "GRANTOR", "corp_bought_ind"]])
display(fulton_sales_all[fulton_sales_all["PARID"] == "12 267306750308"][["TAXYR", "PARID", "GRANTEE", "GRANTOR", "corp_bought_ind"]])

Unnamed: 0,TAXYR,PARID,GRANTEE,GRANTOR,corp_bought_ind


Unnamed: 0,TAXYR,PARID,GRANTEE,GRANTOR,corp_bought_ind
34952,2015,17 000200100547,JOYE CHARLES M &,KHAJAVI KAVEH,0


Unnamed: 0,TAXYR,PARID,GRANTEE,GRANTOR,corp_bought_ind


Unnamed: 0,TAXYR,PARID,GRANTEE,GRANTOR,corp_bought_ind
31318,2015,12 267306750308,SRP SUB LLC,MAJID ABAZERI AND ANA L CRUZ,1


### Understand distribution of corporate ownership size
Within Fulton, within Atlanta, within each neighborhood

In [None]:
# Continous, 2022 number of properties owned in Fulton by each owner
fig = px.histogram(df, x="total_bill", y="tip", color="sex",
                   marginal="box", # or violin, rug
                   hover_data=df.columns)
fig.show()

In [None]:
# Bins

### Agg each class of sale

### Get totals for Fulton then drop non-ATL and agg by neighborhoods, year, size of investor

### Track each property after purchase (or at all owned by corp during period), calculate rental income

### Normalized equity loss measure

### Equity loss burden

### Statistical test to see if FMV - SP was significant between ind and corp (ANOVA) or regression

### Create a measure of corp concentration in neighborhood to use as metric for analysis - is it just being a corp that helps, or when there's high concentration?

### Geospatial

### Do neighborhood characteristics predict equity loss

### Foreclosures?