## Combine Fulton County Parcel and Sales Data From 2010-2021

**Output:** one file which contains Fulton County sales data (as source file), with associated parcel data from the same year.

**Motivation:** the sales and parcel datasets are currently separate, and each year has its own file, making it hard to analyize easily.

**Data Sources:** provided by Fulton County. Originally named "STANDARDS DIGEST ATL {year}.xlxs", "{NF, SF} {year}", and "STANDARDS SALES {year}.txt", now locally named "parcel_{atl, nf, sf}_{year}" and "sales{year}". 

Note: ATL + NF + SF is the sum of all parcels in Fulton county. 

[Link to SharePoint for Data](https://gtvault-my.sharepoint.com/personal/yan74_gatech_edu/_layouts/15/onedrive.aspx?ct=1685705130558&or=OWA%2DNT&cid=208dc2f3%2D7e14%2D99f7%2D2679%2D5df5c365c078&ga=1&id=%2Fpersonal%2Fyan74%5Fgatech%5Fedu%2FDocuments%2FCounty%20Tax%20Assessment%20Data%2FFulton%20%2D%20Conley%2C%20Pinkey&view=0&noAuthRedirect=1)

---


### Process
**1. Append all Parcel data**
- Append all Parcel_NF for years 2010-2022, Parcel_SF for years 2010-2022, and Parcel_ATL for years 2010-2022 separately. _NOTE: should only read in specified desired variables, and declare their datatype on read to improve performance and memory._
- Check duplications in the nf_appended, sf_appended, and atl_appended.
- Append nf_appended, sf_appended, atl_appended to each other.
---
**2. Append all Sales data**
- Append Sales for years 2010-2022. _NOTE: should only read in specified desired variables, and declare their datatype on read to improve performance and memory._
- Check duplications in sales_appended.
---
**3. Merge Sales and Parcel data.**
- _Discussion:_ We want to merge Sales and the associated Parcel data, using Sales as the left table. This way we will have a record all of sales from 2010-2022 with that parcel's associated data. However, we must consider that some parcels have multiple buildings on them, which can be sold separately. Thus, the identifer to merge this data must consider this. Likewise, parcel data often has multiple rows when there are multiple buildings on one parcel, recording each building as a separate row but with the same ParcelID.
- Use the key [parid, taxyr, salesprice] to merge Parcel data from the given year.
- Check duplications in parcel_sales_merged.
---

### Global Functions and Code

In [1]:
import pandas as pd
import os
pd.set_option('display.max_columns', 100)

Helper fuction to get specified variables from given file.

In [12]:
def read_file(file, folder, vars, datatypes):
    print("Reading file: ", file)
    df = pd.DataFrame()

    if file[-4:len(file)] == "xlsx":
        df = pd.read_excel(
            '../data/' + folder + '/' + file,
            index_col=False,
            usecols=vars,
            dtype=datatypes
        )
    else:
        df = pd.read_csv(
            '../data/' + folder + '/' + file + ".txt",
            sep='\t',
            dtype=datatypes
        )

    print("Successfully read file: ", file)
    return df

---

### 1. Append all Parcel data to a single file.

Define all needed files, variables, and datatypes for reading in Parcel data.

In [3]:
path = "C:/Users/nicho/Documents/research/FCS/data/cmr/"
cmr_files = os.listdir(path)

path = "C:/Users/nicho/Documents/research/FCS/data/sales/"
sales_files = os.listdir(path)

In [4]:
# Modify variables as needed
parcel_vars = ["Taxyr", "Parid", "Nbhd", "SITUS Adrpre", "SITUS Adrno", "SITUS Adrdir", "SITUS Adrstr",
            "SITUS Adrsuf", "SITUS Adrsuf2", "SITUS Cityname", "Class", "Luc", "Livunit", "Calcacres",
            "Chgrsn", "Taxdist", "Own1", "Own2", "Careof", "OWNER Adrno", "OWNER Adrdir", "OWNER Adrstr",
            "OWNER Adrsuf", "OWNER Adrsuf2", "OWNER Cityname", "Statecode", "Country", "Unitdesc", "Unitno",
            "Zip1", "Zip2", "Reascd", "Spcflg", "Aprland", "Aprbldg", "Revcode", "Revreas", "Revland",
            "Revbldg", "Revtot", "Aprtot", "Exmppct", "Exmpval", "Cur", "Areasum", "Bldnum", "Yrblt",
            "Effyr", "Units", "Structure", "Grade", "Area", "Perim", "Stories", "Sf", "Usetype",
            "Yrblt", "Effyr", "Rentpct", "Rentsf", "Occupancy", "Msunits"]


In [7]:
#testcsv = pd.read_csv('../output/test.csv', dtype=parcel_vars)

In [7]:
parcel_datatypes = {"Taxyr": pd.StringDtype(), "Parid": pd.StringDtype()}

In [6]:
parcel_datatypes = {"Taxyr": pd.StringDtype(), "Parid": pd.StringDtype(),
                    "Nbhd": pd.CategoricalDtype(), "Situs Adrno": pd.Int64Dtype(),
                    "Situs Adrdir": pd.CategoricalDtype(), "Situs Adrsuf": pd.StringDtype(),
                    "Situs Adrsuf": pd.CategoricalDtype(), "Situs Adrsuf2": pd.CategoricalDtype(),
                    "Cityname": pd.CategoricalDtype(), "Zoning": pd.CategoricalDtype(), "Muni": pd.CategoricalDtype(),
                    "Class": pd.CategoricalDtype(), "Luc": pd.CategoricalDtype(), "Livunit": pd.CategoricalDtype(),
                    "Calcacres": pd.CategoricalDtype(), "Location": pd.CategoricalDtype(),
                    "Fronting": pd.CategoricalDtype(), "Street1": pd.CategoricalDtype(),
                    "Util1": pd.CategoricalDtype(), "Util2": pd.CategoricalDtype(), "Util3": pd.CategoricalDtype(),
                    "Parkprox": pd.CategoricalDtype(), "Parkquanit": pd.CategoricalDtype(),
                    "Parktype": pd.CategoricalDtype(), "Note1": pd.StringDtype(),
                    "Note2": pd.StringDtype(), "Note3": pd.StringDtype(),
                    "Note4": pd.StringDtype(), "Notecd1": pd.CategoricalDtype(),
                    "Notecd2": pd.CategoricalDtype(), "Bldgros D": pd.CategoricalDtype(),
                    "Bldgros V": pd.CategoricalDtype(), "Mscbld N": pd.CategoricalDtype(),
                    "Mscbld V": pd.CategoricalDtype(), "Chgrsn": pd.CategoricalDtype(), "Taxdist": pd.CategoricalDtype(),
                    "Own1": pd.StringDtype(), "Own2": pd.StringDtype(),
                    "Owner Adrno": pd.Int64Dtype(), "Owner Adradd": pd.CategoricalDtype(),
                    "Owner Adrdir": pd.CategoricalDtype(), "Owner Adrstr": pd.StringDtype(),
                    "Owner Adrsuf": pd.CategoricalDtype(), "Owner Adrsuf2": pd.CategoricalDtype(),
                    "Cityname": pd.StringDtype(), "Statecode": pd.CategoricalDtype(),
                    "Country": pd.CategoricalDtype(), "Unitno": pd.Int64Dtype(),
                    "Zip1": pd.CategoricalDtype(), "Reascd": pd.CategoricalDtype(), "Spcflg": pd.CategoricalDtype(),
                    "Aprland": pd.Float64Dtype(), "Aprbldg": pd.Float64Dtype(),
                    "Revcode": pd.CategoricalDtype(), "Revreas": pd.CategoricalDtype(), "Revland": pd.Float64Dtype(),
		            "Revbldg": pd.Float64Dtype(), "Aprtot": pd.Float64Dtype(),
		            "Aprtot": pd.Float64Dtype(), "D Card": pd.CategoricalDtype(), "Stories": pd.CategoricalDtype(),
		            "Extwall": pd.CategoricalDtype(), "Style": pd.CategoricalDtype(), "D Yrblt": pd.CategoricalDtype(),
		            "D Effyr": pd.CategoricalDtype(), "D Yrremod": pd.CategoricalDtype(), "Rmtot": pd.CategoricalDtype(),
		            "Rmbed": pd.CategoricalDtype(), "Rmfam": pd.CategoricalDtype(), "Fixbath": pd.CategoricalDtype(),
		            "Fixhalf": pd.CategoricalDtype(), "Fixaddl": pd.CategoricalDtype(), "Fixtot": pd.CategoricalDtype(),
		            "Plumval": pd.Float64Dtype(), "Bsmt": pd.CategoricalDtype(), "Bsmtval": pd.Float64Dtype(),
                    "Heat": pd.CategoricalDtype(), "Fuel": 'category', "Heatsys": pd.CategoricalDtype(), "Heatval": pd.Int32Dtype(),
                    "Attic": pd.CategoricalDtype(), "Atticval": pd.CategoricalDtype(), "Recromarea": pd.Int32Dtype(),
		            "Recval": pd.Float64Dtype(), "Ufeatarea": pd.Int32Dtype(), "Ufeatval": pd.Float64Dtype(),
                    "Wbfp O": pd.CategoricalDtype(), "Wbfp S": pd.CategoricalDtype(), "Wbfp Pf": pd.CategoricalDtype(),
                    "Wbfpval": pd.Float64Dtype(), "Condolvl": pd.CategoricalDtype(), "Condovw": pd.CategoricalDtype(),
		            "Mgfa": pd.Int32Dtype(), "Sfla": pd.Int32Dtype(), "Areafact": pd.Float64Dtype(),
                    "Shfact": pd.CategoricalDtype(), "D Grade": pd.CategoricalDtype(), "D Grdfact": pd.CategoricalDtype(),
                    "Cddesc": pd.CategoricalDtype(), "Cdpct": pd.CategoricalDtype(), "D Cdu": pd.CategoricalDtype(),
                    "Adjfact 2": pd.Float64Dtype(), "D Pctcomplete": pd.CategoricalDtype(), "Cur": pd.CategoricalDtype()}

In [8]:
#test_xl.astype(parcel_datatypes, errors='ignore').dtypes

In [9]:
#test_xl.info(verbose=True)

Read in all Parcel files.

In [8]:
#Took ~15 min to read in all files on previous run.

parcel_dfs = dict()

for file in cmr_files:
    parcel_dfs[file] = read_file(file, 'cmr', parcel_vars, parcel_datatypes)

print("Finished reading in all CMR Parcel files")

Reading file:  cmr2011.xlsx
Successfully read file:  cmr2011.xlsx
Reading file:  cmr2012.xlsx
Successfully read file:  cmr2012.xlsx
Reading file:  cmr2013.xlsx
Successfully read file:  cmr2013.xlsx
Reading file:  cmr2014.xlsx
Successfully read file:  cmr2014.xlsx
Reading file:  cmr2015.xlsx
Successfully read file:  cmr2015.xlsx
Reading file:  cmr2016.xlsx
Successfully read file:  cmr2016.xlsx
Reading file:  cmr2017.xlsx
Successfully read file:  cmr2017.xlsx
Reading file:  cmr2018.xlsx
Successfully read file:  cmr2018.xlsx
Reading file:  cmr2019.xlsx
Successfully read file:  cmr2019.xlsx
Reading file:  cmr2020.xlsx
Successfully read file:  cmr2020.xlsx
Reading file:  cmr2021.xlsx
Successfully read file:  cmr2021.xlsx
Reading file:  cmr2022.xlsx
Successfully read file:  cmr2022.xlsx
Finished reading in all CMR Parcel files


Append each file to the end of the previous, starting with "cmr2011"  
AND  
Calcuate the percent of entire row duplications (excludes the first observation) and drop these (all duplicates except the first).

In [None]:
parcel_total = pd.concat(list(parcel_dfs.values()))

prev_len = len(parcel_total.index)
parcel_total.drop_duplicates(inplace=True)

print("Number of entire rows duplicated (auto dropped): ", prev_len - len(parcel_total.index))
print("Percent: ", (prev_len - len(parcel_total.index)) / prev_len * 100)
print("Final size (after dropping entire row duplications): ", parcel_total.shape)

Number of entire rows duplicated (auto dropped):  85585
Percent:  10.698793674604664
Final size (after dropping entire row duplications):  (714365, 60)


Calculate the percent of data with duplicate Parid and duplicate [Parid, Taxyr];  
AND  
Calculate the number of unique [Parid, Taxyr] keys and the number of unique Parid keys (e.g. the number of parcels).  

**VERIFY: is 32,359 parcels the correct number for Fulton county?**

In [None]:
parid_dup = parcel_total.duplicated(subset='Parid').sum()
print("Number of rows with the a duplicate Parid: ", parid_dup)
print("Percent: ", parid_dup / len(parcel_total.index) * 100)

print("-------")

parid_taxyr_dup = parcel_total.duplicated(subset=['Parid', 'Taxyr']).sum()
print("Number of rows with the a duplicate ['Parid', 'Taxyr']: ", parid_taxyr_dup)
print("Percent: ", parid_taxyr_dup / len(parcel_total.index) * 100)

print("-------")

concat_parid_taxyr = parcel_total.drop_duplicates(subset=['Parid', 'Taxyr'])
print("Number of unique values of [Parid, Taxyr] combination: ", len(concat_parid_taxyr.index))

print("-------")

concat_parid_taxyr = parcel_total.drop_duplicates(subset=['Parid'])
print("Number of unique values of Parid (e.g. number of parcels): ", len(concat_parid_taxyr.index))

Number of rows with the a duplicate Parid:  682006
Percent:  95.47024280304886
-------
Number of rows with the a duplicate ['Parid', 'Taxyr']:  372901
Percent:  52.20034576162046
-------
Number of unique values of [Parid, Taxyr] combination:  341464
-------
Number of unique values of Parid (e.g. number of parcels):  32359


We want to use ['Parid', 'Taxyr'] as a key, however, there is a chance that a property might have been sold twice during one year.  

We see that a high percent of ['Parid', 'Taxyr'] keys are duplicated. This is not due to properties being sold multiple times in one year. Rather, this is due to some 

In [None]:
#parcel_sample = parcel_total.groupby(['Parid', 'Taxyr']).astype(float).diff()

AttributeError: 'DataFrameGroupBy' object has no attribute 'astype'

In [None]:
parcel_col_change_test = parcel_total.drop(["Area","Luc", "Unitno", "Perim","Sf","Rentsf","Bldnum","Units","Yrblt","Effyr","Structure","Grade","Stories","Usetype","Rentpct", "Revreas", "Revbldg"], axis=1)

In [None]:
parcel_col_change_test.columns

Index(['Taxyr', 'Parid', 'Nbhd', 'SITUS Adrpre', 'SITUS Adrno', 'SITUS Adrdir',
       'SITUS Adrstr', 'SITUS Adrsuf', 'SITUS Adrsuf2', 'SITUS Cityname',
       'Class', 'Livunit', 'Calcacres', 'Chgrsn', 'Taxdist', 'Own1', 'Own2',
       'Careof', 'OWNER Adrno', 'OWNER Adrdir', 'OWNER Adrstr', 'OWNER Adrsuf',
       'OWNER Adrsuf2', 'OWNER Cityname', 'Statecode', 'Country', 'Unitdesc',
       'Zip1', 'Zip2', 'Reascd', 'Spcflg', 'Aprland', 'Aprbldg', 'Revcode',
       'Revland', 'Revtot', 'Aprtot', 'Exmppct', 'Exmpval', 'Cur', 'Areasum',
       'Occupancy', 'Msunits'],
      dtype='object')

In [None]:
dup_after_drop = parcel_col_change_test.duplicated(subset=['Parid', 'Taxyr']).sum()
print("Number of rows with the a duplicate ['Parid', 'Taxyr']: ", dup_after_drop)
print("Percent: ", dup_after_drop / len(parcel_col_change_test.index) * 100)

Number of rows with the a duplicate ['Parid', 'Taxyr']:  372901
Percent:  52.20034576162046


---

### 2. Append all Sales Data to a single file.

In [9]:
sales_files = ["Sales2011", "Sales2012", "Sales2013", "Sales2014",
             "Sales2015", "Sales2016", "Sales2017", "Sales2018",
             "Sales2019", "Sales2020", "Sales2021", "Sales2022"]

sales_vars = None # Since we are taking all variables in the file, we don't need to specify.

sales_datatypes = {"Taxyr": pd.StringDtype(), "Parid": pd.StringDtype()}

In [13]:
#Took 6 sec to read in all files on previous run.

sales_dfs = dict()

for file in sales_files:
    sales_dfs[file] = read_file(file, "sales", sales_vars, sales_datatypes)

print("Finished reading in all Sales files")

Reading file:  Sales2011


  df = pd.read_csv(


Successfully read file:  Sales2011
Reading file:  Sales2012


  df = pd.read_csv(


Successfully read file:  Sales2012
Reading file:  Sales2013
Successfully read file:  Sales2013
Reading file:  Sales2014
Successfully read file:  Sales2014
Reading file:  Sales2015


  df = pd.read_csv(
  df = pd.read_csv(


Successfully read file:  Sales2015
Reading file:  Sales2016
Successfully read file:  Sales2016
Reading file:  Sales2017


  df = pd.read_csv(


Successfully read file:  Sales2017
Reading file:  Sales2018
Successfully read file:  Sales2018
Reading file:  Sales2019


  df = pd.read_csv(


Successfully read file:  Sales2019
Reading file:  Sales2020


  df = pd.read_csv(


Successfully read file:  Sales2020
Reading file:  Sales2021


  df = pd.read_csv(


Successfully read file:  Sales2021
Reading file:  Sales2022
Successfully read file:  Sales2022
Finished reading in all Sales files


  df = pd.read_csv(


In [None]:
sales2015 = pl.

In [16]:
appended_cmr = pd.concat(parcel_dfs.values())

appended_cmr.to_csv("../output/appended_cmr.csv")

In [17]:
appended_sales = pd.concat(sales_dfs.values())

appended_cmr.to_csv("../output/appended_sales.csv")

In [18]:
appended_cmr.info()

<class 'pandas.core.frame.DataFrame'>
Index: 799950 entries, 0 to 68811
Data columns (total 60 columns):
 #   Column          Non-Null Count   Dtype  
---  ------          --------------   -----  
 0   Taxyr           799938 non-null  string 
 1   Parid           799950 non-null  string 
 2   Nbhd            799938 non-null  object 
 3   SITUS Adrpre    0 non-null       float64
 4   SITUS Adrno     799290 non-null  float64
 5   SITUS Adrdir    2351 non-null    object 
 6   SITUS Adrstr    799922 non-null  object 
 7   SITUS Adrsuf    781960 non-null  object 
 8   SITUS Adrsuf2   415953 non-null  object 
 9   SITUS Cityname  790024 non-null  object 
 10  Class           799938 non-null  object 
 11  Luc             799780 non-null  object 
 12  Livunit         698945 non-null  float64
 13  Calcacres       756945 non-null  float64
 14  Chgrsn          735414 non-null  object 
 15  Taxdist         799938 non-null  object 
 16  Own1            799938 non-null  object 
 17  Own2            

In [19]:
appended_sales.info()

<class 'pandas.core.frame.DataFrame'>
Index: 477238 entries, 0 to 58757
Data columns (total 49 columns):
 #   Column               Non-Null Count   Dtype  
---  ------               --------------   -----  
 0   Taxyr                477226 non-null  string 
 1   Saledt: Year (YYYY)  477226 non-null  float64
 2   Saledt: Month (Mon)  477226 non-null  object 
 3   Taxdist              477226 non-null  object 
 4   Parid                477238 non-null  string 
 5   Nbhd                 477226 non-null  object 
 6   Class                477225 non-null  object 
 7   Luc                  477226 non-null  object 
 8   Saledt               477226 non-null  object 
 9   Book                 477198 non-null  object 
 10  Page                 477183 non-null  object 
 11  SALES PRICE          477170 non-null  object 
 12  FAIR MARKET VALUE    477226 non-null  object 
 13  DEED TYPE            477220 non-null  object 
 14  Aprland              477205 non-null  object 
 15  Aprbldg              47

In [20]:
merged = appended_sales.merge(appended_cmr, how="left", on=['Taxyr', 'Parid', 'Livunit'])

In [26]:
merged[merged['Areasum_x'] == None]

Unnamed: 0,Taxyr,Saledt: Year (YYYY),Saledt: Month (Mon),Taxdist_x,Parid,Nbhd_x,Class_x,Luc_x,Saledt,Book,Page,SALES PRICE,FAIR MARKET VALUE,DEED TYPE,Aprland_x,Aprbldg_x,Costval,Saleval,Who,Wen,GRANTOR,GRANTEE,Revcode_x,Reascd_x,Adrpre,Adrno,Adrdir,Adrstr,Adrsuf,Adrsuf2,Cityname,Unitno_x,Livunit,Calcacres_x,Zoning,Notecd1,Notecd2,Chgrsn_x,Cur_x,Cur.1,Whocalc,Wencalc,Saletype,Appraiser ID,Income,Bldgros V,Mscbld V,Val30 SUM,Areasum_x,Nbhd_y,...,SITUS Cityname,Class_y,Luc_y,Calcacres_y,Chgrsn_y,Taxdist_y,Own1,Own2,Careof,OWNER Adrno,OWNER Adrdir,OWNER Adrstr,OWNER Adrsuf,OWNER Adrsuf2,OWNER Cityname,Statecode,Country,Unitdesc,Unitno_y,Zip1,Zip2,Reascd_y,Spcflg,Aprland_y,Aprbldg_y,Revcode_y,Revreas,Revland,Revbldg,Revtot,Aprtot,Exmppct,Exmpval,Cur_y,Areasum_y,Bldnum,Yrblt,Effyr,Units,Structure,Grade,Area,Perim,Stories,Sf,Usetype,Rentpct,Rentsf,Occupancy,Msunits


In [21]:
merged.to_csv("sales_parcel_merged.csv")

In [None]:
sales_appended = pd.concat(list(sales_dfs.values()))

prev_len = len(sales_appended.index)
sales_appended.drop_duplicates(inplace=True)

print("Number of entire rows duplicated (auto dropped): ", prev_len - len(sales_appended.index))
print("Percent: ", (prev_len - len(sales_appended.index)) / prev_len * 100)
print("Final size (after dropping entire row duplications): ", sales_appended.shape)

Number of entire rows duplicated (auto dropped):  63
Percent:  0.013200960527032634
Final size (after dropping entire row duplications):  (477175, 49)


In [None]:
parid_dup = sales_appended.duplicated(subset='Parid').sum()
print("Number of rows with the same Parid: ", parid_dup)
print("Percent: ", parid_dup / len(sales_appended.index) * 100)
print("-------")

parid_taxyr_dup = sales_appended.duplicated(subset=['Parid', 'Taxyr']).sum()
print("Number of rows with the same ['Parid', 'Taxyr']: ", parid_taxyr_dup)
print("Percent: ", parid_taxyr_dup / len(sales_appended.index) * 100)

Number of rows with the same Parid:  253529
Percent:  53.13124115890396
-------
Number of rows with the same ['Parid', 'Taxyr']:  96444
Percent:  20.211452821291978


In [None]:
sales_appended.head(4)

Unnamed: 0,Taxyr,Saledt: Year (YYYY),Saledt: Month (Mon),Taxdist,Parid,Nbhd,Class,Luc,Saledt,Book,Page,SALES PRICE,FAIR MARKET VALUE,DEED TYPE,Aprland,Aprbldg,Costval,Saleval,Who,Wen,GRANTOR,GRANTEE,Revcode,Reascd,Adrpre,Adrno,Adrdir,Adrstr,Adrsuf,Adrsuf2,Cityname,Unitno,Livunit,Calcacres,Zoning,Notecd1,Notecd2,Chgrsn,Cur,Cur.1,Whocalc,Wencalc,Saletype,Appraiser ID,Income,Bldgros V,Mscbld V,Val30 SUM,Areasum
0,2011,2010.0,Oct,59,06 0310 LL0417,604,R4,100,20-OCT-2010,49589.0,558,0,36700,WD,36700.0,0.0,36700.0,T,TA_TJONES,09-DEC-2010,MORRISON MARION A,WOODALL LLC,1.0,,,0.0,,OLD LAWRENCEVILLE,RD,,FUL,,0.0,6.2,AG1,,,MN,Y,Y,,,,,,,,,
1,2011,2010.0,Jun,59,06 0310 LL0490,2116,R3,101,07-JUN-2010,49098.0,288,794600,717100,WD,195000.0,522100.0,717100.0,0,FIXSALV,04-MAR-2011,CDG HOMES LLC,EDMUNDS KEITH S & KIMBERLY C,1.0,00,,3916.0,,DAHLWINY,CT,,SANDY SPRINGS,,1.0,0.35,CUP,,,NC,Y,Y,,,,,,,,,
2,2011,2010.0,Jul,59,06 0310 LL0581,2116,R3,101,14-JUL-2010,49244.0,522,800000,590400,WD,127700.0,462700.0,590400.0,0,FIXSALV,04-MAR-2011,CAPITAL DESIGN HOMES LLC,MEHDIPOUR MOHAMMADREZ & SADEGHI SHIVA,,00,,3952.0,,DAHLWINY,CT,,FUL,,1.0,0.35,AG1,,,MN,Y,Y,,,,,,,,,
3,2011,2010.0,Sep,59,06 0310 LL0755,604,R3,100,21-SEP-2010,49422.0,626,200000,97500,LW,97500.0,0.0,104200.0,M,TA_LPRICE,26-OCT-2010,REGIONS BANK,IHD INVESTMENT INC,3.0,SB,,7615.0,,REGENCY,CIR,,FUL,,0.0,0.522,R3C,11.0,,MN,Y,Y,,,,,,,,,


In [None]:
parid_taxyr_dup = sales_appended.duplicated(subset=['Parid', 'Taxyr','SALES PRICE']).sum()
print("Number of rows with the same ['Parid', 'Taxyr']: ", parid_taxyr_dup)
print("Percent: ", parid_taxyr_dup / len(sales_appended.index) * 100)

Number of rows with the same ['Parid', 'Taxyr']:  30274
Percent:  6.344422905642584


In [None]:
parid_taxyr_dup = sales_appended.duplicated(subset=['Parid', 'Taxyr','Saledt']).sum()
print("Number of rows with the same ['Parid', 'Taxyr']: ", parid_taxyr_dup)
print("Percent: ", parid_taxyr_dup / len(sales_appended.index) * 100)

Number of rows with the same ['Parid', 'Taxyr']:  47236
Percent:  9.89909362393252


In [None]:
nbhrs = pd.read_csv('../data/parcel/select_parcels.csv')

In [None]:
nbhrs.head(4)

Unnamed: 0,FID,OBJECTID,ParcelID,Address,Match_addr,X,Y,MatchType_,FIPS,DistoCityc,TaxYear,AddrNumber,AddrPreDir,AddrStreet,AddrSuffix,AddrPosDir,AddrUntTyp,AddrUnit,Owner,OwnerAddr1,OwnerAddr2,TaxDist,TotAssess,LandAssess,ImprAssess,TotAppr,LandAppr,ImprAppr,LUCode,ClassCode,ExCode,LivUnits,LandAcres,NbrHood,Subdiv,SubdivNum,SubdivLot,SubdivBlck,FeatureID,Shape__Area,Shape__Length,neighborhood
0,292925,344032,07 351100620022,115 ANNA AVE,"115 Anne St SE, Atlanta, GA 30315, USA",-84.382553,33.712952,RANGE_INTERPOLATED street_address,131210000000,4658.9771,2023,115,,ANNA,AVE,,,,TRANS AM SFE II LLC,5001 PLAZA ON THE LK # 200,AUSTIN TX 78746,40,46960,8520,38440,117400,21300,96100,101,R3,,1.0,0.5022,7022,,,,,07 351100620022,21918.326,600.68976,South Atlanta
1,292934,344041,07 351100620253,205 ANNA AVE,"205 Anne St SE, Atlanta, GA 30315, USA",-84.381531,33.712879,RANGE_INTERPOLATED street_address,131210000000,4676.0293,2023,205,,ANNA,AVE,,,,CHEEKS AVERY STANLEY,205 ANNA AVE,PALMETTO GA 30268,40,50080,8520,41560,125200,21300,103900,101,R3,,1.0,0.5,7022,,,,,07 351100620253,21899.537,600.40112,South Atlanta
2,12845,14741,07 351100620410,210 ANNA AVE,"210 Anne St SE, Atlanta, GA 30315, USA",-84.382027,33.713051,RANGE_INTERPOLATED street_address,131210000000,4652.165,2023,210,,ANNA,AVE,,,,KENNEDY WILLIAM EDWARD,210 ANNA AVE,PALMETTO GA 30268,40,51200,8520,42680,128000,21300,106700,101,R3,,1.0,0.5,7022,,,,,07 351100620410,21838.982,595.98987,South Atlanta
3,20645,23097,09F170900670015,235 MARGARET ST,"235 Margaret St SE, Atlanta, GA 30315, USA",-84.38134,33.711422,RANGE_INTERPOLATED street_address,131210000000,4838.6426,2023,235,,MARGARET,ST,,,,NGUYEN CHINH H,2451 CUMBERLAND PKWY SE # #3487,ATLANTA GA 30339,25,5560,5560,0,13900,13900,0,100,R3,,1.0,0.345,96215,,,,,09F170900670015,5900.1147,645.52936,South Atlanta


In [None]:
merged = sales_appended.merge(nbhrs, how='inner', left_on=['Parid'], right_on=['ParcelID'])

In [None]:
merged.head(3)

Unnamed: 0,Taxyr,Saledt: Year (YYYY),Saledt: Month (Mon),Taxdist,Parid,Nbhd,Class,Luc,Saledt,Book,Page,SALES PRICE,FAIR MARKET VALUE,DEED TYPE,Aprland,Aprbldg,Costval,Saleval,Who,Wen,GRANTOR,GRANTEE,Revcode,Reascd,Adrpre,Adrno,Adrdir,Adrstr,Adrsuf,Adrsuf2,Cityname,Unitno,Livunit,Calcacres,Zoning,Notecd1,Notecd2,Chgrsn,Cur,Cur.1,Whocalc,Wencalc,Saletype,Appraiser ID,Income,Bldgros V,Mscbld V,Val30 SUM,Areasum,FID,OBJECTID,ParcelID,Address,Match_addr,X,Y,MatchType_,FIPS,DistoCityc,TaxYear,AddrNumber,AddrPreDir,AddrStreet,AddrSuffix,AddrPosDir,AddrUntTyp,AddrUnit,Owner,OwnerAddr1,OwnerAddr2,TaxDist,TotAssess,LandAssess,ImprAssess,TotAppr,LandAppr,ImprAppr,LUCode,ClassCode,ExCode,LivUnits,LandAcres,NbrHood,Subdiv,SubdivNum,SubdivLot,SubdivBlck,FeatureID,Shape__Area,Shape__Length,neighborhood
0,2011,2010.0,Sep,65,07 230000570315,700,R3,101,07-SEP-2010,49440.0,112.0,69555,59700,DP,16700.0,43000.0,59700.0,5,TA_TJONES,28-OCT-2010,CRUSE WANDA M.,EVERHOME MORTGAGE COMPANY,,,,7495.0,,PHILLIPS,RD,,FUL,,1.0,1.0,AG1,,,RV,Y,Y,,,,,,,,,,11306,12766,07 230000570315,7495 PHILLIPS RD,"Phillips Dr SE, Atlanta, GA 30315, USA",-84.35376,33.699902,GEOMETRIC_CENTER route,131210000000,6819.2993,2023,7495,,PHILLIPS,RD,,,,HEPLER SEAN,7495 PHILLIPS RD,PALMETTO GA 30268,65,51560,6080,45480,128900,15200,113700,101,R3,,1.0,1.0,7001,,,,,07 230000570315,45103.582,1005.9736,Thomasville Heights
1,2011,2010.0,Sep,65,07 230000570315,700,R3,101,07-SEP-2010,50106.0,520.0,69555,59700,SW,16700.0,43000.0,59700.0,9,TA_LPRICE,16-JUN-2011,EVERHOME MORTGAG COMPANY,SECRETARY OF HOUSING & URBAN,,,,7495.0,,PHILLIPS,RD,,FUL,,1.0,1.0,AG1,,,RV,Y,Y,,,,,,,,,,11306,12766,07 230000570315,7495 PHILLIPS RD,"Phillips Dr SE, Atlanta, GA 30315, USA",-84.35376,33.699902,GEOMETRIC_CENTER route,131210000000,6819.2993,2023,7495,,PHILLIPS,RD,,,,HEPLER SEAN,7495 PHILLIPS RD,PALMETTO GA 30268,65,51560,6080,45480,128900,15200,113700,101,R3,,1.0,1.0,7001,,,,,07 230000570315,45103.582,1005.9736,Thomasville Heights
2,2013,2012.0,Feb,65,07 230000570315,700,R3,101,24-FEB-2012,50936.0,575.0,35000,52000,SW,12600.0,39400.0,52000.0,2,TA_RAUGUST,06-NOV-2012,SECRETARY OF HOUSING AND URBAN DEVELOPME,ALLEN DIANE M,,,,7495.0,,PHILLIPS,RD,,FUL,,1.0,1.0,AG1,,,MN,Y,Y,LP401,21-MAY-2013,,,,,,,,11306,12766,07 230000570315,7495 PHILLIPS RD,"Phillips Dr SE, Atlanta, GA 30315, USA",-84.35376,33.699902,GEOMETRIC_CENTER route,131210000000,6819.2993,2023,7495,,PHILLIPS,RD,,,,HEPLER SEAN,7495 PHILLIPS RD,PALMETTO GA 30268,65,51560,6080,45480,128900,15200,113700,101,R3,,1.0,1.0,7001,,,,,07 230000570315,45103.582,1005.9736,Thomasville Heights


In [None]:
merged.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2264 entries, 0 to 2263
Data columns (total 91 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Taxyr                2264 non-null   string 
 1   Saledt: Year (YYYY)  2264 non-null   float64
 2   Saledt: Month (Mon)  2264 non-null   object 
 3   Taxdist              2264 non-null   object 
 4   Parid                2264 non-null   object 
 5   Nbhd                 2264 non-null   object 
 6   Class                2264 non-null   object 
 7   Luc                  2264 non-null   object 
 8   Saledt               2264 non-null   object 
 9   Book                 2264 non-null   object 
 10  Page                 2264 non-null   object 
 11  SALES PRICE          2264 non-null   object 
 12  FAIR MARKET VALUE    2264 non-null   object 
 13  DEED TYPE            2264 non-null   object 
 14  Aprland              2264 non-null   object 
 15  Aprbldg              2264 non-null   o

In [None]:
parid_taxyr_dup = merged.duplicated(subset=['Parid', 'Taxyr']).sum()
print("Number of rows with the same ['Parid', 'Taxyr']: ", parid_taxyr_dup)
print("Percent: ", parid_taxyr_dup / len(merged.index) * 100)

Number of rows with the same ['Parid', 'Taxyr']:  713
Percent:  31.492932862190813


In [None]:
parid_taxyr_dup = nbhrs.duplicated(subset=['ParcelID', 'TaxYear']).sum()
print("Number of rows with the same ['Parid', 'Taxyr']: ", parid_taxyr_dup)
print("Percent: ", parid_taxyr_dup / len(nbhrs.index) * 100)

Number of rows with the same ['Parid', 'Taxyr']:  0
Percent:  0.0


Just say a parcel has 1 building. But its a condo, and 2 of them sold in one year. We need to link the right sale data to each specific unit.

- Eliminate some variables.
- Determine which variables change even for the same parcel; for variables that change that we want to keep- determine how to aggregate.
- Use key: [Parid, Taxyr, SITUS Address, Unitno]?
- Continue to use Taxyr?
- Geocode
- Owner address info is only in parcel data, not sale data. Therefore, it may not be updated at the same time as the sale.

I need to figure out a unique key to merge parcel and sales data that allows for specific units in the same building to be sold.
Identify the variables that change for the same Parid, Taxyr.
Also need to figure out how to aggregate the variables that change when there are multiple buildings on one parcel.
Merge this data

How do we know about changing parcel IDs?

- Get geocoded full sales and parcel data


In [None]:
parcel_sales = pd.read_csv('../data/parcel_sales_fulton_merged.csv')

  parcel_sales = pd.read_csv('../data/parcel_sales_fulton_merged.csv')


In [None]:
parcel_sales.head(3)

Unnamed: 0.1,Unnamed: 0,Taxyr,Parid,Nbhd_x,SITUS Adrpre,SITUS Adrno,SITUS Adrdir,SITUS Adrstr,SITUS Adrsuf,SITUS Adrsuf2,SITUS Cityname,Class_x,Luc_x,Livunit_x,Calcacres_x,Chgrsn_x,Taxdist_x,Own1,Own2,Careof,OWNER Adrno,OWNER Adrdir,OWNER Adrstr,OWNER Adrsuf,OWNER Adrsuf2,OWNER Cityname,Statecode,Country,Unitdesc,Unitno_x,Zip1,Zip2,Reascd_x,Spcflg,Aprland_x,Aprbldg_x,Revcode_x,Revreas,Revland,Revbldg,Revtot,Aprtot,Exmppct,Exmpval,Cur_x,Areasum_x,Bldnum,Effyr,Grade,Area,...,Structure,Stories,BuildingCount,Saledt: Year (YYYY),Saledt: Month (Mon),Taxdist_y,Nbhd_y,Class_y,Luc_y,Saledt,Book,Page,SALES PRICE,FAIR MARKET VALUE,DEED TYPE,Aprland_y,Aprbldg_y,Costval,Saleval,Who,Wen,GRANTOR,GRANTEE,Revcode_y,Reascd_y,Adrpre,Adrno,Adrdir,Adrstr,Adrsuf,Adrsuf2,Cityname,Unitno_y,Livunit_y,Calcacres_y,Zoning,Notecd1,Notecd2,Chgrsn_y,Cur_y,Cur.1,Whocalc,Wencalc,Saletype,Appraiser ID,Income,Bldgros V,Mscbld V,Val30 SUM,Areasum_y
0,0,2011,06 036300010858,C207,,8800.0,,DUNWOODY,PL,,SANDY SPRINGS,C5,2B1,210.0,18.17,MN,59,ROV VI LLC,,,11755.0,,WILSHIRE,BLVD,,LOS ANGELES,CA,,SUITE,1670,90025,,E1,,4200000.0,6150000.0,3.0,OVR,4200000.0,6150000.0,10350000.0,10350000.0,,0.0,Y,199611.0,1,1980.0,B,3317.0,...,211.0,1.0,19.0,2010.0,Nov,59,C207,C5,2B1,02-NOV-2010,49515.0,118.0,16053491,10350000,DP,4200000.0,6150000.0,14441300.0,M,TA_LPRICE,22-NOV-2010,ASLAN LEGACY KEY LLC,ROV VI LLC,3.0,E1,,8800.0,,DUNWOODY,PL,,FUL,,210.0,18.17,AC,,,MN,Y,Y,,,,,,,,,
1,1,2011,06 036300020121,C207,,8800.0,,DUNWOODY,PL,,SANDY SPRINGS,C5,2B1,142.0,11.33,MN,59,ROV VI LLC,,,11755.0,,WILSHIRE,BLVD,,LOS ANGELES,CA,,SUITE,1670,90025,,E1,,2840000.0,4160000.0,3.0,OVR,2840000.0,4160000.0,7000000.0,7000000.0,,0.0,Y,134916.0,1,1980.0,B,5552.0,...,211.0,1.0,8.0,2010.0,Nov,59,C207,C5,2B1,02-NOV-2010,49515.0,118.0,16053491,7000000,DP,2840000.0,4160000.0,9976100.0,M,TA_LPRICE,22-NOV-2010,ASLAN LEGACY KEY LLC,ROV VI LLC,3.0,E1,,8800.0,,DUNWOODY,PL,,FUL,REAR,142.0,11.33,AC,,,MN,Y,Y,,,,,,,,,
2,2,2011,06 036300040343,C207,,8613.0,,ROSWELL,RD,,SANDY SPRINGS,C3,3C3,0.0,0.98,MN,59,JAT PARTNERS LLLP,,,8613.0,,ROSWELL,RD,,SANDY SPRINGS,GA,,BLDG,4,30350,1896.0,E0,,1002200.0,465300.0,3.0,70,1002200.0,465300.0,1467500.0,1467500.0,,0.0,Y,6460.0,1,,B-,2870.0,...,353.0,1.0,2.0,2010.0,Mar,59,C207,C3,3C3,23-MAR-2010,48987.0,672.0,0,1467500,QC,1002200.0,465300.0,1467500.0,M,TA_LPRICE,30-SEP-2010,THOMAS JAMES STARK JR,JAT PARTNERS LLLP,3.0,E0,,8613.0,,ROSWELL,RD,,FUL,,0.0,0.98,C2,,,MN,Y,Y,,,,,,,,,


In [None]:
parcels_atl = pd.read_csv('../data/parcels_in_atl.csv')

In [None]:
parcels_atl = parcels_atl[['parcelid', 'x', 'y', 'NAME', 'geometry']]

In [None]:
parcels_atl.head(3)

Unnamed: 0,parcelid,x,y,NAME,geometry
0,06 033700050642,-84.403067,33.800487,Loring Heights,POINT (-84.403067 33.800487)
1,07 170001370069,-84.404044,33.792496,Loring Heights,POINT (-84.404044 33.792496)
2,07 170001370317,-84.404044,33.792496,Loring Heights,POINT (-84.404044 33.792496)


In [None]:
parcel_geo = parcel_sales.merge(parcels_atl, left_on='Parid', right_on='parcelid', how='left')

In [None]:
parcel_geo

Unnamed: 0.1,Unnamed: 0,Taxyr,Parid,Nbhd_x,SITUS Adrpre,SITUS Adrno,SITUS Adrdir,SITUS Adrstr,SITUS Adrsuf,SITUS Adrsuf2,SITUS Cityname,Class_x,Luc_x,Livunit_x,Calcacres_x,Chgrsn_x,Taxdist_x,Own1,Own2,Careof,OWNER Adrno,OWNER Adrdir,OWNER Adrstr,OWNER Adrsuf,OWNER Adrsuf2,OWNER Cityname,Statecode,Country,Unitdesc,Unitno_x,Zip1,Zip2,Reascd_x,Spcflg,Aprland_x,Aprbldg_x,Revcode_x,Revreas,Revland,Revbldg,Revtot,Aprtot,Exmppct,Exmpval,Cur_x,Areasum_x,Bldnum,Effyr,Grade,Area,...,Taxdist_y,Nbhd_y,Class_y,Luc_y,Saledt,Book,Page,SALES PRICE,FAIR MARKET VALUE,DEED TYPE,Aprland_y,Aprbldg_y,Costval,Saleval,Who,Wen,GRANTOR,GRANTEE,Revcode_y,Reascd_y,Adrpre,Adrno,Adrdir,Adrstr,Adrsuf,Adrsuf2,Cityname,Unitno_y,Livunit_y,Calcacres_y,Zoning,Notecd1,Notecd2,Chgrsn_y,Cur_y,Cur.1,Whocalc,Wencalc,Saletype,Appraiser ID,Income,Bldgros V,Mscbld V,Val30 SUM,Areasum_y,parcelid,x,y,NAME,geometry
0,0,2011,06 036300010858,C207,,8800.0,,DUNWOODY,PL,,SANDY SPRINGS,C5,2B1,210.0,18.170,MN,59,ROV VI LLC,,,11755.0,,WILSHIRE,BLVD,,LOS ANGELES,CA,,SUITE,1670,90025,,E1,,4200000.0,6150000.0,3.0,OVR,4200000.0,6150000.0,10350000.0,10350000.0,,0.0,Y,199611.0,1,1980.0,B,3317.0,...,59,C207,C5,2B1,02-NOV-2010,49515.0,118.0,16053491,10350000,DP,4200000.0,6150000.0,14441300.0,M,TA_LPRICE,22-NOV-2010,ASLAN LEGACY KEY LLC,ROV VI LLC,3.0,E1,,8800.0,,DUNWOODY,PL,,FUL,,210.0,18.170,AC,,,MN,Y,Y,,,,,,,,,,,,,,
1,1,2011,06 036300020121,C207,,8800.0,,DUNWOODY,PL,,SANDY SPRINGS,C5,2B1,142.0,11.330,MN,59,ROV VI LLC,,,11755.0,,WILSHIRE,BLVD,,LOS ANGELES,CA,,SUITE,1670,90025,,E1,,2840000.0,4160000.0,3.0,OVR,2840000.0,4160000.0,7000000.0,7000000.0,,0.0,Y,134916.0,1,1980.0,B,5552.0,...,59,C207,C5,2B1,02-NOV-2010,49515.0,118.0,16053491,7000000,DP,2840000.0,4160000.0,9976100.0,M,TA_LPRICE,22-NOV-2010,ASLAN LEGACY KEY LLC,ROV VI LLC,3.0,E1,,8800.0,,DUNWOODY,PL,,FUL,REAR,142.0,11.330,AC,,,MN,Y,Y,,,,,,,,,,,,,,
2,2,2011,06 036300040343,C207,,8613.0,,ROSWELL,RD,,SANDY SPRINGS,C3,3C3,0.0,0.980,MN,59,JAT PARTNERS LLLP,,,8613.0,,ROSWELL,RD,,SANDY SPRINGS,GA,,BLDG,4,30350,1896,E0,,1002200.0,465300.0,3.0,70,1002200.0,465300.0,1467500.0,1467500.0,,0.0,Y,6460.0,1,,B-,2870.0,...,59,C207,C3,3C3,23-MAR-2010,48987.0,672.0,0,1467500,QC,1002200.0,465300.0,1467500.0,M,TA_LPRICE,30-SEP-2010,THOMAS JAMES STARK JR,JAT PARTNERS LLLP,3.0,E0,,8613.0,,ROSWELL,RD,,FUL,,0.0,0.980,C2,,,MN,Y,Y,,,,,,,,,,,,,,
3,3,2011,06 036300040343,C207,,8613.0,,ROSWELL,RD,,SANDY SPRINGS,C3,3C3,0.0,0.980,MN,59,JAT PARTNERS LLLP,,,8613.0,,ROSWELL,RD,,SANDY SPRINGS,GA,,BLDG,4,30350,1896,E0,,1002200.0,465300.0,3.0,70,1002200.0,465300.0,1467500.0,1467500.0,,0.0,Y,6460.0,1,,B-,2870.0,...,59,C207,C3,3C3,25-OCT-2010,49497.0,567.0,0,1467500,QC,1002200.0,465300.0,1467500.0,M,TA_LPRICE,16-NOV-2010,THOMAS MICHAEL BARRY,JAT PARTNERS LLLP,3.0,E0,,8613.0,,ROSWELL,RD,,FUL,,0.0,0.980,C2,,,MN,Y,Y,,,,,,,,,,,,,,
4,4,2011,06 036300040343,C207,,8613.0,,ROSWELL,RD,,SANDY SPRINGS,C3,3C3,0.0,0.980,MN,59,JAT PARTNERS LLLP,,,8613.0,,ROSWELL,RD,,SANDY SPRINGS,GA,,BLDG,4,30350,1896,E0,,1002200.0,465300.0,3.0,70,1002200.0,465300.0,1467500.0,1467500.0,,0.0,Y,6460.0,1,,B-,2870.0,...,59,C207,C3,3C3,06-JUL-2010,49502.0,46.0,0,1467500,DP,1002200.0,465300.0,1467500.0,5,TA_LPRICE,16-NOV-2010,SMITH KIYOKO MIQUNO,ROTHMAN SUSAN M & MARC KANNE LLC,3.0,E0,,8613.0,,ROSWELL,RD,,FUL,,0.0,0.980,C2,,,MN,Y,Y,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
29485,29485,2022,22 546012611444,C104,,3225.0,,WEBB BRIDGE,RD,,ALPHARETTA,C4,300,0.0,6.519,MN,10,ALPHARETTA HOTEL INVESTMENTS LLC,,,1000.0,,TOWNE CENTER,BLVD,,POOLER,GA,,STE,503,31322.0,,CO,,2281700.0,0.0,1.0,,,,,2281700.0,,0.0,Y,0.0,,,,,...,10,C104,C4,300,02-SEP-2021,64421.0,546.0,0,2281700,QC,2281700,0,2281700,T,TA_WBRITT,24-FEB-2022,GANDHI KRISHAN,ALPHARETTA HOTEL INVESTMENTS LLC,1.0,CO,,3225.0,,WEBB BRIDGE,RD,,ALPHARETTA,,0.0,6.519,OI,23,,MN,Y,Y,,,,313.0,0,,,2281700.0,0.0,,,,,
29486,29486,2022,22 546012620577,C104,,12150.0,,MORRIS,RD,,ALPHARETTA,C4,300,0.0,7.736,MN,10,MAYFAIR AT WEBB BRIDGE ASSOCIATION INC,,,26.0,,MILTON,AVE,,ALPHARETTA,GA,,,,30009.0,,H2,,105000.0,223300.0,3.0,OVR,105000.0,223300.0,328300.0,328300.0,,0.0,Y,0.0,,,,,...,10,C104,C4,300,17-AUG-2021,64421.0,116.0,0,328300,LW,105000,223300,3607800,M,TA_WBRITT,24-FEB-2022,"WEBB BRIDGE INVESTMENTS LLC, A DELAWARE",WEBB BRIDGE INVESTMENTS II LLC,3.0,H2,,12150.0,,MORRIS,RD,,ALPHARETTA,,0.0,7.736,OI,,,MN,Y,Y,,,,313.0,0,,,3607800.0,0.0,,,,,
29487,29487,2022,22 546012620577,C104,,12150.0,,MORRIS,RD,,ALPHARETTA,C4,300,0.0,7.736,MN,10,MAYFAIR AT WEBB BRIDGE ASSOCIATION INC,,,26.0,,MILTON,AVE,,ALPHARETTA,GA,,,,30009.0,,H2,,105000.0,223300.0,3.0,OVR,105000.0,223300.0,328300.0,328300.0,,0.0,Y,0.0,,,,,...,10,C104,C4,300,17-AUG-2021,64421.0,114.0,0,328300,LW,105000,223300,3607800,M,TA_JBANKS,08-MAR-2022,"WEBB BRIDGE INVESTMENTS LLC, A DELAWARE",MAYFAIR AT WEBB BRIDGE ASSOCIATION INC,3.0,H2,,12150.0,,MORRIS,RD,,ALPHARETTA,,0.0,7.736,OI,,,MN,Y,Y,,,,313.0,0,,,3607800.0,0.0,,,,,
29488,29488,2022,22 546012621484,C108,,12130.0,,MORRIS,RD,,ALPHARETTA,C3,300,0.0,0.560,MN,10,MAYFAIR AT WEBB BRIDGE ASSOCIATION INC,,,26.0,,MILTON,AVE,,ALPHARETTA,GA,,,,30009.0,,E2,,168500.0,0.0,3.0,OVR,168500.0,0.0,168500.0,168500.0,,0.0,Y,0.0,,,,,...,10,C108,C3,300,17-AUG-2021,64421.0,114.0,0,168500,LW,168500,0,168500,M,TA_JBANKS,08-MAR-2022,WEBB BRIDGE INVESTMENTS LLC,MAYFAIR AT WEBB BRIDGE ASSOCIATION INC,3.0,E2,,12130.0,,MORRIS,RD,,ALPHARETTA,,0.0,0.560,OI,,,MN,Y,Y,,,,313.0,0,,,168500.0,0.0,,,,,


In [None]:
parcel_geo[parcel_geo['NAME'] == 'South Atlanta']

Unnamed: 0.1,Unnamed: 0,Taxyr,Parid,Nbhd_x,SITUS Adrpre,SITUS Adrno,SITUS Adrdir,SITUS Adrstr,SITUS Adrsuf,SITUS Adrsuf2,SITUS Cityname,Class_x,Luc_x,Livunit_x,Calcacres_x,Chgrsn_x,Taxdist_x,Own1,Own2,Careof,OWNER Adrno,OWNER Adrdir,OWNER Adrstr,OWNER Adrsuf,OWNER Adrsuf2,OWNER Cityname,Statecode,Country,Unitdesc,Unitno_x,Zip1,Zip2,Reascd_x,Spcflg,Aprland_x,Aprbldg_x,Revcode_x,Revreas,Revland,Revbldg,Revtot,Aprtot,Exmppct,Exmpval,Cur_x,Areasum_x,Bldnum,Effyr,Grade,Area,...,Taxdist_y,Nbhd_y,Class_y,Luc_y,Saledt,Book,Page,SALES PRICE,FAIR MARKET VALUE,DEED TYPE,Aprland_y,Aprbldg_y,Costval,Saleval,Who,Wen,GRANTOR,GRANTEE,Revcode_y,Reascd_y,Adrpre,Adrno,Adrdir,Adrstr,Adrsuf,Adrsuf2,Cityname,Unitno_y,Livunit_y,Calcacres_y,Zoning,Notecd1,Notecd2,Chgrsn_y,Cur_y,Cur.1,Whocalc,Wencalc,Saletype,Appraiser ID,Income,Bldgros V,Mscbld V,Val30 SUM,Areasum_y,parcelid,x,y,NAME,geometry
949,949,2011,14 0056 LL0070,C901,,1293.0,,MARCY,ST,SE,ATLANTA,C3,2D1,16.0,0.5600,MN,05T,ARS MARCY STREET LLC,,,619.0,,EDGEWOOD,AVE,,ATLANTA,GA,,SUITE,300,30312,,S1,,24000.0,26000.0,3.0,OVR,24000.0,26000.0,50000.0,50000.0,,0.0,Y,9152.0,1,1950.0,C,2392.0,...,05T,C901,C3,2D1,21-APR-2010,48974.0,383.0,50000,120000,QC,24000.0,96000.0,466300.0,5,TA_EDASILV,28-MAR-2011,"O'HALLORAN, RECEIVER KEVIN T.",ARS MARCY STREET LLC,3.0,E1,,1293.0,,MARCY,ST,SE,ATL,,16.0,0.5600,R5,12.0,,MN,Y,Y,,,,,,,,,,14 0056 LL0070,-84.386669,33.719097,South Atlanta,POINT (-84.386669 33.719097)
950,950,2011,14 005700090046,C901,,1534.0,,JONESBORO,RD,SE,ATLANTA,I3,398,,0.4563,MN,5,BRANCH BANKING & TRUST COMPANY,,PROP TAX COMPLIANCE GROUP,,,P.O. BOX 167,,,WINSTON SALEM,NC,,,,27102,167,,,129000.0,68400.0,4.0,,143300.0,24500.0,167800.0,197400.0,,0.0,Y,7068.0,1,,D,1400.0,...,05,C901,I3,398,07-SEP-2010,46373.0,651.0,160000,197400,DP,129000.0,68400.0,163400.0,5,TA_TJONES,29-SEP-2010,LEGACY DEV INVESTMENT GROUP LLC,BRANCH BANKING & TRUST CO,4.0,,,1534.0,,JONESBORO,RD,SE,ATL,,,0.4563,I1,,,MN,Y,Y,,,,,,,,,,14 005700090046,-84.382321,33.712269,South Atlanta,POINT (-84.382321 33.712269)
952,952,2011,14 005700090293,C901,,1518.0,,JONESBORO,RD,SE,ATLANTA,C3,374,,0.2970,RV,5,LOU QINGXIANG &,WU HUIQIN,,400.0,W,PEACHTREE,ST,NW,ATLANTA,GA,,,#1411,30308,3548,SB,,43900.0,35100.0,3.0,,43900.0,35100.0,79000.0,79000.0,,0.0,Y,5856.0,1,,C-,640.0,...,05,C901,C3,374,12-AUG-2010,49344.0,616.0,79000,79000,WD,43900.0,35100.0,189700.0,0,TA_EDASILV,15-APR-2011,WEATHERBY JOYCE GWENDOLYN,LOU QINGXIANG &,3.0,SB,,1518.0,,JONESBORO,RD,SE,ATL,,,0.2970,I1,,,RV,Y,Y,,,,,,,,,,14 005700090293,-84.382460,33.712760,South Atlanta,POINT (-84.38246 33.71276)
2399,2399,2011,22 482512690078,C103,,41.0,,MILTON,AVE,,ALPHARETTA,C3,353,,0.1322,RV,10,SUMMIT SPRINGS FARM LLC,,,1385.0,,SUMMIT,RD,,ALPHARETTA,GA,,,,30004,,SB,,64300.0,225700.0,3.0,,64300.0,225700.0,290000.0,290000.0,,0.0,Y,6164.0,1,,C-,1726.0,...,10,C103,C3,353,05-MAR-2010,48888.0,488.0,0,290000,OT,64300.0,225700.0,529500.0,T,TA_SWILLIN,22-JUN-2010,"FDIC, AS RECEIVER FOR SECURITY BANK OF N",STATE BANK & TRUST COMPANY,3.0,SB,,41.0,,MILTON,AVE,,ALP,,,0.1322,AG,,,RV,Y,Y,,,,,,,,,,22 482512690078,-84.384690,33.711250,South Atlanta,POINT (-84.38469 33.71125)
2400,2400,2011,22 482512690078,C103,,41.0,,MILTON,AVE,,ALPHARETTA,C3,353,,0.1322,RV,10,SUMMIT SPRINGS FARM LLC,,,1385.0,,SUMMIT,RD,,ALPHARETTA,GA,,,,30004,,SB,,64300.0,225700.0,3.0,,64300.0,225700.0,290000.0,290000.0,,0.0,Y,6164.0,1,,C-,1726.0,...,10,C103,C3,353,05-MAR-2010,48888.0,493.0,290000,290000,LW,64300.0,225700.0,529500.0,RE,CLT,04-JAN-2011,STATE BANK AND TRUST COMPANY,SUMMIT SPRINGS FARM LLC,3.0,SB,,41.0,,MILTON,AVE,,ALP,,,0.1322,AG,,,RV,Y,Y,,,,,,,,,,22 482512690078,-84.384690,33.711250,South Atlanta,POINT (-84.38469 33.71125)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26299,26299,2022,14 005700090285,C901,,1515.0,,LAKEWOOD,AVE,SE,ATLANTA,C3,300,0.0,0.1647,MN,5,0 LAKEWOOD AVENUE LAND TRUST THE,,,2551.0,,HIDDEN HILLS,DR,,MARIETTA,GA,,,,30066.0,,,,24300.0,0.0,1.0,,,,,24300.0,,0.0,Y,0.0,,,,,...,05,C901,C3,300,14-JUL-2021,64191.0,501.0,0,24300,WD,24300,0,24300,G,TA_VCLARK,08-FEB-2022,TODD ROBERT E &,ESTATE OF HELEN MARIE KEMP,1.0,,,1515.0,,LAKEWOOD,AVE,SE,ATLANTA,,0.0,0.1647,R4A,22,,MN,Y,Y,LP401,01-MAR-2022,1,127.0,0,,,24300.0,0.0,14 005700090285,-84.381044,33.713048,South Atlanta,POINT (-84.381044 33.713048)
26300,26300,2022,14 005700120124,C901,,1553.0,,JONESBORO,RD,SE,ATLANTA,C3,373,,0.2755,RV,5,BAWRE REALTY LLC,,,1553.0,,JONESBORO,RD,SE,ATLANTA,GA,,,,30315.0,,CO,,102400.0,89600.0,4.0,,,,,192000.0,,0.0,Y,2224.0,1,,C-,780.0,...,05,C901,C3,373,01-JUN-2021,63868.0,326.0,0,192000,QC,102400,89600,185700,T,TA_BWILCOX,28-DEC-2021,MERKERSON PHYLLIS,BAWRE REALTY LLC,4.0,CO,,1553.0,,JONESBORO,RD,SE,ATLANTA,,,0.2755,R4,,,RV,Y,Y,,,,127.0,192000,,,192000.0,2224.0,14 005700120124,-84.381830,33.710667,South Atlanta,POINT (-84.38183 33.710667)
26301,26301,2022,14 005700120124,C901,,1553.0,,JONESBORO,RD,SE,ATLANTA,C3,373,,0.2755,RV,5,BAWRE REALTY LLC,,,1553.0,,JONESBORO,RD,SE,ATLANTA,GA,,,,30315.0,,CO,,102400.0,89600.0,4.0,,,,,192000.0,,0.0,Y,2224.0,1,,C-,780.0,...,05,C901,C3,373,01-JUN-2021,63868.0,322.0,200000,192000,LW,102400,89600,185700,0,TA_TBRIDGE,21-JAN-2022,MERKERSON PHYLLIS,BAWRE REALTY LLC,4.0,CO,,1553.0,,JONESBORO,RD,SE,ATLANTA,,,0.2755,R4,,,RV,Y,Y,,,2,127.0,192000,,,192000.0,2224.0,14 005700120124,-84.381830,33.710667,South Atlanta,POINT (-84.38183 33.710667)
27668,27668,2022,14 013100030695,C909,,1296.0,,MARTIN,ST,,EAST POINT,C3,300,0.0,0.3600,MN,20,16TH GROUP OF PROPERTIES LLC,,,2754.0,,BODDIE,PL,,DULUTH,GA,,,,30097.0,,CO,,65900.0,0.0,1.0,,65900.0,0.0,65900.0,65900.0,,0.0,Y,0.0,,,,,...,20,C909,C3,300,30-NOV-2021,64918.0,402.0,10000,65900,LW,65900,0,65900,P,TA_JSMASHU,25-APR-2022,"CAWTHON-HOLLUMS PROPERTIES, INC., A GEOR",16TH GROUP OF PROPERTIES LLC,1.0,CO,,1296.0,,MARTIN,ST,,EAST POINT,,0.0,0.3600,R4,,,MN,Y,Y,,,1,183.0,0,,,65900.0,0.0,14 013100030695,-84.383897,33.719087,South Atlanta,POINT (-84.383897 33.719087)


In [None]:
parcel_geo.to_csv('parcel_sales_geo.csv')