# Real Estate Sales 2001-2016

The Office of Policy and Management maintains a listing of all real estate sales with a sales price of $2,000 or greater that occur between October 1 and September 30 of each year. For each sale record, the file includes: town, property address, date of sale, property type (residential, apartment, commercial, industrial or vacant land), sales price, and property assessment.

Source: <https://data.ct.gov/Housing-and-Development/Real-Estate-Sales-2001-2016/5mzw-sjtu>

**INPUT**
Each row is an individual sale that includes address, assessed value and sale price

**OUTPUT**
The processed dataset should contain aggregated counts per town per price bucket (<100k, <200k, <300k, <400k, 400k+)

**NOTES** We only care about Single Family homes, without ay Non-use codes applied to them

In [83]:
import pandas as pd
import csv

In [6]:
raw = pd.read_csv('../raw/Real_Estate_Sales_2001-2016.csv', index_col=0, dtype=str)

In [120]:
town2fips = pd.read_csv('https://raw.githubusercontent.com/CT-Data-Collaborative/ct-town-county-fips-list/master/ct-town-county-fips-list.csv', dtype=str, index_col=0)

### Extract only relevant sales from the dataset

In [105]:
# Single Family homes with Non-Use code = 0
relevant = raw.copy(deep=True)
relevant['NonUseCode'] = relevant['NonUseCode'].astype(str).fillna('0')

relevant = relevant[
    (relevant['ResidentialType'] == 'Single Family')
    & (relevant['NonUseCode'] == '0' )
]

# Sale Amount should be a positive float
relevant['SaleAmount'] = relevant['SaleAmount'].astype(float)
relevant = relevant[relevant['SaleAmount'] > 0]

# Remove all unnecessary columns
relevant = relevant.filter(['ListYear', 'Town', 'SaleAmount'])

relevant['Price Range'] = pd.cut(
    relevant['SaleAmount'],
    [0, 100000, 200000, 300000, 400000, 10000000000],
    labels=[ 'Less than $100,000',
             '$100,000 to $199,999',
             '$200,000 to $299,000',
             '$300,000 to $399,000',
             '$400,000 and Over']
)

In [121]:
relevant_counts = relevant.groupby(['Town', 'ListYear', 'Price Range']).size()
relevant_counts = relevant_counts.unstack(level=0,fill_value=0).stack().reset_index()

# Calculate Counties counts and append to the main dataframe
relevant_counts['County'] = relevant_counts['Town'].apply(lambda t: town2fips.loc[t]['County'])

counties = relevant_counts.groupby(['County', 'ListYear', 'Price Range']).count()
counties = counties.unstack(level=0,fill_value=0).stack().reset_index()
counties['Town'] = counties['County'].apply(lambda c: c + ' County')

relevant_counts_combined = pd.concat([relevant_counts, counties], sort=False)

# Add remaining columns
relevant_counts_combined['FIPS'] = relevant_counts_combined['Town'].apply(lambda t: town2fips.loc[t]['FIPS'])
relevant_counts_combined['Measure Type'] = 'Number'
relevant_counts_combined['Variable'] = 'Number of Home Sales'

relevant_counts_combined.to_csv('../data/single-family-home-sales-2001-2016.csv', index=False,
                       columns=['Town', 'FIPS', 'ListYear', 'Price Range', 'Measure Type', 'Variable', 0],
                       header=['Town/County', 'FIPS', 'Year', 'Price Range', 'Measure Type', 'Variable', 'Value'],
                       quoting=csv.QUOTE_NONNUMERIC)

In [71]:
town2fips

Unnamed: 0_level_0,County,FIPS
Town,Unnamed: 1_level_1,Unnamed: 2_level_1
Andover,Tolland,0901301080
Ansonia,New Haven,0900901220
Ashford,Windham,0901501430
Avon,Hartford,0900302060
Barkhamsted,Litchfield,0900502760
Beacon Falls,New Haven,0900903250
Berlin,Hartford,0900304300
Bethany,New Haven,0900904580
Bethel,Fairfield,0900104720
Bethlehem,Litchfield,0900504930


In [92]:
raw[raw['ListYear'] == '2001']

Unnamed: 0_level_0,SerialNumber,ListYear,DateRecorded,Town,Address,AssessedValue,SaleAmount,SalesRatio,PropertyType,ResidentialType,NonUseCode,Remarks
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
684,10173,2001,04/17/2002 12:00:00 AM,Ansonia,1-3 EAGLE ST,63630,116000,54.8534482758621,Residential,Two Family,0,
694,10005,2001,10/04/2001 12:00:00 AM,Ansonia,1 CRESTWOOD RD,76370,160000,47.73125,Residential,Single Family,0,
697,10253,2001,06/18/2002 12:00:00 AM,Ansonia,1 DAVIES CT,97720,180000,54.2888888888889,Residential,Single Family,0,
698,10094,2001,01/17/2002 12:00:00 AM,Ansonia,1 DOREL TER,110600,259900,42.5548287803001,Residential,Single Family,0,
710,10100,2001,01/30/2002 12:00:00 AM,Ansonia,1 JAMES ST,63210,132000,47.8863636363636,Residential,Single Family,0,
715,10268,2001,06/27/2002 12:00:00 AM,Ansonia,1 LESTER ST,82530,74500,110.778523489933,Residential,Two Family,0,
733,10012,2001,10/11/2001 12:00:00 AM,Ansonia,1 WESTBROOK AVE,74830,131000,57.1221374045802,Residential,Two Family,0,
738,10115,2001,02/22/2002 12:00:00 AM,Ansonia,10-12 CLIFTON AVE,60550,20000,302.75,Residential,Single Family,25,
739,10187,2001,04/29/2002 12:00:00 AM,Ansonia,10-12 HALL ST,87710,168000,52.2083333333333,Residential,Single Family,0,
741,10337,2001,09/03/2002 12:00:00 AM,Ansonia,10-12 PARKER ST,112630,186500,60.3914209115282,Residential,Two Family,0,
