# Do Duplexes Sell for Less per Square Foot than Single Family Homes?

## Hypotheses
Our null hypothesis is that duplexes do not sell for less per square foot than single family homes.
Our alternative hypothesis is that they do sell for less than single family homes..

We will try to get results with a 95% confidence, so we will set our alpha to .05

#### Possible Errors:
If we make a type 1 error, we would claim that duplexes sell for less per square foot, when in reality they do not.

On the other hand, if we make a type 2 error, we would claim that they do not sell for less, when in fact they do.

In [3]:
# First we import the libraries we will be using.
import os
import sys
module_path = os.path.abspath(os.pardir)
if module_path not in sys.path:
    sys.path.append(module_path)
from src import data_download
import scipy.stats as stats
import statsmodels.stats.power as power
import pandas as pd
import numpy as np

## Load the data

In [4]:
DataFrames = data_download.get_dataframes()
sales = DataFrames['rp_sale']
parcels = DataFrames['parcel']
residences = DataFrames['res_bldg']

Successfully downloaded ZIP file
    https://aqua.kingcounty.gov/extranet/assessor/Parcel.zip
    
Successfully downloaded ZIP file
    https://aqua.kingcounty.gov/extranet/assessor/Residential%20Building.zip
    


  exec(code_obj, self.user_global_ns, self.user_ns)


Successfully downloaded ZIP file
    https://aqua.kingcounty.gov/extranet/assessor/Real%20Property%20Sales.zip
    


  exec(code_obj, self.user_global_ns, self.user_ns)


Successfully downloaded ZIP file
    https://aqua.kingcounty.gov/extranet/assessor/Lookup.zip
    


## Filter the data for subset we want to explore


In [5]:
sales = sales[sales['DocumentDate'].astype(str).str.endswith('2019')]
sales = sales[(sales['SalePrice'] > 120000) & (sales['SalePrice'] < 3000000)]

#### Join the tables and extract the features we want to compare

In [6]:

duplexs = parcels[parcels['PresentUse'] == 3]
duplexs = duplexs.merge(sales, on = ['Major','Minor']).merge(residences, on = ['Major','Minor'])
duplexs = duplexs[['SalePrice','SqFtTotLiving']]
duplexs['cost_per_sqft'] = duplexs.SalePrice / duplexs.SqFtTotLiving
singlefamily = parcels[parcels['PresentUse'] == 2]
singlefamily = singlefamily.merge(sales, on = ['Major','Minor']).merge(residences, on = ['Major','Minor'])
singlefamily = singlefamily[['SalePrice','SqFtTotLiving']]
singlefamily['cost_per_sqft'] = singlefamily.SalePrice / singlefamily.SqFtTotLiving


### Single family vs duplex: sample size and sample means

In [7]:
sample1 = duplexs['cost_per_sqft']
sample2 = singlefamily['cost_per_sqft']
print(f'In 2019 {len(sample1)} duplexes were sold, and {len(sample2)} single family homes were sold.')

print(f'The mean cost per sqft of our samples for single family homes is {sample2.mean()}')
print(f'The mean cost per sqft of our samples for duplexs is {sample1.mean()}')

In 2019 262 duplexes were sold, and 24710 single family homes were sold.
The mean cost per sqft of our samples for single family homes is 365.5262182329487
The mean cost per sqft of our samples for duplexs is 432.95432652015114


A quick glace at the sample means seems to indicate that, in fact, duplexes sell for more per square foot than single family homes.  Let's test whether this difference is statistically significant, especially since our sample size of duplexes is much smaller than for single family homes.

## Testing for statistical significance

We will be using a two sample, one-tailed Welch's test to determine the statistical significance of the difference in means.  Our T-critical value tells us that we need a test statistic below -1.645 to confirm with 95% confidence that duplexes sell for less per square foot than single family homes.  We are looking for a pvalue of .05 or less to confirm our result.

In [8]:
data_download.tt_ind(sample1, sample2, alpha = .05, equal_var = False, tails = 1)

critical stat is -1.6449146507800436, test stat is 3.255415005345168 with a pvalue of 0.0006402560157514741


# We cannot reject the null hypothesis

Our critical stat, which tells us if duplexes sell for less per square foot than single family homes is ~ -1.64.  Our test statistic would need to be below that for us to confidently confirm this.  

In fact, the test statistic is ~ 3.35.  We cannot reject our null hypothesis. 

### What are the chances we are wrong?
Let's check the power of our test, the chance that we would detect the lower average per square foot value of duplexes, if it were there.

In [9]:
effect = data_download.cohen_d(sample1, sample2)

testpower = power.tt_ind_solve_power(alpha = .95, 
                         nobs1 = len(sample1), 
                         ratio = len(sample1) / len(sample2),
                         alternative = 'smaller',
                         effect_size = effect)

print(f'The power of our test to discover a lower square foot cost of duplexes, if it is there, is {testpower}')

The power of our test to discover a lower square foot cost of duplexes, if it is there, is 0.9478374526365154


The power of our test is just under .95.  If, in fact, duplexes do sell for less per square foot than single family homes, we would get this same result about 5% of the time.  This gives us ~ 95% confidence that we are not mistaken. 

## It's likely that duplexes do not sell for less than single family homes.

## Next steps

We recommend testing whether duplexes, in fact may sell for more per square foot than single family homes, which our tests may indicate is the case.  If so, subdividing a home into a duplex may be a successful way for some homeowners to improve the value of their home.