<img src="http://imgur.com/GCAf1UX.png" style="float: left; margin: 25px 15px 0px 0px; height: 25px">

## 3. What property characteristics predict an "abnormal" sale?

---

The `SaleCondition` feature indicates the circumstances of the house sale. From the data file, we can see that the possibilities are:

       Normal	Normal Sale
       Abnorml	Abnormal Sale -  trade, foreclosure, short sale
       AdjLand	Adjoining Land Purchase
       Alloca	Allocation - two linked properties with separate deeds, typically condo with a garage unit	
       Family	Sale between family members
       Partial	Home was not completed when last assessed (associated with New Homes)
       
One of the executives at your company has an "in" with higher-ups at the major regional bank. His friends at the bank have made him a proposal: if he can reliably indicate what features, if any, predict "abnormal" sales (foreclosures, short sales, etc.), then in return the bank will give him first dibs on the pre-auction purchase of those properties (at a dirt-cheap price).

He has tasked you with determining (and adequately validating) which features of a property predict this type of sale. 

---

**Your task:**
1. Determine which features predict the `Abnorml` category in the `SaleCondition` feature.
- Justify your results.

This is a challenging task that tests your ability to perform classification analysis in the face of severe class imbalance. You may find that simply running a classifier on the full dataset to predict the category ends up useless: when there is bad class imbalance classifiers often tend to simply guess the majority class.

It is up to you to determine how you will tackle this problem. I recommend doing some research to find out how others have dealt with the problem in the past. Make sure to justify your solution. Don't worry about it being "the best" solution, but be rigorous.

Be sure to indicate which features are predictive (if any) and whether they are positive or negative predictors of abnormal sales.

In [104]:
import numpy as np
import scipy.stats as stats
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

sns.set_style('whitegrid')

%config InlineBackend.figure_format = 'retina'
%matplotlib inline

In [105]:
from sklearn.model_selection import train_test_split,KFold, cross_val_score, cross_val_predict
from sklearn.model_selection import KFold, cross_val_score, cross_val_predict
from sklearn.preprocessing import StandardScaler
from sklearn import metrics
from sklearn.linear_model import Lasso, ElasticNet, Ridge, LassoCV, ElasticNetCV, \
RidgeCV, LinearRegression, LogisticRegression, LogisticRegressionCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix, \
mean_squared_error

import statsmodels.formula.api as sm
import statsmodels.api as smf

In [106]:
pd.set_option('display.max_columns', 100)
# retrieve dataframes
fixeddf = pd.read_pickle("./fixeddf.pkl")
locationdf = pd.read_pickle("./locationdf.pkl")
universaldf =pd.read_pickle("./universaldf.pkl")
variabledf = pd.read_pickle("./variabledf.pkl")
# retrieve stored y-residual variables
%store -r yrestrain
%store -r yrestest
%store -r FixedCoefs

In [107]:
# join all dfs together
dfs = [fixeddf, locationdf, universaldf, variabledf]
dfs = pd.concat(dfs, axis=1)
dfs.shape

(1448, 113)

In [108]:
dfs.SaleCondition.head()

0     Normal
1     Normal
2     Normal
3    Abnorml
4     Normal
Name: SaleCondition, dtype: object

In [109]:
dfs['SaleCondition'] = np.where(dfs['SaleCondition'].isin(['Abnorml']), 0, 1).astype(float)

In [110]:
# split data into train and test
X = dfs.drop('SaleCondition', axis=1)
Xtrain = X[X.YrSold != 2010]
y_train = dfs.SaleCondition[dfs.YrSold != 2010]
Xtest = X[X.YrSold == 2010]
y_test = dfs.SaleCondition[dfs.YrSold == 2010]

In [111]:
# write function to slice and dice X variable
ss = StandardScaler()
def slicex(colcats):
    Xtrn = Xtrain[colcats.columns] 
    Xtst = Xtest[colcats.columns]            
    # standardize X VALUES using list of x variables
    Xtrnstd = pd.DataFrame(ss.fit_transform(Xtrn), columns=colcats.columns)
    Xtststd = pd.DataFrame(ss.transform(Xtst), columns=colcats.columns)
    return Xtrnstd, Xtststd

In [112]:
X_train = slicex(X)[0]
X_test = slicex(X)[1]
X_train.head()

Unnamed: 0,LotFrontage,LotArea,LandContour,LandSlope,YearRemodAdd,MasVnrArea,BsmtHeight,BsmtExposure,TotalBsmtSF,GrLivArea,BedroomAbvGr,KitchenAbvGr,Fireplaces,GarageYrBlt,GarageCars,GarageArea,TotBathAbvGr,BsmtTotBath,LotConfig_CulDSac,LotConfig_FR,LotConfig_Inside,MSSubClass_Multi,MSSubClass_Sone,MSSubClass_Stwo,LotShape_IR2,LotShape_IR3,LotShape_Reg,GarageType_Attchd,GarageType_Basment,GarageType_BuiltIn,GarageType_CarPort,GarageType_Detchd,GarageType_None,MasVnrType_None,MasVnrType_Stone,Foundation_CBlock,Foundation_Other,Foundation_PConc,Condition_PosNorm,Condition_RRNorm,Condition_RdNorm,MSZoning_RH,MSZoning_RL,MSZoning_RM,Neighborhood_Blueste,Neighborhood_BrDale,Neighborhood_BrkSide,Neighborhood_ClearCr,Neighborhood_CollgCr,Neighborhood_Crawfor,...,Neighborhood_Sawyer,Neighborhood_SawyerW,Neighborhood_Somerst,Neighborhood_StoneBr,Neighborhood_Timber,Neighborhood_Veenker,YrSold,SalePrice,SaleType_CWD,SaleType_Con,SaleType_ConLD,SaleType_ConLI,SaleType_ConLw,SaleType_New,SaleType_Oth,SaleType_WD,MoSold_spring,MoSold_summer,MoSold_winter,Functional,BsmtCond,OverallQual,CentralAir,GarageFinish,WoodDeckSF,FireplaceQu,ExterCond,BsmtFinType,OverallCond,HeatingQC,Electrical,GarageCond,OpenPorchSF,GarageQual,RoofShingle,ExterQual,Fence,PavedDrive,EnclosedPorch,KitchenQual,Exterior1st_CemntBd,Exterior1st_HdBoard,Exterior1st_MetalSd,Exterior1st_Other,Exterior1st_Plywood,Exterior1st_VinylSd,Exterior1st_Wd Sdng,Exterior1st_WdShing,RoofStyle_Hip,RoofStyle_Other
0,-0.181416,-0.199217,0.311295,0.22559,0.877459,0.501224,0.574134,-0.631694,-0.476784,0.381329,0.162216,-0.205229,-0.971296,0.291836,0.294166,0.340982,0.879224,1.131016,-0.260351,-0.191195,0.612372,-0.251661,-1.17688,1.507247,-0.167938,-0.084282,0.771362,0.823983,-0.105326,-0.253417,-0.074271,-0.607535,-0.237258,-1.213618,-0.307148,-0.857936,-0.147028,1.105892,-0.132453,-0.165455,-0.31322,-0.112687,0.515629,-0.424722,-0.039621,-0.109066,-0.214247,-0.138453,2.883378,-0.197707,...,-0.218218,-0.193387,-0.255164,-0.126189,-0.162938,-0.09325,0.428469,0.334857,-0.056077,-0.028006,-0.074271,-0.056077,-0.062721,-0.311709,-0.048545,0.39736,-0.606326,-0.891434,2.814796,0.233656,0.110115,0.641103,0.256901,0.313176,-0.755039,-1.027704,-0.247806,0.965343,-0.521959,0.87837,0.310193,0.262221,0.208047,0.259641,0.116201,1.053921,-0.478839,0.300995,-0.348543,0.7305,-0.208173,-0.423435,-0.419567,-0.191195,-0.272166,1.336423,-0.407875,-0.141365,-0.493624,-0.149786
1,0.466745,-0.087199,0.311295,0.22559,-0.434559,-0.569301,0.574134,2.240189,0.50011,-0.500427,0.162216,-0.205229,0.59952,0.231171,0.294166,-0.070599,0.699865,-0.624199,-0.260351,5.230254,-1.632993,-0.251661,0.849704,-0.663461,-0.167938,-0.084282,0.771362,0.823983,-0.105326,-0.253417,-0.074271,-0.607535,-0.237258,0.823983,-0.307148,1.165588,-0.147028,-0.904248,-0.132453,-0.165455,3.192645,-0.112687,0.515629,-0.424722,-0.039621,-0.109066,-0.214247,-0.138453,-0.346815,-0.197707,...,-0.218218,-0.193387,-0.255164,-0.126189,-0.162938,10.723805,-0.460517,-0.006381,-0.056077,-0.028006,-0.074271,-0.056077,-0.062721,-0.311709,-0.048545,0.39736,1.649278,-0.891434,-0.355266,0.233656,0.110115,-0.087345,0.256901,0.313176,1.654929,0.625953,-0.247806,0.274497,2.200191,0.87837,0.310193,0.262221,-0.718356,0.259641,0.116201,-0.708599,-0.478839,0.300995,-0.348543,-0.775345,-0.208173,-0.423435,2.383407,-0.191195,-0.272166,-0.748266,-0.407875,-0.141365,-0.493624,-0.149786
2,-0.051783,0.073522,0.311295,0.22559,0.828866,0.315521,0.574134,0.3256,-0.322791,0.530912,0.162216,-0.205229,0.59952,0.287342,0.294166,0.621605,0.879224,1.131016,-0.260351,-0.191195,0.612372,-0.251661,-1.17688,1.507247,-0.167938,-0.084282,-1.296407,0.823983,-0.105326,-0.253417,-0.074271,-0.607535,-0.237258,-1.213618,-0.307148,-0.857936,-0.147028,1.105892,-0.132453,-0.165455,-0.31322,-0.112687,0.515629,-0.424722,-0.039621,-0.109066,-0.214247,-0.138453,2.883378,-0.197707,...,-0.218218,-0.193387,-0.255164,-0.126189,-0.162938,-0.09325,0.428469,0.524433,-0.056077,-0.028006,-0.074271,-0.056077,-0.062721,-0.311709,-0.048545,0.39736,-0.606326,-0.891434,-0.355266,0.233656,0.110115,0.641103,0.256901,0.313176,-0.755039,0.625953,-0.247806,0.965343,-0.521959,0.87837,0.310193,0.262221,-0.080505,0.259641,0.116201,1.053921,-0.478839,0.300995,-0.348543,0.7305,-0.208173,-0.423435,-0.419567,-0.191195,-0.272166,1.336423,-0.407875,-0.141365,-0.493624,-0.149786
3,-0.397469,-0.092069,0.311295,0.22559,-0.726119,-0.569301,-0.565205,-0.631694,-0.717398,0.395106,0.162216,-0.205229,0.59952,0.280601,1.62995,0.780625,-1.093724,1.131016,-0.260351,-0.191195,-1.632993,-0.251661,-1.17688,1.507247,-0.167938,-0.084282,-1.296407,-1.213618,-0.105326,-0.253417,-0.074271,1.645996,-0.237258,0.823983,-0.307148,-0.857936,-0.147028,-0.904248,-0.132453,-0.165455,-0.31322,-0.112687,0.515629,-0.424722,-0.039621,-0.109066,-0.214247,-0.138453,-0.346815,5.057997,...,-0.218218,-0.193387,-0.255164,-0.126189,-0.162938,-0.09325,-1.349503,-0.530875,-0.056077,-0.028006,-0.074271,-0.056077,-0.062721,-0.311709,-0.048545,0.39736,-0.606326,-0.891434,2.814796,0.233656,1.911476,0.641103,0.256901,-0.809333,-0.755039,1.177172,-0.247806,0.274497,-0.521959,-0.174024,0.310193,0.262221,-0.186814,0.259641,0.116201,-0.708599,-0.478839,0.300995,4.114291,0.7305,-0.208173,-0.423435,-0.419567,-0.191195,-0.272166,-0.748266,2.451732,-0.141365,-0.493624,-0.149786
4,0.639588,0.366715,0.311295,0.22559,0.731679,1.342351,0.574134,1.282895,0.218591,1.341812,1.386997,-0.205229,0.59952,0.285095,1.62995,1.687973,0.879224,1.131016,-0.260351,5.230254,-1.632993,-0.251661,-1.17688,1.507247,-0.167938,-0.084282,-1.296407,0.823983,-0.105326,-0.253417,-0.074271,-0.607535,-0.237258,-1.213618,-0.307148,-0.857936,-0.147028,1.105892,-0.132453,-0.165455,-0.31322,-0.112687,0.515629,-0.424722,-0.039621,-0.109066,-0.214247,-0.138453,-0.346815,-0.197707,...,-0.218218,-0.193387,-0.255164,-0.126189,-0.162938,-0.09325,0.428469,0.859352,-0.056077,-0.028006,-0.074271,-0.056077,-0.062721,-0.311709,-0.048545,0.39736,-0.606326,-0.891434,2.814796,0.233656,0.110115,1.369552,0.256901,0.313176,0.797693,0.625953,-0.247806,0.965343,-0.521959,0.87837,0.310193,0.262221,0.557346,0.259641,0.116201,1.053921,-0.478839,0.300995,-0.348543,0.7305,-0.208173,-0.423435,-0.419567,-0.191195,-0.272166,1.336423,-0.407875,-0.141365,-0.493624,-0.149786


In [113]:
# creating odd list of K for KNN
myList = list(range(1,50,2))

# empty list that will hold cv scores
cv_scores = []

# perform 10-fold cross validation
for k in myList:
    knn = KNeighborsClassifier(n_neighbors=k)
    scores = cross_val_score(knn, X_train, y_train, cv=10, scoring='accuracy')
    cv_scores.append(scores.mean())

In [114]:
cv_scores[:10]


[0.8753672392513436,
 0.9263463160854893,
 0.9326271716035496,
 0.9318336184539433,
 0.9318336184539433,
 0.9318336184539433,
 0.9318336184539433,
 0.9318336184539433,
 0.9318336184539433,
 0.9318336184539433]

In [115]:
print('best score:', max(cv_scores))
# knn = 5 look good. 

best score: 0.9326271716035496


In [116]:
# loading library
from sklearn.neighbors import KNeighborsClassifier

# instantiate learning model (k = 5)
knn = KNeighborsClassifier(n_neighbors=5)

# fitting the model
knn.fit(X_train, y_train)

# predict the response
pred = knn.predict(X_test)

# evaluate accuracy = (TP+TN)/(P+N)
print(metrics.accuracy_score(y_test, pred))

0.9476744186046512


In [117]:
# calculate baseline
y_train[y_train==0].count()/len(y_train)
#abnormal transactions account for 6.8% of total train population
# normal transactions therefore account for 93.2%

0.06818181818181818

In [118]:
cm = confusion_matrix(y_test, pred)
cr = classification_report(y_test, pred)
### View classification report 
print(cr)

             precision    recall  f1-score   support

        0.0       0.00      0.00      0.00         9
        1.0       0.95      1.00      0.97       163

avg / total       0.90      0.95      0.92       172



  'precision', 'predicted', average, warn_for)


In [119]:
#confusion matrix
pd.DataFrame(cm, index=['Abnormal','Normal'], columns=['Abnormal','Normal'])
# comparing to baseline, the model predicted 0 true negatives compared to an actual 6.8%
#percentage. this is probably due to the data being greatly imbalanced. 

Unnamed: 0,Abnormal,Normal
Abnormal,0,9
Normal,0,163


In [120]:
# implement synthetic minority sampling technique (SMOTE) to address this.

from imblearn.over_sampling import SMOTE
os = SMOTE(random_state=0)
columns = X_train.columns
os_data_X,os_data_y=os.fit_sample(X_train, y_train)
os_data_X = pd.DataFrame(data=os_data_X,columns=columns )
os_data_y= pd.DataFrame(data=os_data_y,columns=['SaleCondition'])
# we can Check the numbers of our data
print("length of oversampled data is ",len(os_data_X))
print("Number of normal sale conditions in oversampled data",len(os_data_y[os_data_y['SaleCondition']==0]))
print("Number of abnormal sales",len(os_data_y[os_data_y['SaleCondition']==1]))
print("Proportion of normal sale conditions in oversampled data is ",len(os_data_y[os_data_y['SaleCondition']==0])/len(os_data_X))
print("Proportion of abnormal sale conditions in oversampled data is ",len(os_data_y[os_data_y['SaleCondition']==1])/len(os_data_X))


length of oversampled data is  2378
Number of normal sale conditions in oversampled data 1189
Number of abnormal sales 1189
Proportion of normal sale conditions in oversampled data is  0.5
Proportion of abnormal sale conditions in oversampled data is  0.5


In [121]:
# ok, we have addressed the imbalance! 
# now run RFE to identify useful predictive features

from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression(solver='liblinear')
rfe = RFE(logreg, 15)
rfe = rfe.fit(os_data_X, os_data_y.values.ravel())
print(rfe.support_)
print(rfe.ranking_)

[False False False False  True False False False False  True False False
 False False False False False False False False False False  True  True
 False False False False False False False False False False False False
  True False  True False False False False False False False False False
 False False False False False False False False False False False  True
 False False False False False False False False False  True False False
 False False  True  True  True  True False False False False False False
 False False False  True False False False False False False False False
 False False False False False  True False False False False False  True
 False False False False]
[68 38 90 19  1 12 15  5  4  1  6 65 31 62 78 56 52 72 75 28 74  2  1  1
 88 17 93 59 61 60 23 58 57 53 30 35  1 80  1 85 76 92 10  9 54 33 48 36
 45 47 84 43 49 91 42 29 69 34 13  1 44 83 41 46 96 51 37 11  3  1 64 79
 55 81  1  1  1  1 98 20 67 21 97 77 63 95 40  1 22 14 82 18 87  8 39  7
 25 50 32 16 94  1 86 71 

In [122]:
rfetop =  list(zip(X.columns, rfe.ranking_))
print(rfetop[:5])
top15 = []
for i in rfetop:
    if i[1]==1:
        top15.append(i[0])
top15

[('LotFrontage', 68), ('LotArea', 38), ('LandContour', 90), ('LandSlope', 19), ('YearRemodAdd', 1)]


['YearRemodAdd',
 'GrLivArea',
 'MSSubClass_Sone',
 'MSSubClass_Stwo',
 'Foundation_Other',
 'Condition_PosNorm',
 'Neighborhood_NridgHt',
 'SalePrice',
 'SaleType_ConLw',
 'SaleType_New',
 'SaleType_Oth',
 'SaleType_WD',
 'FireplaceQu',
 'KitchenQual',
 'Exterior1st_VinylSd']

In [123]:
# run logistic regression
Xlr=os_data_X[top15]
ylr=os_data_y
logit_model=sm.Logit(ylr,Xlr)
result=logit_model.fit()
print(result.summary2())

         Current function value: inf
         Iterations: 35
                                           Results: Logit
Model:                           Logit                         Pseudo R-squared:              -inf   
Dependent Variable:              SaleCondition                 AIC:                           inf    
Date:                            2019-01-17 21:14              BIC:                           inf    
No. Observations:                2378                          Log-Likelihood:                -inf   
Df Model:                        14                            LL-Null:                       -1648.3
Df Residuals:                    2363                          LLR p-value:                   1.0000 
Converged:                       0.0000                        Scale:                         1.0000 
No. Iterations:                  35.0000                                                             
-----------------------------------------------------------------

  return 1/(1+np.exp(-X))
  return np.sum(np.log(self.cdf(q*np.dot(X,params))))
  bse_ = np.sqrt(np.diag(self.cov_params()))
  return (self.a < x) & (x < self.b)
  return (self.a < x) & (x < self.b)
  cond2 = cond0 & (x <= self.a)


In [124]:
top15

['YearRemodAdd',
 'GrLivArea',
 'MSSubClass_Sone',
 'MSSubClass_Stwo',
 'Foundation_Other',
 'Condition_PosNorm',
 'Neighborhood_NridgHt',
 'SalePrice',
 'SaleType_ConLw',
 'SaleType_New',
 'SaleType_Oth',
 'SaleType_WD',
 'FireplaceQu',
 'KitchenQual',
 'Exterior1st_VinylSd']

In [125]:
# remove p-values above 0.05
str = ['SaleType_WD', 'Foundation_Other','Condition_PosNorm','Neighborhood_NridgHt']

[top15.remove(i) for i in str]
top15

['YearRemodAdd',
 'GrLivArea',
 'MSSubClass_Sone',
 'MSSubClass_Stwo',
 'SalePrice',
 'SaleType_ConLw',
 'SaleType_New',
 'SaleType_Oth',
 'FireplaceQu',
 'KitchenQual',
 'Exterior1st_VinylSd']

In [126]:

# run logistic regression
Xlr=os_data_X[top15]
ylr=os_data_y
logit_model=sm.Logit(ylr,Xlr)
result=logit_model.fit()
print(result.summary2())

         Current function value: 0.551536
         Iterations: 35
                               Results: Logit
Model:                   Logit               Pseudo R-squared:    0.204      
Dependent Variable:      SaleCondition       AIC:                 2645.1034  
Date:                    2019-01-17 21:14    BIC:                 2708.6175  
No. Observations:        2378                Log-Likelihood:      -1311.6    
Df Model:                10                  LL-Null:             -1648.3    
Df Residuals:            2367                LLR p-value:         3.0517e-138
Converged:               0.0000              Scale:               1.0000     
No. Iterations:          35.0000                                             
-----------------------------------------------------------------------------
                     Coef.    Std.Err.     z    P>|z|     [0.025     0.975]  
-----------------------------------------------------------------------------
YearRemodAdd          0.7849  

  return 1/(1+np.exp(-X))


In [127]:
# p -values over 0.05 should be dropped
str = ['MSSubClass_Stwo','SaleType_ConLw','SaleType_New','SaleType_Oth']

[top15.remove(i) for i in str]
top15

['YearRemodAdd',
 'GrLivArea',
 'MSSubClass_Sone',
 'SalePrice',
 'FireplaceQu',
 'KitchenQual',
 'Exterior1st_VinylSd']

In [128]:
# run logistic regression
Xlr=os_data_X[top15]
ylr=os_data_y
logit_model=sm.Logit(ylr,Xlr)
result=logit_model.fit()
print(result.summary2())

Optimization terminated successfully.
         Current function value: 0.602149
         Iterations 5
                          Results: Logit
Model:                Logit            Pseudo R-squared: 0.131     
Dependent Variable:   SaleCondition    AIC:              2877.8223 
Date:                 2019-01-17 21:14 BIC:              2918.2404 
No. Observations:     2378             Log-Likelihood:   -1431.9   
Df Model:             6                LL-Null:          -1648.3   
Df Residuals:         2371             LLR p-value:      2.4846e-90
Converged:            1.0000           Scale:            1.0000    
No. Iterations:       5.0000                                       
-------------------------------------------------------------------
                     Coef.  Std.Err.    z    P>|z|   [0.025  0.975]
-------------------------------------------------------------------
YearRemodAdd         0.6786   0.0654 10.3791 0.0000  0.5504  0.8067
GrLivArea           -0.6790   0.0855 -7.9

In [129]:
X_train, X_test, y_train, y_test = train_test_split(Xlr, ylr, test_size=0.2, random_state=0)
logreg = LogisticRegression()
logreg.fit(X_train, y_train)

  if n_samples < ensure_min_samples:


LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [130]:
y_pred = logreg.predict(X_test)
print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(logreg.score(X_test, y_test)))

Accuracy of logistic regression classifier on test set: 0.67


In [131]:
confusion_matrix = confusion_matrix(y_test, y_pred)
print(confusion_matrix)

[[153  80]
 [ 77 166]]


In [132]:
print(classification_report(y_test, y_pred))

             precision    recall  f1-score   support

        0.0       0.67      0.66      0.66       233
        1.0       0.67      0.68      0.68       243

avg / total       0.67      0.67      0.67       476



CONCLUSION: 
1) our target variable is 0 (abnormal sale) and precision is 67% which is ability of classifier to correctly classify positive samples ie 33% wrongly classified as negative 
2) recall is 66%. this is percentage of positively classified samples which were actually positive. 
3) this result is much better than the results using the original imbalanced dataset. 
4) factors affecting abnormal sales are listed above (see regression summary). abnormal sales are strongly affected by factors such as 'newness of home' and saleprice (positive) and living space (negative).

n/b removal of features based on p-values of regression were done by eyeballing, so if for whatever reason coefficients change (eg due to random sample picking), features may be incorrectly removed. 