### Classifying Housing Prices

Objective: Given a data set about housing/property prices, create a model that can predict whether a house/property will have a low or high price

#### How did I operationalize my target?
The original data set contained a price column. I ran some basic descriptive statistics and decided that the median price would serve as a reasonable mid-point for dividing up housing prices into a low and high category. From there, I created an "is_high" column that served as the target by assigning zero to those rows with a price < $273,500 and 1 to those equal to or over that figure.
(Note: see Jupyter notebook project4_data for a more detailed look at this data transformation process.)

#### How did I choose which data to include?
Using a combination of techniques (see Jupyter notebook project4_eda) I determined that there were a lot of variables that didn't have particularly high correlation with directly predicting the housing price. Baths were the most highly correlated variable in any direction (in this case positive) followed by square feet (both of which make clear sense); surprisingly beds were only sightly correlated. 

A few clear features that had impact were that zip code and the status of house were somwhat negatively correlated with price. (Although note below, when I modeled, zip code didn't have any effect--I may have done something wrong there.) New construction--something typically highly coveted in today's housing market was somewhat more highly correlated with higher price. Condos were also highly correlated with higher price. Auction and foreclosure properties were both negatively correlated with price, as would be expected as those almost always sell under market price. Most other variables appeared to have less to little impact.

When looking at a Seaborn heatmap of the values columns of all the transformed data, there was very little overlap between them with the exception of beds, baths, and square feet being correlated with each other.

I tried three basic models. Model 1 was beds, baths, and square feet. Model 2 was those plus zip code (again, I think I need to redo this one). Model 3 was the number of beds, number of baths, square footage, dummy variables for all of the housing types, and zip code

Although I didn't try this, some additional improvement in the model might have come from doing something with the address data--trying to further break down each zip code into smaller neighborhoods.

#### How did I transform my data?
First I dropped all the null value rows. Most of these were in housing-type categories that had small numbers of properties, and the remaining categories covered all the relevant types of properties (e.g. one that got dropped was "coming soon" which obviously doesn't have complete data since it's not final yet and only had 15 rows in the dataset). Due to dropping rows with empty values, the number of housing types dropped from 15 to 12.
Then some columns needed to be transformed into numbers from strings (note: I'm sure there's a combination way to do all the .replace lines, but I couldn't figure it out).

The "bed_bath" column needed to be split on the " • " divider and then cleaned quite a bit--dropping the non-number parts of the values, converting things like "--" to 0, and converting the word "studio" to 0.
Then dummy columns needed to be made for all the categorical variables (remembering to drop one).

The only column from the original data set that was still relevant as unchanged was the zip code one. Although now with further thought, I think this also needs to be turned into categorical variables with get_dummies.

Then I created a few different dataframes to use as matrices for modeling: a df_bbsplit one that had the beds, baths, and square feet, one that merged bbsplit with the zip code, and one that had everything including the categorical variables.


#### What are my findings?

One of my interesting findings is that just the beds, baths, and square feet of a property are reasonably predictive of the price. That scored at .75 with a non-cross-validated model and using various tools, this was fairly consistent (e.g. cross_val_score gave me 0.79724656, 0.73763306,  0.70444584; all the iterations in GridSearchCV gave me scores in the .70s with a best estimator model score of .76 using l2 regression and a penalty of C=0.1.)

Since the Seaborn heatmap and correlating the columns with one another showed a high correlation between beds, baths, and square feet (as would be expected), I also tried a model that used just beds, but that turned out much worse. And then because baths had somewhat of a correlation with the target, I tried just baths. This scored at .66, so that was interesting that just the baths were fairly predictive of property price.

Adding in the zip codes, surprisingly, didn't change anything (I'm actually wondering if something's wrong because in the correlation eda, zip code was negatively correlated with the target by -0.29 so it's really strange that the score isn't changing at all..I think I need to do a dummy variables treatment on zip codes.)

Using the categorical values improved the model a small amount from .75 to .78 on the regular regression and to .78 with the best estimator model.

#### How strong do I think my model is?

Pretty strong. See above for scores.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
%matplotlib inline

In [2]:
df = pd.read_csv("housing.csv")
df.head()

Unnamed: 0,address,bed_bath,more_info,price,status,zip_code
0,"360 E Randolph St # 601-602, Chicago, IL","3 bds · 4 ba · 2,700 sqft",http://www.zillow.com/homedetails/360-E-Randol...,"$1,299,000",Condo For Sale,60601
1,"8 E Randolph St UNIT 1006, Chicago, IL",1 bd · 1 ba · 850 sqft,http://www.zillow.com/homedetails/8-E-Randolph...,"$324,900",Condo For Sale,60601
2,,,,,,60601
3,"340 E Randolph St APT 704, Chicago, IL","2 bds · 3 ba · 1,902 sqft",http://www.zillow.com/homedetails/340-E-Randol...,"$1,099,000",Condo For Sale,60601
4,"420 E Waterside Dr UNIT 310, Chicago, IL","2 bds · 3 ba · 1,500 sqft",http://www.zillow.com/homedetails/420-E-Waters...,"$567,770",Condo For Sale,60601


### Data Cleaning

In [3]:
#drops rows with NaN values
df = df.dropna()
#gets rid of $ and ,, in the prices
df["price"] = df["price"].str.replace("$", "")
df["price"] = df["price"].str.replace(",", "")
df["price"] = df["price"].str.replace("M", "000000")
df["price"] = df["price"].str.replace("K", "000")
df["price"] = df["price"].str.replace("+", "")
#change values to floats from strings
df["price"] = df["price"].apply(float)
#drop the 134 rows that have a funky format in bed_bath
#(found using the following code: df.loc[df["bed_bath"].str.contains("lot") == True].shape)
df = df.loc[df["bed_bath"].str.contains("lot") == False]

In [4]:
#turn "--" into 0 in bed_bath and turn "Studio" into "0 bds"
df["bed_bath"] = df["bed_bath"].str.replace("--", "0")
df["bed_bath"] = df["bed_bath"].str.replace("Studio", "0 bds")

In [5]:
#assigning the "bed_bath" splits to a new df (that I can join to the original df later)
df_bbsplit = pd.DataFrame(df["bed_bath"].str.split(" · ").tolist(), columns = ["beds", "baths", "sqft"])

In [6]:
#clean the data in the new df columns to change to ints
df_bbsplit["sqft"] = df_bbsplit["sqft"].str.replace(",",'')
df_bbsplit["sqft"] = df_bbsplit["sqft"].str.replace("sqft","")
df_bbsplit["sqft"] = df_bbsplit["sqft"].str.replace("+","")
df_bbsplit["beds"] = df_bbsplit["beds"].str.replace(" bds","")
df_bbsplit["beds"] = df_bbsplit["beds"].str.replace(" bd","")
df_bbsplit["baths"] = df_bbsplit["baths"].str.replace(" ba","")

In [7]:
#convert to ints
df_bbsplit["sqft"] = pd.to_numeric(df_bbsplit["sqft"])
df_bbsplit["beds"] = pd.to_numeric(df_bbsplit["beds"])
df_bbsplit["baths"] = pd.to_numeric(df_bbsplit["baths"])

In [8]:
df_bbsplit.tail()

Unnamed: 0,beds,baths,sqft
4787,4,3.0,1775
4788,4,2.0,910
4789,8,3.0,0
4790,3,1.0,1191
4791,4,2.0,1181


In [9]:
#get dummies for status using the built-in n -1 thing.
df_status_dummies = pd.get_dummies(df["status"], drop_first=True)

In [10]:
#fixes index problem with status dummies
#index was using the numbers from the original dataset before dropped rows
df_status_dummies.index = range(len(df_status_dummies))

In [11]:
#create operationalized target-- >= to median (273500.0) is high, < is low
df["is_high"] = df["price"].apply(lambda x: 1 if x >=273500.0 else 0)

In [12]:
#create target
target = df["is_high"]

In [13]:
#create version of original df with only relevant columns to join others to
df_zip = df[["zip_code"]]

In [14]:
#fix index to conform with bbsplit
df_zip.index = range(len(df_zip))

In [15]:
#join together first two dfs
housing_data_intm = pd.merge(df_zip, df_bbsplit, how="inner", left_index=True, right_index=True)

In [16]:
housing_data_intm.shape

(4792, 4)

In [17]:
housing_data = pd.merge(housing_data_intm, df_status_dummies, how="inner", left_index=True, right_index=True)

In [18]:
housing_data.tail()

Unnamed: 0,zip_code,beds,baths,sqft,Auction,Co-op For Sale,Coming Soon,Condo For Sale,For Sale by Owner,Foreclosure,House For Sale,Lot/Land For Sale,Make Me Move®,New Construction,Townhouse For Sale
4787,60827,4,3.0,1775,0,0,0,0,0,0,1,0,0,0,0
4788,60827,4,2.0,910,0,0,0,0,0,0,1,0,0,0,0
4789,60827,8,3.0,0,0,0,0,0,0,0,0,0,0,0,0
4790,60827,3,1.0,1191,0,0,0,0,0,0,1,0,0,0,0
4791,60827,4,2.0,1181,0,0,0,0,0,0,1,0,0,0,0


In [19]:
housing_data.shape

(4792, 15)

### Models

In [20]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
from sklearn.metrics import classification_report
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import confusion_matrix

In [21]:
#first model with just the bed/bath/sqft values as a matrix
log_reg = LogisticRegression()

log_reg.fit(df_bbsplit, target)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [22]:
log_reg.score(df_bbsplit, target)

0.75772120200333892

In [23]:
#trying with the intermediate merge of zip and bb_split
log_reg_2 = LogisticRegression()
log_reg_2.fit(housing_data_intm, target)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [24]:
log_reg_2.score(housing_data_intm, target)

0.75813856427378967

In [25]:
#maybe try one just with beds only...hmmmmm
log_reg_beds = LogisticRegression()
log_reg_beds.fit(df_bbsplit["beds"].values.reshape(-1, 1), target)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [26]:
log_reg_beds.score(df_bbsplit["beds"].values.reshape(-1, 1), target)

0.53547579298831383

In [27]:
#baths had a high correlation with the target, how does that work on its own?
log_reg_baths = LogisticRegression()
log_reg_baths.fit(df_bbsplit["baths"].values.reshape(-1, 1), target)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [28]:
log_reg_baths.score(df_bbsplit["baths"].values.reshape(-1, 1), target)

0.66402337228714525

In [29]:
#try a model with everything: beds, baths, sqft, zip code, categories
log_reg_3 = LogisticRegression()
log_reg_3.fit(housing_data, target)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [30]:
log_reg_3.score(housing_data, target)

0.77900667779632726

### Additional Crossvalidation, Testing, Scoring

##### Model 1: log_reg (uses beds, baths, sqft)

In [31]:
cross_val_score(log_reg, df_bbsplit, target)

array([ 0.79724656,  0.75516594,  0.7100814 ])

In [32]:
hyperparameters = {'penalty': ['l1', 'l2'],
                   'C': [0.1, 1.0, 10]}

grid_search = GridSearchCV(log_reg, hyperparameters, verbose=10)

grid_search.fit(df_bbsplit, target)

grid_search.best_estimator_

Fitting 3 folds for each of 6 candidates, totalling 18 fits
[CV] C=0.1, penalty=l1 ...............................................
[CV] ................ C=0.1, penalty=l1, score=0.796621, total=   0.0s
[CV] C=0.1, penalty=l1 ...............................................
[CV] ................ C=0.1, penalty=l1, score=0.753287, total=   0.0s
[CV] C=0.1, penalty=l1 ...............................................
[CV] ................ C=0.1, penalty=l1, score=0.710708, total=   0.0s
[CV] C=0.1, penalty=l2 ...............................................
[CV] ................ C=0.1, penalty=l2, score=0.799124, total=   0.0s
[CV] C=0.1, penalty=l2 ...............................................
[CV] ................ C=0.1, penalty=l2, score=0.755792, total=   0.0s
[CV] C=0.1, penalty=l2 ...............................................
[CV] ................ C=0.1, penalty=l2, score=0.712586, total=   0.0s
[CV] C=1.0, penalty=l1 ...............................................
[CV] ............

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   7 out of   7 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   8 out of   8 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:    0.2s remaining:    0.0s


[CV] ................ C=1.0, penalty=l2, score=0.755166, total=   0.0s
[CV] C=1.0, penalty=l2 ...............................................
[CV] ................ C=1.0, penalty=l2, score=0.710081, total=   0.0s
[CV] C=10, penalty=l1 ................................................
[CV] ................. C=10, penalty=l1, score=0.795995, total=   0.0s
[CV] C=10, penalty=l1 ................................................
[CV] ................. C=10, penalty=l1, score=0.752035, total=   0.0s
[CV] C=10, penalty=l1 ................................................
[CV] ................. C=10, penalty=l1, score=0.708203, total=   0.0s
[CV] C=10, penalty=l2 ................................................
[CV] ................. C=10, penalty=l2, score=0.797247, total=   0.0s
[CV] C=10, penalty=l2 ................................................
[CV] ................. C=10, penalty=l2, score=0.755166, total=   0.0s
[CV] C=10, penalty=l2 ................................................
[CV] .

[Parallel(n_jobs=1)]: Done  18 out of  18 | elapsed:    0.3s finished


LogisticRegression(C=0.1, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [33]:
grid_search.best_estimator_.fit(df_bbsplit, target)
grid_search.best_estimator_.score(df_bbsplit, target)

0.76168614357262099

##### Model 2: log_reg_2 (beds, baths, sqft, zip code)

In [34]:
cross_val_score(log_reg, housing_data_intm, target)

array([ 0.79599499,  0.74452098,  0.70757671])

In [35]:
hyperparameters = {'penalty': ['l1', 'l2'],
                   'C': [0.1, 1.0, 10]}

grid_search = GridSearchCV(log_reg, hyperparameters, verbose=10)

grid_search.fit(housing_data_intm, target)

grid_search.best_estimator_

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.1s remaining:    0.0s


Fitting 3 folds for each of 6 candidates, totalling 18 fits
[CV] C=0.1, penalty=l1 ...............................................
[CV] ................ C=0.1, penalty=l1, score=0.796621, total=   0.0s
[CV] C=0.1, penalty=l1 ...............................................
[CV] ................ C=0.1, penalty=l1, score=0.752661, total=   0.0s
[CV] C=0.1, penalty=l1 ...............................................
[CV] ................ C=0.1, penalty=l1, score=0.710081, total=   0.0s
[CV] C=0.1, penalty=l2 ...............................................
[CV] ................ C=0.1, penalty=l2, score=0.796621, total=   0.0s
[CV] C=0.1, penalty=l2 ...............................................
[CV] ................ C=0.1, penalty=l2, score=0.741390, total=   0.0s
[CV] C=0.1, penalty=l2 ...............................................
[CV] ................ C=0.1, penalty=l2, score=0.709455, total=   0.0s
[CV] C=1.0, penalty=l1 ...............................................
[CV] ............

[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   7 out of   7 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   8 out of   8 | elapsed:    0.2s remaining:    0.0s


[CV] ................ C=1.0, penalty=l1, score=0.753287, total=   0.1s
[CV] C=1.0, penalty=l1 ...............................................
[CV] ................ C=1.0, penalty=l1, score=0.708203, total=   0.0s
[CV] C=1.0, penalty=l2 ...............................................
[CV] ................ C=1.0, penalty=l2, score=0.795995, total=   0.0s
[CV] C=1.0, penalty=l2 ...............................................
[CV] ................ C=1.0, penalty=l2, score=0.744521, total=   0.0s
[CV] C=1.0, penalty=l2 ...............................................
[CV] ................ C=1.0, penalty=l2, score=0.707577, total=   0.0s
[CV] C=10, penalty=l1 ................................................
[CV] ................. C=10, penalty=l1, score=0.795369, total=   0.1s
[CV] C=10, penalty=l1 ................................................


[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:    0.3s remaining:    0.0s


[CV] ................. C=10, penalty=l1, score=0.752035, total=   0.1s
[CV] C=10, penalty=l1 ................................................
[CV] ................. C=10, penalty=l1, score=0.708203, total=   0.0s
[CV] C=10, penalty=l2 ................................................
[CV] ................. C=10, penalty=l2, score=0.795995, total=   0.0s
[CV] C=10, penalty=l2 ................................................
[CV] ................. C=10, penalty=l2, score=0.744521, total=   0.0s
[CV] C=10, penalty=l2 ................................................
[CV] ................. C=10, penalty=l2, score=0.707577, total=   0.0s


[Parallel(n_jobs=1)]: Done  18 out of  18 | elapsed:    0.6s finished


LogisticRegression(C=0.1, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l1', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [36]:
grid_search.best_estimator_.fit(housing_data_intm, target)
grid_search.best_estimator_.score(housing_data_intm, target)

0.75939065108514192

##### Model 3: log_reg_3 (beds, baths, sqft, zip code, and categorical variables)

In [37]:
cross_val_score(log_reg, housing_data, target)

array([ 0.80851064,  0.76393237,  0.72197871])

In [38]:
hyperparameters = {'penalty': ['l1', 'l2'],
                   'C': [0.1, 1.0, 10]}

grid_search = GridSearchCV(log_reg, hyperparameters, verbose=10)

grid_search.fit(housing_data, target)

grid_search.best_estimator_

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s


Fitting 3 folds for each of 6 candidates, totalling 18 fits
[CV] C=0.1, penalty=l1 ...............................................
[CV] ................ C=0.1, penalty=l1, score=0.812891, total=   0.0s
[CV] C=0.1, penalty=l1 ...............................................
[CV] ................ C=0.1, penalty=l1, score=0.780839, total=   0.0s
[CV] C=0.1, penalty=l1 ...............................................
[CV] ................ C=0.1, penalty=l1, score=0.723231, total=   0.0s
[CV] C=0.1, penalty=l2 ...............................................
[CV] ................ C=0.1, penalty=l2, score=0.808511, total=   0.0s
[CV] C=0.1, penalty=l2 ...............................................
[CV] ................ C=0.1, penalty=l2, score=0.764559, total=   0.0s
[CV] C=0.1, penalty=l2 ...............................................
[CV] ................ C=0.1, penalty=l2, score=0.718848, total=   0.0s
[CV] C=1.0, penalty=l1 ...............................................


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   7 out of   7 | elapsed:    0.2s remaining:    0.0s


[CV] ................ C=1.0, penalty=l1, score=0.821652, total=   0.1s
[CV] C=1.0, penalty=l1 ...............................................
[CV] ................ C=1.0, penalty=l1, score=0.785848, total=   0.1s
[CV] C=1.0, penalty=l1 ...............................................
[CV] ................ C=1.0, penalty=l1, score=0.726362, total=   0.1s
[CV] C=1.0, penalty=l2 ...............................................
[CV] ................ C=1.0, penalty=l2, score=0.808511, total=   0.0s
[CV] C=1.0, penalty=l2 ...............................................
[CV] ................ C=1.0, penalty=l2, score=0.763932, total=   0.0s
[CV] C=1.0, penalty=l2 ...............................................


[Parallel(n_jobs=1)]: Done   8 out of   8 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:    0.4s remaining:    0.0s


[CV] ................ C=1.0, penalty=l2, score=0.721979, total=   0.0s
[CV] C=10, penalty=l1 ................................................
[CV] ................. C=10, penalty=l1, score=0.821652, total=   0.1s
[CV] C=10, penalty=l1 ................................................
[CV] ................. C=10, penalty=l1, score=0.785848, total=   0.0s
[CV] C=10, penalty=l1 ................................................
[CV] ................. C=10, penalty=l1, score=0.733876, total=   0.0s
[CV] C=10, penalty=l2 ................................................
[CV] ................. C=10, penalty=l2, score=0.808511, total=   0.0s
[CV] C=10, penalty=l2 ................................................
[CV] ................. C=10, penalty=l2, score=0.763932, total=   0.0s
[CV] C=10, penalty=l2 ................................................
[CV] ................. C=10, penalty=l2, score=0.703193, total=   0.0s


[Parallel(n_jobs=1)]: Done  18 out of  18 | elapsed:    0.7s finished


LogisticRegression(C=10, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l1', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [39]:
grid_search.best_estimator_.fit(housing_data, target)
grid_search.best_estimator_.score(housing_data, target)

0.78171953255425708