<a href="https://colab.research.google.com/github/KennethTBarrett/DS-Unit-2-Linear-Models/blob/master/Copy_of_LS_DS_213_assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Lambda School Data Science

*Unit 2, Sprint 1, Module 3*

---

# Ridge Regression

## Assignment

We're going back to our other **New York City** real estate dataset. Instead of predicting apartment rents, you'll predict property sales prices.

But not just for condos in Tribeca...

- [x] Use a subset of the data where `BUILDING_CLASS_CATEGORY` == `'01 ONE FAMILY DWELLINGS'` and the sale price was more than 100 thousand and less than 2 million.
- [x] Do train/test split. Use data from January — March 2019 to train. Use data from April 2019 to test.
- [x] Do one-hot encoding of categorical features.
- [x] Do feature selection with `SelectKBest`.
- [x] Fit a ridge regression model with multiple features. Use the `normalize=True` parameter (or do [feature scaling](https://scikit-learn.org/stable/modules/preprocessing.html) beforehand — use the scaler's `fit_transform` method with the train set, and the scaler's `transform` method with the test set)
- [x] Get mean absolute error for the test set.
- [x] As always, commit your notebook to your fork of the GitHub repo.

The [NYC Department of Finance](https://www1.nyc.gov/site/finance/taxes/property-rolling-sales-data.page) has a glossary of property sales terms and NYC Building Class Code Descriptions. The data comes from the [NYC OpenData](https://data.cityofnewyork.us/browse?q=NYC%20calendar%20sales) portal.


## Stretch Goals

Don't worry, you aren't expected to do all these stretch goals! These are just ideas to consider and choose from.

- [ ] Add your own stretch goal(s) !
- [ ] Instead of `Ridge`, try `LinearRegression`. Depending on how many features you select, your errors will probably blow up! 💥
- [ ] Instead of `Ridge`, try [`RidgeCV`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html).
- [ ] Learn more about feature selection:
    - ["Permutation importance"](https://www.kaggle.com/dansbecker/permutation-importance)
    - [scikit-learn's User Guide for Feature Selection](https://scikit-learn.org/stable/modules/feature_selection.html)
    - [mlxtend](http://rasbt.github.io/mlxtend/) library
    - scikit-learn-contrib libraries: [boruta_py](https://github.com/scikit-learn-contrib/boruta_py) & [stability-selection](https://github.com/scikit-learn-contrib/stability-selection)
    - [_Feature Engineering and Selection_](http://www.feat.engineering/) by Kuhn & Johnson.
- [ ] Try [statsmodels](https://www.statsmodels.org/stable/index.html) if you’re interested in more inferential statistical approach to linear regression and feature selection, looking at p values and 95% confidence intervals for the coefficients.
- [ ] Read [_An Introduction to Statistical Learning_](http://faculty.marshall.usc.edu/gareth-james/ISL/ISLR%20Seventh%20Printing.pdf), Chapters 1-3, for more math & theory, but in an accessible, readable way.
- [ ] Try [scikit-learn pipelines](https://scikit-learn.org/stable/modules/compose.html).

# Setup Code

## Lambda's Setup Code

In [0]:
%%capture
import sys

# If you're on Colab:
if 'google.colab' in sys.modules:
    DATA_PATH = 'https://raw.githubusercontent.com/LambdaSchool/DS-Unit-2-Applied-Modeling/master/data/'
    !pip install category_encoders==2.*

# If you're working locally:
else:
    DATA_PATH = '../data/'
    
# Ignore this Numpy warning when using Plotly Express:
# FutureWarning: Method .ptp is deprecated and will be removed in a future version. Use numpy.ptp instead.
import warnings
warnings.filterwarnings(action='ignore', category=FutureWarning, module='numpy')

In [0]:
import pandas as pd
import pandas_profiling

# Read New York City property sales data
df = pd.read_csv(DATA_PATH+'condos/NYC_Citywide_Rolling_Calendar_Sales.csv')

# Change column names: replace spaces with underscores
df.columns = [col.replace(' ', '_') for col in df]

# SALE_PRICE was read as strings.
# Remove symbols, convert to integer
df['SALE_PRICE'] = (
    df['SALE_PRICE']
    .str.replace('$','')
    .str.replace('-','')
    .str.replace(',','')
    .astype(int)
)

In [0]:
# BOROUGH is a numeric column, but arguably should be a categorical feature,
# so convert it from a number to a string
# df['BOROUGH'] = df['BOROUGH'].astype(str)

In [0]:
# Reduce cardinality for NEIGHBORHOOD feature

# Get a list of the top 10 neighborhoods
top10 = df['NEIGHBORHOOD'].value_counts()[:10].index

# At locations where the neighborhood is NOT in the top 10, 
# replace the neighborhood with 'OTHER'
df.loc[~df['NEIGHBORHOOD'].isin(top10), 'NEIGHBORHOOD'] = 'OTHER'

## My Setup Code

In [9]:
# Let's see what we're working with!
df


Unnamed: 0,BOROUGH,NEIGHBORHOOD,BUILDING_CLASS_CATEGORY,TAX_CLASS_AT_PRESENT,BLOCK,LOT,EASE-MENT,BUILDING_CLASS_AT_PRESENT,ADDRESS,APARTMENT_NUMBER,ZIP_CODE,RESIDENTIAL_UNITS,COMMERCIAL_UNITS,TOTAL_UNITS,LAND_SQUARE_FEET,GROSS_SQUARE_FEET,YEAR_BUILT,TAX_CLASS_AT_TIME_OF_SALE,BUILDING_CLASS_AT_TIME_OF_SALE,SALE_PRICE,SALE_DATE
0,1,OTHER,13 CONDOS - ELEVATOR APARTMENTS,2,716,1246,,R4,"447 WEST 18TH STREET, PH12A",PH12A,10011.0,1.0,0.0,1.0,10733,1979.0,2007.0,2,R4,0,01/01/2019
1,1,OTHER,21 OFFICE BUILDINGS,4,812,68,,O5,144 WEST 37TH STREET,,10018.0,0.0,6.0,6.0,2962,15435.0,1920.0,4,O5,0,01/01/2019
2,1,OTHER,21 OFFICE BUILDINGS,4,839,69,,O5,40 WEST 38TH STREET,,10018.0,0.0,7.0,7.0,2074,11332.0,1930.0,4,O5,0,01/01/2019
3,1,OTHER,13 CONDOS - ELEVATOR APARTMENTS,2,592,1041,,R4,"1 SHERIDAN SQUARE, 8C",8C,10014.0,1.0,0.0,1.0,0,500.0,0.0,2,R4,0,01/01/2019
4,1,UPPER EAST SIDE (59-79),15 CONDOS - 2-10 UNIT RESIDENTIAL,2C,1379,1402,,R1,"20 EAST 65TH STREET, B",B,10065.0,1.0,0.0,1.0,0,6406.0,0.0,2,R1,0,01/01/2019
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23035,4,OTHER,01 ONE FAMILY DWELLINGS,1,10965,276,,A5,111-17 FRANCIS LEWIS BLVD,,11429.0,1.0,0.0,1.0,1800,1224.0,1945.0,1,A5,510000,04/30/2019
23036,4,OTHER,09 COOPS - WALKUP APARTMENTS,2,169,29,,C6,"45-14 43RD STREET, 3C",,11104.0,0.0,0.0,0.0,0,0.0,1929.0,2,C6,355000,04/30/2019
23037,4,OTHER,10 COOPS - ELEVATOR APARTMENTS,2,131,4,,D4,"50-05 43RD AVENUE, 3M",,11377.0,0.0,0.0,0.0,0,0.0,1932.0,2,D4,375000,04/30/2019
23038,4,OTHER,02 TWO FAMILY DWELLINGS,1,8932,18,,S2,91-10 JAMAICA AVE,,11421.0,2.0,1.0,3.0,2078,2200.0,1931.0,1,S2,1100000,04/30/2019


In [10]:
print(df['EASE-MENT'].isna().sum()) # All of this column is NaN
print(df['APARTMENT_NUMBER'].isna().sum()) # All except one in this column are NaN
print(df['NEIGHBORHOOD'].unique())

23040
17839
['OTHER' 'UPPER EAST SIDE (59-79)' 'UPPER EAST SIDE (79-96)'
 'BOROUGH PARK' 'ASTORIA' 'FOREST HILLS' 'UPPER WEST SIDE (59-79)'
 'BEDFORD STUYVESANT' 'EAST NEW YORK' 'FLUSHING-NORTH' 'GRAMERCY']


In [0]:
# Drop NaN columns
df.drop(['EASE-MENT', 'APARTMENT_NUMBER'], axis = 1, inplace = True)

In [12]:
df.T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,...,23000,23001,23002,23003,23004,23005,23006,23007,23008,23009,23010,23011,23012,23013,23014,23015,23016,23017,23018,23019,23020,23021,23022,23023,23024,23025,23026,23027,23028,23029,23030,23031,23032,23033,23034,23035,23036,23037,23038,23039
BOROUGH,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,...,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4
NEIGHBORHOOD,OTHER,OTHER,OTHER,OTHER,UPPER EAST SIDE (59-79),UPPER EAST SIDE (79-96),OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,BOROUGH PARK,OTHER,OTHER,OTHER,OTHER,...,FLUSHING-NORTH,FLUSHING-NORTH,FLUSHING-NORTH,FLUSHING-NORTH,OTHER,FOREST HILLS,FOREST HILLS,FOREST HILLS,FOREST HILLS,FOREST HILLS,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER,OTHER
BUILDING_CLASS_CATEGORY,13 CONDOS - ELEVATOR APARTMENTS,21 OFFICE BUILDINGS,21 OFFICE BUILDINGS,13 CONDOS - ELEVATOR APARTMENTS,15 CONDOS - 2-10 UNIT RESIDENTIAL,07 RENTALS - WALKUP APARTMENTS,07 RENTALS - WALKUP APARTMENTS,01 ONE FAMILY DWELLINGS,01 ONE FAMILY DWELLINGS,01 ONE FAMILY DWELLINGS,02 TWO FAMILY DWELLINGS,05 TAX CLASS 1 VACANT LAND,29 COMMERCIAL GARAGES,29 COMMERCIAL GARAGES,29 COMMERCIAL GARAGES,29 COMMERCIAL GARAGES,29 COMMERCIAL GARAGES,32 HOSPITAL AND HEALTH FACILITIES,33 EDUCATIONAL FACILITIES,08 RENTALS - ELEVATOR APARTMENTS,21 OFFICE BUILDINGS,29 COMMERCIAL GARAGES,31 COMMERCIAL VACANT LAND,32 HOSPITAL AND HEALTH FACILITIES,32 HOSPITAL AND HEALTH FACILITIES,32 HOSPITAL AND HEALTH FACILITIES,33 EDUCATIONAL FACILITIES,41 TAX CLASS 4 - OTHER,14 RENTALS - 4-10 UNIT,29 COMMERCIAL GARAGES,31 COMMERCIAL VACANT LAND,31 COMMERCIAL VACANT LAND,03 THREE FAMILY DWELLINGS,02 TWO FAMILY DWELLINGS,02 TWO FAMILY DWELLINGS,07 RENTALS - WALKUP APARTMENTS,22 STORE BUILDINGS,22 STORE BUILDINGS,22 STORE BUILDINGS,22 STORE BUILDINGS,...,01 ONE FAMILY DWELLINGS,10 COOPS - ELEVATOR APARTMENTS,10 COOPS - ELEVATOR APARTMENTS,10 COOPS - ELEVATOR APARTMENTS,01 ONE FAMILY DWELLINGS,01 ONE FAMILY DWELLINGS,10 COOPS - ELEVATOR APARTMENTS,10 COOPS - ELEVATOR APARTMENTS,10 COOPS - ELEVATOR APARTMENTS,10 COOPS - ELEVATOR APARTMENTS,02 TWO FAMILY DWELLINGS,10 COOPS - ELEVATOR APARTMENTS,10 COOPS - ELEVATOR APARTMENTS,02 TWO FAMILY DWELLINGS,10 COOPS - ELEVATOR APARTMENTS,01 ONE FAMILY DWELLINGS,10 COOPS - ELEVATOR APARTMENTS,10 COOPS - ELEVATOR APARTMENTS,01 ONE FAMILY DWELLINGS,10 COOPS - ELEVATOR APARTMENTS,17 CONDO COOPS,01 ONE FAMILY DWELLINGS,01 ONE FAMILY DWELLINGS,13 CONDOS - ELEVATOR APARTMENTS,01 ONE FAMILY DWELLINGS,01 ONE FAMILY DWELLINGS,01 ONE FAMILY DWELLINGS,01 ONE FAMILY DWELLINGS,02 TWO FAMILY DWELLINGS,01 ONE FAMILY DWELLINGS,02 TWO FAMILY DWELLINGS,01 ONE FAMILY DWELLINGS,01 ONE FAMILY DWELLINGS,01 ONE FAMILY DWELLINGS,02 TWO FAMILY DWELLINGS,01 ONE FAMILY DWELLINGS,09 COOPS - WALKUP APARTMENTS,10 COOPS - ELEVATOR APARTMENTS,02 TWO FAMILY DWELLINGS,12 CONDOS - WALKUP APARTMENTS
TAX_CLASS_AT_PRESENT,2,4,4,2,2C,2B,2B,1,1,1,1,1B,4,4,4,4,4,4,4,2,4,4,4,4,4,4,4,4,2A,4,4,4,1,1,1,2A,4,4,4,4,...,1,2,2,2,1,1,2,2,2,2,1,2,2,1,2,1,2,2,1,2,2,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,2,2,1,2
BLOCK,716,812,839,592,1379,1551,1891,4090,4120,4120,4120,4090,4120,4120,4120,4120,4120,4117,4090,4222,4203,4209,4203,4205,4205,4205,4205,4205,4166,4226,4226,4226,4820,5999,5999,5639,8001,8001,8001,8001,...,5410,4374,5049,5122,6927,3234,2127,2148,2250,2270,3834,3880,3907,10551,10538,13979,11446,1440,10026,3322,3360,13186,8125,16,7623,10745,3104,9412,9430,13215,10162,11612,11808,12295,12536,10965,169,131,8932,1216
LOT,1246,68,69,1041,1402,131,159,37,18,20,19,17,7,8,12,16,17,1,19,84,82,1,81,3,30,55,2,40,4,419,420,422,16,5,22,30,1,4,6,8,...,40,49,19,29,5,103,18,1,1,20,138,97,960,25,10,15,1,1,33,135,1005,54,78,1227,38,4,67,48,18,3,52,73,50,23,38,276,29,4,18,1161
BUILDING_CLASS_AT_PRESENT,R4,O5,O5,R4,R1,C4,C4,A1,A5,A5,B1,V0,G7,G7,G7,G7,G7,I9,W6,D3,O7,G1,V1,I1,I1,I1,W9,Z9,S3,G1,V6,V6,C0,S2,B1,C2,K1,K1,K1,K1,...,A1,D4,D4,D4,A5,A5,D4,D4,D4,D4,B2,D4,D4,B3,D4,A1,D4,D4,A1,D4,R9,A1,A0,R4,A5,A2,A5,S1,B3,A2,B1,A1,A0,A1,B3,A5,C6,D4,S2,R2
ADDRESS,"447 WEST 18TH STREET, PH12A",144 WEST 37TH STREET,40 WEST 38TH STREET,"1 SHERIDAN SQUARE, 8C","20 EAST 65TH STREET, B",354 EAST 89TH STREET,304 WEST 106 STREET,1193 SACKET AVENUE,1215 VAN NEST AVENUE,1211 VAN NEST AVENUE,1213 VAN NEST AVENUE,1190 PIERCE AVENUE,1216 MORRIS PARK AVENUE,1228 MORRIS PARK AVENUE,N/A NEWPORT AVENUE,1219 VAN NEST AVENUE,1217 VAN NEST AVENUE,1250 MORRIS PARK,1196 PIERCE AVENUE,1579 RHINELANDER AVENUE,1201 MORRIS PARK AVENUE,1864 EASTCHESTER ROAD,N/A MORRIS PARK AVENUE,1301 MORRIS PARK AVENUE,1225 MORRIS PARK AVENUE,1410 PELHAM PARKWAY SOUTH,1925 EASTCHESTER ROAD,2025 EASTCHESTER ROAD,2949 MIDDLETOWN ROAD,1842 EASTCHESTER ROAD,1848 EASTCHESTER ROAD,1850 EASTCHESTER ROAD,3969 CARPENTER AVE,8117 FIFTH AVENUE,544 81ST STREET,1043 50TH STREET,8023 FLATLANDS AVENUE,8015 FLATLANDS AVENUE,8009 FLATLANDS AVENUE,8001 FLATLANDS AVENUE,...,43-60 MURRAY STREET,"31-31 138TH STREET, 1L","143-36 BARCLAY AVENUE, 2G","134-54 MAPLE AVENUE, 5K",173-11 69TH AVENUE,67-115 BURNS STREET,"102-18 64TH AVENUE, 6M","105-24 63RD DRIVE, 5R","72-35 112TH STREET, 3D","112-50 78TH AVENUE, 4G",78-60 84TH STREET,"9050 UNION TURNPIKE, 3J","83-05 98TH STREET, 3F",210-02 93RD AVENUE,"87-10 204TH STREET, B66",158-20 81ST STREET,"84-29 155TH AVENUE, LJ","33-24 93RD STREET, 3N",95-31 WALTHAM STREET,"118-11 84TH AVENUE, 601","125-10 QUEENS BLVD, 1211",139-34 231ST STREET,4010 LITTLE NECK PARKWAY,"2-26 50TH AVENUE, 5G",67-57 211TH STREET,222-01 93 AVE,64-58 AUSTIN,110-09 101ST AVENUE,101-30 113TH STREET,244-15 135 AVENUE,104-59 164TH STREET,10919 132ND STREET,135-24 122ND STREET,134-34 157TH STREET,130-26 176 PLACE,111-17 FRANCIS LEWIS BLVD,"45-14 43RD STREET, 3C","50-05 43RD AVENUE, 3M",91-10 JAMAICA AVE,"61-05 39TH AVENUE, F5"
ZIP_CODE,10011,10018,10018,10014,10065,10128,10025,10461,10461,10461,10461,10461,10461,10461,0,10461,10461,10461,10461,10461,10461,10461,0,10461,10461,10461,10461,10461,10461,10461,10461,10461,10466,11209,11209,11219,11236,11236,11236,11236,...,11355,11354,11355,11355,11365,11375,11375,11375,11375,11375,11385,11385,11421,11428,11423,11414,11414,11372,11435,11415,11415,11413,11363,11101,11364,11428,11374,11419,11419,11422,11433,11420,11420,11434,11434,11429,11104,11377,11421,11377
RESIDENTIAL_UNITS,1,0,0,1,1,10,10,1,1,1,2,0,0,0,0,0,0,0,0,130,0,0,0,0,0,0,0,0,3,0,0,0,3,2,2,6,0,0,0,0,...,1,0,0,0,1,1,0,0,0,0,2,0,0,2,0,1,0,0,1,0,0,1,1,1,1,1,1,1,2,1,2,1,1,1,2,1,0,0,2,1


In [0]:
# Define variables with high cardinality...

high_cardinality = ['ADDRESS', 'BLOCK', 'LOT', 'BUILDING_CLASS_AT_TIME_OF_SALE',
                    'ZIP_CODE', 'BUILDING_CLASS_AT_PRESENT', 'SALE_DATE', 'LAND_SQUARE_FEET']

In [0]:
# Drop columns with no uniqueness...

df.drop(['TAX_CLASS_AT_TIME_OF_SALE', 'TAX_CLASS_AT_PRESENT'], axis = 1, inplace = True)

# My Work

## Define Subset of Data

In [15]:
# We're only going to be using this subset, so I'm just going to go ahead and redefine df for simplicity to meet the specified parameters.
df = df[(df['BUILDING_CLASS_CATEGORY'] == '01 ONE FAMILY DWELLINGS') & (df['SALE_PRICE'] > 100000) & (df['SALE_PRICE'] < 2000000)]

# Now, since we're only dealing with one building class category, let's drop that column.

df.drop(['BUILDING_CLASS_CATEGORY'], axis = 1, inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


## Train / Test Split

I decided to use Regex to do the train/test split.

In [16]:
# First, the train data...
train = df[df['SALE_DATE'].str.contains('^01/.*/2019$') |
           df['SALE_DATE'].str.contains('^02/.*/2019$') |
           df['SALE_DATE'].str.contains('^03/.*/2019$')]

# ... and the test data!
test = df[df['SALE_DATE'].str.contains('^04/.*/2019$')]

test.head()

Unnamed: 0,BOROUGH,NEIGHBORHOOD,BLOCK,LOT,BUILDING_CLASS_AT_PRESENT,ADDRESS,ZIP_CODE,RESIDENTIAL_UNITS,COMMERCIAL_UNITS,TOTAL_UNITS,LAND_SQUARE_FEET,GROSS_SQUARE_FEET,YEAR_BUILT,BUILDING_CLASS_AT_TIME_OF_SALE,SALE_PRICE,SALE_DATE
18235,2,OTHER,5913,878,A1,4616 INDEPENDENCE AVENUE,10471.0,1.0,0.0,1.0,5000,2272.0,1930.0,A1,895000,04/01/2019
18239,2,OTHER,5488,48,A2,558 ELLSWORTH AVENUE,10465.0,1.0,0.0,1.0,2500,720.0,1935.0,A2,253500,04/01/2019
18244,3,OTHER,5936,31,A1,16 BAY RIDGE PARKWAY,11209.0,1.0,0.0,1.0,2880,2210.0,1925.0,A1,1300000,04/01/2019
18280,3,OTHER,7813,24,A5,1247 EAST 40TH STREET,11210.0,1.0,0.0,1.0,1305,1520.0,1915.0,A5,789000,04/01/2019
18285,3,OTHER,8831,160,A9,2314 PLUMB 2ND STREET,11229.0,1.0,0.0,1.0,1800,840.0,1925.0,A9,525000,04/01/2019


## Use One-Hot Encoding on Categorical Variables

In [0]:
import category_encoders as ce

target = 'SALE_PRICE'
features = train.columns.drop([target] + high_cardinality)

X_train = train[features]
X_test = test[features]

encoder = ce.OneHotEncoder(use_cat_names = True)
X_train = encoder.fit_transform(X_train)
X_test = encoder.transform(X_test)

y_train = train[target]
y_test = test[target]

## Feature Selection


In [18]:
# Import
from sklearn.feature_selection import SelectKBest, f_regression


for k in range(1, len(X_train.columns) + 1):

  selector = SelectKBest(score_func = f_regression, k = k)

  X_train_selected = selector.fit_transform(X_train, y_train)
  X_test_selected = selector.transform(X_test)

# Features Selected
selected_mask = selector.get_support()
all_names = X_train.columns
selected_names = all_names[selected_mask]

print('Selected Features: \n')
for name in selected_names:
  print(name)
selected_names

Selected Features: 

BOROUGH
NEIGHBORHOOD_OTHER
NEIGHBORHOOD_FLUSHING-NORTH
NEIGHBORHOOD_EAST NEW YORK
NEIGHBORHOOD_BEDFORD STUYVESANT
NEIGHBORHOOD_FOREST HILLS
NEIGHBORHOOD_BOROUGH PARK
NEIGHBORHOOD_ASTORIA
RESIDENTIAL_UNITS
COMMERCIAL_UNITS
TOTAL_UNITS
GROSS_SQUARE_FEET
YEAR_BUILT


## Ridge Regression Model

In [0]:
# Imports
from IPython.display import display, HTML
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_absolute_error

In [40]:
for alpha in [0.000001, 0.00001, 0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0]:
  # Display which alpha
  display(HTML(f'Ridge Regression Where Alpha = {alpha}'))
  # Fit the Ridge Regression Model
  model = Ridge(alpha = alpha, normalize = True)
  model.fit(X_train_selected, y_train)

  # Predict
  y_pred = model.predict(X_test_selected)
  # Print MAE
  mae = mean_absolute_error(y_test, y_pred)
  display(HTML(f'Test Mean Absolute Error: ${mae:,.0f}'))
