<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>

# Neural Network Framework (Keras)

## *Data Science Unit 4 Sprint 2 Assignmnet 3*

## Use the Keras Library to build a Multi-Layer Perceptron Model on the Boston Housing dataset

- The Boston Housing dataset comes with the Keras library so use Keras to import it into your notebook. 
- Normalize the data (all features should have roughly the same scale)
- Import the type of model and layers that you will need from Keras.
- Instantiate a model object and use `model.add()` to add layers to your model
- Since this is a regression model you will have a single output node in the final layer.
- Use activation functions that are appropriate for this task
- Compile your model
- Fit your model and report its accuracy in terms of Mean Squared Error
- Use the history object that is returned from model.fit to make graphs of the model's loss or train/validation accuracies by epoch. 
- Run this same data through a linear regression model. Which achieves higher accuracy?
- Do a little bit of feature engineering and see how that affects your neural network model. (you will need to change your model to accept more inputs)
- After feature engineering, which model sees a greater accuracy boost due to the new features?

In [135]:
import pandas as pd
import category_encoders as ce
from sklearn.preprocessing import MinMaxScaler, Normalizer
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.impute import SimpleImputer

In [136]:
!pip install category_encoders

[33mYou are using pip version 10.0.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [137]:
# loading my dataset
url = 'https://raw.githubusercontent.com/VeraMendes/DS-Unit-4-Sprint-2-Neural-Networks/master/module3-Intro-to-Keras/amesHousePrice.csv'

dataset = pd.read_csv(url, header=0)
dataset.shape

(1460, 81)

In [138]:
# looking at my dataset
dataset.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000


In [139]:
dataset = dataset.drop(['Id', 'Alley'], axis=1)

In [140]:
dataset.head()

Unnamed: 0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,LandSlope,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,60,RL,65.0,8450,Pave,Reg,Lvl,AllPub,Inside,Gtl,...,0,,,,0,2,2008,WD,Normal,208500
1,20,RL,80.0,9600,Pave,Reg,Lvl,AllPub,FR2,Gtl,...,0,,,,0,5,2007,WD,Normal,181500
2,60,RL,68.0,11250,Pave,IR1,Lvl,AllPub,Inside,Gtl,...,0,,,,0,9,2008,WD,Normal,223500
3,70,RL,60.0,9550,Pave,IR1,Lvl,AllPub,Corner,Gtl,...,0,,,,0,2,2006,WD,Abnorml,140000
4,60,RL,84.0,14260,Pave,IR1,Lvl,AllPub,FR2,Gtl,...,0,,,,0,12,2008,WD,Normal,250000


In [141]:
# train_test split
train, test = train_test_split(dataset)

In [142]:
# separating features from initial df
features = list(dataset)[:-1]

In [143]:
# setting X_train and X_test df's 
X_train = train[features]
X_test = test[features]

In [144]:
dataset.dtypes

MSSubClass         int64
MSZoning          object
LotFrontage      float64
LotArea            int64
Street            object
LotShape          object
LandContour       object
Utilities         object
LotConfig         object
LandSlope         object
Neighborhood      object
Condition1        object
Condition2        object
BldgType          object
HouseStyle        object
OverallQual        int64
OverallCond        int64
YearBuilt          int64
YearRemodAdd       int64
RoofStyle         object
RoofMatl          object
Exterior1st       object
Exterior2nd       object
MasVnrType        object
MasVnrArea       float64
ExterQual         object
ExterCond         object
Foundation        object
BsmtQual          object
BsmtCond          object
                  ...   
BedroomAbvGr       int64
KitchenAbvGr       int64
KitchenQual       object
TotRmsAbvGrd       int64
Functional        object
Fireplaces         int64
FireplaceQu       object
GarageType        object
GarageYrBlt      float64


In [145]:
# encoding dataset columns
encoder = ce.OrdinalEncoder()
X_train_encoded = encoder.fit_transform(X_train)
X_test_encoded = encoder.transform(X_test)

In [146]:
X_train_clean = X_train_encoded.fillna(X_train_encoded.mean(axis=0))
X_test_clean = X_test_encoded.fillna(X_test_encoded.mean(axis=0))

In [162]:
X_train_clean.dtypes

MSSubClass         int64
MSZoning           int64
LotFrontage      float64
LotArea            int64
Street             int64
LotShape           int64
LandContour        int64
Utilities          int64
LotConfig          int64
LandSlope          int64
Neighborhood       int64
Condition1         int64
Condition2         int64
BldgType           int64
HouseStyle         int64
OverallQual        int64
OverallCond        int64
YearBuilt          int64
YearRemodAdd       int64
RoofStyle          int64
RoofMatl           int64
Exterior1st        int64
Exterior2nd        int64
MasVnrType         int64
MasVnrArea       float64
ExterQual          int64
ExterCond          int64
Foundation         int64
BsmtQual           int64
BsmtCond           int64
                  ...   
HalfBath           int64
BedroomAbvGr       int64
KitchenAbvGr       int64
KitchenQual        int64
TotRmsAbvGrd       int64
Functional         int64
Fireplaces         int64
FireplaceQu        int64
GarageType         int64


In [147]:
# normalize the dataset
normalizer = Normalizer()
X_train_norm = normalizer.fit_transform(X_train_clean)
X_test_norm = normalizer.transform(X_test_clean)

In [158]:
type(y_train)

numpy.ndarray

In [153]:
X_train_norm.shape

(1095, 78)

In [159]:
y_train[:5]

array([[ 91000],
       [109500],
       [ 58500],
       [197900],
       [179000]])

In [155]:
X_train_norm[:5]

array([[3.99501929e-02, 2.21945516e-04, 4.66085584e-03, 3.54225044e-01,
        2.21945516e-04, 2.21945516e-04, 2.21945516e-04, 2.21945516e-04,
        2.21945516e-04, 2.21945516e-04, 2.21945516e-04, 2.21945516e-04,
        2.21945516e-04, 2.21945516e-04, 2.21945516e-04, 8.87782065e-04,
        1.10972758e-03, 4.37898504e-01, 4.37898504e-01, 2.21945516e-04,
        2.21945516e-04, 2.21945516e-04, 2.21945516e-04, 2.21945516e-04,
        0.00000000e+00, 2.21945516e-04, 2.21945516e-04, 2.21945516e-04,
        2.21945516e-04, 2.21945516e-04, 2.21945516e-04, 2.21945516e-04,
        1.02538829e-01, 2.21945516e-04, 0.00000000e+00, 0.00000000e+00,
        1.02538829e-01, 2.21945516e-04, 2.21945516e-04, 2.21945516e-04,
        2.21945516e-04, 1.16743342e-01, 1.02538829e-01, 0.00000000e+00,
        2.19282170e-01, 2.21945516e-04, 0.00000000e+00, 2.21945516e-04,
        0.00000000e+00, 4.43891033e-04, 2.21945516e-04, 2.21945516e-04,
        1.10972758e-03, 2.21945516e-04, 0.00000000e+00, 2.219455

In [156]:
model = Sequential(name="nn_test") 
model.add(Dense(40, input_dim=78, activation='relu')) 
model.add(Dense(20, activation='relu')) 
model.add(Dense(20, activation='relu')) 
model.add(Dense(1, activation='relu'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['mean_squared_error'])

model.summary()

Model: "nn_test"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_8 (Dense)              (None, 40)                3160      
_________________________________________________________________
dense_9 (Dense)              (None, 20)                820       
_________________________________________________________________
dense_10 (Dense)             (None, 20)                420       
_________________________________________________________________
dense_11 (Dense)             (None, 1)                 21        
Total params: 4,421
Trainable params: 4,421
Non-trainable params: 0
_________________________________________________________________


In [157]:
model.fit(X_train, y_train, epochs=150)

Epoch 1/150


ValueError: could not convert string to float: 'RL'

## Use the Keras Library to build an image recognition network using the Fashion-MNIST dataset (also comes with keras)

- Load and preprocess the image data similar to how we preprocessed the MNIST data in class.
- Make sure to one-hot encode your category labels
- Make sure to have your final layer have as many nodes as the number of classes that you want to predict.
- Try different hyperparameters. What is the highest accuracy that you are able to achieve.
- Use the history object that is returned from model.fit to make graphs of the model's loss or train/validation accuracies by epoch. 
- Remember that neural networks fall prey to randomness so you may need to run your model multiple times (or use Cross Validation) in order to tell if a change to a hyperparameter is truly producing better results.

In [None]:
##### Your Code Here #####

## Stretch Goals:

- Use Hyperparameter Tuning to make the accuracy of your models as high as possible. (error as low as possible)
- Use Cross Validation techniques to get more consistent results with your model.
- Use GridSearchCV to try different combinations of hyperparameters. 
- Start looking into other types of Keras layers for CNNs and RNNs maybe try and build a CNN model for fashion-MNIST to see how the results compare.