## DS 862 Machine Learning for Business Analysts Fall 2020

### Artificial Neural Network

#### Submitted by:
* Di Wang

For this assignment, you will be trying out different structures of MLP and compare the performance. We will again  work on a regression data set and a classification data set.

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf

# To get reproducible results
from numpy.random import seed
seed(123)
from tensorflow.random import set_seed
set_seed(123)

# Regression

We will use the California housing dataset for our regression exercise. [Here](https://scikit-learn.org/stable/datasets/index.html#california-housing-dataset) is some details about the dataset.
#### Attribution Information: 
- MedInc median income in block
- HouseAge median house age in block
- AveRooms average number of rooms
- AveBedrms average number of bedrooms
- Population block population
- AveOccup average house occupancy
- Latitude house block latitude
- Longitude house block longitude

In [2]:
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing()
X = housing.data
y = housing.target

In [3]:
X.shape # We have 8 features and 20640 observations. 

(20640, 8)

In [4]:
pd.DataFrame(X).head()

Unnamed: 0,0,1,2,3,4,5,6,7
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25


In [5]:
y

array([4.526, 3.585, 3.521, ..., 0.923, 0.847, 0.894])

#### Task 1
Split the data into training, validation, and testing sets. Scale the training set and apply the same scale onto the validation and testing sets. Make sure you set random seed to the result is reproducible.

In [6]:
# Perform data splitting
from sklearn.model_selection import train_test_split

X_train_valid, X_test, y_train_valid, y_test = train_test_split(X, y, 
                                                        test_size = 0.2, random_state = 123)
X_train, X_valid, y_train, y_valid = train_test_split(X_train_valid, y_train_valid, 
                                                      test_size = 0.25, random_state = 123) 

In [7]:
# Scale data
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler() # Instantiate
X_train_s = scaler.fit_transform(X_train)
X_valid_s = scaler.transform(X_valid)
X_test_s = scaler.transform(X_test)

#### Task 2
Now we will fit a neural network. Let's start with a shallow network but with more neurons. Let's try 2 hidden layers, with 15 and 10 neurons repsectively. Use the standard inputs for all other hyperparameters (or something you like). Fit the model and calculate the MSE on the test set.

In [8]:
# Set up your neural network here. Use as many boxes as you need.
# Instantiate the model structure
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation

set_seed(123)
Model1 = Sequential([
    Dense(15, input_dim = X_train_s.shape[1], activation="relu"), #input dimention 8
    Dense(10, activation="relu"),
    Dense(1, activation="relu")
])

In [9]:
Model1.summary() # 15*(8+1), 10*(15+1), 1*(10+1)

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 15)                135       
_________________________________________________________________
dense_1 (Dense)              (None, 10)                160       
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 11        
Total params: 306
Trainable params: 306
Non-trainable params: 0
_________________________________________________________________


In [10]:
# Compile the model
Model1.compile(loss = "mean_squared_error", optimizer = 'adam')

In [11]:
# Fit the model
Model1.fit(X_train_s, y_train, epochs=25, validation_data = (X_valid_s, y_valid),
         batch_size = 32)

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<tensorflow.python.keras.callbacks.History at 0x19357756408>

In [12]:
# Predict on the test set
Model1.predict(X_test_s)

array([[2.0774338],
       [0.8028945],
       [1.1777669],
       ...,
       [0.7444577],
       [1.8713886],
       [3.3993115]], dtype=float32)

In [13]:
# Evaluate on test set
Model1.evaluate(X_test_s, y_test)



0.313221275806427

#### Task 3
Now we will try a different structure. Instead of a shallow network, let's use a deeper network, but with fewer number of neurons in each later. Let's use 5 hidden layers, but each with [7,5,3,2,2] neurons. Use the standard inputs for all other hyperparameters (or something you like, but be consistent with those in Task 2). Fit the model and calculate the MSE on the test set

In [14]:
# Set up your neural network here. Use as many boxes as you need.

Model2 = Sequential([
    Dense(7, input_dim = X_train_s.shape[1], activation="relu"), 
    Dense(5, activation="relu"),
    Dense(3, activation="relu"),
    Dense(2, activation="relu"),
    Dense(2, activation="relu"),
    Dense(1, activation="relu")
])

In [15]:
# Compile the model
Model2.compile(loss = "mean_squared_error", optimizer = 'adam')

In [16]:
# Fit the model
Model2.fit(X_train_s, y_train, epochs=25, validation_data = (X_valid_s, y_valid),
         batch_size = 32)

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<tensorflow.python.keras.callbacks.History at 0x193589c79c8>

In [17]:
# Evaluate on test set
Model2.evaluate(X_test_s, y_test)



0.3369347155094147

<b>What do you observe?</b>
- Looks like it will take a similar time to run Model1 (2 hidden layers) and Model2 (5 hidden layers).
- The second model does not improve the Mean Squared Error for our dataset. So more layers does not always mean high accuracy. 

#### Task 4
Now let's try to tune our model. Here are the parameters I want you to tune: the optimizer (SGD vs Adam), number of layers, number of neurons, and the activation functions of the hidden layers.

In [18]:
# First we will create a wrapper function
def build_model(n_hidden = 1, n_neurons = 5, activation="relu", optimizer = 'Adam', input_dim = 8):
    model = Sequential() # Instantiate the model
    options = {"input_dim": input_dim} # Set options 
    for layer in range(n_hidden):
        model.add(Dense(n_neurons, activation = activation, **options)) # Here we are using the input options from before
        options = {} # Now we erase the input options so it won't be included in future layers
    model.add(Dense(1, activation = activation))
    model.compile(loss = "mean_squared_error", optimizer = optimizer)
    return model

In [19]:
# Set up your grid search here. Use as many boxes as you need.
from sklearn.model_selection import GridSearchCV

set_seed(123)
keras_model = tf.keras.wrappers.scikit_learn.KerasRegressor(build_model) 

param = {
    'n_hidden': [1,2,3,4],
    'n_neurons':[5,10,15,20],
    'activation':['relu','selu'],
    'optimizer': ['Adam', 'SGD']
        }

Model3 = GridSearchCV(keras_model, param, cv = 2, n_jobs = -1)

In [20]:
Model3.fit(X_train_s, y_train, epochs = 25,
          validation_data = (X_valid_s, y_valid),
          callbacks = tf.keras.callbacks.EarlyStopping(patience=3))

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


GridSearchCV(cv=2,
             estimator=<tensorflow.python.keras.wrappers.scikit_learn.KerasRegressor object at 0x0000019358B44908>,
             n_jobs=-1,
             param_grid={'activation': ['relu', 'selu'],
                         'n_hidden': [1, 2, 3, 4], 'n_neurons': [5, 10, 15, 20],
                         'optimizer': ['Adam', 'SGD']})

<b>What is your best combination? </b>

In [21]:
Model3.best_params_

{'activation': 'relu', 'n_hidden': 2, 'n_neurons': 20, 'optimizer': 'Adam'}

<b>What about the evaluation on the test set? </b>

In [22]:
# Evaluate on test set
from sklearn.metrics import mean_squared_error

mean_squared_error(Model3.predict(X_test_s), y_test)

0.3106520283419537

#### Task 5
Now compare with one of your favorite regression model, and tell me what you observe.

In [23]:
# Your favorite regression model. Use as many boxes as you need.
from sklearn.linear_model import LinearRegression

scaler = StandardScaler()
X_train_valid_s = scaler.fit_transform(X_train_valid)
X_test_s = scaler.transform(X_test)

Model4 = LinearRegression()
Model4.fit(X_train_valid_s, y_train_valid)
mean_squared_error(Model4.predict(X_test_s), y_test)

0.5180228655178674

<b>Observation:</b>
- Looks like Neural Networks performs much better than the traditional linear regression model. 
- The MSE is quite small (less than 1), it may bacause our housing target (y) is small, the difference between predicting target and true target won't be large as well. 
- When I tune my model, it does not stop early since there is improvement on the MSE in Epoch 25. With the best combination, I get my smallest MSE (0.311) among the 4 models. I think I can still improve my model later if I don't use a fixed parameter value.

# Classification

In the lecture code, we did a multiclass classification. for this assignment, we will do something simpler. We will only do a binary classification. For this task, we will use the dataset [here](https://www.kaggle.com/iabhishekofficial/mobile-price-classification?select=train.csv). The original output has levels 0,1,2,3. I have merged the 0,1 levels to level 0, and others to level 1. The task is to predict the price level of a cell phone (low = 0 vs high = 1) given a set of mobile feature.

#### Attribution Information: 
- Total energy a battery can store in one time measured in mAh
- Has bluetooth or not
- Speed at which microprocessor executes instructions
- Has dual sim support or not
- Front Camera mega pixels
- Has 4G or not
- Internal Memory in Gigabytes
- Mobile Depth in cm
- Weight of mobile phone
- Number of cores of processor
- Primary Camera mega pixels
- Pixel Resolution Height
- Pixel Resolution Width
- Random Access Memory in Mega Bytes
- Screen Height of mobile in cm
- Screen Width of mobile in cm
- longest time that a single battery charge will last when you are
- Has 3G or not
- Has touch screen or not
- Has wifi or not
#### Target: # price_range

In [24]:
mobile = pd.read_csv('mobile.csv')

In [25]:
y2 = mobile.price_range
del mobile['price_range']
X2 = mobile

In [26]:
X2.head()

Unnamed: 0,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,n_cores,pc,px_height,px_width,ram,sc_h,sc_w,talk_time,three_g,touch_screen,wifi
0,842,0,2.2,0,1,0,7,0.6,188,2,2,20,756,2549,9,7,19,0,0,1
1,1021,1,0.5,1,0,1,53,0.7,136,3,6,905,1988,2631,17,3,7,1,1,0
2,563,1,0.5,1,2,1,41,0.9,145,5,6,1263,1716,2603,11,2,9,1,1,0
3,615,1,2.5,0,0,0,10,0.8,131,6,9,1216,1786,2769,16,8,11,1,0,0
4,1821,1,1.2,0,13,1,44,0.6,141,2,14,1208,1212,1411,8,2,15,1,1,0


#### Task 1
Split the data into train, validation, and test set. Scale the data.

In [27]:
# Split data
X_train_valid2, X_test2, y_train_valid2, y_test2 = train_test_split(X2, y2, 
                                                        test_size = 0.2, random_state = 123)
X_train2, X_valid2, y_train2, y_valid2 = train_test_split(X_train_valid2, y_train_valid2, 
                                                      test_size = 0.25, random_state = 123) 

In [28]:
# Scale data
scaler = StandardScaler() # Instantiate
X_train2_s = scaler.fit_transform(X_train2)
X_valid2_s = scaler.transform(X_valid2)
X_test2_s = scaler.transform(X_test2)

I have given you enough work for the regression task. We will take it easy with the classification. All you have to do is build a neural network with 3 hidden layers. Becareful of the activation function you use. Return the accuracy on the test set.

In [29]:
# Convert the labels into dummy variables
from tensorflow.keras.utils import to_categorical

Y_train2 = to_categorical(y_train2) # Convert the training output
Y_valid2 = to_categorical(y_valid2) # Convert the validation output
Y_test2 = to_categorical(y_test2)

In [30]:
# Build your neural network. Use as many boxes as you need.
set_seed(123)
Model5 = Sequential([
    Dense(15, input_dim = X_train2_s.shape[1], activation="relu"),
    Dense(10, activation="relu"),
    Dense(5, activation="relu"),
    Dense(2, activation="sigmoid")
])

In [31]:
# Compile the model
Model5.compile(optimizer=tf.keras.optimizers.SGD(lr=0.1),
              loss = 'binary_crossentropy', metrics = 'accuracy')

In [32]:
# Fit the model
Model5.fit(X_train2_s, Y_train2, epochs=25, validation_data = (X_valid2_s, Y_valid2),
         batch_size = 32)

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<tensorflow.python.keras.callbacks.History at 0x193589be9c8>

In [33]:
# Accuracy on test set
Model5.evaluate(X_test2_s, Y_test2)



[0.07529295235872269, 0.9800000190734863]

Lastly, compare it with your favoriate classification model, and tell me what you observe.

In [34]:
# Your favorite classification model. Use as many boxes as you need.
from sklearn.linear_model import LogisticRegression

scaler = StandardScaler()
X_train_valid2_s = scaler.fit_transform(X_train_valid2)
X_test2_s = scaler.transform(X_test2)
Model6 = LogisticRegression()
Model6.fit(X_train_valid2_s, y_train_valid2)
Model6.score(X_test2, y_test2) # mean accuracy on the given test data and labels

0.5325

In [35]:
from sklearn.ensemble import GradientBoostingClassifier

Model7 = GradientBoostingClassifier(learning_rate=0.1, n_estimators = 100)
Model7.fit(X_train_valid2_s, y_train_valid2)
np.mean(Model7.predict(X_test2_s) == y_test2)

0.97

<b>Observation:</b>
- Since it is a binary classifiation (0/1), I use sigmoid function for my output layer and use binary_crossentropy as my loss to fit the model.
- Looks like the accuracy of Neural Networks (98%) is much higher than the Logistic Regression model (53%). Gradient Boosting performs good as well, the accuracy is around 97%. 

### Thank you.