## Machine Learning Final Exam

1.  From the perspective of a social scientist, which models did we learn this semester that are useful for ruling out alternative explanations through control variables AND that allow us to observe substantively meaningful information from model coefficients?

Models that rely on control variables to rule out alternative explanations are usually regression models, which are used to determine relationships between a target variable and independent variables. They generate model coefficients which can be used to infer and measure the significance of certain correlations between variables. The models that make use of these methods most effectively are Linear Regression and Non-Penalty Logistic Regression.

2. Describe the main differences between supervised and unsupervised learning.

A supervised leaning algorithm learns from labeled training data, while unsupervised models deals with data that are not labeled. In supervised learning, we have some prior knowledge of what the output data should be, so the goal is to build a model that produces values that closely approximate the expected output values. Unsupervised learning algorithms are more capable of handling high-complexity models and perform more complex processing tasks compared to supervised learning.

3. Is supervised or unsupervised learning the primary approach that is used by machine learning practitioners?  For whatever approach you think is secondary, why would you use this approach (what's a good reason to use these kinds of models?)

Supervised learning is the more commonly used appraoch in machine learning since it is generally faster and more widely applicable in classification and regression problems. Unsupervised learning is suitable for problems that involve clustering, dimensionality reduction, and feature selection. Also it is often easier to obtain unlabeled data, which could be computer generated, than labeled data, which often requires human intervention.

4. Which unsupervised learning modeling approaches did we cover this semester?  What are the major differences between these techniques?

The unsupervised learning models we covered are Clustering, which includes K-Means Clustering and Hierarchical Clustering, Principal Components Analysis (PCA), and Manifold Multidimensional Learning. PCA is a form of dimensionality reduction as it aims to reduce a large number of observational variables to a smaller number while preserving variance and maintaining representativeness. Manifold is similar to PCA but it operates in a multidimensional space. Clustering splits the observations into homogenous subgroups based on relative distance between data points.

5.  What are the main benefits of using Principal Components Analysis?

The main advatange of using PCA is that it eliminates any correlated features that give little or no value to the decision making process. By reducing the number of features, PCA helps mitigate overfitting issues. PCA finds the combination of variables that result in maximum variance and thus improves data visualization.

6. Thinking about neural networks, what are three major differences between a deep multilayer perceptron network and a convolutional neural network model?  Be sure to define any key terms in your explanation.

One difference between multilayer perceptron network and convolutional neural network is that MLP takes vector as input whereas CNN takes tensor as input. Because of this CNN performs better than MLP when processing complex images. Second, the layers or nodes in an MLP network are fully connected while the layers in a CNN are only partially connected. Third, in a CNN, the number of parameters are reduced by sharing weights which improves its image processing efficiency and capability compared to MLP. 

7. Write the tf.keras code for a multilayer perceptron neural network with the following structure: Three hidden layers.  50 hidden units in the first hidden layer, 100 in the second, and 150 in the third.  Activate all hidden layers with relu.  The output layer should be built to classify to five categories.  Further, your optimization technique should be stochastic gradient descent.  (This code should simply build the architecture of the model and your approach to compile the model.  You will not run it on real data.)

In [2]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation

model = Sequential([
    Dense(50, input_shape=(100,)),
    Activation('relu'),
    Dense(100),
    Activation('relu'),
    Dense(150),
    Activation('relu'),
    Dense(5),
    Activation('softmax'),
])


model.summary()
model.compile(optimizer='sgd',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 50)                5050      
                                                                 
 activation (Activation)     (None, 50)                0         
                                                                 
 dense_1 (Dense)             (None, 100)               5100      
                                                                 
 activation_1 (Activation)   (None, 100)               0         
                                                                 
 dense_2 (Dense)             (None, 150)               15150     
                                                                 
 activation_2 (Activation)   (None, 150)               0         
                                                                 
 dense_3 (Dense)             (None, 5)                 7

8. Write the tf.keras code for a multilayer perceptron neural network with the following structure: Two hidden layers.  75 hidden units in the first hidden layer and 150 in the second.  Activate all hidden layers with relu.  The output layer should be built to classify a binary dependent variable.  Further, your optimization technique should be stochastic gradient descent. (This code should simply build the architecture of the model and your approach to compile the model.  You will not run it on real data.)

In [3]:
model = Sequential()
model.add(Dense(75, activation='relu', input_dim=100))
model.add(Dense(150, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

model.summary()
model.compile(optimizer='sgd',
              loss='binary_crossentropy',
              metrics=['accuracy']) 

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_9 (Dense)             (None, 75)                7575      
                                                                 
 dense_10 (Dense)            (None, 150)               11400     
                                                                 
 dense_11 (Dense)            (None, 1)                 151       
                                                                 
Total params: 19,126
Trainable params: 19,126
Non-trainable params: 0
_________________________________________________________________


9.  Write the tf.keras code for a convolutional neural network with the following structure: Two convolutional layers.  16 filters in the first layer and 28 in the second.  Activate all convolutional layers with relu.  Use max pooling after each convolutional layer with a 2 by 2 filter.  The output layer should be built to classify to ten categories.  Further, your optimization technique should be stochastic gradient descent.  (This code should simply build the architecture of the model and your approach to compile the model.  You will not run it on real data.)

In [3]:
import tensorflow.keras as keras
from keras.layers.convolutional import Conv2D
from keras.layers.pooling import MaxPooling2D
from keras.layers.core import Dense, Activation, Flatten

model = Sequential()
model.add(Conv2D(16, (3, 3),
                 padding='valid', input_shape=(10,10,4)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(28, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
  
model.add(Flatten())
model.add(Dense(10))
model.add(Activation('softmax'))

model.summary()
model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 8, 8, 16)          592       
                                                                 
 activation_4 (Activation)   (None, 8, 8, 16)          0         
                                                                 
 max_pooling2d (MaxPooling2D  (None, 4, 4, 16)         0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 2, 2, 28)          4060      
                                                                 
 activation_5 (Activation)   (None, 2, 2, 28)          0         
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 1, 1, 28)         0         
 2D)                                                  

10.  Write the keras code for a convolutional neural network with the following structure: Two convolutional layers.  32 filters in the first layer and 32 in the second.  Activate all convolutional layers with relu.  Use max pooling after each convolutional layer with a 2 by 2 filter.  Add two fully connected layers with 128 hidden units in each layer and relu activations.  The output layer should be built to classify to six categories.  Further, your optimization technique should be stochastic gradient descent.  (This code should simply build the architecture of the model and your approach to compile the model.  You will not run it on real data.)

In [4]:
model = Sequential()
model.add(Conv2D(32, (3, 3),
                 padding='valid', input_shape=(10,10,4)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
  
model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dense(6))
model.add(Activation('softmax'))

model.summary()
model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_2 (Conv2D)           (None, 8, 8, 32)          1184      
                                                                 
 activation_7 (Activation)   (None, 8, 8, 32)          0         
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 4, 4, 32)         0         
 2D)                                                             
                                                                 
 conv2d_3 (Conv2D)           (None, 2, 2, 32)          9248      
                                                                 
 activation_8 (Activation)   (None, 2, 2, 32)          0         
                                                                 
 max_pooling2d_3 (MaxPooling  (None, 1, 1, 32)         0         
 2D)                                                  