## CSCI 470 Activities and Case Studies

1. For all activities, you are allowed to collaborate with a partner. 
1. For case studies, you should work individually and are **not** allowed to collaborate.

By filling out this notebook and submitting it, you acknowledge that you are aware of the above policies and are agreeing to comply with them.

Some considerations with regard to how these notebooks will be graded:

1. Cells in which "# YOUR CODE HERE" is found are the cells where your graded code should be written.
2. In order to test out or debug your code you may also create notebook cells or edit existing notebook cells other than "# YOUR CODE HERE". We actually highly recommend you do so to gain a better understanding of what is happening. However, during grading, **these changes are ignored**. 
2. You must ensure that all your code for the particular task is available in the cells that say "# YOUR CODE HERE"
3. Every cell that says "# YOUR CODE HERE" is followed by a "raise NotImplementedError". You need to remove that line. During grading, if an error occurs then you will not receive points for your work in that section.
4. If your code passes the "assert" statements, then no output will result. If your code fails the "assert" statements, you will get an "AssertionError". Getting an assertion error means you will not receive points for that particular task.
5. If you edit the "assert" statements to make your code pass, they will still fail when they are graded since the "assert" statements will revert to the original. Make sure you don't edit the assert statements.
6. We may sometimes have "hidden" tests for grading. This means that passing the visible "assert" statements is not sufficient. The "assert" statements are there as a guide but you need to make sure you understand what you're required to do and ensure that you are doing it correctly. Passing the visible tests is necessary but not sufficient to get the grade for that cell.
7. When you are asked to define a function, make sure you **don't** use any variables outside of the parameters passed to the function. You can think of the parameters being passed to the function as a hint. Make sure you're using all of those variables.
8. Finally, **make sure you run "Kernel > Restart and Run All"** and pass all the asserts before submitting. If you don't restart the kernel, there may be some code that you ran and deleted that is still being used and that was why your asserts were passing.

# Deep Learning - Autoencoders

In this exercise we'll use an AutoEncoder to learn a dimenionally reduced representation of data and investigate its performance compared to using the original data. You'll learn how to build AutoEncoders and how to use the Keras functional API.

In [1]:
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.layers import Dense
from tensorflow.keras import Model, Input
import sklearn as sk
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVC
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline
plt.style.use("ggplot")

np.random.seed(0)
tf.random.set_seed(0)

In [2]:
data = load_breast_cancer()
features = data["data"]
targets = data["target"]
X_train, X_test, y_train, y_test = train_test_split(features, targets, random_state=0)

In [3]:
# Read through the description of the data to better understand it
# What features do we have and what is the target we're trying to predict?
print(data["DESCR"])

.. _breast_cancer_dataset:

Breast cancer wisconsin (diagnostic) dataset
--------------------------------------------

**Data Set Characteristics:**

    :Number of Instances: 569

    :Number of Attributes: 30 numeric, predictive attributes and the class

    :Attribute Information:
        - radius (mean of distances from center to points on the perimeter)
        - texture (standard deviation of gray-scale values)
        - perimeter
        - area
        - smoothness (local variation in radius lengths)
        - compactness (perimeter^2 / area - 1.0)
        - concavity (severity of concave portions of the contour)
        - concave points (number of concave portions of the contour)
        - symmetry
        - fractal dimension ("coastline approximation" - 1)

        The mean, standard error, and "worst" or largest (mean of the three
        worst/largest values) of these features were computed for each image,
        resulting in 30 features.  For instance, field 0 is Mean Radi

In [4]:
X_train.shape

(426, 30)

In [5]:
X_train

array([[1.185e+01, 1.746e+01, 7.554e+01, ..., 9.140e-02, 3.101e-01,
        7.007e-02],
       [1.122e+01, 1.986e+01, 7.194e+01, ..., 2.022e-02, 3.292e-01,
        6.522e-02],
       [2.013e+01, 2.825e+01, 1.312e+02, ..., 1.628e-01, 2.572e-01,
        6.637e-02],
       ...,
       [9.436e+00, 1.832e+01, 5.982e+01, ..., 5.052e-02, 2.454e-01,
        8.136e-02],
       [9.720e+00, 1.822e+01, 6.073e+01, ..., 0.000e+00, 1.909e-01,
        6.559e-02],
       [1.151e+01, 2.393e+01, 7.452e+01, ..., 9.653e-02, 2.112e-01,
        8.732e-02]])

In [6]:
y_train.shape

(426,)

In [7]:
y_train

array([1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1,
       0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1,
       1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0,
       1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1,
       1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0,
       1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0,
       0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1,
       1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0,
       1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1,
       1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1,
       1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1,
       1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
       0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0,

### Building a Keras Model with the functional API

In this exercise, instead of using the `Sequential` model, we will use the base `Model` in `tf.keras`. There are two approaches to using `tf.keras.Model`. We will use the functional API as outlined in the [`Model` docs](https://www.tensorflow.org/api_docs/python/tf/keras/models/Model).

Note that unlike the manner in which we defined our model in the prior Activity, in this Activity (using the base Model) the definition is more like that of the coding of a functional algorithm, e.g.:

b = f1(a)  
c = f2(b)  
d = f3(c)  
etc.

Unlike traditional script execution in Python, however, those lines of code do not actually execute the computation at that moment. Rather, they are defining a chain of operations that our `Model` will execute later, when we ask it to.

### The architecture of your autoencoder

Below, you'll build an autoencoder with several layers. Recall that an encoder is composed of an "encoder" portion and a "decoder" portion. Your encoder will have two layers, transforming the number of features down to 10 in the first layer, and down to 5 (or less) in the second layer. The output of that second layer will serve as the encoded (or "embedded") representation, which will later be used as features for an SVM model. Your decoder will also have two layers, undoing the encoding, and transforming the encoded representation up to 10 in the first layer, and up to the original dimensionality in its second layer.

Your autoencoder model with thus have 4 layers of neurons. Some would call this a 5-layer model, considering the input samples as a "layer" as well, although that is not a layer of neurons.

In [8]:
# Determine the number of input dimensions (features) and use that value to
# create a tf.keras.Input object, giving it the variable name "inputs".
#
# Also, select a dimension (<=5) for the encoding/embedding (the number of
# neurons in the "middle" layer of your autoencoder) and give it the
# variable name "embedding_dim".

# YOUR CODE HERE
input_dim = features.shape[1]
embedding_dim = 5

inputs = Input(shape=(input_dim,))



In [9]:
assert inputs.shape[1] == X_train.shape[1]
assert isinstance(embedding_dim, int)
assert embedding_dim > 0 and embedding_dim <= 5

In [10]:
# To start, you'll define the encoding portion of the autoencoder.
#
# Start with "inputs" and chain layer calls to two subsequent Dense layers,
# the first with 10 neurons (units), the second with "embedding_dim" neurons.
# See the tf.keras.Model documentation (linked in the instructional cell
# above). There is a brief example near the top of that webpage.
#
# Use ReLU as the activation function for the first Dense layer
# and do not set an activation for the second Dense layer.
#
# Name the output of the second Dense layer "encoded".

# YOUR CODE HERE
hidden_layer1 = Dense(10, activation='relu')(inputs)
encoded = Dense(embedding_dim)(hidden_layer1)


In [11]:
testM = Model(inputs, encoded)
assert len(testM.layers) == 3
assert encoded.shape[1] == embedding_dim

In [12]:
# Now you'll define the decoding portion of the autoencoder.
#
# Chain layer calls to two more dense layers, the first with 10 neurons and the
# second (final layer) with the same number of neurons as your input (number
# of features).
#
# Use ReLU as the activation function for the first new Dense layer
# and do not set an activation for the second new Dense layer.
#
# Name the output of the final Dense layer "decoded".

# YOUR CODE HERE
hidden_layer2 = Dense(10, activation='relu')(encoded)
decoded = Dense(input_dim)(hidden_layer2)


In [13]:
testM = Model(inputs, decoded)
print(len(testM.layers))
assert len(testM.layers) == 5
assert decoded.shape[1] == 30

5


### Create the autoencoder

Above, you defined the encoder, decoder, and collectively the autoencoder. But we haven't actually instantiated any anything yet.

In the cell below, we'll create/instantiate your autoencoder for you, and also create a separate "encoder" object which shares its layers with the encoder portion of the autoencoder. This makes it easy for use to train the full autoencoder, and then use just the encoder portion to convert our original features into an embedded representation of lower dimensionality. We'll also create a separate "decoder" object in a similar manner.

In [14]:
# Create the autoencoder
autoencoder = Model(inputs, decoded)


# Create the encoder, which takes the same inputs as the
# autoencoder, but stops after the encoding layers. Thus,
# the output of the encoder is the encoded representation.
encoder = Model(inputs, encoded)


# Create the decoder which starts at the encoded output,
# and uses the remaining layers of the autoencoder...
encoded_embedding = Input(shape=(embedding_dim,))

# Get the 1st and 2nd decoder layer from the autoencoder
decoder_layer2 = autoencoder.layers[-2]
decoder_layer3 = autoencoder.layers[-1]

# Chain layer calls
decoder_out = decoder_layer3(decoder_layer2(encoded_embedding))

# Create the decoder
decoder = Model(encoded_embedding, decoder_out)

In [15]:
# View the autoencoder architecture
autoencoder.summary()

Model: "model_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 30)]              0         
                                                                 
 dense (Dense)               (None, 10)                310       
                                                                 
 dense_1 (Dense)             (None, 5)                 55        
                                                                 
 dense_2 (Dense)             (None, 10)                60        
                                                                 
 dense_3 (Dense)             (None, 30)                330       
                                                                 
Total params: 755
Trainable params: 755
Non-trainable params: 0
_________________________________________________________________


In [None]:
# View the encoder architecture
encoder.summary()

In [None]:
# View the decoder architecture
decoder.summary()

### Training

Below we'll compile and train the model. __Notice that in our call to `fit` we use `X_train` as both the features and the targets__. Our autoencoder is not a traditional machine learning model. It uses self-supervised learning, in which we want the output to equal the input. This might seem easy, but we are forcing the autoencoder model to compress the features down to a much lower dimensionality, in the bottlenecked autoencoder architecture. Thus, it may have to learn a complex non-linear function to accomplish this task.

Note that, as always, we do not use the test set features to train the autoencoder. Test set features are held out for all stages of training including dimensionality reduction.

We'll train for quite a while (1000 epochs). If all goes well, you'll see afterwards as to why we train for such a long time.

In [16]:
# Compile the model, using the Adam optimizer for gradient descent,
# and using MSE as the loss function.
autoencoder.compile(optimizer="adam", loss="mse")

# Train our model. Note that X_train serves as both features and targets.
n_epochs = 1000
history = autoencoder.fit(X_train, X_train, epochs=n_epochs)

Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 8/1000
Epoch 9/1000
Epoch 10/1000
Epoch 11/1000
Epoch 12/1000
Epoch 13/1000
Epoch 14/1000
Epoch 15/1000
Epoch 16/1000
Epoch 17/1000
Epoch 18/1000
Epoch 19/1000
Epoch 20/1000
Epoch 21/1000
Epoch 22/1000
Epoch 23/1000
Epoch 24/1000
Epoch 25/1000
Epoch 26/1000
Epoch 27/1000
Epoch 28/1000
Epoch 29/1000
Epoch 30/1000
Epoch 31/1000
Epoch 32/1000
Epoch 33/1000
Epoch 34/1000
Epoch 35/1000
Epoch 36/1000
Epoch 37/1000
Epoch 38/1000
Epoch 39/1000
Epoch 40/1000
Epoch 41/1000
Epoch 42/1000
Epoch 43/1000
Epoch 44/1000
Epoch 45/1000
Epoch 46/1000
Epoch 47/1000
Epoch 48/1000
Epoch 49/1000
Epoch 50/1000
Epoch 51/1000
Epoch 52/1000
Epoch 53/1000
Epoch 54/1000
Epoch 55/1000
Epoch 56/1000
Epoch 57/1000
Epoch 58/1000
Epoch 59/1000
Epoch 60/1000
Epoch 61/1000
Epoch 62/1000
Epoch 63/1000
Epoch 64/1000
Epoch 65/1000
Epoch 66/1000
Epoch 67/1000
Epoch 68/1000
Epoch 69/1000
Epoch 70/1000
Epoch 71/1000
Epoch 72/1000
E

In [None]:
# Let's look at our the loss scores collected during the training/fit session above.

plt.semilogy(np.arange(1, n_epochs+1), history.history['loss'])
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Loss during training session.')

print(f"Loss on the final training epoch was {history.history['loss'][-1]:0.2f}")

### Training convergence

If everthing went as planned, you'll see a somewhat chaotic, stairstep-shapped loss curve in the figure above. During training, the model parameters sometimes got stuck in or a near a local minimum of the loss function, which is why you see some flatter portions even during the early training epochs. Fortunately the learning procedure found a way out of those local minima.

### Embedding

Now that you've trained the autoencoder, and given yourself direct access to the encoder half of the autoencoder, we'll use that encoder (below) to "embed" the original features into a new feature/embedding space. The resulting new features are typically called "embedded" or "encoded" features.

In [17]:
# Calculate the embedded features using the encoder model
X_train_embed = encoder.predict(X_train)
X_test_embed = encoder.predict(X_test)



### Supervised Learning

Now comes the actual training of a supervised learning model. We have our embedded features, thanks to our autoencoder, along with our original features.

__You'll train two SVM models to predict breast cancer diagnoses: malignant or benign.__  
One SVM will use the original features and one will use the embedded features.  
Which do you think well make better test set predictions?

You may experience some `ConvergenceWarning` messages. That's okay, you can ignore them.

In [18]:
# Train two LinearSVC models
#
# Create and fit a model with the original features.
# Name the model "base_model".
#
# Create and fit a model with autoencoder-embedded features.
# Name the model "embed_model".
#
# When you create your LinearSVC models, you may want to
# set random_seed to the same values (e.g., 0) for both models
# for a more apples-to-apples comparison.

# YOUR CODE HERE
# create and fit model with original features
base_model = LinearSVC(random_state=0)
base_model.fit(X_train, y_train)

# create and fit model with embedded features
encoder = Model(inputs, encoded)
X_train_encoded = encoder.predict(X_train)
embed_model = LinearSVC(random_state=0)
embed_model.fit(X_train_encoded, y_train)






In [19]:
assert base_model
assert isinstance(base_model, LinearSVC)
assert base_model.coef_.shape[1] == 30
assert embed_model
assert isinstance(embed_model, LinearSVC)
assert embed_model.coef_.shape[1] == embedding_dim

In [20]:
print(f"The base SVM accuracy score:                  {base_model.score(X_test, y_test):0.3f}")
print(f"The autoencoder embedding SVM accuracy score: {embed_model.score(X_test_embed, y_test):0.3f}")

The base SVM accuracy score:                  0.902
The autoencoder embedding SVM accuracy score: 0.867


### Parting thoughts

Was the test set score of the embedded data model better or worse than that of the original data model?

Ask youself why it might be better or worse. 

 - What happens when you change the activation function(s) in the autoencoder?
 - What happens when you change the embedding_dim to be larger or smaller?
 - Is it sufficient to just use LinearSVC with the default parameters to make any of these conclusions?

## Feedback

In [None]:
def feedback():
    """Provide feedback on the contents of this exercise
    
    Returns:
        string
    """
    # YOUR CODE HERE
    raise NotImplementedError()