#**Sparse AutoEncoders**

In the previous step, the input layer was constrainedby by the size of the hidden layer (128). In such a situation, what typically happens is that the hidden layer is learning an approximation of PCA (principal component analysis). But another way to constrain the representations to be compact is to add a sparsity contraint on the activity of the hidden representations, so fewer units would "fire" at a given time. In Keras, this can be done by adding an activity_regularizer to our Dense layer. The code below outlines how this works. In particular, look at the line in the code that specifies the regularization:

> "activity_regularizer=regularizers.l1(10e-5)"

Remember L1 regularization constraints the cost function with and absolute value of the magnitude of the weights

>$\lambda \sum_{j=1}^p\lvert \beta_j \rvert$ where $p$ is the number of weights

L1 regularization shrinks the less important featureâ€™s coefficient to zero thus, removing some feature altogether.

In the following example we use the  ["binary_crossentropy"](https://towardsdatascience.com/understanding-binary-cross-entropy-log-loss-a-visual-explanation-a3ac6025181a) loss or cost function. This term effectively tries to maximise the Log-loss function and is very similar to the entropy calculations we did when attempting to discretize a continous variable. In this approach we are trying to match pixels rather than match the intensity of them which the "mean_squared_error" does.


Now when you print how out the hidden feature set you will notice that a number of them are zero. This means we have introduced sparsity into the autoencoder.

Have a go at changing the loss function and changing the optimizers in the following code. See how your results change.

Share your thoughts on the comment box.



In [None]:
import tensorflow as tf
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K
from keras.utils import plot_model
from keras import regularizers
import numpy as np

(x_train, y_train), (x_test, y_test) = mnist.load_data()

img_rows, img_cols = 28, 28

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))
print(x_train.shape)
print(x_test.shape)

#,activity_regularizer=regularizers.l1(10e-5)

model = Sequential()
model.add(Dense(32,activation='relu',activity_regularizer=regularizers.l1(10e-5),input_dim=784))
model.add(Dense(784,activation='sigmoid'))
#loss_choice='mean_squared_error'
loss_choice='binary_crossentropy'
#model.compile(optimizer='adadelta', loss=loss_choice,metrics = ['accuracy'])
model.compile(loss=loss_choice,
              optimizer='adam',
              metrics = ['accuracy'])

plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)


In [None]:
#print(x_train.shape)

history=model.fit(x_train,x_train,verbose=1,epochs=50,batch_size=256,shuffle=True,)
model.save('auto_en.h5')


We have now trianed our algorithm and I want to find the new decompressed variables. I have used an alternative piece of code to that shown previously. In it we basically create a new neural network model by mimicing the original neural network up to the hidden layer. We then insert the weights from the original model. We then use the new_model predictions to get outputs.

In [None]:
model.layers[0].get_weights()[0][1]

In [None]:
new_model = Sequential()
new_model.add(Dense(32,activation='relu',activity_regularizer=regularizers.l1(10e-5),input_dim=784))
new_model.set_weights(model.layers[0].get_weights())
new_model.compile(optimizer='adam', loss='categorical_crossentropy')
output = new_model.predict(x_train)
print(output[0])

In [None]:
predicted_image = model.predict(x_test)

In [None]:
# use Matplotlib (don't ask)
import matplotlib.pyplot as plt

n = 10  # how many digits we will display
plt.figure(figsize=(20, 4))
for i in range(n):
    # display original
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(x_test[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # display reconstruction
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(predicted_image[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()

In [None]:
import matplotlib.pyplot as plt





# Plot training & validation loss values
plt.plot(history.history['loss'])

plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()