# Neural Networks in Keras

Previously, we hand wrote a neural network using JAX for computing gradients using automatic differentiation.

`tensorflow` (from google) or `pytorch` (facebook) are other packages used to build neural networks. `Keras` is a package that is built on top of all three of these package to allow very easy building and training of neural networks.

Today, we will use `keras` buily on `tensorflow` to build a neural network. 

To get used to `keras`, let's first rebuild our `JAX` activity's neural network. I wrote this for you, but look over it carefully and play around with changing some parameters to see the effect.

In [None]:
#lots of imports

import numpy as np
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

# This is simply an alias for convenience
layers = tf.keras.layers


In [None]:
# a useful function

def plot_learning_curve(history):
    """Plots a learning curve from a training history.
    
    Arguments:
        history (dict): The training history returned by `model.fit()`.
        
    Returns:
        None.
    """
    plt.figure(figsize=(11, 6), dpi=100)
    plt.plot(history.history['loss'], 'o-', label='Training Loss')
    plt.plot(history.history['val_loss'], 'o:', color='r', label='Validation Loss')
    plt.legend(loc='best')
    plt.title('Learning Curve')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    #plt.xticks(range(0, len(history.history['loss'])), range(1, len(history.history['loss']) + 1))
    plt.show()
    


In [None]:
# create same synthetic data as last activity

N = 1200 # number of examples
n_feat = 3
x = np.random.uniform(size=(N,n_feat))

def true(x):
    return .5*x[:,0] + .2*x[:,1] + 2*x[:,2] + np.random.normal(scale=.1)

y = true(x)

# my same janky train/val split
split = int((N*.8)//1)
train_x = x[:split]
train_y = y[:split]
test_x = x[split:]
test_y = y[split:]
split2=int((len(train_x[:,0]*.2)//1))
val_x = x[:split2]
val_y = y[:split2]




In [None]:
# now the important part.. building the network

# Build a simple neural network with one hidden layer usign keras

model = tf.keras.Sequential()
model.add(layers.Input(shape=(3,)))
#model.add(layers.Input(input_shape=3))

# Add a hidden layer with 4 nodes and sigmoid activation
model.add(layers.Dense(4, activation='sigmoid'))

# Add an output layer with a single node and linear activation for regression
model.add(layers.Dense(1, activation='linear'))

# That was easier, right?!

In [None]:
# Now, compile the model with lass and backprop details
opt = tf.keras.optimizers.Adam(learning_rate=1e-2)
model.compile(optimizer=opt, loss='MSE')

# Train the model
history = model.fit(train_x, train_y, epochs=100, validation_data=(val_x, val_y))



In [None]:
# plot loss curve
plot_learning_curve(history)

# plot predictions vs true values for the test set. Your model should work well!
y_pred = model.predict(test_x)
plt.plot(test_y,y_pred,'.')
plt.plot([0,2.5],[0,2.5])
plt.xlabel("y_true")
plt.ylabel("y_pred")
plt.show()

# Autoencoders

We will now use `keras` in `tensorflow` to build an autoencoder. 

We will start with a simple neural network architecture that is composed of an input layer a lower-dimensional latent space, and an output layer of equal size.

<img src="https://www.jeremyjordan.me/content/images/2018/03/Screen-Shot-2018-03-06-at-3.17.13-PM.png" width="400" />

Autoencoders are an *unsupervised learning* method. We will begin by using an autoencoder to create a latent space representation of the `digits` dataset, a reduced-dimension version of the `MNIST` dataset. Replacing the `digits` dataset with the larger `MNIST` dataset is perhaps a more useful activity, but increases the runtime of the algorithm.



## Data exploration

Again, we begin by loading our data, normalizing it, and putting it into he approporate format for our model. In this case, we need 1D arrays for our fully connected architecture. 

In [None]:
(x_train, _), (x_test, _) = tf.keras.datasets.mnist.load_data()

In [None]:
print('Training Features:\n   Shape: {}\n   Type: {}\n'.format(x_train.shape, x_train.dtype))

In [None]:
plt.figure(figsize=(10, 10))

for i in range(25):
    plt.subplot(5, 5, i + 1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(x_train[i], cmap=plt.cm.binary)
    
plt.show()

### As always, we should rescale the data. Here, I rescaled the images to [0,1]

In [None]:
x_train = x_train / 255.
x_test = x_test / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))


Now we build the autoencoder. Ours is a standard feed-forward neural network architecture with three layers as descibed above.

Let's start by reducing our dimensionality by a factor of two and see if we can recover our original images.

One way to build an autoencoder is to store each layer into a variable so that we can access the different pieces later. 


In [None]:
latent_dim = 128

#input layer
input_img = layers.Input(#complete me)
#latent layer
latent_layer = layers.Dense(#complete me, activation="??")(input_img)
# we also want to use this as an input once trained, no need to modify this
latent_input = layers.Input(shape=(latent_dim,))
# output layer
output_layer = layers.Dense(#complete me, activation="??")(latent_input)

## Now, we build the encoder and decoder from the same layers.

Let's start with the encoder:

In [None]:
# this model maps an input to its encoded representation
encoder = tf.keras.models.Model(input_img, latent_layer)

Now, for the decoder. This requires slightly more work because we want to have a latent representation as an Input in order to use the decoder as a generator.

In [None]:
# this model maps from a latent space to a reconstructed output
decoder = tf.keras.models.Model(latent_input, output_layer)


# put the layers together to create your Model
autoencoder = tf.keras.models.Model(input_img, decoder(encoder(input_img)))

### Build the model...

In [None]:
opt = tf.keras.optimizers.Adam(lr = 1e-4)
autoencoder.compile(optimizer=opt, loss='MSE')

### Now, train!

The follow training 

In [None]:
hist = autoencoder.fit(x_train, x_train,
                epochs=10, batch_size=512, # batch size can speed up this calculation
                shuffle=True,
                validation_data=(x_test, x_test))

In [None]:
plot_learning_curve(hist)

In [None]:
# encode and decode digits from the test det
encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)


In [None]:

n = 10  # how many digits we will display
plt.figure(figsize=(20, 4))
for i in range(n):
    # display original
    ax = plt.subplot(2, n, i + 1)
    #plt.imshow(x_test[i].reshape(28, 28))
    #plt.gray()    
    plt.imshow(x_test[i].reshape(28, 28), cmap=plt.cm.binary)
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # display reconstruction
    ax = plt.subplot(2, n, i + 1 + n)
    #plt.imshow(decoded_imgs[i].reshape(28, 28))
    plt.imshow(decoded_imgs[i].reshape(28, 28), cmap=plt.cm.binary)
    #plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()

### Let's try k-means clustering on the latent space to see if the latent space can seperate out the digits in the test set.

https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

In [None]:
# your code here



In [None]:
#plot a selection of test examples from various clusters

