# Exploratory data analysis in retinal bipolar data with autoencoders

In this notebook, we will build a neural network that explores the retinal bipolar dataset for Shekhar et al., 2016 without using the manually annotated cell type labels.

## 1. Imports

In [None]:
!pip install --user scprep

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import tensorflow as tf
import scprep

## 2. Loading the retinal bipolar data

We'll use the same retinal bipolar data we used for the classifier.

In [None]:
scprep.io.download.download_google_drive("1pRYn62SOmmJxwVU0sSW7eBagRL2RJmx0", "shekhar_data.pkl")
scprep.io.download.download_google_drive("1FlNktWuJCka3pXOvNIFfRitGluZy2ftt", "shekhar_clusters.pkl")

In [None]:
data_raw = pd.read_pickle("shekhar_data.pkl")
clusters = pd.read_pickle("shekhar_clusters.pkl")

In [None]:
data = scprep.reduce.pca(data_raw, n_components=100, method='dense').to_numpy()
labels, cluster_names = pd.factorize(clusters['CELLTYPE'])

## 3. Building an autoencoder

An **autoencoder** is a network that tries to reproduce its input. 

In this case, we will squeeze the data through a two-dimensional bottleneck which we can use for visualization. Also, reducing the dimension from 100 down to 2 forces the network to only retain the most important information, which intrinsically behaves as a kind of denoising.

**Note Dan/Scott/Matt**: Does the term 'bottleneck' have a specific mathematical meaning.  If so that needs to be fleshed out a bit here.  

#### Create a `Session`

You only have to do this once.

In [None]:
sess = tf.InteractiveSession()

In [None]:
# this function applies the simple feedforward operation
def layer(x, n_dim, name, activation=None):
    # create the weight matrix
    W = tf.get_variable(dtype=tf.float32, shape=[x.get_shape()[-1], n_dim], name='W{}'.format(name))
    # create the bias vector
    b = tf.get_variable(dtype=tf.float32, shape=[n_dim], name='b{}'.format(name))
    # X2 = X1 * W + b
    output = tf.matmul(x, W) + b
    if activation:
        # nonlinear activation function
        output = activation(output)
    return output

# we'll pass 100 data points at a time through the network
batch_size = 100

# create a placeholder for the input which is the same as the output
data_tf = tf.placeholder(shape=[None, data.shape[1]], dtype=tf.float32, name='data_tf')


# layers will be input -> 100 -> 2 --> 100 -> output
# first hidden layer of size 100
hidden_layer1_tf = layer(data_tf, 100, 'hidden_layer1', activation=tf.nn.relu)
# we won't apply a nonlinear activation to the 2D middle layer
hidden_layer2_tf = layer(hidden_layer1_tf, 2, 'hidden_layer2', activation=None)
# last hidden layer of size 100
hidden_layer3_tf = layer(hidden_layer2_tf, 100, 'hidden_layer3', activation=tf.nn.relu)
# the output should be the same size as the input
output_tf = layer(hidden_layer3_tf, data.shape[1], 'output_tf', activation=None)


# use mean-squared-error reconstruction loss
loss_tf = tf.reduce_mean((data_tf - output_tf)**2)

# this part is all the same as before
learning_rate = .001
opt = tf.train.AdamOptimizer(learning_rate)

# create an instruction to tell tf to minimize the loss
train_op = opt.minimize(loss_tf)

# initialize variables
sess.run(tf.global_variables_initializer())

#### Train the network

In [None]:
# we'll train the network for 10 epochs
step = 0
for epoch in range(10):
    # randomize the order of the data each time through
    random_order = np.random.permutation(data.shape[0])
    data_randomized = data[random_order]

    # train the network on batches of size `batch_size`
    for data_batch in np.array_split(data_randomized, data_randomized.shape[0] // batch_size):
        step += 1

        # update the network weights to minimize the loss
        sess.run(train_op, {data_tf: data_batch})
        
        # print the loss every 100 epochs
        if step % 100 == 0:
            loss_np = sess.run(loss_tf, {data_tf: data_batch})
            print("Step: {} Loss: {:.3f}".format(step, loss_np))

#### Visualize the output

Rather than evaluating our model with our data like we did with the classifier, we can now use our model to evaluate our data (aka exploratory data analysis)!  Autoencoder networks are very useful in exploratory data analysis.

In [None]:
# let's get the 2D internal hidden layer and visualize it with a scatter plot
ae_coordinates = sess.run(hidden_layer2_tf, {data_tf: data})

scprep.plot.scatter2d(ae_coordinates, c=cluster_names[labels],
                      label_prefix='AE Coordinate ', discrete=True,
                      legend_anchor=(1,1), figsize=(10,4))

### Discussion

What do you notice about the visualization? How does this compare to the visualizations you have seen with PCA, t-SNE, UMAP and PHATE?

#### _Breakpoint_  - once you get here, please help those around you!

## Exercise 4 - Activation functions on the visualization layer

Notice we used `activation=None` for the hidden layer we were going to visualize repeat the process with other activation functions like `tf.nn.relu`, `tf.nn.sigmoid`, `tf.nn.tanh`, etc. You can see more in the [Tensorflow documentation](https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/keras/activations). 

Note how the visualization changes. Has the data changed at all?

In [None]:
# reset everything
sess.close()
sess = tf.InteractiveSession()
tf.reset_default_graph()

# ===================
# Copy the code from above for both building the graph and training
# Change `activation` in `hidden_layer2_tf` from `None` to one of the other options

# ===================

In [None]:
# let's get the 2D internal hidden layer and visualize it with a scatter plot
ae_coordinates = sess.run(hidden_layer2_tf, {data_tf: data})

scprep.plot.scatter2d(ae_coordinates, c=cluster_names[labels],
                      label_prefix='AE Coordinate ', discrete=True,
                      legend_anchor=(1,1), figsize=(10,4))

#### _Breakpoint_  - once you get here, please help those around you!

## Exercise 5 - Activation functions on the wide hidden layers

Now turn the activation for the visualization layer back to None, but experiment with the activation function for the 100-dimensional layers.

Is there a change? Why?

In [None]:
# reset everything
sess.close()
sess = tf.InteractiveSession()
tf.reset_default_graph()

# ===================
# Copy the code from above and change `activation` in `hidden_layer1_tf` and 
# `hidden_layer3_tf` from `None` to one of the other options

# ===================

In [None]:
# let's get the 2D internal hidden layer and visualize it with a scatter plot
ae_coordinates = sess.run(hidden_layer2_tf, {data_tf: data})

scprep.plot.scatter2d(ae_coordinates, c=cluster_names[labels],
                      label_prefix='AE Coordinate ', discrete=True,
                      legend_anchor=(1,1), figsize=(10,4))