# Understanding the behavior of Linear Activation Functions in a Simple Autoencoder
Cameron Farzaneh

The goal of this project is to gain insight as to why a Linear Activation Function is not able to reconstruct my input data, and why it is behaving the way it is when reducing the dimensions from two, down to one in latency space. The purpose of this experiment is to gain futurer insight into Autoencoders, the basic structure of Neural Networks, and to gain a deeper understanding into the Mathematics involved during the entire process.

In this experiment, I was not able to successfully reconstruct the Input Data. My goal is to understand why this is the case.

# The Dataset

The dataset consists of vectors with magnitudes between -3 and 3. These vectors are unit vectors 45 degrees from the X and Y axis, and 90 degrees from each other.

This is how the dataset looks like:
![title](img/dataset.png)

Now, to construct this dataset, we are first creating basis vectors. These are unit vectors so we can easily control the magnitude. Our basis vectors, U1 and U2 are:
$$U_1 = <\frac{1}{\sqrt{2}},\frac{1}{\sqrt{2}}>$$
$$U_2 = <-\frac{1}{\sqrt{2}},\frac{1}{\sqrt{2}}>$$

Now, we can multiply the basis vectors by magnitudes randomly picked between -3 and 3. Doing this, we can construct the dataset above. Our dataset size is 10,000. We can simply store this in a NumPy array.

# The Autoencoder

This is how the Autoencoder looks like:
![title](img/network.png)

In this diagram, $W_1$ and $W_2$ are both weights. They are initialized randomly. $B_z$, $B_1$, and $B_2$ are our biases. $X_1$ and $X_2$ are our output neurons. The weights $W_1$ and $W_2$ are shared, however, they are transposed in-between Z and the reconstruction layer. All together, there are 3 biases and 2 weights.

This autoencoder has one neuron in the hidden layer and two neurons representing for both the input and output layers. The goal of this autoencoder is to reduce the dimensionality from two (the dataset) into one dimension in latency space, and reconstruct the same vectors.

The autoencoder works by taking in two inputs, $X_1$ and $X_2$. $X_1$ and $X_2$ represent the X and Y componants of a single vector (either Purple or Yellow).
So $X_1$ could be the Y compontant and $X_2$ could be the X compontent (or Vice Versa).  Because our autoencoder has only one node in the middle, the transformation from the two nodes to Z is simply a dot product. 

**Note. This is only the case because we are reducing from two neurons to one! Typically, this step would be matrix multiplcation.**

Our forumula for Z is equal to:
$$Z = \sum\limits_{i=1}^{2}{X_iW_i} + B_z$$

<center>or</center>

$$Z = X_1W_1+X_2W_2+B_z$$

Now, we must look at our possibilities as inputs for $X_1$ and $X_2$.
If the input is a point on the purple line, then $X_1$ and $X_2$ would either both be positive, or both be negative.
Similarly, if the input is a point on the yellow line, then $X_1$ is either negative and $X_2$ is positive, or $X_1$ is positive and $X_2$ is negative.

We can write this as:
$$\frac{a}{\sqrt{2}}<-1,1>$$

Because of this, our Z function will look different depending on the input point. 

# The Results

Given the state in which the autoencoder was built, it was not able to successfully reconstruct both vectors. As you can see in the diagram below, only one line was successfully being successfully reconstructed. This must mean that the Autoencoder was only able to learn one of the vectors.

To optimize the cost function, Adam Optimizer was the fastest in comparison to Gradient Decent and Adagrad.

<img src="img/results/result1.png" width="600">

In latency space, it is clear that the input data for the purple line was successfully being transformed into one-dimension. This is not the case for the yellow line. All the points appear to be cluttered around the point 0.

The distance between the points in latency space should correspond to the distance in the Input data. This is why the purple line, in latency space, looks almost identitcal to the input data and reconstrcution. The purple line appeared to be successfully keeping the same distance between the points.

But why is the yellow line being mapped to only 0? Why isn't the autoencoder able to learn both lines, and maintain the distance apart in latency space for both lines?