# Reverse Neural Network

##### Machine Learning Comprehension: Low

##### Time to Read: 15 min

##### Author: Albert Nguyen-Tran

# Intro

At this point you probably have some idea of what neural networks are. Even better, how useful they are for image classification. With some artificial neurons and weights among other clever techniques we can train a model to classify almost anything we want

But what if you could run the model in reverse? Could you inverse the model such that, given a label it could "generate" some inputs? The answer is not really, but in some cases it can be done given that certain strict conditions are met.

First, let's walk through how a feed-forward neural network works to lay the groundwork for what's to come.

# 1. Neural Networks 101

Artificial neural network, convolutional neural network, recurrent neural network? What even is a neural network?? If you made the biological connection to our own brains' you are already on the right track 

![image.png](attachment:0997a03c-b666-4f01-bdac-95e5c35d10c6.png)

Despite the fact that we don't *completely* understand how our brains work, the motivation behind neural networks is simple, to - model the way in which our **neurons** interact.

We understand that neurons:
- Have **one** axon that carry electrical nerve signals
- This axon splits into an axon terminal, housing collateral "axon tips"
- Such tips form synapses with the dendrites of other respective neurons

Ultimately creating 10^15 pathways for our near 90 billion neurons to interact!


### 1.1 How it Works

If your first intuition was to label our brains as a massive series of logic gates like those in your CPU you would be vastly underestimating the power of these neurons

The thing is, each respective neuron will only fire through its axon tips if and only if the total strength of its inputs exceed a certain threshold. 

Well wouldn't that make its image binary just like a circuit?

No! Because the [synaptic connection between](https://en.wikipedia.org/wiki/Synaptic_plasticity) neurons [can change](https://en.wikipedia.org/wiki/Spike-timing-dependent_plasticity) when:
- [Long-term potentiation (increase)](https://en.wikipedia.org/wiki/Long-term_potentiation) if neurons are activated in a coordinated manner
- [Long-term depression (decrease)](https://en.wikipedia.org/wiki/Long-term_depression) if neurons are active but do not coincide

Changing the liklihood of these neurons interacting in the future! This is the key idea, that the synaptic strengths are learnable and can affect one another!


### 1.2 The Perceptron

Now that groundwork has been layed it's time to discuss how such a neuron can be modelled mathematically

![image.png](attachment:ce27f94e-961d-4e19-a3cb-720b9c69e6ed.png)!

Notice how we are simply grabbing the weighted sum at each cell body from the signals of surrounding neurons to represent the total strength of the "electrical impulses".
Then we use a step function to activate it, introducing "thresholds" as described earlier. 

By having an activation function at each perceptron, together the model will be able to learn and adapt to the key characteristics of the label it is trying to classify because it will learn to "fire" based on certain conditions

This distinguishes neural nets from a series of linear functions that cannot create decision boundaries, and would accomplish something like this instead:

![image.png](attachment:e51cd31b-b88c-4510-b34a-476be71f526f.png)

Briefly, these are two popular non-linear activation functions:

1. Sigmoid: it looks like an S, or a hyperbolic tangent function fitted between [0, 1] to be exact
    - It rose to popularity due to it's all or nothing principle (0, 1), which is a convenient representation of how neurons work in the real world
    - However is not used as much in practice because the increased likelihood of [vanishing gradient](https://stats.stackexchange.com/questions/126238/what-are-the-advantages-of-relu-over-sigmoid-function-in-deep-neural-networks) (vector of partial derivatives on perceptrons that gives us information about which weight to adjust whilst [backpropagating](https://www.analyticsvidhya.com/blog/2023/01/gradient-descent-vs-backpropagation-whats-the-difference/))

2. ReLU (Rectified Linear Unit): looks like a hockey stick, modelled by max(0, x)
    - Inexpensive to run because of its near linearity (is 0 for negative terms)
    - Minimizes vanishing gradient
    - But still prone to exploding [gradient and dying relu](https://www.linkedin.com/pulse/rectified-linear-unit-non-linear-mukesh-manral/?trk=pulse-article_more-articles_related-content-card)

________________

### 1.3 Feed Forward Neural Network Architecture

So, how do these perceptrons work together? Let's take a look at a Feed Forward Neural Network

In a feed forward artificial neural network, information is passed forward into hidden layers which spit out an output. These layers are "hidden" because its information is unknown to the user during training (unlike the inputs and desired output which are known)

Within each hidden layer are a series of perceptrons which as described earlier take the weighted sum of each signal it receives an activates the value through a non-linear function:
    
    y = signal strength of perceptron
    f = non-linear activation function
    w = weight
    x = signal strength of input node

![image.png](attachment:ab06eb01-8ef2-4fd0-a712-3ebe1c89f169.png)![image.png](attachment:47e93e3a-be2e-487b-9edd-5bf252207f83.png)

Imagine we were trying to train the model to predict if a 256x256 pixel image was happy or sad... in that case each input node (x1, x2, x3...) could represent a single pixel of the current image being trained

Once we pass the final layer an output value or **yhat** will be produced. Since we are solving a [binary classification problem](https://www.kaggle.com/code/ryanholbrook/binary-classification) the single output node (yhat) value might be a probability ranging between [0, 1], where 0 is sad and 1 is happy (depending on how you set it up)

If we had multiple classes, we would be solving a [multiclass problem](https://en.wikipedia.org/wiki/Multiclass_classification) which might entail that we have several output nodes each representing the probability of its respective class being labelled

![image.png](attachment:2038b30e-ef27-457d-a062-a7a84e6bd521.png)

### 1.4 Initializing Weights

But how are the values of these weights determined? The main idea is still that the weights can change and "learn", but how do we initialize the weights in the first place?

Assuming the data is normalized properly (data dimensions are of approximately the same scale and don't have much variance), we can assume that around half of the weights will be positive and the other half will be negative.

Notice how we are able to normalize the images through PCA and still preserve most of the information using the eigenvectors (https://cs231n.github.io/neural-networks-2/), just compare the whitened images and the original images for reference
![image.png](attachment:94b2caed-64bb-4b6c-bb5f-c28aac1d4d45.png)

So should we just set all the weights to 0 if roughly half the weights will be positive and half will be negative? No because this will affect our gradient calculation in backpropogation (see 1.6)

Ok what about setting all to be a very small negative number or very small positive number? No because we want there to be unique weights at initialization so that the model can inherently introduce symmetry breaking while backpropogating through the distinct computations.

One way to do this is to sample random numbers from a zero mean, unit standard deviation gaussian ([normal distribution](https://en.wikipedia.org/wiki/Normal_distribution)),

Optimizing to find shorter program https://www.youtube.com/watch?v=9EN_HoEk3KY&t=374s

### 1.5 Cost Function

### 1.6 Backpropogation


To clarify, during backpropagation, the error signal is propagated backwards through the network to update the weights. The update rule for the weights involves multiplying the error signal by the input to the neuron and the derivative of the activation function.

For example, consider a neuron with ReLU activation that receives an input x. During the forward pass, the output of the neuron is max(0, x). If the error signal for this neuron is denoted as delta, the update rule for the weight w is:

delta * x * relu_derivative(x)

where relu_derivative(x) is 0 if x<=0, and 1 if x>0.

Suppose that the weight w contributes to negative inputs, i.e., when x<0. In this case, relu_derivative(x) is 0, which means that the update rule for the weight w becomes:

delta * x * 0 = 0

Since the update rule for this weight involves multiplying by 0, the weight will not be updated during backpropagation. This means that the weight will retain its value and not change during training, unless other weights that contribute to positive inputs cause a change in the output of the neuron.

In summary, the weights that contribute to negative inputs in a neuron with ReLU activation will not be updated during backpropagation, unless other weights that contribute to positive inputs cause a change in the output of the neuron

# 2. Is Backpropogation "reversing" a neural network?

We established how weights and activation functions are used to create these neural networks but how do we come up with the values of these weights?

Good question, and the answer relies on Backpropogation, a technique used to optimize our model such that the weights are changed in a way that minimizes our cost function, and in turn maximizes the result we are looking for

#### 2.1 Propogation

Leading us to two core realizations:
1. Artificial neural networks model the synapses of cortical neurons but cannot model the STDP which is trivial, so we use backpropogation to optimize or "learn" instead

2. Backpropogation relies on the fact there is a shorter program that exists that can model the data we have, and so we just have to optimize our model to find it. If there isn't then the data is completely random noise

3.

# Supplementary Reading

- Backpropogation explained: https://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html
- Convulational Neural Networks for Visual Recognition: https://cs231n.github.io/neural-networks-1/