# Reverse Neural Network

##### Machine Learning Comprehension: Low

##### Time to Read: 15 min

##### Author: Albert Nguyen-Tran

# Intro

At this point you probably have some idea of what neural networks are. Even better, how useful they are for image classification. With some artificial neurons and weights among other clever techniques we can train a model to classify almost anything we want

But what if you could run the model in reverse? Could you inverse the model such that, given a label it could "generate" some inputs? The answer is not really, but in some cases it can be done given that certain strict conditions are met.

First, let's walk through how a feed-forward neural network works.

# 1. Neural Networks 101

Artificial neural network, convulational neural network, recurrent neural network? What even is a neural network?? If you made the biological connection to our own brains' you are already on the right track 

![image.png](attachment:0997a03c-b666-4f01-bdac-95e5c35d10c6.png)

Despite the fact that we don't *completely* understand how our brains work, the motivation behind neural networks is simple, to - model the way in which our **neurons** interact.

We understand that neurons:
- Have **one** axon that carry electrical nerve impulses
- This axon splits into an axon terminal, housing collateral "axon tips"
- Such tips form synapses with the dendrites of other respective neurons

Ultimately creating 10^15 pathways for our near 90 billion neurons to interact!


### 1.1 How it Works

If your first intuition was to label our brains as a massive series of logic gates like those in your CPU you would be vastly underestimating the power of these neurons

The thing is, each respective neuron will only fire through its axon tips if and only if the total strength of its inputs exceed a certain threshold. 

Well wouldn't that make its image binary just like a circuit?

No! Because the synaptic connection between *most* neurons [can change](https://en.wikipedia.org/wiki/Synaptic_plasticity) when:
- [Long-term potentiation (increase)](https://en.wikipedia.org/wiki/Long-term_potentiation) if neurons are activated in a coordinated manner
- [Long-term depression (decrease)](https://en.wikipedia.org/wiki/Long-term_depression) if neurons are active but do not coincide

Changing the liklihood of these neurons interacting in the future! This is the key idea, that the synaptic strengths are learnable and can control one another!


### 1.2 The Perceptron

Now that groundwork has been layed it's time to discuss how such a neuron can be modelled mathematically

![image.png](attachment:ce27f94e-961d-4e19-a3cb-720b9c69e6ed.png)!

Notice how we are simply grabbing the weighted sum at each cell body from the axons of surrounding neurons to represent the total strength of the "electrical impulses".
Then we use a step function to activate it, introducing "thresholds" as described earlier. 

By having an activation function at each perceptron, together the model will be able to learn and adapt to the key characteristics of the label it is trying to classify.

This distinguishes neural nets from a series of linear functions that would accomplish something like this instead (biases + template ...):

![image.png](attachment:e51cd31b-b88c-4510-b34a-476be71f526f.png)

Briefly, these are two popular non-linear activation functions:

1. Sigmoid: it looks like an S, or a hyperbolic tangent function fitted between [0, 1] to be exact
    - It rose to popularity due to it's all or nothing principle (0, 1), which is a convenient representation of how neurons work in the real world
    - However is not used as much in practice because the increased likelihood of [vanishing gradient](https://stats.stackexchange.com/questions/126238/what-are-the-advantages-of-relu-over-sigmoid-function-in-deep-neural-networks) (vector of partial derivatives on perceptrons that gives us information about which weight to adjust whilst backpropagating to better un)

2. ReLU (Rectified Linear Unit): looks like a hockey stick, modelled by max(0, x)
    - Inexpensive to run because of its near linearity
    - Minimizes vanishing gradient
    - But still prone to exploding [gradient and dying relu](https://www.linkedin.com/pulse/rectified-linear-unit-non-linear-mukesh-manral/?trk=pulse-article_more-articles_related-content-card)

![image.png](attachment:4c5a1a4c-0978-4397-9ae5-86b256da2f0e.png)