# Exercise 3: More Advanced AD with Convolutional Auto-Encoders, Non-Standard Losses, And Generative Adversarial Network Anomaly Detection

### Goals of the Exercise

- Make a 2D convolutional autoencoder for events
- Explore some non-standard losses for 2D images/autoencoders
- Create a Generative Adversarial Network for anomaly detection

### More Anomaly Detection With Autoencoders

A very common form of data in particle physics (and everything else) is images. 2D maps of the visual of something. There are, of course, autoencoders for this, "convolutional autoencoders", that use the same filter based features as a traditional image classifier convolutional neural network.

Very often, data comes in 2D grids (especially for something like a detector or calorimeter), or a 2D image can be made by imposing a grid based structure on spatial data (note: this is acutally a bit non-ideal as it can lead to sparse data representations, and a whole lot of math on a whole lot of nothing, a problem that graph auto-encoders in the next exercise solve).

at this point, you know the drill, we're going to make some auto-encoders for this, and see how they do!

### Esoteric Losses

This section is a little more freeform. I just wanted to take some time to introduce other loss functions than Mean Squared Error here. Mean Squared Error is not a bad loss, and should be your first stop in checking things when making reconstruction losses, but it also can reward "peak memorization" and out of set reconstruction if you aren't careful. For images like we're working with, you do have other options. These are things I am experimenting around with, so I figured it might be interesting for you all to take a look at them as well and see how they work for this exercise. No guaratees from my side that these will be suitable or usable. It is good practice to try implementing your own loss functions though, because the loss function really does shape the behavior of a neural network as much as the layers you put in it you may be called to really try exploring that as much as you would trying to put in more or fancier layers.

##### Huber Loss

##### Normalized Cross Correlation

##### Structural Similarity Index

### Different Networks for Anomaly Detection: Generative Adversarial Networks
Up to this point we have been focusing excluively on different types of Autoencoder, because they form the most common and easiest to use idea in anomaly detection. They are not the only unsupervised technique in neural networks though, which means they are not the only technique in anomaly detection. The other big kind of network is the "Generative Adversarial Network". The idea goes like this: Generative Adversarial Networks actually use a pair of models, the first model is designed to take a bunch of noise/random numbers, and from this, it creates data that looks like the input dataset you are doing unsupervised learning on. The second model is a classifier, and its job is to judge whether the images it has gotten are genuinely from the dataset, or are fakes. The first model is trained to try and fool the second classifier model, the second classifier model is trained to try and find the genuine article and not get fooled.

At the end of training both of these models, the classifier then should be relatively picky and serve as a good judge of whether something belongs to the dataset it has seen, or is anomalous.

Because the training loop is a bit odd, we can't actually use the standard `fit` function. I will provide the training loop, you provide the models

Generative Adversarial Networks are an interesting idea, and have some great upsides (a direct classifier trained to do almost exactly the task we are after, a free model that does generative stuff, etc.) but of course they are not without their downsides. The big one is the training loop, which has some problems that stem from game theory. Because these two networks are competing, they can end up in what is called a [Nash Equilibrium](https://en.wikipedia.org/wiki/Nash_equilibrium). A Nash Equilibrium is a scenario where neither network stands to gain from changing anymore, if the generator changes it will not fool the classifier any more than it already does, and potentially will become worse, and the classifier cannot be any better on the data it is seeing than it already is. Note, that being in this state does not guarantee that the generated images are any good, nor that the classifier is very accurate to our underlying dataset!

There are some techniques to try and get around this. One is [SpectralNormalization](https://keras.io/api/layers/preprocessing_layers/numerical/spectral_normalization/) of layers. Another is adding some "fuzz" or "jitter" to the labels of the dataset (i.e. not using 0 or 1, but removing or adding a tiny bit to each) to try and force the networks out of stable states.

We won't have time to explore them here. That's homework.

### Wrap up

Barring some kind of LLM based autoencoder, we've covered a lot of the basics of most of the modern anomaly detection neural network methods. The only remaining major technique to cover is Graph Neural Networks. We'll talk about those in the next exercise if we get time.