# Weekly Report - June 23
## Summary of Contents
- Goals
- Results & Findings
- Plan of the Next Week
- Appendix

## Goals
My goals of the week include:
- Implement the following four **Anomaly Detectors** and do comparison
    - Detector based on the Construction Error with PCA 
    - Detector based on the Multivariate-Gaussian Distribution with PCA
    - Detector based on the Construction Error with Deep Autoencoder
    - Detector based on the Multivariate-Gaussian Distribution with Deep Autoencoder
- Apply the same evluation to different dataset to test if the conclusion can be generalized

## Results & Findings
### Dataset 1: Face Data from Yale
The results of Construction Error-based methods work well in the dataset, and in comparison, PCA-based method achieves a better result than the Deep Autoencoder-based result. However, I highly doubt that this may be due to the small size of the dataset (there are just around 600 images in the dataset)

#### Comparison two Construction Error-based methods
> Table of Results (Precision @k) in the Test set

|Method|Precision @ K = 50|
|------|-------------|
|Construction Error with PCA (N = 50)|100%|
|Construction Error with PCA (N = 20)|70%|
|Construction Error with Deep Autoencoder|65%|

*Note*: the `N` in the table is the number of Principal Components selected during encoding. 

<br>

> Plots of Reconstruction Error from each encoding methods

The following plots show the data points sorted according to their reconstruction error. The points are colored based on their labels: anomaly points are black and normal points are yellow. 

Clearly, **in all of the three methods, most of the anomaly points have high construction error**, which is desireable.

Construction Error with PCA (N = 50): 
![PCA_N=50](Screenshots/623_re_pca_50.png)

Construction Error with PCA (N = 20): 
![PCA_N=20](Screenshots/623_re_pca_20.png)

Construction Error with Deep Autoencoder: 
![DA](Screenshots/623_re_da.png)

#### Failure of Gaussian-based methods
With PCA-encoder, the Gaussian method fails to identify the Anoamly. Below is a plot of data points sorted according to their Probability in the Gaussian Distribution. **Ideally, the anomaly points should have very low probability**, as the distribution is fitted only to the normal data. However, the long and flat tail makes it very diffcult to detect anomalies precisely. 

The Precision @ 50 on the test set is only 4.0%

![GA-PCA](Screenshots/623_ga_pca.png)

As for the Deep Autoencoder, there is a big problem: **the covariance of the encoded matrix become singular**, which means the Gaussian method becomes un-usable. This problem happens to both the Face dataset and the MNIST hand-writing digits dataset. The cause of the problem is still un-known. Dusan thought a potential reason is the training data is not big enough, but I used an entire MNIST dataset for training

### Dataset 2: MNIST
The MNIST dataset contains an (almost) equal amount of hand-written photos of 10 digits. I first selected the photos of digit 0 as the anomaly, and others as normal. The result was very bad. Later I changed the target to the digit 2 from the digit 0, and the result became very good. 

#### Comparison two Construction Error-based methods
> Table of Results (Precision @k) in the Test set

|Method|Precision @ K = 50|
|------|-------------|
|Construction Error with PCA (N = 200)|72%|
|Construction Error with Deep Autoencoder|86%|

*Note*: the `N` in the table is the number of Principal Components selected during encoding. 

<br>

> Plots of Reconstruction Error from each encoding methods

The following plots show the data points sorted according to their reconstruction error. The points are colored based on their labels: anomaly points are black and normal points are yellow. 

Clearly, **in all of the three methods, most of the anomaly points have high construction error**, which is desireable.

Construction Error with PCA: 
![PCA](Screenshots/623_re_pca_mnist.png)

Construction Error with Deep Autoencoder: 
![DA](Screenshots/623_re_da_mnist.png)


#### Failure of Gaussian-based methods
The Gaussian-based methods fail in this dataset as well. 

In the context of autoencoder, the problem remains the same: singular covariance prevents the use of gaussian distribution. 

In PCA, the problem is strange: all of the data points have 0 probability (as shown below). I am still probing for the cause of the problem. Probably it is due to a bug in the code. 

![GA-PCA-mnist](Screenshots/623_ga_pca_mnist.png)

## Plan of the Next Week
1. For PCA-based method, I want to do quantative test to evaluate the effect of N (number of PC selected) on the final result (Precision @ K)
2. For Gaussian and Deep Autoencoder-based method, I want to identify the cause(s) of the singularity problem, and how to solve it. 
3. For Deep Autoencoder-based methods in general, I want to test different configurations (number of layers, layer size, dropout, etc)
4. I will find 1-2 more datasets and compare the performance of the four detectors. 

## Appendice
### Links to my code
- [Work on the Yale Faces Dataset](https://github.com/Ivan-Zhou/Anomaly_Detection/tree/master/Yale_Faces_Data)
    - [PCA & Reconstruction Error (N = 50)](https://github.com/Ivan-Zhou/Anomaly_Detection/blob/master/Yale_Faces_Data/PCA_Reconstruction_Error_k%3D50.ipynb)
    - [PCA & Reconstruction Error (N = 20)](https://github.com/Ivan-Zhou/Anomaly_Detection/blob/master/Yale_Faces_Data/PCA_Reconstruction_Error_k%3D20.ipynb)
    - [PCA & Gaussian](https://github.com/Ivan-Zhou/Anomaly_Detection/blob/master/Yale_Faces_Data/PCA_Multivariate%20Gaussian.ipynb)
    - [Deep Autoencoder - Training](https://github.com/Ivan-Zhou/Anomaly_Detection/blob/master/Yale_Faces_Data/Autoencoder_training_deep.py)
    - [Deep Autoencoder - Two Methods](https://github.com/Ivan-Zhou/Anomaly_Detection/blob/master/Yale_Faces_Data/Autoencoder%20Anomaly%20Detection-Deep.ipynb)
- [Work on the MNIST](https://github.com/Ivan-Zhou/Anomaly_Detection/tree/master/MNIST)
    - [PCA & Reconstruction Error - 0 as Target Anomaly](https://github.com/Ivan-Zhou/Anomaly_Detection/blob/master/MNIST/PCA_Reconstruction_Error_Target_0.ipynb)
    - [PCA & Reconstruction Error - 2 as Target Anomaly](https://github.com/Ivan-Zhou/Anomaly_Detection/blob/master/MNIST/PCA_Reconstruction_Error_Target_2.ipynb)
    - [PCA & Gaussian](https://github.com/Ivan-Zhou/Anomaly_Detection/blob/master/MNIST/PCA_Gaussian_Based_Model_Target_2.ipynb)
    - [Deep Autoencoder - Training](https://github.com/Ivan-Zhou/Anomaly_Detection/blob/master/MNIST/autoencoder_training.py)
    - [Deep Autoencoder & Reconstruction Error](https://github.com/Ivan-Zhou/Anomaly_Detection/blob/master/MNIST/Autoencoder_Reconstruction_Error.ipynb)
    - [Deep Autoencoder & Gaussian](https://github.com/Ivan-Zhou/Anomaly_Detection/blob/master/MNIST/Autoencoder_Multivariate_Gaussian.ipynb)
    
### Sample Images from PCA & Autoencoder
#### Faces Data - Decoded from PCA
![face_pca](Screenshots/623_decoded_pca_faces.png)

#### Faces Data - Decoded from Deep Autoencoder
![face_da](Screenshots/623_decoded_da_faces.png)

#### MNIST Data - Decoded from PCA
![mnist_pca](Screenshots/623_decoded_pca_mnist.png)

#### MNIST Data - Decoded from Auto Autoencoder
![mnist_pca](Screenshots/623_decoded_da_mnist.png)