# Problem Statement

Given an image, and a set of images which we know to be real images or synthetic images, can we classify an image to be real or synthetic using statistical methods?


# Concept

We are treating this problem as binary hypothesis testing problem and our observations are the histograms of the tonal values for a given image. 

* Hypothesis H1: Image is a natural image. 
* Hypothesis H0: Image is a synthetic image. 

Given below is a histogram, which shows the difference between a histogram for real images vs the histogram for synthetic images. We observe visually that the histogram for a synthetic image has sharper and more well defined peak values, as compared to real images. 

Hence, our feature input for the classifier is the histogram for a particular image in particular the maximum pixels at a particular tonal value. 

<img src="histogram2.png" alt="Histogram" style="width: 600px;"/>



### Likelihood Ratio Test

This test is able to distinguish between natural and synthetic images when we use a threshold and compare the same with the threshold. 
Sample average and variance of both synthetic and natural scenes are calculated and modelled as a gaussian distribution becausee of the large separation between the two means and hence would be a reasonable assumption.


\begin{equation}
\frac{Pr(x |H_1)}{Pr(x |H_0)} \ > threshold
\end{equation}

As we are treating both kind of images on a single parameter, we are setting the threshold value to 1. If the likelihood ratio is greater than 1, then it's treated as natural image and otherwise it's treated as synthetic image. The threshold can be changed based on the trade-off between the probability of false alarm and the probability of detection.

We assume that the peak intensities for both classes of data are distributed in a Gaussian fashion. 

### Kolmogorov–Smirnov Test

A non-parametric test, the KS tests whether two samples have come from the same probability distribution. It serves as a goodness of fit test, and returns back a p-value based on which we can either reject or accept our null hypothesis. 


## Image Processing Method

### Likelihood ratio test


The image has been processed in RGB. The _peak pixel value is taken as the feature for the classifier_. Since we are only concerned with the peak pixel value, we resized the image to a 300 x 150 image to reduce computational complexity. 

Post that, our hypothesis H1 was that the pixel values for real were clustered around a sample mean and distributed with a sample variance. We assumed that this would be lower than the ones for the synthetic images, as the synthetic images had more vibrant colors.

For synthetic images, our hypothesis H0 was similar to H1, just with a different mean and variance. 

<img src="histogram.png" alt="Histogram" style="width: 600px;"/>

For a given test image, we would run the LRT to detect which class it had a higher or greater probability of being in. 

### Kolmogorov–Smirnov Test

For this method, we took the input to be the entire histogram, normalized to give the probability mass function of the pixel intensities present in the image. Post that, every input image to be classified was tested against every image in our data set. 


## Results

Using our likelihood ratio test and setting the threshold equal to 1, we obtain the following performance:

| Method        | Real (Classification Rate)         | Synthetic  (Classification Rate)  |
| ------------- |:-------------:| -----:|
| Likelihood Ratio Test     | 95.55%| 82.5%|
|Kolmogorov–Smirnov Test     | 97.7%      |   100% |

<!---


[//]: <> Natural: 95.55% of the images were detected correctly. 

[//]: <> Synthetic: 82.5% of the images were detected correctly. 

[//]: <> _Probability of Detection: 0.9555_

[//]: <> Probability of False Alarm: 0.175

[//]: <> Probability of Miss: 1 -Probability of Detection = 0.0445

-->
