Over the past years, images generated by artificial intelligence have become more prevalent and more realistic. Their advent raises ethical questions relating to misinformation, artistic expression, and identity theft, among others. The crux of many of these moral questions is the difficulty in distinguishing between real and fake images. It is important to develop tools that are able to detect AI-generated images, especially when these images are too realistic-looking for the human eye to identify as fake. This paper proposes a dual-branch neural network architecture that takes both images and their Fourier frequency decomposition as inputs. We use standard CNN-based methods for both branches as described in Stuchi et al. [7], followed by fully-connected layers. Our proposed model achieves an accuracy of 94% on the CIFAKE dataset, which significantly outperforms classic ML methods and CNNs, achieving performance comparable to some state-of-the-art architectures, such as ResNet.
Model | Precision | Recall | F1-Score | Accuracy |
---|---|---|---|---|
SVM | 0.8020 | 0.8222 | 0.8120 | 0.8143 |
CNN | 0.8734 | 0.8574 | 0.8653 | 0.8640 |
ResNet | 0.9917 | 0.9066 | 0.9472 | 0.9495 |
VGGNet | 0.9657 | 0.9547 | 0.9602 | 0.9600 |
DenseNet | 0.9769 | 0.9779 | 0.9774 | 0.9774 |
Our Model | 0.9351 | 0.9471 | 0.9410 | 0.9407 |