In [16]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import os
from PIL import Image
import imagehash
import torch

# **AI generated vs real images detection. Classifying deepfake and real images**

## **ABSTRACT**

This will be added in the end........


## **1.Introduction**

The rapid advancements in artificial intelligence have transformed the field of image generation (Stephan Bohm et al., n.d.)[1] . Modern generative models, such as Generative Adversarial Networks (Goodfellow et al., 2014)[2] and Latent Diffusion Models (Rombach et al., 2021)[3], can produce almost realistic images that are often indistinguishable from those created by humans. This technology has enabled innovative applications in art, design, and entertainment (Van Waning, 2024)[4] . However, it also introduces significant concerns, particularly around misinformation, intellectual property, ethical usage, digital security, and public trust (Kamali et al., 2024)[5].

AI-generated images are a double-edged sword. While they empower creative industries and expand access to content creation tools, they also pose serious societal risks (Stephan Bohm et al., n.d.)[1]. Misinformation is a major concern, as fake images can be used to spread false information, influence opinions, or create fake evidence. For example, an AI-generated image could falsely show someone committing a crime, damaging their reputation or affecting legal cases. (Bird & Lotfi, 2023) [6]

The risks extend to cybersecurity, where AI-generated human faces and biometric data have been used to bypass security systems. Additionally, the rise of deepfake technology amplifies these challenges, eroding trust in visual media and complicating the work of journalists, law enforcement, and other stakeholders who rely on image authenticity. (Bird & Lotfi, 2023) [6]

Although people can often notice small visual clues in fake images, modern AI is becoming so advanced that it’s harder to spot the difference. Unlike earlier generations, which exhibited obvious flaws, modern synthetic images seamlessly mimic real-world details. However, certain artifacts—such as anatomical errors (e.g., malformed hands), stylistic inconsistencies (e.g., unnatural lighting), and sociocultural inaccuracies (e.g., cultural norm violations)—can still serve as critical indicators for detection (Kamali et al., 2024)[5] Ediboglu and Akyol (2023)[7].

This project aims to ................................................

## **2.Related work**

Various methods have been proposed for distinguishing AI-generated images from real ones. ResNets (He et al., 2015)[8] , a type of convolutional neural network (CNN) (O’Shea & Nash, 2015)[9], are widely used due to their ability to analyze high-level features while mitigating the vanishing gradient problem with residual connections. These properties allow ResNets to often outperform other models in classification tasks by learning intricate patterns and textures in images. Ediboglu and Akyol (2023)[7] (Bird & Lotfi, 2023) [6]

CNNs more broadly remain a reliable choice for feature extraction and classification. Their ability to process spatial hierarchies in image data makes them highly effective for identifying subtle differences between real and AI-generated images. CNNs are frequently used as a foundation for other advanced models.(Maruthiram et al., 2024)[10].

Variational Autoencoders (VAEs) (Kingma & Welling, 2013)[11] have also been explored, primarily for anomaly detection. By reconstructing input images and identifying deviations from natural patterns, VAEs can highlight inconsistencies in synthetic images. However, they tend to underperform compared to discriminative models like ResNets when applied to complex datasets such as CIFAKE. Ediboglu and Akyol (2023)[7]

Transformer-based models, such as Vision Transformer (Dosovitskiy et al., 2020)[12], have gained traction for their attention mechanisms, which enable them to capture global dependencies in image data. This makes them particularly effective at handling diverse and complex datasets. Models like Swin Transformers further enhance performance by combining hierarchical feature extraction with the efficiency of attention-based architectures. Ediboglu and Akyol (2023)[7] (Bird & Lotfi, 2023) [6] (Maruthiram et al., 2024)[10]

Finally, explainability tools, such as Grad-CAM (Selvaraju et al., 2019)[13], help to interpret model decisions by visualizing the image regions most influential in classifications. These tools often reveal that classifiers rely on subtle imperfections or artifacts in AI-generated images—features that are difficult for humans to detect but crucial for accurate classification.(Maruthiram et al., 2024)[10].

This project will build on all of these approaches, they have been used on other datasets(CIFAKE or AiARt). We will ........................................

## **3.Data acquisition, exploration and working with images**

### **Data acquisition**

The data is acquired through Kaggle and it's free for use. The dataset consists of authentic images sampled from the Shutterstock platform across various categories, including a balanced selection where one-third of the images feature humans. These authentic images are paired with their equivalents generated using state-of-the-art generative models. This structured pairing enables a direct comparison between real and AI-generated content, providing a robust foundation for developing and evaluating image authenticity detection systems. This is the official dataset for the 2025 Women in AI Kaggle Competition.

AI vs. Human-Generated Images; By: Alessandra Sala, Margarita Pitsiani, Manuela Jeyaraj, Toma Ijatomi; License: Apache 2.0 ; Link to data: https://www.kaggle.com/datasets/alessandrasala79/ai-vs-human-generated-dataset/data

To note: No one from the above mentioned has worked on the same dataset as this one.

## **References**

[1] Stephan Bohm, S., CAEBUS Center of Advanced E-Business Studies, & RheinMain University of Applied Sciences. (n.d.). HumanPerception and Classification of AI-Generated Images: APre-Study based on a sample from the media sector in Germany. In ThinkMind [Research paper]. The First International Conference on Generative Pre-trained Transformer Models and Beyond, Wiesbaden, Germany. https://www.thinkmind.org/articles/gptmb_2024_1_20_38004.pdf 

[2] Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014, June 10). Generative adversarial networks. arXiv.org. https://arxiv.org/abs/1406.2661

[3] Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2021, December 20). High-Resolution Image Synthesis with Latent Diffusion Models. arXiv.org. https://arxiv.org/abs/2112.10752

[4] Van Waning, J. (2024). Human Art vs AI Art, a Potential Danger for Artist? : Artistic and Economic Evaluations across Multiple Art Genres. In B. Liefooghe, Social, Health and Organizational Psychology. https://studenttheses.uu.nl/bitstream/handle/20.500.12932/47294/Thesis_JoA%cc%83%c2%ablvanWaning%202.pdf?sequence=1&isAllowed=y

[5] Kamali, N., Nakamura, K., Chatzimparmpas, A., Hullman, J., & Groh, M. (2024, June 12). How to Distinguish AI-Generated Images from Authentic Photographs. arXiv.org. https://arxiv.org/abs/2406.08651

[6] Bird, J. J., & Lotfi, A. (2023, March 24). CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images. arXiv.org. https://arxiv.org/abs/2303.14126

[7] Ediboglu Bartos, G., & Akyol, S. (4 C.E.). Deep learning for image authentication [Alba Regia Technical Faculty Obuda University]. https://www.researchgate.net/publication/375952278_Deep_Learning_for_Image_Authentication_A_Comparative_Study_on_Real_and_AI-Generated_Image_Classification

[8] He, K., Zhang, X., Ren, S., & Sun, J. (2015, December 10). Deep residual learning for image recognition. arXiv.org. https://arxiv.org/abs/1512.03385

[9] O’Shea, K., & Nash, R. (2015, November 26). An introduction to convolutional neural networks. arXiv.org. https://arxiv.org/abs/1511.08458

[10] Maruthiram, B., Venkataramireddy, .G. Venkataramireddy, & Klick, M. K. (2024). Real VS AI Generated Image Detection and  Classification. International Journal of Innovative Research in Technology (IJIRT), 11(2), ISSN: 2349-6002. https://ijirt.org/publishedpaper/IJIRT166462_PAPER.pdf

[11] Kingma, D. P., & Welling, M. (2013, December 20). Auto-Encoding variational Bayes. arXiv.org. https://arxiv.org/abs/1312.6114

[12] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020, October 22). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv.org. https://arxiv.org/abs/2010.11929

[13] Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2019). Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. International Journal of Computer Vision, 128(2), 336–359. https://doi.org/10.1007/s11263-019-01228-7