# AI-Generated Image Detection

Generative artificial intelligence (GenAI) has evolved rapidly within the last few years.
A central achievement is the ability to generate highly-realistic images from a simple text prompt.
While this technology can support us in productive and creative tasks, it also holds great potential for misuse.
AI-generated images are increasingly being used to commit fraud and spread harmful disinformation.

In this assignment, we will take a closer look at the internals of a generative model and create our own images.
Then, we will exploit the model's architecture to detect AI-generated images using [AEROBLADE](https://arxiv.org/abs/2401.17879).

**Note:** If your local setup does not have a GPU, it is highly recommended to run this notebook on Google Colab (see this repository's README for instructions).

In [None]:
# if you are running this notebook on Google Colab, uncomment and run the following commands

# !pip install lpips --no-deps
# !mkdir data
# !wget https://github.com/aisoc-lab/ai-security-summer-school/blob/6df293b9cb46bddda293237af74b592457219c88/tutorial_03_deepfakes/data/real_elephant.jpg?raw=True -O data/real_elephant.jpg
# !wget https://github.com/aisoc-lab/ai-security-summer-school/blob/6df293b9cb46bddda293237af74b592457219c88/tutorial_03_deepfakes/data/fake_elephant.png?raw=True -O data/fake_elephant.png


In [None]:
import lpips
import matplotlib.pyplot as plt
import numpy as np
import torch
from diffusers import AutoPipelineForText2Image
from diffusers.utils import make_image_grid, pt_to_pil
from PIL import Image
from torchvision.transforms.functional import to_tensor

## Generating Images Using Stable Diffusion XL Turbo

The seminal work by [Rombach et al.](https://arxiv.org/abs/2112.10752), which sparked the development of [Stable Diffusion](https://github.com/CompVis/stable-diffusion) marks a milestone in text-to-image synthesis.
In this assignment, we will be working with its successor [Stable Diffusion XL Turbo](https://arxiv.org/abs/2307.01952), which allows for the generation of highly-realistic images within a single step.

A convenient way to use diffusion models is the [diffusers](https://huggingface.co/docs/diffusers/index) library.
In the following, we create a *pipeline* from the pre-trained SDXL-Turbo model.

In [None]:
pipeline = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16"
)

# performance tweaks
pipeline.enable_sequential_cpu_offload()
pipeline.upcast_vae()

We can now take a look at the model's architecture by simply printing the pipeline.
It mostly resembles that of the original Stable Diffusion and features a U-Net, a text encoder, and a variational autoencoder (VAE).
The VAE is used to transform an image to the latent space, in which the actual generation takes place, and back.
It will be important later, since it can be leveraged to determine whether an image is real or AI-generated.

<center>
<img src="https://github.com/aisoc-lab/ai-security-summer-school/blob/6df293b9cb46bddda293237af74b592457219c88/tutorial_03_deepfakes/data/stable-diffusion-architecture.png?raw=true" alt="Architecture of Stable Diffusion" width="600"/>

*Source: [https://arxiv.org/abs/2112.10752](https://arxiv.org/abs/2112.10752).*
</center>

In [None]:
pipeline

With diffusers, generating an image is straightforward: call the pipeline with your prompt and some additional parameters and wait for the result.
Enter a prompt below and generate some images!
While the guidance scale should be set to 0.0, you can vary the number of inference steps.
Values of 2, 3, or 4 should yield better image quality at the cost of longer runtime.

In [None]:
prompt = # Your prompt here
img = pipeline(prompt=prompt, guidance_scale=0.0, num_inference_steps=1).images[0]
img

# Using the VAE

We can now use the VAE to transform our generated image into the model's latent space.
To do so, we need to first apply some pre-processing.
Then, we can call the VAE's `encode` method on our image.
Since a VAE outputs a distribution, we need to sample from it to obtain the actual latents.

In [None]:
with torch.no_grad():  # avoid gradient computation
    preprocessed_img = pipeline.image_processor.preprocess(img)
    latents = pipeline.vae.encode(preprocessed_img).latent_dist.sample()

You can now have a look at the latents.

First, inspect their dimensions.
How do they differ from those of the image?

Second, visualize the latents. You can make use of the `plot_tensor` function defined below. Since the channels in the latent space do not directly map to color channels, you should plot each channel independently.

In [None]:
def plot_tensor(tensor: torch.Tensor, vmin: float | None = None, vmax: float | None = None) -> None:
    """
    Plot a tensor using matplotlib's imshow. Optionally resize tensor and change minimum/maximum value for better visualization.
    """
    # move channel dimension to the end and normalize
    if tensor.ndim == 3 and tensor.shape[0] == 3:
        tensor = tensor.permute(1, 2, 0)
    plt.imshow(tensor.numpy(), vmin=vmin, vmax=vmax)
    plt.colorbar()
    plt.show()

In [None]:
# Your code here

Using the VAE's decoder, we can now reconstructe the original image from the latents.
Try this on your own. Basically, you need to reverse the steps we took to encode an image, so first call the VAE's `decode` method and then apply post-processing using the `postprocess` method of the `image_processor`.

**Note:** Due to model internals, you need to change the data type of the VAE to `float32` before calling its decode method. This can be done using `pipeline.vae.to(dtype=torch.float32)`. Moreover, the `decode` function returns a data structure. You can access the actual sample using `.sample`.

In [None]:
# Your code here

While the original image and its reconstruction look identical, they are not.
Since the latent space is significantly smaller than the image space (to speed up generation), it cannot preserve every detail of an image.

To show this, compute the absolute error between original and reconstruction and visualize the average error across all color channels.
The `to_tensor` function might come in handy here.

Have a look at what parts of the image can be reconstructed well, and what parts suffer from larger reconstruction errors.

In [None]:
# Your code here

## Detecting AI-Generated Images Using AEROBLADE

The previous observation is the core idea of AEROBLADE.
It exploits the fact that generated images can be reconstructed relatively well.
For each generated image, there is a distinct location in the VAE's latent space, which can be seen as its origin.

In contrast, real images do not have an exact origin.
They are mapped to the closest point in the latent space.
Due to this shift, they exhibit a higher reconstruction error.

Thus, if an image can be reconstructed well using a model's VAE, chances are high that it is generated by this particular model.

The following figure further illustrates this idea.
Note that we can use different distance/error metrics for $d$.
Experiments have shown that [LPIPS](https://arxiv.org/abs/1801.03924) performs particularly well.
<center>
<img src="https://github.com/aisoc-lab/ai-security-summer-school/blob/6df293b9cb46bddda293237af74b592457219c88/tutorial_03_deepfakes/data/aeroblade.png?raw=true" alt="Concept behind AEROBLADE" width="600"/>

*Source: [https://arxiv.org/abs/2401.17879](https://arxiv.org/abs/2401.17879).*
</center>

In `data/real_elephant.jpg` and `data/fake_elephant.png` you find two images we can use to demonstrate AEROBLADE.
For both images, apply the steps from above to create their reconstructions.
Then, use the LPIPS metric to compute the distance and visualize it.

You should initialize LPIPS using `lpips_fn = lpips.LPIPS(net="vgg", spatial=True)` and call it with `lpips_fn(..., normalize=True)`.

**Hints:**
- You can use `Image.open("path/to/file")` to load images with PIL.
- LPIPS expects inputs to be 4-dimensional (samples x channels x width x height).
- Using the `vmin`/`vmax` arguments of `plot_tensor` can improve visualization, especially to align the color bars for both images.

In [None]:
# Your code here

As you can see, the reconstruction error for the real image is significantly larger than that of the fake image.
By taking the spatial average over all pixels, we obtain a single value for simple threshold-based detection of AI-generated images!

**Congratulations, you made it to the end of the notebook!**