<a href="https://colab.research.google.com/github/HLCV-23/Inpainting-Detection/blob/Christian/project_prototype.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Related Work
## Inpainting
- https://arxiv.org/pdf/2102.12092.pdf  
  Dall-e
- https://arxiv.org/pdf/2112.10752.pdf  
  Stable-Diffusion

## GAN-generated images detection
- https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8397040  
  detection of images generated by GANs, no inpainting
- https://arxiv.org/pdf/2202.07145.pdf
  review of several GAN detection algorithms

## Diffusion-generated images detection
- https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10095167  
  studies the performance of GAN-detection models on images generated by diffusion models

## Inpainting Detection
- https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9410590  
- https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9506778&tag=1  
  similar to our detection approach, but with random(?) masks. It compares the performance on deep learning and traditional inpainting techniques, which might be of interest for us (both for training and evaluation). We could also use the network architecture proposed here. Additional open-source dataset we might want to use.

## Image/Network Watermarking
- https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=650120  
  not directly related, but in watermarking, it is common to test the robustness of image watermarking techniques against common image preprocessing(rescaling, compression, etc). We might want to do that in our experiments too.
- https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-adi.pdf
- https://tianweiz07.github.io/Papers/21-aamas.pdf  
- https://arxiv.org/abs/2305.20030 (Fourier Transform based technique to robustly watermark diffusion model outputs)
  

In [None]:
import gc
import os
import torch
from PIL import Image
import requests
!pip install transformers
from transformers import SamModel, SamProcessor
import matplotlib.pyplot as plt
import numpy as np
!pip install diffusers
!pip install accelerate
from diffusers import StableDiffusionInpaintPipeline
from PIL import Image
!pip install xformers
import requests
import time
from transformers import pipeline
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from torch import cuda, bfloat16
import transformers
import torch
from transformers import StoppingCriteria, StoppingCriteriaList
from PIL import Image, ImageFilter

In [None]:
device = torch.device("cuda:0")
generator = pipeline("mask-generation", model="facebook/sam-vit-huge", device=device)
image_to_text = pipeline("image-to-text", model="nlpconnect/vit-gpt2-image-captioning")
tokenizer = AutoTokenizer.from_pretrained("Gustavosta/MagicPrompt-Stable-Diffusion")
model = AutoModelForCausalLM.from_pretrained("Gustavosta/MagicPrompt-Stable-Diffusion")
gpt2_pipe = pipeline("text-generation", model= model, tokenizer = tokenizer, device = device)

In [None]:
pipe = StableDiffusionInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-inpainting",
    torch_dtype=torch.float16,
    safety_checker=None,
    # Had to turn it off due to error with accelerate
    low_cpu_mem_usage = False
)
pipe.to("cuda:0")

### Generate Sample Images

Generate multiple versions of an image with different regions inpainted and return result as a matplotlib figure. Additionaly return figure containing all the masks of the regions that where inpainted in the image.


In [None]:
import numpy as np
import requests
from PIL import Image
from matplotlib import pyplot as plt

# Calculate areas of all masks and store them with their indices
mask_areas = [(mask.sum(), idx) for idx, mask in enumerate(masks)]
# Sort mask_areas by area (in descending order) and take top 16
largest_masks = sorted(mask_areas, key=lambda x: x[0], reverse=True)[:16]

fig, axs = plt.subplots(4, 4, figsize=(20, 20))

# Fetch the image once, as it doesn't change
img_url = "https://datasets-server.huggingface.co/assets/biglam/dating-historical-color-images/--/default/train/4/image/image.jpg"
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB").resize((512,512))

for i, (_, mask_id) in enumerate(largest_masks):
    mask_img = Image.fromarray((masks[mask_id] * 255).astype(np.uint8).squeeze())
    mask_img = mask_img.convert("RGB")
    mask = np.expand_dims(masks[mask_id], axis=2)

    print(mask.shape)
    print(np.array(raw_image).shape)

    image = pipe(prompt=prompt, image=raw_image, mask_image=mask_img, height=512, width=512).images[0]
    print(f"Finished generating image {i}/{len(largest_masks)}")
    axs[i // 4, i % 4].imshow(image)
    axs[i // 4, i % 4].axis('off')

plt.savefig("mask_images_grid.png")
plt.show()

# Create a separate 5x5 grid for the masks
fig, axs = plt.subplots(4, 4, figsize=(20, 20))

for i, (_, mask_id) in enumerate(largest_masks):
    mask_img = Image.fromarray((masks[mask_id] * 255).astype(np.uint8).squeeze())
    axs[i // 4, i % 4].imshow(mask_img, cmap='gray')  # cmap='gray' for grayscale
    axs[i // 4, i % 4].axis('off')

plt.savefig("mask_grid.png")
plt.show()

# Dataset Generation

In [None]:
# hugging face token is required to load dataset,
# go to https://huggingface.co/datasets/imagenet-1k/viewer/default/train
# and get a token in the account settings
!pip install datasets
!huggingface-cli login

Collecting datasets
  Downloading datasets-2.13.1-py3-none-any.whl (486 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/486.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m486.2/486.2 kB[0m [31m26.3 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.7,>=0.3.0 (from datasets)
  Downloading dill-0.3.6-py3-none-any.whl (110 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m110.5/110.5 kB[0m [31m15.0 MB/s[0m eta [36m0:00:00[0m
Collecting xxhash (from datasets)
  Downloading xxhash-3.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m212.5/212.5 kB[0m [31m29.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting multiprocess (from datasets)
  Downloading multiprocess-0.70.14-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.3/134.3 kB[0m [31m13.9 MB/s[0m eta [36m0:

In [None]:
# download imagenet
from datasets import load_dataset, Dataset, Image
import pandas as pd
import os

dataset = load_dataset("scene_parse_150", use_auth_token = True, streaming = True, split = "train")
dataset = iter(dataset.shuffle())

Downloading builder script:   0%|          | 0.00/21.4k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/22.3k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/50.0k [00:00<?, ?B/s]



In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
#"""
!mkdir drive/MyDrive/Inpainted_Scenes
!touch drive/MyDrive/Inpainted_Scenes/Prompts.csv
!mkdir drive/MyDrive/Inpainted_Scenes/Images
!mkdir drive/MyDrive/Inpainted_Scenes/Masks
!pwd
#"""

/content


In [None]:
def generate_datapoint_single_mask(data, id, csv_data, path):
    data = data["image"].resize((512,512))
    image_area = 512*512
    #data.show()
    masks = generator(data)["masks"]

    # Only masks with an area not too large or too small are eligible candidates for inpainting
    valid_masks = [mask for idx, mask in enumerate(masks) if (image_area * 0.025 <= mask.sum() <= image_area * 0.5)]

    choice = np.random.choice(np.arange(len(valid_masks)))
    mask = valid_masks[choice]

    mask_img = Image.fromarray((mask * 255).astype(np.uint8).squeeze())
    mask_img = mask_img.convert("RGB")

    # Generate prompt for inpainting model
    caption = image_to_text(data)[0]["generated_text"]
    prompts = gpt2_pipe(caption)[0]["generated_text"]
    prompt = prompts.split(",")[0]

    # inpaint
    inpainted_img = pipe(prompt=prompt, image=data, mask_image=mask_img, height=512, width=512).images[0]

    # get the edges of the mask
    edges = mask_img.filter(ImageFilter.FIND_EDGES).filter(ImageFilter.MaxFilter(7)).convert("1")

    # blur the inpainted image
    inpainted_img_blur = inpainted_img.filter(ImageFilter.GaussianBlur(radius = 1))

    # add the blured parts along the borders of the inpainted object to make it smoother
    inpainted_img_smooth = Image.composite(inpainted_img_blur, inpainted_img, edges).convert("RGB")

    inpainted_img_smooth.save(path + f"/Images/{id}.png")
    mask_img.save(path + f"/Masks/{id}.png")
    csv_data.at[id, "Prompt"] = prompt
    print(f"Saved processed image {id}.jpeg")

In [None]:
path = "drive/MyDrive/Inpainted_Scenes"

try:
  csv_data = pd.read_csv(path + "/Prompts.csv")
except:
  with open(path + "/Prompts.csv", "w+") as f:
    f.write("idx, Prompt")
  csv_data = pd.read_csv(path + "/Prompts.csv")

for id in range(0,100):
    generate_datapoint_single_mask(next(dataset), id, csv_data, path)

csv_data.to_csv(path + "/Prompts.csv")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  0%|          | 0/50 [00:00<?, ?it/s]

Saved processed image 0.jpeg


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  0%|          | 0/50 [00:00<?, ?it/s]

Saved processed image 1.jpeg


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  0%|          | 0/50 [00:00<?, ?it/s]

Saved processed image 2.jpeg


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  0%|          | 0/50 [00:00<?, ?it/s]

Saved processed image 3.jpeg


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  0%|          | 0/50 [00:00<?, ?it/s]

Saved processed image 4.jpeg


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  0%|          | 0/50 [00:00<?, ?it/s]

Saved processed image 5.jpeg


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  0%|          | 0/50 [00:00<?, ?it/s]

Saved processed image 6.jpeg


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  0%|          | 0/50 [00:00<?, ?it/s]

Saved processed image 7.jpeg


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  0%|          | 0/50 [00:00<?, ?it/s]

Saved processed image 8.jpeg


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  0%|          | 0/50 [00:00<?, ?it/s]

Saved processed image 9.jpeg


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  0%|          | 0/50 [00:00<?, ?it/s]

Saved processed image 10.jpeg


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  0%|          | 0/50 [00:00<?, ?it/s]

Saved processed image 11.jpeg


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  0%|          | 0/50 [00:00<?, ?it/s]

Saved processed image 12.jpeg


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  0%|          | 0/50 [00:00<?, ?it/s]

Saved processed image 13.jpeg


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  0%|          | 0/50 [00:00<?, ?it/s]

Saved processed image 14.jpeg


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  0%|          | 0/50 [00:00<?, ?it/s]

Saved processed image 15.jpeg


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  0%|          | 0/50 [00:00<?, ?it/s]

Saved processed image 16.jpeg


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  0%|          | 0/50 [00:00<?, ?it/s]

Saved processed image 17.jpeg


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  0%|          | 0/50 [00:00<?, ?it/s]

Saved processed image 18.jpeg


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  0%|          | 0/50 [00:00<?, ?it/s]

Saved processed image 19.jpeg


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  0%|          | 0/50 [00:00<?, ?it/s]

Saved processed image 20.jpeg


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  0%|          | 0/50 [00:00<?, ?it/s]

Saved processed image 21.jpeg


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  0%|          | 0/50 [00:00<?, ?it/s]

Saved processed image 22.jpeg
