This experiment aims to help us achieve **high fidelity** and better image quality in terms of resolution, sharpness, smoothness, etc, regarding the image used as a background in the Godot game. To achieve this, I utilized some LLMs that take as input the original image (in our case, the hand-drawn image) and produce a slightly/heavily modified image with the desired features as output.

Below is the code that I ran for this purpose, each time with the corresponding Diffusion model as an argument to the right function.

(Code from: https://huggingface.co/docs/diffusers/main/en/using-diffusers/img2img)

In [None]:
import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image

In [None]:
pipeline = AutoPipelineForImage2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)

# offload models to CPU to reduce memory usage
pipeline.enable_model_cpu_offload()

# prepare image
init_image = load_image("test.jpg")

#prompt = "Same image as original, detailed, 8k, high resolution, unchanged"
#prompt = "Same image as original, detailed, 8k, high resolution, unchanged, no humans"
prompt = "Same image as original, detailed, 8k, high resolution, unchanged, no humans, vibrant colours"

# pass prompt and image to pipeline
image = pipeline(prompt, image=init_image, strength=0.2).images[0]         # strength values between 0.2 & 0.3
make_image_grid([init_image, image], rows=1, cols=2)
