# Pixel Art Generation

This Notebook shows how to generate pixel art image based on the user's prompt. Provided implementation allows for hardware acceleration for MacBook using Hugging Face *diffusers* library and pipeline interface.

Used model: **nerijs/pixel-art-xl** based on **stabilityai/stable-diffusion-xl-base-1.0**. 

In [None]:
# Run this code to install all needed libraries. It assumes that there are no python libraries installed in the environment.
%pip install -r requirements.txt

Let's start with importing necessary libraries. We are using `DiffusionPipeline()` from `diffusers` library and `torch` for available hardware check.

In [None]:
import torch

from diffusers import DiffusionPipeline

Load the base DiffusionPipeline and specify torch_dtype=torch.float16

In [None]:
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)

Define device based on MPS and CUDA availability. When MPS or CUDA is available, use it, otherwise use cpu.

In [None]:
device = "cpu"

if torch.cuda.is_available():
  device = "cuda"
elif torch.backends.mps.is_available():
  device = "mps"

print(f"Using device: {device}")

Move the pipeline to the chosen device. 

In [None]:
pipe = pipe.to(device)

Load the LoRA weights. They allow base model to behave in a specific way, in this example, to generate pixel art.

Diffusers will automatically load these weights to the chosen device where the pipe resides.

In [None]:
pipe.load_lora_weights("nerijs/pixel-art-xl")

Specify prompt. This model doesn't need any specific phrases to generate pixel art, but you can add them. You can also specify negative prompt for better image generation. 

In [None]:
prompt = "pixel art, cyberpunk hacker women working intensely on a computer in a dimly lit room at night, neons, simple, flat colors"

Generate the image using created pipeline. You can control image generation by providing number of steps using num_inference_steps or the LoRA strength using cross_attention_kwargs. Here is also a place where you can ass your negative prompt.

In [None]:
image = pipe(
    prompt,
    # negative_prompt,
    num_inference_steps=50, # 50 is default number of steps, adjust as desired
    cross_attention_kwargs={"scale": 1} # Default scale is 1, adjust LoRA scale as needed (e.g., 0.7-1.0 is common)
).images[0]

# image = pipe(prompt).images[0]

In [None]:
image

## Every step of image generation 

Using the same prepared pipeline, we can modify its callback mechanism. We will implement logic to save every step of new image generation in a specific provided files location.

To do that we are following Hugging Face documentation. Generated step images will be smaller that final image to speed-up generation.

In [None]:
from PIL import Image

def latents_to_rgb(latents):
    weights = (
        (60, -60, 25, -70),
        (60,  -5, 15, -50),
        (60,  10, -5, -35)
    )
    
    weights_tensor = torch.t(torch.tensor(weights, dtype=latents.dtype).to(latents.device))
    biases_tensor = torch.tensor((150, 140, 130), dtype=latents.dtype).to(latents.device)
    rgb_tensor = torch.einsum("...lxy,lr -> ...rxy", latents, weights_tensor) + biases_tensor.unsqueeze(-1).unsqueeze(-1)
    image_array = rgb_tensor.clamp(0, 255)[0].byte().cpu().numpy()
    image_array = image_array.transpose(1, 2, 0)

    return Image.fromarray(image_array)

def decode_tensors(pipe, step, timestep, callback_kwargs):
    latents = callback_kwargs["latents"]
    image = latents_to_rgb(latents)
    image.save(f"./corgi/{step}.png")
    return callback_kwargs


image = pipe(
    prompt="pixel art, a cute corgi in a park, flat colors.",
    callback_on_step_end=decode_tensors,
    callback_on_step_end_tensor_inputs=["latents"],
).images[0]

In [None]:
image

Generate gif showing whole process form random noise to final image based on every step images.

In [None]:
%ffmpeg -framerate 30 -pattern_type glob -i 'corgi/*.png' -r 15 out3.gif