# Stable Diffusion XL with Neuronx: LoRA adapters

`🤗 Optimum` extends `🤗 Diffusers` to support inference on the second generation of Neuron devices(powering Trainium and Inferentia 2). It aims at inheriting the ease of Diffusers on Neuron.

Before you get started, make sure you have run the scripts install-drivers.sh and install-pytorch-neuron.sh, and the notebook is running on torch-neuronx kernel 
For more details, refer to [configured your inf2 / trn1 instance](https://huggingface.co/docs/optimum-neuron/installation)

Lets start by installing optimum for easy inference and peft for fine-tuning

In [None]:
pip install "optimum[neuronx, diffusers]" 
pip install -U peft

## Compilation

To deploy SDXL models, we will also start by compiling the models. We support the export of following components in the pipeline to boost the speed:

* Text encoder
* Second text encoder
* U-Net (a three times larger UNet than the one in Stable Diffusion pipeline)
* VAE encoder
* VAE decoder

You can either compile and export a Stable Diffusion Checkpoint via CLI or `NeuronStableDiffusionXLPipeline` class. 
In this tutorial, we will export [`stabilityai/stable-diffusion-xl-base-1.0`](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) with the API.

In [None]:
from optimum.neuron import NeuronStableDiffusionXLPipeline

model_id = "stabilityai/stable-diffusion-xl-base-1.0"
adapter_id = "lora"
input_shapes = {"batch_size": 1, "height": 1024, "width": 1024, "num_images_per_prompt": 1}

# Compile
pipe = NeuronStableDiffusionXLPipeline.from_pretrained(
    model_id,
    export=True,
    inline_weights_to_neff=True,  # caveat: performance drop if neff/weights separated, will be improved by a future Neuron sdk release.
    lora_model_ids=adapter_id,
    lora_weight_names="pytorch_lora_weights.safetensors",
    lora_adapter_names="sttirum",
    **input_shapes,
)

# Save locally or upload to the HuggingFace Hub
save_directory = "sd_neuron_xl/"

pipe.save_pretrained(save_directory) 


We Recommend `inf2.8xlarge` or larger for compilation. You will also be able to compile the models with a CPU-only instance *(needs ~92GB memory)* using the CLI with `--disable-validation`, which disables the validation of inference on neuron devices.

In the following section, we will run the pre-compiled model on Neuron devices, to reduce expenses, you can run inference with `inf2.xlarge` instance.

## Text-to-image Inference

If you have pre-compiled Stable Diffusion XL models, you can load them directly to skip the compilation: 

In [None]:
from optimum.neuron import NeuronStableDiffusionXLPipeline

stable_diffusion_xl = NeuronStableDiffusionXLPipeline.from_pretrained("sd_neuron_xl")  # Pass a local path or your repo id on the HuggingFace hub.

Run the pipeline passing a prompt with the unique identifier that was used while fine-tuning the model initially
Edit the Prompt below and generate multiple avatars

In [None]:
import torch
# Run pipeline
prompt = """
photo of <<TOK>> pencil sketch, young and beautiful, face front, centered
"""         

negative_prompt = """
ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, 
watermark, grainy, signature, cut off, draft, amateur, multiple, gross, weird, uneven, furnishing, decorating, decoration, furniture, text, poor, low, basic, worst, juvenile, 
unprofessional, failure, crayon, oil, label, thousand hands
"""

seed = 491057365
generator = [torch.Generator().manual_seed(seed)]

image = stable_diffusion_xl(prompt, 
             num_inference_steps=50, 
             guidance_scale=7, 
             negative_prompt=negative_prompt,
             generator=generator).images[0]

image

## Clean-up

After completion, stop the EC2 instance to save the costs. You can download the pretrained model in /sd_neuron_xl and load for inference later.  